All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: "Joshi, Mukul" <Mukul.Joshi@amd.com>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"mchehab@kernel.org" <mchehab@kernel.org>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCHv3 2/2] drm/amdgpu: Register MCE notifier for Aldebaran RAS
Date: Sat, 25 Sep 2021 13:20:57 +0200	[thread overview]
Message-ID: <YU8GGSrQSbAZPz4z@zn.tnic> (raw)
In-Reply-To: <YU4rAigWIh8g6iOl@yaz-ubuntu>

On Fri, Sep 24, 2021 at 07:46:10PM +0000, Yazen Ghannam wrote:
> I agree with you in general. But this device isn't really a GPU. And
> users of this device seem to want to count *every* error, at least for
> now.

Aha, so something accelerator-y where they do general purpose computation.

So what's the big picture here: they count all the errors and when they
reach a certain amount, they decide to replace the GPUs just in case?

Or wait until they become uncorrectable? But then it doesn't matter
because we will handle it properly by excluding the VRAM range from
further use.

Or do they wanna see *when* they had the correctable errors so that they
can restart the computation, just in case.

Dunno, it would be a lot helpful if we had some RAS strategy for those
things...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

  reply	other threads:[~2021-09-25 11:21 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-11 15:25 [PATCH 1/3] x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types Naveen Krishna Chatradhi
2021-05-11 15:25 ` [PATCH 2/3] x86/MCE/AMD: Helper function to check UMC v2 Naveen Krishna Chatradhi
2021-05-11 17:34   ` Borislav Petkov
2021-05-12  1:40     ` Joshi, Mukul
2021-05-12  7:26       ` Borislav Petkov
2021-09-13  2:13   ` [PATCHv2 1/2] x86/MCE/AMD: Export smca_get_bank_type symbol Mukul Joshi
2021-09-13  2:13     ` [PATCHv2 2/2] drm/amdgpu: Register MCE notifier for Aldebaran RAS Mukul Joshi
2021-09-22 11:40       ` Borislav Petkov
2021-09-22 19:43         ` Joshi, Mukul
2021-09-22 19:36       ` [PATCHv3 " Mukul Joshi
2021-09-23 14:29         ` Yazen Ghannam
2021-09-23 14:37           ` Borislav Petkov
2021-09-23 15:31             ` Joshi, Mukul
2021-09-23 15:30           ` Joshi, Mukul
2021-09-23 17:23             ` Yazen Ghannam
2021-09-23 18:14               ` Borislav Petkov
2021-09-24 19:46                 ` Yazen Ghannam
2021-09-25 11:20                   ` Borislav Petkov [this message]
2021-09-27 18:37                     ` Yazen Ghannam
2021-09-23 18:34               ` Joshi, Mukul
2021-09-23 22:04         ` [PATCHv4 " Mukul Joshi
2021-09-24 19:53           ` Yazen Ghannam
2021-09-22 11:33     ` [PATCHv2 1/2] x86/MCE/AMD: Export smca_get_bank_type symbol Borislav Petkov
2021-09-22 16:27       ` Deucher, Alexander
2021-09-22 16:43         ` Borislav Petkov
2021-09-22 16:47           ` Joshi, Mukul
2021-05-11 15:25 ` [PATCH 3/3] x86/mce: Add MCE priority for Accelerator devices Naveen Krishna Chatradhi
2021-05-11 17:27 ` [PATCH 1/3] x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types Borislav Petkov
2021-05-24 16:41   ` Chatradhi, Naveen Krishna
2021-05-25 18:02     ` Borislav Petkov
2021-05-25 20:03       ` Yazen Ghannam
2021-05-25 20:12         ` Borislav Petkov
2021-05-26 16:46 ` [PATCH v2] " Naveen Krishna Chatradhi
2021-05-28 15:25   ` [tip: ras/core] " tip-bot2 for Muralidhara M K

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YU8GGSrQSbAZPz4z@zn.tnic \
    --to=bp@alien8.de \
    --cc=Mukul.Joshi@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=mingo@redhat.com \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.