From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D121C433ED for ; Thu, 13 May 2021 14:57:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5D31C61433 for ; Thu, 13 May 2021 14:57:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234664AbhEMO6z (ORCPT ); Thu, 13 May 2021 10:58:55 -0400 Received: from mail.skyhub.de ([5.9.137.197]:54934 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229925AbhEMO6x (ORCPT ); Thu, 13 May 2021 10:58:53 -0400 Received: from zn.tnic (p200300ec2f0e440021f4b7a45291c72c.dip0.t-ipconnect.de [IPv6:2003:ec:2f0e:4400:21f4:b7a4:5291:c72c]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.skyhub.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 7CBD11EC023E; Thu, 13 May 2021 16:57:41 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=dkim; t=1620917861; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=AbgyPKG5ZtG/4WqxP4kIpo39kNw3OZzOO2nwU/C30hg=; b=oRJpIZArC9T1M+9MNwcVtDwjwfbntLsBzjjToIMCKWfzI3s0eG1Hr+RerU06xPGRegGdNV 4kXRfxt+dIQV3oe9KG2yNlBVoNz4nQFBNgRjaiKItSEZNZpwuZcOKQHamI3kAxT17XuauI AOQFZrfNvojet7MBk7JzSn81CuVsorc= Date: Thu, 13 May 2021 16:57:37 +0200 From: Borislav Petkov To: Alex Deucher Cc: "Joshi, Mukul" , x86-ml , "Kasiviswanathan, Harish" , lkml , "amd-gfx@lists.freedesktop.org" Subject: Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran Message-ID: References: <20210512013058.6827-1-mukul.joshi@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 13, 2021 at 10:32:45AM -0400, Alex Deucher wrote: > Right. The sys admin can query the bad page count and decide when to > retire the card. Yap, although the driver should actively "tell" the sysadmin when some critical counts of retired VRAM pages are reached because I doubt all admins would go look at those counts on their own. Btw, you say "admin" - am I to understand that those are some high end GPU cards with ECC memory? If consumer grade stuff has this too, then the driver should very much warn on such levels on its own because normal users won't know what and where to look. Other than that, the big picture sounds good to me. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 449B8C433B4 for ; Thu, 13 May 2021 14:58:21 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D330A613BE for ; Thu, 13 May 2021 14:58:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D330A613BE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=alien8.de Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=amd-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8E5A56ED15; Thu, 13 May 2021 14:58:20 +0000 (UTC) X-Greylist: delayed 105649 seconds by postgrey-1.36 at gabe; Thu, 13 May 2021 14:57:43 UTC Received: from mail.skyhub.de (mail.skyhub.de [5.9.137.197]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4694D6ED15 for ; Thu, 13 May 2021 14:57:43 +0000 (UTC) Received: from zn.tnic (p200300ec2f0e440021f4b7a45291c72c.dip0.t-ipconnect.de [IPv6:2003:ec:2f0e:4400:21f4:b7a4:5291:c72c]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.skyhub.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 7CBD11EC023E; Thu, 13 May 2021 16:57:41 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=dkim; t=1620917861; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=AbgyPKG5ZtG/4WqxP4kIpo39kNw3OZzOO2nwU/C30hg=; b=oRJpIZArC9T1M+9MNwcVtDwjwfbntLsBzjjToIMCKWfzI3s0eG1Hr+RerU06xPGRegGdNV 4kXRfxt+dIQV3oe9KG2yNlBVoNz4nQFBNgRjaiKItSEZNZpwuZcOKQHamI3kAxT17XuauI AOQFZrfNvojet7MBk7JzSn81CuVsorc= Date: Thu, 13 May 2021 16:57:37 +0200 From: Borislav Petkov To: Alex Deucher Subject: Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran Message-ID: References: <20210512013058.6827-1-mukul.joshi@amd.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Mailman-Approved-At: Thu, 13 May 2021 14:58:19 +0000 X-BeenThere: amd-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussion list for AMD gfx List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Joshi, Mukul" , x86-ml , "Kasiviswanathan, Harish" , lkml , "amd-gfx@lists.freedesktop.org" Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: amd-gfx-bounces@lists.freedesktop.org Sender: "amd-gfx" On Thu, May 13, 2021 at 10:32:45AM -0400, Alex Deucher wrote: > Right. The sys admin can query the bad page count and decide when to > retire the card. Yap, although the driver should actively "tell" the sysadmin when some critical counts of retired VRAM pages are reached because I doubt all admins would go look at those counts on their own. Btw, you say "admin" - am I to understand that those are some high end GPU cards with ECC memory? If consumer grade stuff has this too, then the driver should very much warn on such levels on its own because normal users won't know what and where to look. Other than that, the big picture sounds good to me. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx