From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933141AbcKGSJO (ORCPT ); Mon, 7 Nov 2016 13:09:14 -0500 Received: from mail.skyhub.de ([78.46.96.112]:38811 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932635AbcKGSJI (ORCPT ); Mon, 7 Nov 2016 13:09:08 -0500 Date: Mon, 7 Nov 2016 19:08:54 +0100 From: Borislav Petkov To: "Luck, Tony" Cc: linux-edac , X86 ML , LKML Subject: Re: [PATCH] x86/MCE: Remove MCP_TIMESTAMP Message-ID: <20161107180853.4uxlvtoychzhwr2q@pd.tnic> References: <20161101120911.13163-1-bp@alien8.de> <20161105131104.drouavgezizz4w7v@pd.tnic> <3908561D78D1C84285E8C5FCA982C28F3A22720B@ORSMSX114.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F3A22720B@ORSMSX114.amr.corp.intel.com> User-Agent: NeoMutt/20161014 (1.7.1) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 07, 2016 at 05:48:46PM +0000, Luck, Tony wrote: > > So, get rid of all that and simply log an MCE with a TSC value always. > > Simplifies the code a bit too. > > I'm not necessarily opposed to this ... but there was once some logic behind when > logged TSC, and when we didn't. Essentially we wanted the TSC when we were > logging from #CMCI or #MC .... because the detection of the error was fresh, and > wanted as much precision on the logged time as possible to compare with logged > errors from other banks/cpus. This might allow us to distinguish multiple errors logged > in the same #CMCI, from errors logged in separate #CMCI a tenth of a second apart. > > If we found the error while polling, we didn’t want to provide a false sense of precision. > The error could have been logged up to five minutes previously (or when logging > errors during the initial poll of the banks an arbitrary time in the past). Right, looks like we've lost that logic: Functions calling this function: machine_check_poll File Function Line 0 mce-inject.c raise_poll 57 machine_check_poll(0, &b); 1 mce.c mce_timer_fn 1358 machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_poll_banks)); 2 mce.c __mcheck_cpu_init_generic 1508 machine_check_poll(MCP_UC | m_fl, &all_banks); 3 mce_intel.c mce_intel_cmci_poll 133 if (machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned))) 4 mce_intel.c intel_threshold_interrupt 253 machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned)); 5 mce_intel.c cmci_recheck 345 machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned)); So the TSC timestamp will be possibly inexact now in mce_timer_fn(), __mcheck_cpu_init_generic(), mce_intel_cmci_poll() and cmci_recheck(). Should we bother and add a flag to struct mce - maybe somewhere in the padding __u8 pad; - to denote that the logged TSC may not be exact? Mind you, there's also m->time = get_seconds(); which also collects time and which could also be possibly inexact. One other possibility would be to use ->time and write ->tsc *only* when exact - i.e., in the handler - and this is then enough info about timing. ->time will give you somewhere around where it happened and ->tsc - only if set - will give you exact, well, *timestamp* :) This sounds like a pretty straightforward logic to me... -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.