LKML Archive mirror
 help / color / mirror / Atom feed
* Re: MCE Bug?
       [not found] <FC9702EC51E4CA40A875703BEBD6CEF801AE738D@SHSMSX101.ccr.corp.intel.com>
@ 2015-06-17  9:41 ` Borislav Petkov
  2015-06-17 17:45   ` Luck, Tony
  2015-06-17 23:53   ` Luck, Tony
  0 siblings, 2 replies; 9+ messages in thread
From: Borislav Petkov @ 2015-06-17  9:41 UTC (permalink / raw
  To: Wang, Rui Y; +Cc: Luck, Tony, Chen, Gong, linux-kernel@vger.kernel.org

Hi Rui,

On Wed, Jun 17, 2015 at 02:04:56AM +0000, Wang, Rui Y wrote:
> Is it a known problem? I'm based on Linux 4.1.0-rc3-7.

Yeah, I triggered a similar thing in conjunction with testing Gong's
stuff:

https://lkml.kernel.org/r/20150611082747.GA30391@pd.tnic

And I was waiting in line to get a chance to do some injection on our
EINJ box here too. But it seems you have the required setup already so
if you want to give those changes a run, I've uploaded them here:

git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras

It'll be much appreciated.

Thanks!

--
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: MCE Bug?
  2015-06-17  9:41 ` MCE Bug? Borislav Petkov
@ 2015-06-17 17:45   ` Luck, Tony
  2015-06-17 23:53   ` Luck, Tony
  1 sibling, 0 replies; 9+ messages in thread
From: Luck, Tony @ 2015-06-17 17:45 UTC (permalink / raw
  To: Borislav Petkov; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org

On Wed, Jun 17, 2015 at 11:41:56AM +0200, Borislav Petkov wrote:
> And I was waiting in line to get a chance to do some injection on our
> EINJ box here too. But it seems you have the required setup already so
> if you want to give those changes a run, I've uploaded them here:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras
> 
> It'll be much appreciated.

and the answer is <drum roll> ....



no. :-(

Here's the console log:


[    0.190921] Initializing cgroup subsys blkio
[    0.195701] Initializing cgroup subsys memory
[    0.200621] Initializing cgroup subsys devices
[    0.205591] Initializing cgroup subsys freezer
[    0.210563] Initializing cgroup subsys net_cls
[    0.215535] Initializing cgroup subsys perf_event
[    0.220837] Initializing cgroup subsys hugetlb
[    0.225884] CPU: Physical Processor ID: 0
[    0.230368] CPU: Processor Core ID: 0
[    0.235373] mce: CPU supports 22 MCE banks
[    0.239995] CPU0: Thermal monitoring enabled (TM1)
[    0.245388] process: using mwait in idle threads
[    0.250554] Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 1024
[    0.257367] Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 1024, 1GB 4
[    0.265023] Freeing SMP alternatives memory: 28K (ffffffff81cd9000 - ffffffff81ce0000)
[    0.284681] ftrace: allocating 25590 entries in 100 pages
[    0.301981] x2apic: IRQ remapping doesn't support X2APIC mode
[    0.308540] Switched APIC routing to physical flat.
[    0.314866] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.331616] smpboot: CPU0: Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz (fam: 06, model: 3f, stepping: 04)
[    0.342176] Performance Events: PEBS fmt2+, 16-deep LBR, Haswell events, full-width counters, Intel PMU driver.
[    0.353531] ... version:                3
[    0.358009] ... bit width:              48
[    0.362576] ... generic registers:      4
[    0.367056] ... value mask:             0000ffffffffffff
[    0.372992] ... max period:             0000ffffffffffff
[    0.378926] ... fixed-purpose events:   3
[    0.383404] ... event mask:             000000070000000f
[    0.392352] x86: Booting SMP configuration:
[    0.397027] .... node  #0, CPUs:          #1
[    0.423364] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[    0.432694]    #2   #3   #4   #5   #6   #7   #8   #9  #10  #11  #12  #13  #14  #15  #16  #17
[    0.706722] .... node  #1, CPUs:    #18  #19  #20  #21  #22  #23  #24  #25  #26  #27  #28  #29  #30  #31  #32  #33  #34  #35
[    1.094522] .... node  #2, CPUs:    #36
[    1.192483] mce: [Hardware Error]: Machine check events logged
[    1.209188]   #37
[    1.209279] BUG: unable to handle kernel 
[    1.215830]   #38
[    1.217982] NULL pointer dereference at 0000000000000008
[    1.223925] IP: [<ffffffff810980a1>] process_one_work+0x31/0x420
[    1.225696]   #39PGD 0 
[    1.233428] Oops: 0000 [#1] SMP 
[    1.237059] Modules linked in:
[    1.240486] CPU: 36 PID: 263 Comm: kworker/36:0 Not tainted 4.1.0-rc8 #1
[    1.247969]   #40
[    1.247969] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640 05/01/2015
[    1.261708]   #41
[    1.261711] task: ffff88181c284470 ti: ffff88181bd94000 task.ti: ffff88181bd94000
[    1.272235] RIP: 0010:[<ffffffff810980a1>] [    1.275305]   #42
 [<ffffffff810980a1>] process_one_work+0x31/0x420
[    1.283854] RSP: 0000:ffff88181bd97e08  EFLAGS: 00010046
[    1.289799] RAX: 0000000fffffffe0 RBX: ffffffff81d0fa20 RCX: 0000000000000000
[    1.297780]   #43
[    1.297780] RDX: 0000000fffffff00 RSI: ffffffff81d0fa20 RDI: ffff88181c2660c0
[    1.307914] RBP: ffff88181bd97e48 R08: ffff88181f416ec0 R09: ffff88181c284470
[    1.315892]   #44
[    1.315892] R10: 0000000000000002 R11: ffffffff8109e5ac R12: ffff88181c2660c0
[    1.326023]   #45
[    1.326024] R13: ffff88181f416ec0 R14: 0000000000000000 R15: ffff88181c2660f0
[    1.336151] FS:  0000000000000000(0000) GS:ffff88181f400000(0000) knlGS:0000000000000000
[    1.345207]   #46
[    1.345209] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.353782] CR2: 00000000000000b8 CR3: 00000000019ca000 CR4: 00000000001406e0
[    1.361764]   #47
[    1.361765] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    1.371900] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    1.379879]   #48
[    1.379879] Stack:
[    1.384277]  0000000000000282 00000000bce8578f ffff88181f416f50 ffff88181c2660c0[    1.390883]   #49

[    1.394749]  ffff88181f416ed8 ffff88181f416ec0 ffff88181c284470 ffff88181c2660f0
[    1.403058]  ffff88181bd97eb8 ffffffff81098982[    1.407588]   #50
 ffff88181f4171a0 ffff88181f416ed8
[    1.413538] Call Trace:
[    1.416274]  [<ffffffff81098982>] worker_thread+0x112/0x520
[    1.422518]  [<ffffffff81098870>] ? rescuer_thread+0x3e0/0x3e0
[    1.429048]   #51
[    1.429048]  [<ffffffff8109e6d8>] kthread+0xd8/0xf0
[    1.436650]  [<ffffffff8109e600>] ? kthread_create_on_node+0x1b0/0x1b0
[    1.443964]   #52
[    1.443966]  [<ffffffff816a0322>] ret_from_fork+0x42/0x70
[    1.452164]  [<ffffffff8109e600>] ? kthread_create_on_node+0x1b0/0x1b0
[    1.459469]   #53
[    1.459471] Code: 48 89 e5 41 57 41 56 45 31 f6 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 18 48 8b 06 4c 8b 6f 48 48 89 [    1.473959] 
[    1.473959] .... node  #3, CPUs:    #54
c2 30 d2 a8 04 4c 0f 45 f2 <49> 8b 46 08 44 8b b8 00 01 00 00 41 c1 ef 05 44 89 f8 83 e0 01 
[    1.489433] RIP  [<ffffffff810980a1>] process_one_work+0x31/0x420
[    1.496261]  RSP <ffff88181bd97e08>
[    1.500160] CR2: 0000000000000008
[    1.503872] ---[ end trace 8229a011b97532a0 ]---
[    1.509027] Kernel panic - not syncing: Fatal exception
[    1.514890] ---[ end Kernel panic - not syncing: Fatal exception

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: MCE Bug?
  2015-06-17  9:41 ` MCE Bug? Borislav Petkov
  2015-06-17 17:45   ` Luck, Tony
@ 2015-06-17 23:53   ` Luck, Tony
  2015-06-18 10:25     ` Borislav Petkov
                       ` (2 more replies)
  1 sibling, 3 replies; 9+ messages in thread
From: Luck, Tony @ 2015-06-17 23:53 UTC (permalink / raw
  To: Borislav Petkov, Wang, Rui Y; +Cc: Chen, Gong, linux-kernel@vger.kernel.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2746 bytes --]

> if you want to give those changes a run, I've uploaded them here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras

Latest experiments show that sometimes checking kventd_up() before calling schedule_work()
helps ... but mostly only when I fake some early logs from low numbered  cpus.  I added some
traces to the real case of a left-over fatal error and got this splat:

[    0.331551] smpboot: CPU0: Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz (fam: 06, model: 3f, stepping: 04)
[    0.342117] Performance Events: PEBS fmt2+, 16-deep LBR, Haswell events, full-width counters, Intel PMU driver.
[    0.353471] ... version:                3
[    0.357948] ... bit width:              48
[    0.362523] ... generic registers:      4
[    0.367000] ... value mask:             0000ffffffffffff
[    0.372935] ... max period:             0000ffffffffffff
[    0.378870] ... fixed-purpose events:   3
[    0.383347] ... event mask:             000000070000000f
[    0.392357] x86: Booting SMP configuration:
[    0.397031] .... node  #0, CPUs:          #1
[    0.423373] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[    0.432705]    #2   #3   #4   #5   #6   #7   #8   #9  #10  #11  #12  #13  #14  #15  #16  #17
[    0.706878] .... node  #1, CPUs:    #18  #19  #20  #21  #22  #23  #24  #25  #26  #27  #28  #29  #30  #31  #32  #33  #34  #35
[    1.094625] .... node  #2, CPUs:    #36
[    1.112958] mcelog: cpu 36 bank 8 status be00000000010090
[    1.119201] mcelog() stashed at entry=0
[    1.203602] mce: [Hardware Error]: Machine check events logged
[    1.220313]   #37
[    1.220412] BUG: unable to handle kernel 
[    1.226954]   #38
[    1.229107] NULL pointer dereference at 0000000000000008
[    1.235052] IP: [<ffffffff810980a1>] process_one_work+0x31/0x420
[    1.236829]   #39PGD 0 
[    1.244558] Oops: 0000 [#1] SMP 
[    1.248189] Modules linked in:
[    1.251617] CPU: 36 PID: 263 Comm: kworker/36:0 Not tainted 4.1.0-rc8 #9
[    1.259100]   #40
[    1.259100] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640 05/01/2015
[    1.272832]   #41
[    1.272833] task: ffff88181c1f4470 ti: ffff88181bd24000 task.ti: ffff88181bd24000
[    1.283350] RIP: 0010:[<ffffffff810980a1>] [    1.286433]   #42
 [<ffffffff810980a1>] process_one_work+0x31/0x420
[    1.294976] RSP: 0000:ffff88181bd27e08  EFLAGS: 00010046

I.e. we die on the first attempt to log ... but that attempt is a long way into bringing up all the cpus. 
CPU#36 is the first one from socket2 (counting 0, 1, 2, 3).

-Tony
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: MCE Bug?
  2015-06-17 23:53   ` Luck, Tony
@ 2015-06-18 10:25     ` Borislav Petkov
  2015-06-18 13:10     ` [PATCH] x86/mce: Kill drain_mcelog_buffer() Borislav Petkov
  2015-06-19  9:27     ` [PATCH] x86/mce: Initialize workqueues only once Borislav Petkov
  2 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2015-06-18 10:25 UTC (permalink / raw
  To: Luck, Tony; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org

On Wed, Jun 17, 2015 at 11:53:53PM +0000, Luck, Tony wrote:
> > if you want to give those changes a run, I've uploaded them here:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras
> 
> Latest experiments show that sometimes checking kventd_up() before calling schedule_work()
> helps ... but mostly only when I fake some early logs from low numbered  cpus.  I added some
> traces to the real case of a left-over fatal error and got this splat:

Hmm, and calling mce_log from __mcheck_cpu_init_generic() as you
suggested yesterday seems to work on this box here:

[    1.588713] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (fam: 06, model: 2d, stepping
: 07)
[    1.592727] Performance Events: PEBS fmt1+, 16-deep LBR, SandyBridge events, full-w Broken BIOS d
etected, complain to your hardware vendor.
[    1.997344] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)
[    2.000146] Intel PMU driver.
[    2.001376] ... version:                3
[    2.002919] ... bit width:              48
[    2.004626] ... generic registers:      4
[    2.006137] ... value mask:             0000ffffffffffff
[    2.008064] ... max period:             0000ffffffffffff
[    2.010010] ... fixed-purpose events:   3
[    2.011528] ... event mask:             000000070000000f
[    2.017257] x86: Booting SMP configuration:
[    2.019232] .... node  #0, CPUs:          #1
[    2.033848] microcode: CPU1 microcode updated early to revision 0x710, date = 2013-06-17
[    2.038730] mce: [Hardware Error]: Machine check events logged
[    2.050735]    #2
[    2.050735] microcode: CPU2 microcode updated early to revision 0x710, date = 2013-06-17
[    2.056163] mce: [Hardware Error]: Machine check events logged
[    2.068133]    #3
[    2.068140] microcode: CPU3 microcode updated early to revision 0x710, date = 2013-06-17
[    2.07412.324641] microcode: CPU4 microcode updated early to revision 0x710, date = 2013-06-17
[    2.479404]    #5

Stuff gets logged just fine, no splats later.

Hmmm, more staring...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] x86/mce: Kill drain_mcelog_buffer()
  2015-06-17 23:53   ` Luck, Tony
  2015-06-18 10:25     ` Borislav Petkov
@ 2015-06-18 13:10     ` Borislav Petkov
  2015-06-19  9:27     ` [PATCH] x86/mce: Initialize workqueues only once Borislav Petkov
  2 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2015-06-18 13:10 UTC (permalink / raw
  To: Luck, Tony; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org

From: Borislav Petkov <bp@suse.de>
Subject: [PATCH] x86/mce: Kill drain_mcelog_buffer()

This used to flush out MCEs logged during early boot and which were in
the MCA registers from a previous system run. No need for that now,
since we're moving to a genpool.

Signed-off-by: Borislav Petkov <bp@suse.de>
Suggested-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/mce.h       |  2 +-
 arch/x86/kernel/cpu/mcheck/mce.c | 44 ++--------------------------------------
 drivers/acpi/acpi_extlog.c       |  2 +-
 drivers/edac/i7core_edac.c       |  2 +-
 drivers/edac/mce_amd.c           |  2 +-
 drivers/edac/sb_edac.c           |  2 +-
 6 files changed, 7 insertions(+), 47 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index dfaa4de1dbb4..982dfc3679ad 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -140,7 +140,7 @@ struct mce_vendor_flags {
 extern struct mce_vendor_flags mce_flags;
 
 extern struct mca_config mca_cfg;
-extern void mce_register_decode_chain(struct notifier_block *nb, bool drain);
+extern void mce_register_decode_chain(struct notifier_block *nb);
 extern void mce_unregister_decode_chain(struct notifier_block *nb);
 
 #include <linux/percpu.h>
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index a3660cd1fda3..72b4acef36f4 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -196,55 +196,15 @@ void mce_log(struct mce *mce)
 	set_bit(0, &mce_need_notify);
 }
 
-static void drain_mcelog_buffer(void)
-{
-	unsigned int next, i, prev = 0;
-
-	next = ACCESS_ONCE(mcelog.next);
-
-	do {
-		struct mce *m;
-
-		/* drain what was logged during boot */
-		for (i = prev; i < next; i++) {
-			unsigned long start = jiffies;
-			unsigned retries = 1;
-
-			m = &mcelog.entry[i];
-
-			while (!m->finished) {
-				if (time_after_eq(jiffies, start + 2*retries))
-					retries++;
-
-				cpu_relax();
-
-				if (!m->finished && retries >= 4) {
-					pr_err("skipping error being logged currently!\n");
-					break;
-				}
-			}
-			smp_rmb();
-			atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
-		}
-
-		memset(mcelog.entry + prev, 0, (next - prev) * sizeof(*m));
-		prev = next;
-		next = cmpxchg(&mcelog.next, prev, 0);
-	} while (next != prev);
-}
-
 static struct notifier_block mce_srao_nb;
 
-void mce_register_decode_chain(struct notifier_block *nb, bool drain)
+void mce_register_decode_chain(struct notifier_block *nb)
 {
 	/* Ensure SRAO notifier has the highest priority in the decode chain. */
 	if (nb != &mce_srao_nb && nb->priority == INT_MAX)
 		nb->priority -= 1;
 
 	atomic_notifier_chain_register(&x86_mce_decoder_chain, nb);
-
-	if (drain)
-		drain_mcelog_buffer();
 }
 EXPORT_SYMBOL_GPL(mce_register_decode_chain);
 
@@ -2031,7 +1991,7 @@ __setup("mce", mcheck_enable);
 int __init mcheck_init(void)
 {
 	mcheck_intel_therm_init();
-	mce_register_decode_chain(&mce_srao_nb, false);
+	mce_register_decode_chain(&mce_srao_nb);
 	mcheck_vendor_init_severity();
 
 	return 0;
diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index 07e012e74c1b..b3842ffc19ba 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -286,7 +286,7 @@ static int __init extlog_init(void)
 	 */
 	old_edac_report_status = get_edac_report_status();
 	set_edac_report_status(EDAC_REPORTING_DISABLED);
-	mce_register_decode_chain(&extlog_mce_dec, true);
+	mce_register_decode_chain(&extlog_mce_dec);
 	/* enable OS to be involved to take over management from BIOS */
 	((struct extlog_l1_head *)extlog_l1_addr)->flags |= FLAG_OS_OPTIN;
 
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index 13d77f4a892c..01087a38da22 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -2424,7 +2424,7 @@ static int __init i7core_init(void)
 	pci_rc = pci_register_driver(&i7core_driver);
 
 	if (pci_rc >= 0) {
-		mce_register_decode_chain(&i7_mce_dec, true);
+		mce_register_decode_chain(&i7_mce_dec);
 		return 0;
 	}
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index aca31a237073..58586d59bf8e 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -895,7 +895,7 @@ static int __init mce_amd_init(void)
 
 	pr_info("MCE: In-kernel MCE decoding enabled.\n");
 
-	mce_register_decode_chain(&amd_mce_dec_nb, true);
+	mce_register_decode_chain(&amd_mce_dec_nb);
 
 	return 0;
 }
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 494a23e4f7e6..1acf57ba4c86 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -2556,7 +2556,7 @@ static int __init sbridge_init(void)
 
 	pci_rc = pci_register_driver(&sbridge_driver);
 	if (pci_rc >= 0) {
-		mce_register_decode_chain(&sbridge_mce_dec, true);
+		mce_register_decode_chain(&sbridge_mce_dec);
 		if (get_edac_report_status() == EDAC_REPORTING_DISABLED)
 			sbridge_printk(KERN_WARNING, "Loading driver, error reporting disabled.\n");
 		return 0;
-- 
2.3.5

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] x86/mce: Initialize workqueues only once
  2015-06-17 23:53   ` Luck, Tony
  2015-06-18 10:25     ` Borislav Petkov
  2015-06-18 13:10     ` [PATCH] x86/mce: Kill drain_mcelog_buffer() Borislav Petkov
@ 2015-06-19  9:27     ` Borislav Petkov
  2015-06-19 12:24       ` Borislav Petkov
  2015-06-19 17:36       ` [PATCH] x86/mce: Initialize workqueues only once (alternate proposal) Luck, Tony
  2 siblings, 2 replies; 9+ messages in thread
From: Borislav Petkov @ 2015-06-19  9:27 UTC (permalink / raw
  To: Luck, Tony; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org

On Wed, Jun 17, 2015 at 11:53:53PM +0000, Luck, Tony wrote:
> > if you want to give those changes a run, I've uploaded them here:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras
> 
> Latest experiments show that sometimes checking kventd_up() before calling schedule_work()
> helps ... but mostly only when I fake some early logs from low numbered  cpus.  I added some
> traces to the real case of a left-over fatal error and got this splat:

Here's the fix:

--
From: Borislav Petkov <bp@suse.de>
Subject: [PATCH] x86/mce: Initialize workqueues only once

96d98bfd0366 ("x86/mce: Don't use percpu workqueues") dropped the
per-CPU workqueues in the MCE code but left the initialization per-CPU.
This lead to early boot time splats (below) in the workqueues code
because we were overwriting the workqueue during INIT_WORK() on each new
CPU which would appear.

And since mcheck_cpu_init() happens very early, using an initcall to do
this one-time initialization doesn't fly. Using work->func as a check
whether the workqueue has been initialized already might break in the
future if someone changes ->func, so doesn't work either.

So let's have a simple static boolean flag to do that one-time work.

  mce: [Hardware Error]: Machine check events logged
  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  IP: [<ffffffff810980a1>] process_one_work+0x31/0x420
   PGD 0
  Oops: 0000 [#1] SMP
  Modules linked in:
  CPU: 36 PID: 263 Comm: kworker/36:0 Not tainted 4.1.0-rc8 #1
  Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640 05/01/2015
  task: ffff88181c284470 ti: ffff88181bd94000 task.ti: ffff88181bd94000
  RIP: 0010:[<ffffffff810980a1>] process_one_work+0x31/0x420
  RSP: 0000:ffff88181bd97e08  EFLAGS: 00010046
  RAX: 0000000fffffffe0 RBX: ffffffff81d0fa20 RCX: 0000000000000000
  RDX: 0000000fffffff00 RSI: ffffffff81d0fa20 RDI: ffff88181c2660c0
  RBP: ffff88181bd97e48 R08: ffff88181f416ec0 R09: ffff88181c284470
  R10: 0000000000000002 R11: ffffffff8109e5ac R12: ffff88181c2660c0
  R13: ffff88181f416ec0 R14: 0000000000000000 R15: ffff88181c2660f0
  			     ^^^^^^^^^^^^^^^^^

  27:   4c 0f 45 f2             cmovne %rdx,%r14
  2b:*  49 8b 46 08             mov    0x8(%r14),%rax           <-- trapping instruction
  2f:   44 8b b8 00 01 00 00    mov    0x100(%rax),%r15d

  ...

  Call Trace:
   worker_thread
   ? rescuer_thread
   kthread
   ? kthread_create_on_node
   ret_from_fork
   ? kthread_create_on_node
  Code: 48 89 e5 41 57 41 56 45 31 f6 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 18 48 8b 06 4c 8b 6f 48 48 89 c2 30 d2 a8 04 4c 0f 45 f2 <49> 8b 46 08 44 8b b8 00 01 00 00 41 c1 ef 05 44 89 f8 83 e0 01
  RIP  [<ffffffff810980a1>] process_one_work
   RSP <ffff88181bd97e08>
  CR2: 0000000000000008
  ---[ end trace 8229a011b97532a0 ]---
  Kernel panic - not syncing: Fatal exception
  ---[ end Kernel panic - not syncing: Fatal exception

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 478f81a6d824..216d44d074df 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1645,6 +1645,8 @@ void (*machine_check_vector)(struct pt_regs *, long error_code) =
  */
 void mcheck_cpu_init(struct cpuinfo_x86 *c)
 {
+	static bool __mce_init_once;
+
 	if (mca_cfg.disabled)
 		return;
 
@@ -1665,8 +1667,11 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
 		return;
 	}
 
-	INIT_WORK(&mce_work, mce_process_work);
-	init_irq_work(&mce_irq_work, mce_irq_work_cb);
+	if (!__mce_init_once) {
+		INIT_WORK(&mce_work, mce_process_work);
+		init_irq_work(&mce_irq_work, mce_irq_work_cb);
+		__mce_init_once = 1;
+	}
 
 	machine_check_vector = do_machine_check;
 
-- 
2.3.5

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/mce: Initialize workqueues only once
  2015-06-19  9:27     ` [PATCH] x86/mce: Initialize workqueues only once Borislav Petkov
@ 2015-06-19 12:24       ` Borislav Petkov
  2015-06-19 17:36       ` [PATCH] x86/mce: Initialize workqueues only once (alternate proposal) Luck, Tony
  1 sibling, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2015-06-19 12:24 UTC (permalink / raw
  To: Luck, Tony; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org

On Fri, Jun 19, 2015 at 11:27:18AM +0200, Borislav Petkov wrote:
> Here's the fix:

Injected error:

[  201.237241] EINJ: Error INJection is initialized.
[  246.146290] [Firmware Bug]: APEI: Invalid bit width + offset in GAR [0x1234/64/0/3/0]
[  257.868432] Disabling lock debugging due to kernel taint
[  259.808068] mce: [Hardware Error]: CPU 61: Machine Check Exception: 5 Bank 7: be00000000010090
[  257.868432] mce: [Hardware Error]: Machine check events logged
[  259.808068] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff81344bbf> {intel_idle+0xbf/0x130}
[  259.808068] mce: [Hardware Error]: TSC 98aa47539f28 ADDR 1200 MISC 142303c86 
[  259.808068] mce: [Hardware Error]: PROCESSOR 0:306e7 TIME 1434716502 SOCKET 0 APIC 3 microcode 700
[  259.808068] EDAC sbridge MC3: HANDLING MCE MEMORY ERROR
[  259.808068] EDAC sbridge MC3: CPU 61: Machine Check Exception: 5 Bank 7: be00000000010090
[  259.808068] EDAC sbridge MC3: TSC 98aa47539f28 
[  259.808068] EDAC sbridge MC3: ADDR 1200 EDAC sbridge MC3: MISC 142303c86 
[  259.808068] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1434716502 SOCKET 0 APIC 3
[  259.808068] EDAC MC3: 0 UE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x1 offset:0x200 grain:32 -  recoverable area:DRAM err_code:0001:0090 socket:0 channel_mask:4 rank:0)

Let's see whether something's in the MCA MSRs when this monster reboots
(it takes forever...).

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/mce: Initialize workqueues only once (alternate proposal)
  2015-06-19  9:27     ` [PATCH] x86/mce: Initialize workqueues only once Borislav Petkov
  2015-06-19 12:24       ` Borislav Petkov
@ 2015-06-19 17:36       ` Luck, Tony
  2015-06-19 19:02         ` Borislav Petkov
  1 sibling, 1 reply; 9+ messages in thread
From: Luck, Tony @ 2015-06-19 17:36 UTC (permalink / raw
  To: Borislav Petkov; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org

96d98bfd0366 ("x86/mce: Don't use percpu workqueues") dropped the
per-CPU workqueues in the MCE code but left the initialization per-CPU.
This lead to early boot time splats (below) in the workqueues code
because we were overwriting the workqueue during INIT_WORK() on each new
CPU which would appear.

Move initialization to mcheck_init() so it happens only once.

  mce: [Hardware Error]: Machine check events logged
  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  IP: [<ffffffff810980a1>] process_one_work+0x31/0x420
   PGD 0
  Oops: 0000 [#1] SMP
  Modules linked in:
  CPU: 36 PID: 263 Comm: kworker/36:0 Not tainted 4.1.0-rc8 #1
  Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640
+05/01/2015
  task: ffff88181c284470 ti: ffff88181bd94000 task.ti: ffff88181bd94000
  RIP: 0010:[<ffffffff810980a1>] process_one_work+0x31/0x420
  RSP: 0000:ffff88181bd97e08  EFLAGS: 00010046
  RAX: 0000000fffffffe0 RBX: ffffffff81d0fa20 RCX: 0000000000000000
  RDX: 0000000fffffff00 RSI: ffffffff81d0fa20 RDI: ffff88181c2660c0
  RBP: ffff88181bd97e48 R08: ffff88181f416ec0 R09: ffff88181c284470
  R10: 0000000000000002 R11: ffffffff8109e5ac R12: ffff88181c2660c0
  R13: ffff88181f416ec0 R14: 0000000000000000 R15: ffff88181c2660f0
                             ^^^^^^^^^^^^^^^^^

  27:   4c 0f 45 f2             cmovne %rdx,%r14
  2b:*  49 8b 46 08             mov    0x8(%r14),%rax           <-- trapping instruction
  2f:   44 8b b8 00 01 00 00    mov    0x100(%rax),%r15d

  ...

  Call Trace:
   worker_thread
   ? rescuer_thread
   kthread
   ? kthread_create_on_node
   ret_from_fork
   ? kthread_create_on_node
  Code: 48 89 e5 41 57 41 56 45 31 f6 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 18 48 8b 06 4c
+8b 6f 48 48 89 c2 30 d2 a8 04 4c 0f 45 f2 <49> 8b 46 08 44 8b b8 00 01 00 00 41 c1 ef 05 44
+89 f8 83 e0 01
  RIP  [<ffffffff810980a1>] process_one_work
   RSP <ffff88181bd97e08>
  CR2: 0000000000000008
  ---[ end trace 8229a011b97532a0 ]---
  Kernel panic - not syncing: Fatal exception
  ---[ end Kernel panic - not syncing: Fatal exception

Reported-by: Rui Wang <rui.y.wang@intel.com>
Debugged-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 478f81a6d824..158d9e7db974 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1665,9 +1665,6 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
 		return;
 	}
 
-	INIT_WORK(&mce_work, mce_process_work);
-	init_irq_work(&mce_irq_work, mce_irq_work_cb);
-
 	machine_check_vector = do_machine_check;
 
 	__mcheck_cpu_init_generic();
@@ -1994,6 +1991,9 @@ int __init mcheck_init(void)
 	mce_register_decode_chain(&mce_srao_nb);
 	mcheck_vendor_init_severity();
 
+	INIT_WORK(&mce_work, mce_process_work);
+	init_irq_work(&mce_irq_work, mce_irq_work_cb);
+
 	return 0;
 }
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/mce: Initialize workqueues only once (alternate proposal)
  2015-06-19 17:36       ` [PATCH] x86/mce: Initialize workqueues only once (alternate proposal) Luck, Tony
@ 2015-06-19 19:02         ` Borislav Petkov
  0 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2015-06-19 19:02 UTC (permalink / raw
  To: Luck, Tony; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org

On Fri, Jun 19, 2015 at 10:36:20AM -0700, Luck, Tony wrote:
> 96d98bfd0366 ("x86/mce: Don't use percpu workqueues") dropped the
> per-CPU workqueues in the MCE code but left the initialization per-CPU.
> This lead to early boot time splats (below) in the workqueues code
> because we were overwriting the workqueue during INIT_WORK() on each new
> CPU which would appear.
> 
> Move initialization to mcheck_init() so it happens only once.
> 
>   mce: [Hardware Error]: Machine check events logged
>   BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>   IP: [<ffffffff810980a1>] process_one_work+0x31/0x420
>    PGD 0
>   Oops: 0000 [#1] SMP
>   Modules linked in:
>   CPU: 36 PID: 263 Comm: kworker/36:0 Not tainted 4.1.0-rc8 #1
>   Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640
> +05/01/2015
>   task: ffff88181c284470 ti: ffff88181bd94000 task.ti: ffff88181bd94000
>   RIP: 0010:[<ffffffff810980a1>] process_one_work+0x31/0x420
>   RSP: 0000:ffff88181bd97e08  EFLAGS: 00010046
>   RAX: 0000000fffffffe0 RBX: ffffffff81d0fa20 RCX: 0000000000000000
>   RDX: 0000000fffffff00 RSI: ffffffff81d0fa20 RDI: ffff88181c2660c0
>   RBP: ffff88181bd97e48 R08: ffff88181f416ec0 R09: ffff88181c284470
>   R10: 0000000000000002 R11: ffffffff8109e5ac R12: ffff88181c2660c0
>   R13: ffff88181f416ec0 R14: 0000000000000000 R15: ffff88181c2660f0
>                              ^^^^^^^^^^^^^^^^^
> 
>   27:   4c 0f 45 f2             cmovne %rdx,%r14
>   2b:*  49 8b 46 08             mov    0x8(%r14),%rax           <-- trapping instruction
>   2f:   44 8b b8 00 01 00 00    mov    0x100(%rax),%r15d
> 
>   ...
> 
>   Call Trace:
>    worker_thread
>    ? rescuer_thread
>    kthread
>    ? kthread_create_on_node
>    ret_from_fork
>    ? kthread_create_on_node
>   Code: 48 89 e5 41 57 41 56 45 31 f6 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 18 48 8b 06 4c
> +8b 6f 48 48 89 c2 30 d2 a8 04 4c 0f 45 f2 <49> 8b 46 08 44 8b b8 00 01 00 00 41 c1 ef 05 44
> +89 f8 83 e0 01
>   RIP  [<ffffffff810980a1>] process_one_work
>    RSP <ffff88181bd97e08>
>   CR2: 0000000000000008
>   ---[ end trace 8229a011b97532a0 ]---
>   Kernel panic - not syncing: Fatal exception
>   ---[ end Kernel panic - not syncing: Fatal exception
> 
> Reported-by: Rui Wang <rui.y.wang@intel.com>
> Debugged-by: Borislav Petkov <bp@suse.de>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 478f81a6d824..158d9e7db974 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1665,9 +1665,6 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
>  		return;
>  	}
>  
> -	INIT_WORK(&mce_work, mce_process_work);
> -	init_irq_work(&mce_irq_work, mce_irq_work_cb);
> -
>  	machine_check_vector = do_machine_check;
>  
>  	__mcheck_cpu_init_generic();
> @@ -1994,6 +1991,9 @@ int __init mcheck_init(void)
>  	mce_register_decode_chain(&mce_srao_nb);
>  	mcheck_vendor_init_severity();
>  
> +	INIT_WORK(&mce_work, mce_process_work);
> +	init_irq_work(&mce_irq_work, mce_irq_work_cb);
> +
>  	return 0;

Hmm, and I was under the impression that mcheck_init() runs much
later... Not really.

Anyway, your version is better, I've replaced mine with it.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-06-19 19:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <FC9702EC51E4CA40A875703BEBD6CEF801AE738D@SHSMSX101.ccr.corp.intel.com>
2015-06-17  9:41 ` MCE Bug? Borislav Petkov
2015-06-17 17:45   ` Luck, Tony
2015-06-17 23:53   ` Luck, Tony
2015-06-18 10:25     ` Borislav Petkov
2015-06-18 13:10     ` [PATCH] x86/mce: Kill drain_mcelog_buffer() Borislav Petkov
2015-06-19  9:27     ` [PATCH] x86/mce: Initialize workqueues only once Borislav Petkov
2015-06-19 12:24       ` Borislav Petkov
2015-06-19 17:36       ` [PATCH] x86/mce: Initialize workqueues only once (alternate proposal) Luck, Tony
2015-06-19 19:02         ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).