* Re: MCE Bug?
[not found] <FC9702EC51E4CA40A875703BEBD6CEF801AE738D@SHSMSX101.ccr.corp.intel.com>
@ 2015-06-17 9:41 ` Borislav Petkov
2015-06-17 17:45 ` Luck, Tony
2015-06-17 23:53 ` Luck, Tony
0 siblings, 2 replies; 9+ messages in thread
From: Borislav Petkov @ 2015-06-17 9:41 UTC (permalink / raw
To: Wang, Rui Y; +Cc: Luck, Tony, Chen, Gong, linux-kernel@vger.kernel.org
Hi Rui,
On Wed, Jun 17, 2015 at 02:04:56AM +0000, Wang, Rui Y wrote:
> Is it a known problem? I'm based on Linux 4.1.0-rc3-7.
Yeah, I triggered a similar thing in conjunction with testing Gong's
stuff:
https://lkml.kernel.org/r/20150611082747.GA30391@pd.tnic
And I was waiting in line to get a chance to do some injection on our
EINJ box here too. But it seems you have the required setup already so
if you want to give those changes a run, I've uploaded them here:
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras
It'll be much appreciated.
Thanks!
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: MCE Bug?
2015-06-17 9:41 ` MCE Bug? Borislav Petkov
@ 2015-06-17 17:45 ` Luck, Tony
2015-06-17 23:53 ` Luck, Tony
1 sibling, 0 replies; 9+ messages in thread
From: Luck, Tony @ 2015-06-17 17:45 UTC (permalink / raw
To: Borislav Petkov; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org
On Wed, Jun 17, 2015 at 11:41:56AM +0200, Borislav Petkov wrote:
> And I was waiting in line to get a chance to do some injection on our
> EINJ box here too. But it seems you have the required setup already so
> if you want to give those changes a run, I've uploaded them here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras
>
> It'll be much appreciated.
and the answer is <drum roll> ....
no. :-(
Here's the console log:
[ 0.190921] Initializing cgroup subsys blkio
[ 0.195701] Initializing cgroup subsys memory
[ 0.200621] Initializing cgroup subsys devices
[ 0.205591] Initializing cgroup subsys freezer
[ 0.210563] Initializing cgroup subsys net_cls
[ 0.215535] Initializing cgroup subsys perf_event
[ 0.220837] Initializing cgroup subsys hugetlb
[ 0.225884] CPU: Physical Processor ID: 0
[ 0.230368] CPU: Processor Core ID: 0
[ 0.235373] mce: CPU supports 22 MCE banks
[ 0.239995] CPU0: Thermal monitoring enabled (TM1)
[ 0.245388] process: using mwait in idle threads
[ 0.250554] Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 1024
[ 0.257367] Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 1024, 1GB 4
[ 0.265023] Freeing SMP alternatives memory: 28K (ffffffff81cd9000 - ffffffff81ce0000)
[ 0.284681] ftrace: allocating 25590 entries in 100 pages
[ 0.301981] x2apic: IRQ remapping doesn't support X2APIC mode
[ 0.308540] Switched APIC routing to physical flat.
[ 0.314866] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.331616] smpboot: CPU0: Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz (fam: 06, model: 3f, stepping: 04)
[ 0.342176] Performance Events: PEBS fmt2+, 16-deep LBR, Haswell events, full-width counters, Intel PMU driver.
[ 0.353531] ... version: 3
[ 0.358009] ... bit width: 48
[ 0.362576] ... generic registers: 4
[ 0.367056] ... value mask: 0000ffffffffffff
[ 0.372992] ... max period: 0000ffffffffffff
[ 0.378926] ... fixed-purpose events: 3
[ 0.383404] ... event mask: 000000070000000f
[ 0.392352] x86: Booting SMP configuration:
[ 0.397027] .... node #0, CPUs: #1
[ 0.423364] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[ 0.432694] #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17
[ 0.706722] .... node #1, CPUs: #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35
[ 1.094522] .... node #2, CPUs: #36
[ 1.192483] mce: [Hardware Error]: Machine check events logged
[ 1.209188] #37
[ 1.209279] BUG: unable to handle kernel
[ 1.215830] #38
[ 1.217982] NULL pointer dereference at 0000000000000008
[ 1.223925] IP: [<ffffffff810980a1>] process_one_work+0x31/0x420
[ 1.225696] #39PGD 0
[ 1.233428] Oops: 0000 [#1] SMP
[ 1.237059] Modules linked in:
[ 1.240486] CPU: 36 PID: 263 Comm: kworker/36:0 Not tainted 4.1.0-rc8 #1
[ 1.247969] #40
[ 1.247969] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640 05/01/2015
[ 1.261708] #41
[ 1.261711] task: ffff88181c284470 ti: ffff88181bd94000 task.ti: ffff88181bd94000
[ 1.272235] RIP: 0010:[<ffffffff810980a1>] [ 1.275305] #42
[<ffffffff810980a1>] process_one_work+0x31/0x420
[ 1.283854] RSP: 0000:ffff88181bd97e08 EFLAGS: 00010046
[ 1.289799] RAX: 0000000fffffffe0 RBX: ffffffff81d0fa20 RCX: 0000000000000000
[ 1.297780] #43
[ 1.297780] RDX: 0000000fffffff00 RSI: ffffffff81d0fa20 RDI: ffff88181c2660c0
[ 1.307914] RBP: ffff88181bd97e48 R08: ffff88181f416ec0 R09: ffff88181c284470
[ 1.315892] #44
[ 1.315892] R10: 0000000000000002 R11: ffffffff8109e5ac R12: ffff88181c2660c0
[ 1.326023] #45
[ 1.326024] R13: ffff88181f416ec0 R14: 0000000000000000 R15: ffff88181c2660f0
[ 1.336151] FS: 0000000000000000(0000) GS:ffff88181f400000(0000) knlGS:0000000000000000
[ 1.345207] #46
[ 1.345209] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.353782] CR2: 00000000000000b8 CR3: 00000000019ca000 CR4: 00000000001406e0
[ 1.361764] #47
[ 1.361765] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1.371900] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1.379879] #48
[ 1.379879] Stack:
[ 1.384277] 0000000000000282 00000000bce8578f ffff88181f416f50 ffff88181c2660c0[ 1.390883] #49
[ 1.394749] ffff88181f416ed8 ffff88181f416ec0 ffff88181c284470 ffff88181c2660f0
[ 1.403058] ffff88181bd97eb8 ffffffff81098982[ 1.407588] #50
ffff88181f4171a0 ffff88181f416ed8
[ 1.413538] Call Trace:
[ 1.416274] [<ffffffff81098982>] worker_thread+0x112/0x520
[ 1.422518] [<ffffffff81098870>] ? rescuer_thread+0x3e0/0x3e0
[ 1.429048] #51
[ 1.429048] [<ffffffff8109e6d8>] kthread+0xd8/0xf0
[ 1.436650] [<ffffffff8109e600>] ? kthread_create_on_node+0x1b0/0x1b0
[ 1.443964] #52
[ 1.443966] [<ffffffff816a0322>] ret_from_fork+0x42/0x70
[ 1.452164] [<ffffffff8109e600>] ? kthread_create_on_node+0x1b0/0x1b0
[ 1.459469] #53
[ 1.459471] Code: 48 89 e5 41 57 41 56 45 31 f6 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 18 48 8b 06 4c 8b 6f 48 48 89 [ 1.473959]
[ 1.473959] .... node #3, CPUs: #54
c2 30 d2 a8 04 4c 0f 45 f2 <49> 8b 46 08 44 8b b8 00 01 00 00 41 c1 ef 05 44 89 f8 83 e0 01
[ 1.489433] RIP [<ffffffff810980a1>] process_one_work+0x31/0x420
[ 1.496261] RSP <ffff88181bd97e08>
[ 1.500160] CR2: 0000000000000008
[ 1.503872] ---[ end trace 8229a011b97532a0 ]---
[ 1.509027] Kernel panic - not syncing: Fatal exception
[ 1.514890] ---[ end Kernel panic - not syncing: Fatal exception
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: MCE Bug?
2015-06-17 9:41 ` MCE Bug? Borislav Petkov
2015-06-17 17:45 ` Luck, Tony
@ 2015-06-17 23:53 ` Luck, Tony
2015-06-18 10:25 ` Borislav Petkov
` (2 more replies)
1 sibling, 3 replies; 9+ messages in thread
From: Luck, Tony @ 2015-06-17 23:53 UTC (permalink / raw
To: Borislav Petkov, Wang, Rui Y; +Cc: Chen, Gong, linux-kernel@vger.kernel.org
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2746 bytes --]
> if you want to give those changes a run, I've uploaded them here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras
Latest experiments show that sometimes checking kventd_up() before calling schedule_work()
helps ... but mostly only when I fake some early logs from low numbered cpus. I added some
traces to the real case of a left-over fatal error and got this splat:
[ 0.331551] smpboot: CPU0: Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz (fam: 06, model: 3f, stepping: 04)
[ 0.342117] Performance Events: PEBS fmt2+, 16-deep LBR, Haswell events, full-width counters, Intel PMU driver.
[ 0.353471] ... version: 3
[ 0.357948] ... bit width: 48
[ 0.362523] ... generic registers: 4
[ 0.367000] ... value mask: 0000ffffffffffff
[ 0.372935] ... max period: 0000ffffffffffff
[ 0.378870] ... fixed-purpose events: 3
[ 0.383347] ... event mask: 000000070000000f
[ 0.392357] x86: Booting SMP configuration:
[ 0.397031] .... node #0, CPUs: #1
[ 0.423373] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[ 0.432705] #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17
[ 0.706878] .... node #1, CPUs: #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35
[ 1.094625] .... node #2, CPUs: #36
[ 1.112958] mcelog: cpu 36 bank 8 status be00000000010090
[ 1.119201] mcelog() stashed at entry=0
[ 1.203602] mce: [Hardware Error]: Machine check events logged
[ 1.220313] #37
[ 1.220412] BUG: unable to handle kernel
[ 1.226954] #38
[ 1.229107] NULL pointer dereference at 0000000000000008
[ 1.235052] IP: [<ffffffff810980a1>] process_one_work+0x31/0x420
[ 1.236829] #39PGD 0
[ 1.244558] Oops: 0000 [#1] SMP
[ 1.248189] Modules linked in:
[ 1.251617] CPU: 36 PID: 263 Comm: kworker/36:0 Not tainted 4.1.0-rc8 #9
[ 1.259100] #40
[ 1.259100] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640 05/01/2015
[ 1.272832] #41
[ 1.272833] task: ffff88181c1f4470 ti: ffff88181bd24000 task.ti: ffff88181bd24000
[ 1.283350] RIP: 0010:[<ffffffff810980a1>] [ 1.286433] #42
[<ffffffff810980a1>] process_one_work+0x31/0x420
[ 1.294976] RSP: 0000:ffff88181bd27e08 EFLAGS: 00010046
I.e. we die on the first attempt to log ... but that attempt is a long way into bringing up all the cpus.
CPU#36 is the first one from socket2 (counting 0, 1, 2, 3).
-Tony
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: MCE Bug?
2015-06-17 23:53 ` Luck, Tony
@ 2015-06-18 10:25 ` Borislav Petkov
2015-06-18 13:10 ` [PATCH] x86/mce: Kill drain_mcelog_buffer() Borislav Petkov
2015-06-19 9:27 ` [PATCH] x86/mce: Initialize workqueues only once Borislav Petkov
2 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2015-06-18 10:25 UTC (permalink / raw
To: Luck, Tony; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org
On Wed, Jun 17, 2015 at 11:53:53PM +0000, Luck, Tony wrote:
> > if you want to give those changes a run, I've uploaded them here:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras
>
> Latest experiments show that sometimes checking kventd_up() before calling schedule_work()
> helps ... but mostly only when I fake some early logs from low numbered cpus. I added some
> traces to the real case of a left-over fatal error and got this splat:
Hmm, and calling mce_log from __mcheck_cpu_init_generic() as you
suggested yesterday seems to work on this box here:
[ 1.588713] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (fam: 06, model: 2d, stepping
: 07)
[ 1.592727] Performance Events: PEBS fmt1+, 16-deep LBR, SandyBridge events, full-w Broken BIOS d
etected, complain to your hardware vendor.
[ 1.997344] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)
[ 2.000146] Intel PMU driver.
[ 2.001376] ... version: 3
[ 2.002919] ... bit width: 48
[ 2.004626] ... generic registers: 4
[ 2.006137] ... value mask: 0000ffffffffffff
[ 2.008064] ... max period: 0000ffffffffffff
[ 2.010010] ... fixed-purpose events: 3
[ 2.011528] ... event mask: 000000070000000f
[ 2.017257] x86: Booting SMP configuration:
[ 2.019232] .... node #0, CPUs: #1
[ 2.033848] microcode: CPU1 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.038730] mce: [Hardware Error]: Machine check events logged
[ 2.050735] #2
[ 2.050735] microcode: CPU2 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.056163] mce: [Hardware Error]: Machine check events logged
[ 2.068133] #3
[ 2.068140] microcode: CPU3 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.07412.324641] microcode: CPU4 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.479404] #5
Stuff gets logged just fine, no splats later.
Hmmm, more staring...
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] x86/mce: Kill drain_mcelog_buffer()
2015-06-17 23:53 ` Luck, Tony
2015-06-18 10:25 ` Borislav Petkov
@ 2015-06-18 13:10 ` Borislav Petkov
2015-06-19 9:27 ` [PATCH] x86/mce: Initialize workqueues only once Borislav Petkov
2 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2015-06-18 13:10 UTC (permalink / raw
To: Luck, Tony; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org
From: Borislav Petkov <bp@suse.de>
Subject: [PATCH] x86/mce: Kill drain_mcelog_buffer()
This used to flush out MCEs logged during early boot and which were in
the MCA registers from a previous system run. No need for that now,
since we're moving to a genpool.
Signed-off-by: Borislav Petkov <bp@suse.de>
Suggested-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/include/asm/mce.h | 2 +-
arch/x86/kernel/cpu/mcheck/mce.c | 44 ++--------------------------------------
drivers/acpi/acpi_extlog.c | 2 +-
drivers/edac/i7core_edac.c | 2 +-
drivers/edac/mce_amd.c | 2 +-
drivers/edac/sb_edac.c | 2 +-
6 files changed, 7 insertions(+), 47 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index dfaa4de1dbb4..982dfc3679ad 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -140,7 +140,7 @@ struct mce_vendor_flags {
extern struct mce_vendor_flags mce_flags;
extern struct mca_config mca_cfg;
-extern void mce_register_decode_chain(struct notifier_block *nb, bool drain);
+extern void mce_register_decode_chain(struct notifier_block *nb);
extern void mce_unregister_decode_chain(struct notifier_block *nb);
#include <linux/percpu.h>
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index a3660cd1fda3..72b4acef36f4 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -196,55 +196,15 @@ void mce_log(struct mce *mce)
set_bit(0, &mce_need_notify);
}
-static void drain_mcelog_buffer(void)
-{
- unsigned int next, i, prev = 0;
-
- next = ACCESS_ONCE(mcelog.next);
-
- do {
- struct mce *m;
-
- /* drain what was logged during boot */
- for (i = prev; i < next; i++) {
- unsigned long start = jiffies;
- unsigned retries = 1;
-
- m = &mcelog.entry[i];
-
- while (!m->finished) {
- if (time_after_eq(jiffies, start + 2*retries))
- retries++;
-
- cpu_relax();
-
- if (!m->finished && retries >= 4) {
- pr_err("skipping error being logged currently!\n");
- break;
- }
- }
- smp_rmb();
- atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
- }
-
- memset(mcelog.entry + prev, 0, (next - prev) * sizeof(*m));
- prev = next;
- next = cmpxchg(&mcelog.next, prev, 0);
- } while (next != prev);
-}
-
static struct notifier_block mce_srao_nb;
-void mce_register_decode_chain(struct notifier_block *nb, bool drain)
+void mce_register_decode_chain(struct notifier_block *nb)
{
/* Ensure SRAO notifier has the highest priority in the decode chain. */
if (nb != &mce_srao_nb && nb->priority == INT_MAX)
nb->priority -= 1;
atomic_notifier_chain_register(&x86_mce_decoder_chain, nb);
-
- if (drain)
- drain_mcelog_buffer();
}
EXPORT_SYMBOL_GPL(mce_register_decode_chain);
@@ -2031,7 +1991,7 @@ __setup("mce", mcheck_enable);
int __init mcheck_init(void)
{
mcheck_intel_therm_init();
- mce_register_decode_chain(&mce_srao_nb, false);
+ mce_register_decode_chain(&mce_srao_nb);
mcheck_vendor_init_severity();
return 0;
diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index 07e012e74c1b..b3842ffc19ba 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -286,7 +286,7 @@ static int __init extlog_init(void)
*/
old_edac_report_status = get_edac_report_status();
set_edac_report_status(EDAC_REPORTING_DISABLED);
- mce_register_decode_chain(&extlog_mce_dec, true);
+ mce_register_decode_chain(&extlog_mce_dec);
/* enable OS to be involved to take over management from BIOS */
((struct extlog_l1_head *)extlog_l1_addr)->flags |= FLAG_OS_OPTIN;
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index 13d77f4a892c..01087a38da22 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -2424,7 +2424,7 @@ static int __init i7core_init(void)
pci_rc = pci_register_driver(&i7core_driver);
if (pci_rc >= 0) {
- mce_register_decode_chain(&i7_mce_dec, true);
+ mce_register_decode_chain(&i7_mce_dec);
return 0;
}
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index aca31a237073..58586d59bf8e 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -895,7 +895,7 @@ static int __init mce_amd_init(void)
pr_info("MCE: In-kernel MCE decoding enabled.\n");
- mce_register_decode_chain(&amd_mce_dec_nb, true);
+ mce_register_decode_chain(&amd_mce_dec_nb);
return 0;
}
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 494a23e4f7e6..1acf57ba4c86 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -2556,7 +2556,7 @@ static int __init sbridge_init(void)
pci_rc = pci_register_driver(&sbridge_driver);
if (pci_rc >= 0) {
- mce_register_decode_chain(&sbridge_mce_dec, true);
+ mce_register_decode_chain(&sbridge_mce_dec);
if (get_edac_report_status() == EDAC_REPORTING_DISABLED)
sbridge_printk(KERN_WARNING, "Loading driver, error reporting disabled.\n");
return 0;
--
2.3.5
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH] x86/mce: Initialize workqueues only once
2015-06-17 23:53 ` Luck, Tony
2015-06-18 10:25 ` Borislav Petkov
2015-06-18 13:10 ` [PATCH] x86/mce: Kill drain_mcelog_buffer() Borislav Petkov
@ 2015-06-19 9:27 ` Borislav Petkov
2015-06-19 12:24 ` Borislav Petkov
2015-06-19 17:36 ` [PATCH] x86/mce: Initialize workqueues only once (alternate proposal) Luck, Tony
2 siblings, 2 replies; 9+ messages in thread
From: Borislav Petkov @ 2015-06-19 9:27 UTC (permalink / raw
To: Luck, Tony; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org
On Wed, Jun 17, 2015 at 11:53:53PM +0000, Luck, Tony wrote:
> > if you want to give those changes a run, I've uploaded them here:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras
>
> Latest experiments show that sometimes checking kventd_up() before calling schedule_work()
> helps ... but mostly only when I fake some early logs from low numbered cpus. I added some
> traces to the real case of a left-over fatal error and got this splat:
Here's the fix:
--
From: Borislav Petkov <bp@suse.de>
Subject: [PATCH] x86/mce: Initialize workqueues only once
96d98bfd0366 ("x86/mce: Don't use percpu workqueues") dropped the
per-CPU workqueues in the MCE code but left the initialization per-CPU.
This lead to early boot time splats (below) in the workqueues code
because we were overwriting the workqueue during INIT_WORK() on each new
CPU which would appear.
And since mcheck_cpu_init() happens very early, using an initcall to do
this one-time initialization doesn't fly. Using work->func as a check
whether the workqueue has been initialized already might break in the
future if someone changes ->func, so doesn't work either.
So let's have a simple static boolean flag to do that one-time work.
mce: [Hardware Error]: Machine check events logged
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff810980a1>] process_one_work+0x31/0x420
PGD 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 36 PID: 263 Comm: kworker/36:0 Not tainted 4.1.0-rc8 #1
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640 05/01/2015
task: ffff88181c284470 ti: ffff88181bd94000 task.ti: ffff88181bd94000
RIP: 0010:[<ffffffff810980a1>] process_one_work+0x31/0x420
RSP: 0000:ffff88181bd97e08 EFLAGS: 00010046
RAX: 0000000fffffffe0 RBX: ffffffff81d0fa20 RCX: 0000000000000000
RDX: 0000000fffffff00 RSI: ffffffff81d0fa20 RDI: ffff88181c2660c0
RBP: ffff88181bd97e48 R08: ffff88181f416ec0 R09: ffff88181c284470
R10: 0000000000000002 R11: ffffffff8109e5ac R12: ffff88181c2660c0
R13: ffff88181f416ec0 R14: 0000000000000000 R15: ffff88181c2660f0
^^^^^^^^^^^^^^^^^
27: 4c 0f 45 f2 cmovne %rdx,%r14
2b:* 49 8b 46 08 mov 0x8(%r14),%rax <-- trapping instruction
2f: 44 8b b8 00 01 00 00 mov 0x100(%rax),%r15d
...
Call Trace:
worker_thread
? rescuer_thread
kthread
? kthread_create_on_node
ret_from_fork
? kthread_create_on_node
Code: 48 89 e5 41 57 41 56 45 31 f6 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 18 48 8b 06 4c 8b 6f 48 48 89 c2 30 d2 a8 04 4c 0f 45 f2 <49> 8b 46 08 44 8b b8 00 01 00 00 41 c1 ef 05 44 89 f8 83 e0 01
RIP [<ffffffff810980a1>] process_one_work
RSP <ffff88181bd97e08>
CR2: 0000000000000008
---[ end trace 8229a011b97532a0 ]---
Kernel panic - not syncing: Fatal exception
---[ end Kernel panic - not syncing: Fatal exception
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/kernel/cpu/mcheck/mce.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 478f81a6d824..216d44d074df 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1645,6 +1645,8 @@ void (*machine_check_vector)(struct pt_regs *, long error_code) =
*/
void mcheck_cpu_init(struct cpuinfo_x86 *c)
{
+ static bool __mce_init_once;
+
if (mca_cfg.disabled)
return;
@@ -1665,8 +1667,11 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
return;
}
- INIT_WORK(&mce_work, mce_process_work);
- init_irq_work(&mce_irq_work, mce_irq_work_cb);
+ if (!__mce_init_once) {
+ INIT_WORK(&mce_work, mce_process_work);
+ init_irq_work(&mce_irq_work, mce_irq_work_cb);
+ __mce_init_once = 1;
+ }
machine_check_vector = do_machine_check;
--
2.3.5
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] x86/mce: Initialize workqueues only once
2015-06-19 9:27 ` [PATCH] x86/mce: Initialize workqueues only once Borislav Petkov
@ 2015-06-19 12:24 ` Borislav Petkov
2015-06-19 17:36 ` [PATCH] x86/mce: Initialize workqueues only once (alternate proposal) Luck, Tony
1 sibling, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2015-06-19 12:24 UTC (permalink / raw
To: Luck, Tony; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org
On Fri, Jun 19, 2015 at 11:27:18AM +0200, Borislav Petkov wrote:
> Here's the fix:
Injected error:
[ 201.237241] EINJ: Error INJection is initialized.
[ 246.146290] [Firmware Bug]: APEI: Invalid bit width + offset in GAR [0x1234/64/0/3/0]
[ 257.868432] Disabling lock debugging due to kernel taint
[ 259.808068] mce: [Hardware Error]: CPU 61: Machine Check Exception: 5 Bank 7: be00000000010090
[ 257.868432] mce: [Hardware Error]: Machine check events logged
[ 259.808068] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff81344bbf> {intel_idle+0xbf/0x130}
[ 259.808068] mce: [Hardware Error]: TSC 98aa47539f28 ADDR 1200 MISC 142303c86
[ 259.808068] mce: [Hardware Error]: PROCESSOR 0:306e7 TIME 1434716502 SOCKET 0 APIC 3 microcode 700
[ 259.808068] EDAC sbridge MC3: HANDLING MCE MEMORY ERROR
[ 259.808068] EDAC sbridge MC3: CPU 61: Machine Check Exception: 5 Bank 7: be00000000010090
[ 259.808068] EDAC sbridge MC3: TSC 98aa47539f28
[ 259.808068] EDAC sbridge MC3: ADDR 1200 EDAC sbridge MC3: MISC 142303c86
[ 259.808068] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1434716502 SOCKET 0 APIC 3
[ 259.808068] EDAC MC3: 0 UE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x1 offset:0x200 grain:32 - recoverable area:DRAM err_code:0001:0090 socket:0 channel_mask:4 rank:0)
Let's see whether something's in the MCA MSRs when this monster reboots
(it takes forever...).
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86/mce: Initialize workqueues only once (alternate proposal)
2015-06-19 9:27 ` [PATCH] x86/mce: Initialize workqueues only once Borislav Petkov
2015-06-19 12:24 ` Borislav Petkov
@ 2015-06-19 17:36 ` Luck, Tony
2015-06-19 19:02 ` Borislav Petkov
1 sibling, 1 reply; 9+ messages in thread
From: Luck, Tony @ 2015-06-19 17:36 UTC (permalink / raw
To: Borislav Petkov; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org
96d98bfd0366 ("x86/mce: Don't use percpu workqueues") dropped the
per-CPU workqueues in the MCE code but left the initialization per-CPU.
This lead to early boot time splats (below) in the workqueues code
because we were overwriting the workqueue during INIT_WORK() on each new
CPU which would appear.
Move initialization to mcheck_init() so it happens only once.
mce: [Hardware Error]: Machine check events logged
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff810980a1>] process_one_work+0x31/0x420
PGD 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 36 PID: 263 Comm: kworker/36:0 Not tainted 4.1.0-rc8 #1
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640
+05/01/2015
task: ffff88181c284470 ti: ffff88181bd94000 task.ti: ffff88181bd94000
RIP: 0010:[<ffffffff810980a1>] process_one_work+0x31/0x420
RSP: 0000:ffff88181bd97e08 EFLAGS: 00010046
RAX: 0000000fffffffe0 RBX: ffffffff81d0fa20 RCX: 0000000000000000
RDX: 0000000fffffff00 RSI: ffffffff81d0fa20 RDI: ffff88181c2660c0
RBP: ffff88181bd97e48 R08: ffff88181f416ec0 R09: ffff88181c284470
R10: 0000000000000002 R11: ffffffff8109e5ac R12: ffff88181c2660c0
R13: ffff88181f416ec0 R14: 0000000000000000 R15: ffff88181c2660f0
^^^^^^^^^^^^^^^^^
27: 4c 0f 45 f2 cmovne %rdx,%r14
2b:* 49 8b 46 08 mov 0x8(%r14),%rax <-- trapping instruction
2f: 44 8b b8 00 01 00 00 mov 0x100(%rax),%r15d
...
Call Trace:
worker_thread
? rescuer_thread
kthread
? kthread_create_on_node
ret_from_fork
? kthread_create_on_node
Code: 48 89 e5 41 57 41 56 45 31 f6 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 18 48 8b 06 4c
+8b 6f 48 48 89 c2 30 d2 a8 04 4c 0f 45 f2 <49> 8b 46 08 44 8b b8 00 01 00 00 41 c1 ef 05 44
+89 f8 83 e0 01
RIP [<ffffffff810980a1>] process_one_work
RSP <ffff88181bd97e08>
CR2: 0000000000000008
---[ end trace 8229a011b97532a0 ]---
Kernel panic - not syncing: Fatal exception
---[ end Kernel panic - not syncing: Fatal exception
Reported-by: Rui Wang <rui.y.wang@intel.com>
Debugged-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 478f81a6d824..158d9e7db974 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1665,9 +1665,6 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
return;
}
- INIT_WORK(&mce_work, mce_process_work);
- init_irq_work(&mce_irq_work, mce_irq_work_cb);
-
machine_check_vector = do_machine_check;
__mcheck_cpu_init_generic();
@@ -1994,6 +1991,9 @@ int __init mcheck_init(void)
mce_register_decode_chain(&mce_srao_nb);
mcheck_vendor_init_severity();
+ INIT_WORK(&mce_work, mce_process_work);
+ init_irq_work(&mce_irq_work, mce_irq_work_cb);
+
return 0;
}
--
2.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] x86/mce: Initialize workqueues only once (alternate proposal)
2015-06-19 17:36 ` [PATCH] x86/mce: Initialize workqueues only once (alternate proposal) Luck, Tony
@ 2015-06-19 19:02 ` Borislav Petkov
0 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2015-06-19 19:02 UTC (permalink / raw
To: Luck, Tony; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org
On Fri, Jun 19, 2015 at 10:36:20AM -0700, Luck, Tony wrote:
> 96d98bfd0366 ("x86/mce: Don't use percpu workqueues") dropped the
> per-CPU workqueues in the MCE code but left the initialization per-CPU.
> This lead to early boot time splats (below) in the workqueues code
> because we were overwriting the workqueue during INIT_WORK() on each new
> CPU which would appear.
>
> Move initialization to mcheck_init() so it happens only once.
>
> mce: [Hardware Error]: Machine check events logged
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: [<ffffffff810980a1>] process_one_work+0x31/0x420
> PGD 0
> Oops: 0000 [#1] SMP
> Modules linked in:
> CPU: 36 PID: 263 Comm: kworker/36:0 Not tainted 4.1.0-rc8 #1
> Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640
> +05/01/2015
> task: ffff88181c284470 ti: ffff88181bd94000 task.ti: ffff88181bd94000
> RIP: 0010:[<ffffffff810980a1>] process_one_work+0x31/0x420
> RSP: 0000:ffff88181bd97e08 EFLAGS: 00010046
> RAX: 0000000fffffffe0 RBX: ffffffff81d0fa20 RCX: 0000000000000000
> RDX: 0000000fffffff00 RSI: ffffffff81d0fa20 RDI: ffff88181c2660c0
> RBP: ffff88181bd97e48 R08: ffff88181f416ec0 R09: ffff88181c284470
> R10: 0000000000000002 R11: ffffffff8109e5ac R12: ffff88181c2660c0
> R13: ffff88181f416ec0 R14: 0000000000000000 R15: ffff88181c2660f0
> ^^^^^^^^^^^^^^^^^
>
> 27: 4c 0f 45 f2 cmovne %rdx,%r14
> 2b:* 49 8b 46 08 mov 0x8(%r14),%rax <-- trapping instruction
> 2f: 44 8b b8 00 01 00 00 mov 0x100(%rax),%r15d
>
> ...
>
> Call Trace:
> worker_thread
> ? rescuer_thread
> kthread
> ? kthread_create_on_node
> ret_from_fork
> ? kthread_create_on_node
> Code: 48 89 e5 41 57 41 56 45 31 f6 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 18 48 8b 06 4c
> +8b 6f 48 48 89 c2 30 d2 a8 04 4c 0f 45 f2 <49> 8b 46 08 44 8b b8 00 01 00 00 41 c1 ef 05 44
> +89 f8 83 e0 01
> RIP [<ffffffff810980a1>] process_one_work
> RSP <ffff88181bd97e08>
> CR2: 0000000000000008
> ---[ end trace 8229a011b97532a0 ]---
> Kernel panic - not syncing: Fatal exception
> ---[ end Kernel panic - not syncing: Fatal exception
>
> Reported-by: Rui Wang <rui.y.wang@intel.com>
> Debugged-by: Borislav Petkov <bp@suse.de>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 478f81a6d824..158d9e7db974 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1665,9 +1665,6 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
> return;
> }
>
> - INIT_WORK(&mce_work, mce_process_work);
> - init_irq_work(&mce_irq_work, mce_irq_work_cb);
> -
> machine_check_vector = do_machine_check;
>
> __mcheck_cpu_init_generic();
> @@ -1994,6 +1991,9 @@ int __init mcheck_init(void)
> mce_register_decode_chain(&mce_srao_nb);
> mcheck_vendor_init_severity();
>
> + INIT_WORK(&mce_work, mce_process_work);
> + init_irq_work(&mce_irq_work, mce_irq_work_cb);
> +
> return 0;
Hmm, and I was under the impression that mcheck_init() runs much
later... Not really.
Anyway, your version is better, I've replaced mine with it.
Thanks.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-06-19 19:02 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <FC9702EC51E4CA40A875703BEBD6CEF801AE738D@SHSMSX101.ccr.corp.intel.com>
2015-06-17 9:41 ` MCE Bug? Borislav Petkov
2015-06-17 17:45 ` Luck, Tony
2015-06-17 23:53 ` Luck, Tony
2015-06-18 10:25 ` Borislav Petkov
2015-06-18 13:10 ` [PATCH] x86/mce: Kill drain_mcelog_buffer() Borislav Petkov
2015-06-19 9:27 ` [PATCH] x86/mce: Initialize workqueues only once Borislav Petkov
2015-06-19 12:24 ` Borislav Petkov
2015-06-19 17:36 ` [PATCH] x86/mce: Initialize workqueues only once (alternate proposal) Luck, Tony
2015-06-19 19:02 ` Borislav Petkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).