From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753058AbcBCJfq (ORCPT ); Wed, 3 Feb 2016 04:35:46 -0500 Received: from mail-wm0-f52.google.com ([74.125.82.52]:35716 "EHLO mail-wm0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752616AbcBCJfg (ORCPT ); Wed, 3 Feb 2016 04:35:36 -0500 From: Jiri Slaby Subject: Re: Crashes with 874bbfe600a6 in 3.18.25 To: Thomas Gleixner , Petr Mladek References: <20160120211926.GJ10810@quack.suse.cz> <20160120213901.GA755895@devbig084.prn1.facebook.com> <20160121095234.GN10810@quack.suse.cz> <56A1817C.10300@oracle.com> <20160122160903.GH32380@htj.duckdns.org> <1453515623.3734.156.camel@decadent.org.uk> <20160126093400.GV24938@quack.suse.cz> <20160126111438.GA731@pathway.suse.cz> Cc: Jan Kara , Ben Hutchings , Tejun Heo , Sasha Levin , Shaohua Li , LKML , stable@vger.kernel.org, Daniel Bilik Message-ID: <56B1C9E4.4020400@suse.cz> Date: Wed, 3 Feb 2016 10:35:32 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/26/2016, 02:09 PM, Thomas Gleixner wrote: > On Tue, 26 Jan 2016, Petr Mladek wrote: >> On Tue 2016-01-26 10:34:00, Jan Kara wrote: >>> On Sat 23-01-16 17:11:54, Thomas Gleixner wrote: >>>> On Sat, 23 Jan 2016, Ben Hutchings wrote: >>>>> On Fri, 2016-01-22 at 11:09 -0500, Tejun Heo wrote: >>>>>>> Looks like it requires more than trivial backport (I think). Tejun? >>>>>> >>>>>> The timer migration has changed quite a bit. Given that we've never >>>>>> seen vmstat work crashing in 3.18 era, I wonder whether the right >>>>>> thing to do here is reverting 874bbfe600a6 from 3.18 stable? >>>>> >>>>> It's not just 3.18 that has this; 874bbfe600a6 was backported to all >>>>> stable branches from 3.10 onward. Only the 4.2-ckt branch has >>>>> 22b886dd10180939. >>>> >>>> 22b886dd10180939 fixes a bug which was introduced with the timer wheel >>>> overhaul in 4.2. So only 4.2/3 should have it backported. >>> >>> Thanks for explanation. So do I understand right that timers are always run >>> on the calling CPU in kernels prior to 4.2 and thus commit 874bbfe600a6 (to >>> run timer for delayed work on the calling CPU) doesn't make sense there? If >>> that is true than reverting the commit from older stable kernels is >>> probably the easiest way to resolve the crashes. >> >> The commit 874bbfe600a6 ("workqueue: make sure delayed work run in >> local cpu") forces the timer to run on the local CPU. It might be correct >> for vmstat. But I wonder if it might break some other delayed work >> user that depends on running on different CPU. > > The default of add_timer() is to run on the current cpu. It only moves the > timer to a different cpu when the power saving code says so. So 874bbfe600a6 > enforces that the timer runs on the cpu on which queue_delayed_work() is > called, but before that commit it was likely that the timer was queued on the > calling cpu. So there is nothing which can depend on running on a different > CPU, except callers of queue_delayed_work_on() which provide the target cpu > explicitely. 874bbfe600a6 does not affect those callers at all. > > Now, what's different is: > > + if (cpu == WORK_CPU_UNBOUND) > + cpu = raw_smp_processor_id(); > dwork->cpu = cpu; > > So before that change dwork->cpu was set to WORK_CPU_UNBOUND. Now it's set to > the current cpu, but I can't see how that matters. What happens in later kernels, when the cpu is offlined before the delayed_work timer ticks? In stable 3.12, with the patch, this scenario results in an oops: #5 [ffff8c03fdd63d80] page_fault at ffffffff81523a88 [exception RIP: __queue_work+121] RIP: ffffffff81071989 RSP: ffff8c03fdd63e30 RFLAGS: 00010086 RAX: ffff88048b96bc00 RBX: ffff8c03e9bcc800 RCX: ffff880473820478 RDX: 0000000000000400 RSI: 0000000000000004 RDI: ffff880473820458 RBP: 0000000000000000 R8: ffff8c03fdd71f40 R9: ffff8c03ea4c4002 R10: 0000000000000000 R11: 0000000000000005 R12: ffff880473820458 R13: 00000000000000a8 R14: 000000000000e328 R15: 00000000000000a8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #6 [ffff8c03fdd63e68] call_timer_fn at ffffffff81065611 #7 [ffff8c03fdd63e98] run_timer_softirq at ffffffff810663b7 #8 [ffff8c03fdd63f00] __do_softirq at ffffffff8105e2c5 #9 [ffff8c03fdd63f68] call_softirq at ffffffff8152cf9c #10 [ffff8c03fdd63f80] do_softirq at ffffffff81004665 #11 [ffff8c03fdd63fa0] smp_apic_timer_interrupt at ffffffff8152d835 #12 [ffff8c03fdd63fb0] apic_timer_interrupt at ffffffff8152c2dd The CPU was 168, and that one was offlined in the meantime. So __queue_work fails at: if (!(wq->flags & WQ_UNBOUND)) pwq = per_cpu_ptr(wq->cpu_pwqs, cpu); else pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu)); ^^^ ^^^^ NODE is -1 \ pwq is NULL if (last_pool && last_pool != pwq->pool) { <--- BOOM Any ideas? thanks, -- js suse labs