On Tue, 26 Jan 2016, Petr Mladek wrote: > On Tue 2016-01-26 10:34:00, Jan Kara wrote: > > On Sat 23-01-16 17:11:54, Thomas Gleixner wrote: > > > On Sat, 23 Jan 2016, Ben Hutchings wrote: > > > > On Fri, 2016-01-22 at 11:09 -0500, Tejun Heo wrote: > > > > > > Looks like it requires more than trivial backport (I think). Tejun? > > > > > > > > > > The timer migration has changed quite a bit.  Given that we've never > > > > > seen vmstat work crashing in 3.18 era, I wonder whether the right > > > > > thing to do here is reverting 874bbfe600a6 from 3.18 stable? > > > > > > > > It's not just 3.18 that has this; 874bbfe600a6 was backported to all > > > > stable branches from 3.10 onward.  Only the 4.2-ckt branch has > > > > 22b886dd10180939. > > > > > > 22b886dd10180939 fixes a bug which was introduced with the timer wheel > > > overhaul in 4.2. So only 4.2/3 should have it backported. > > > > Thanks for explanation. So do I understand right that timers are always run > > on the calling CPU in kernels prior to 4.2 and thus commit 874bbfe600a6 (to > > run timer for delayed work on the calling CPU) doesn't make sense there? If > > that is true than reverting the commit from older stable kernels is > > probably the easiest way to resolve the crashes. > > The commit 874bbfe600a6 ("workqueue: make sure delayed work run in > local cpu") forces the timer to run on the local CPU. It might be correct > for vmstat. But I wonder if it might break some other delayed work > user that depends on running on different CPU. The default of add_timer() is to run on the current cpu. It only moves the timer to a different cpu when the power saving code says so. So 874bbfe600a6 enforces that the timer runs on the cpu on which queue_delayed_work() is called, but before that commit it was likely that the timer was queued on the calling cpu. So there is nothing which can depend on running on a different CPU, except callers of queue_delayed_work_on() which provide the target cpu explicitely. 874bbfe600a6 does not affect those callers at all. Now, what's different is: + if (cpu == WORK_CPU_UNBOUND) + cpu = raw_smp_processor_id(); dwork->cpu = cpu; So before that change dwork->cpu was set to WORK_CPU_UNBOUND. Now it's set to the current cpu, but I can't see how that matters. Thanks, tglx