From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964968AbcAZJuw (ORCPT ); Tue, 26 Jan 2016 04:50:52 -0500 Received: from www.linutronix.de ([62.245.132.108]:33972 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964851AbcAZJur (ORCPT ); Tue, 26 Jan 2016 04:50:47 -0500 Date: Tue, 26 Jan 2016 10:49:37 +0100 (CET) From: Thomas Gleixner To: Jan Kara cc: Ben Hutchings , Tejun Heo , Sasha Levin , Shaohua Li , LKML , stable@vger.kernel.org, Daniel Bilik Subject: Re: Crashes with 874bbfe600a6 in 3.18.25 In-Reply-To: <20160126093400.GV24938@quack.suse.cz> Message-ID: References: <20160120211926.GJ10810@quack.suse.cz> <20160120213901.GA755895@devbig084.prn1.facebook.com> <20160121095234.GN10810@quack.suse.cz> <56A1817C.10300@oracle.com> <20160122160903.GH32380@htj.duckdns.org> <1453515623.3734.156.camel@decadent.org.uk> <20160126093400.GV24938@quack.suse.cz> User-Agent: Alpine 2.11 (DEB 23 2013-08-11) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323329-1299937804-1453801778=:3886" X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323329-1299937804-1453801778=:3886 Content-Type: TEXT/PLAIN; charset=iso-8859-1 Content-Transfer-Encoding: 8BIT On Tue, 26 Jan 2016, Jan Kara wrote: > On Sat 23-01-16 17:11:54, Thomas Gleixner wrote: > > On Sat, 23 Jan 2016, Ben Hutchings wrote: > > > On Fri, 2016-01-22 at 11:09 -0500, Tejun Heo wrote: > > > > > Looks like it requires more than trivial backport (I think). Tejun? > > > > > > > > The timer migration has changed quite a bit.  Given that we've never > > > > seen vmstat work crashing in 3.18 era, I wonder whether the right > > > > thing to do here is reverting 874bbfe600a6 from 3.18 stable? > > > > > > It's not just 3.18 that has this; 874bbfe600a6 was backported to all > > > stable branches from 3.10 onward.  Only the 4.2-ckt branch has > > > 22b886dd10180939. > > > > 22b886dd10180939 fixes a bug which was introduced with the timer wheel > > overhaul in 4.2. So only 4.2/3 should have it backported. > > Thanks for explanation. So do I understand right that timers are always run > on the calling CPU in kernels prior to 4.2 and thus commit 874bbfe600a6 (to > run timer for delayed work on the calling CPU) doesn't make sense there? If > that is true than reverting the commit from older stable kernels is > probably the easiest way to resolve the crashes. I was merily referring to 22b886dd10180939 which is a bug fix for things we reworked in the timer wheel core code in 4.2. It's completely unrelated to the problem at hand. Non pinned timers can be migrated due to power saving decisions since 2.6.36. What changed over time is how the decision is made, but the general principle still applies. The timer code was completely unchanged between 3.18 and 4.0 and even with the larger overhaul in 4.2 we did not change the migration logic. We merily changed the internal implementation of the timer wheel. I have no idea how 874bbfe600a6 can result in crashing on older kernels. Can you ask the reporter to enable DEBUG_OBJECTS so we might get an idea what goes wrong with that timer. Thanks, tglx --8323329-1299937804-1453801778=:3886--