From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756022AbcBDLHb (ORCPT ); Thu, 4 Feb 2016 06:07:31 -0500 Received: from mail-wm0-f41.google.com ([74.125.82.41]:34468 "EHLO mail-wm0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752904AbcBDLH0 (ORCPT ); Thu, 4 Feb 2016 06:07:26 -0500 Message-ID: <1454584036.3407.121.camel@gmail.com> Subject: Re: Crashes with 874bbfe600a6 in 3.18.25 From: Mike Galbraith To: Thomas Gleixner Cc: Tejun Heo , Michal Hocko , Jiri Slaby , Petr Mladek , Jan Kara , Ben Hutchings , Sasha Levin , Shaohua Li , LKML , stable@vger.kernel.org, Daniel Bilik Date: Thu, 04 Feb 2016 12:07:16 +0100 In-Reply-To: References: <20160122160903.GH32380@htj.duckdns.org> <1453515623.3734.156.camel@decadent.org.uk> <20160126093400.GV24938@quack.suse.cz> <20160126111438.GA731@pathway.suse.cz> <56B1C9E4.4020400@suse.cz> <20160203122855.GB6762@dhcp22.suse.cz> <20160203162441.GE14091@mtj.duckdns.org> <1454518913.6148.15.camel@gmail.com> <20160203170652.GI14091@mtj.duckdns.org> <1454580263.3407.114.camel@gmail.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.16.5 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2016-02-04 at 11:46 +0100, Thomas Gleixner wrote: > On Thu, 4 Feb 2016, Mike Galbraith wrote: > > I'm also wondering why 22b886dd only applies to kernels >= 4.2. > > > > > > Regardless of the previous CPU a timer was on, add_timer_on() > > currently simply sets timer->flags to the new CPU. As the caller must > > be seeing the timer as idle, this is locally fine, but the timer > > leaving the old base while unlocked can lead to race conditions as > > follows. > > > > Let's say timer was on cpu 0. > > > > cpu 0 cpu 1 > > ----------------------------------------------------------------------------- > > del_timer(timer) succeeds > > del_timer(timer) > > lock_timer_base(timer) locks cpu_0_base > > add_timer_on(timer, 1) > > spin_lock(&cpu_1_base->lock) > > timer->flags set to cpu_1_base > > operates on @timer operates on @timer > > > > > > What's the difference between... > > timer->flags = (timer->flags & ~TIMER_BASEMASK) | cpu; > > and... > > timer_set_base(timer, base); > > > > ...that makes that fix unneeded prior to 4.2? We take the same locks > > in < 4.2 kernels, so seemingly both will diddle concurrently above. > > Indeed, you are right. Whew, thanks for confirming, looking for what the hell I was missing wasn't going well at all, ate most of my day. > The same can happen on pre 4.2, just the fix does not apply as we changed the > internals how the base is managed in the timer itself. Backport below. Exactly what I did locally. -Mike