From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757685AbcBDLUg (ORCPT ); Thu, 4 Feb 2016 06:20:36 -0500 Received: from mx2.suse.de ([195.135.220.15]:57042 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757244AbcBDLUc (ORCPT ); Thu, 4 Feb 2016 06:20:32 -0500 Date: Thu, 4 Feb 2016 12:20:44 +0100 From: Jan Kara To: Thomas Gleixner Cc: Mike Galbraith , Tejun Heo , Michal Hocko , Jiri Slaby , Petr Mladek , Jan Kara , Ben Hutchings , Sasha Levin , Shaohua Li , LKML , stable@vger.kernel.org, Daniel Bilik Subject: Re: Crashes with 874bbfe600a6 in 3.18.25 Message-ID: <20160204112044.GE4956@quack.suse.cz> References: <20160126093400.GV24938@quack.suse.cz> <20160126111438.GA731@pathway.suse.cz> <56B1C9E4.4020400@suse.cz> <20160203122855.GB6762@dhcp22.suse.cz> <20160203162441.GE14091@mtj.duckdns.org> <1454518913.6148.15.camel@gmail.com> <20160203170652.GI14091@mtj.duckdns.org> <1454580263.3407.114.camel@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 04-02-16 11:46:47, Thomas Gleixner wrote: > On Thu, 4 Feb 2016, Mike Galbraith wrote: > > On Wed, 2016-02-03 at 12:06 -0500, Tejun Heo wrote: > > > On Wed, Feb 03, 2016 at 06:01:53PM +0100, Mike Galbraith wrote: > > > > Hm, so it's ok to queue work to an offline CPU? What happens if it > > > > doesn't come back for an eternity or two? > > > > > > Right now, it just loses affinity.... > > > > WRT affinity... > > > > Somebody somewhere queues a delayed work, a timer is started on CPUX, > > work is targeted at CPUX. Now wash/rinse/repeat mod_delayed_work() > > along with migrations. Should __queue_delayed_work() not refrain from > > altering dwork->cpu once set? > > > > I'm also wondering why 22b886dd only applies to kernels >= 4.2. > > > > > > Regardless of the previous CPU a timer was on, add_timer_on() > > currently simply sets timer->flags to the new CPU. As the caller must > > be seeing the timer as idle, this is locally fine, but the timer > > leaving the old base while unlocked can lead to race conditions as > > follows. > > > > Let's say timer was on cpu 0. > > > > cpu 0 cpu 1 > > ----------------------------------------------------------------------------- > > del_timer(timer) succeeds > > del_timer(timer) > > lock_timer_base(timer) locks cpu_0_base > > add_timer_on(timer, 1) > > spin_lock(&cpu_1_base->lock) > > timer->flags set to cpu_1_base > > operates on @timer operates on @timer > > > > > > What's the difference between... > > timer->flags = (timer->flags & ~TIMER_BASEMASK) | cpu; > > and... > > timer_set_base(timer, base); > > > > ...that makes that fix unneeded prior to 4.2? We take the same locks > > in < 4.2 kernels, so seemingly both will diddle concurrently above. > > Indeed, you are right. > > The same can happen on pre 4.2, just the fix does not apply as we changed the > internals how the base is managed in the timer itself. Backport below. Thanks for backport Thomas and to Mike for persistence :). I've asked my friend seeing crashes with 3.18.25 to try whether this patch fixes the issues. It may take some time so stay tuned... Honza > 8<---------------------------- > > --- a/kernel/time/timer.c > +++ b/kernel/time/timer.c > @@ -956,13 +956,26 @@ EXPORT_SYMBOL(add_timer); > */ > void add_timer_on(struct timer_list *timer, int cpu) > { > - struct tvec_base *base = per_cpu(tvec_bases, cpu); > + struct tvec_base *new_base = per_cpu(tvec_bases, cpu); > + struct tvec_base *base; > unsigned long flags; > > timer_stats_timer_set_start_info(timer); > BUG_ON(timer_pending(timer) || !timer->function); > - spin_lock_irqsave(&base->lock, flags); > - timer_set_base(timer, base); > + > + /* > + * If @timer was on a different CPU, it must be migrated with the > + * old base locked to prevent other operations proceeding with the > + * wrong base locked. See lock_timer_base(). > + */ > + base = lock_timer_base(timer, &flags); > + if (base != new_base) { > + timer_set_base(timer, NULL); > + spin_unlock(&base->lock); > + base = new_base; > + spin_lock(&base->lock); > + timer_set_base(timer, base); > + } > debug_activate(timer, timer->expires); > internal_add_timer(base, timer); > spin_unlock_irqrestore(&base->lock, flags); > > > -- Jan Kara SUSE Labs, CR