From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756138AbcBDHkz (ORCPT ); Thu, 4 Feb 2016 02:40:55 -0500 Received: from mail-wm0-f51.google.com ([74.125.82.51]:32918 "EHLO mail-wm0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755997AbcBDHkw (ORCPT ); Thu, 4 Feb 2016 02:40:52 -0500 Date: Thu, 4 Feb 2016 08:40:49 +0100 From: Michal Hocko To: Tejun Heo Cc: Jiri Slaby , Thomas Gleixner , Petr Mladek , Jan Kara , Ben Hutchings , Sasha Levin , Shaohua Li , LKML , stable@vger.kernel.org, Daniel Bilik Subject: Re: Crashes with 874bbfe600a6 in 3.18.25 Message-ID: <20160204074049.GA12153@dhcp22.suse.cz> References: <20160126093400.GV24938@quack.suse.cz> <20160126111438.GA731@pathway.suse.cz> <56B1C9E4.4020400@suse.cz> <20160203122855.GB6762@dhcp22.suse.cz> <20160203162441.GE14091@mtj.duckdns.org> <20160203164852.GK6757@dhcp22.suse.cz> <20160203165901.GH14091@mtj.duckdns.org> <20160204063723.GB8581@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160204063723.GB8581@dhcp22.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 04-02-16 07:37:23, Michal Hocko wrote: > On Wed 03-02-16 11:59:01, Tejun Heo wrote: > > On Wed, Feb 03, 2016 at 05:48:52PM +0100, Michal Hocko wrote: > [...] > > > anything and add_timer_on also for WORK_CPU_UNBOUND is really required > > > then we should at least preserve WORK_CPU_UNBOUND in dwork->cpu so that > > > __queue_work can actually move on to the local CPU properly and handle > > > the offline cpu properly. > > > > delayed_work->cpu is determined on queueing time. Dealing with > > offlined cpus at execution is completley fine. There's no need to > > "preserve" anything. > > I've seen you have posted a fix in the mean time but just for my > understading. Why the following is not an appropriate fix? > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index c579dbab2e36..52bb11cf20d1 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -1459,9 +1459,9 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq, > > dwork->wq = wq; > /* timer isn't guaranteed to run in this cpu, record earlier */ > + dwork->cpu = cpu; > if (cpu == WORK_CPU_UNBOUND) > cpu = raw_smp_processor_id(); > - dwork->cpu = cpu; > timer->expires = jiffies + delay; > > add_timer_on(timer, cpu); Ok, so after some more thinking about that, this won't really help for memory less CPU which would still have NUMA_NO_NODE associated with it AFAIU. So this is definitely better to be handled at unbound_pwq_by_node level. -- Michal Hocko SUSE Labs