From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757007AbcBEUrT (ORCPT ); Fri, 5 Feb 2016 15:47:19 -0500 Received: from mail-wm0-f50.google.com ([74.125.82.50]:35304 "EHLO mail-wm0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756776AbcBEUrQ (ORCPT ); Fri, 5 Feb 2016 15:47:16 -0500 Message-ID: <1454705231.3819.151.camel@gmail.com> Subject: Re: Crashes with 874bbfe600a6 in 3.18.25 From: Mike Galbraith To: Tejun Heo Cc: Michal Hocko , Jiri Slaby , Thomas Gleixner , Petr Mladek , Jan Kara , Ben Hutchings , Sasha Levin , Shaohua Li , LKML , stable@vger.kernel.org, Daniel Bilik Date: Fri, 05 Feb 2016 21:47:11 +0100 In-Reply-To: <20160205164923.GC4401@htj.duckdns.org> References: <20160126093400.GV24938@quack.suse.cz> <20160126111438.GA731@pathway.suse.cz> <56B1C9E4.4020400@suse.cz> <20160203122855.GB6762@dhcp22.suse.cz> <20160203162441.GE14091@mtj.duckdns.org> <1454518913.6148.15.camel@gmail.com> <20160203170652.GI14091@mtj.duckdns.org> <1454551217.3677.27.camel@gmail.com> <20160205164923.GC4401@htj.duckdns.org> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.16.5 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2016-02-05 at 11:49 -0500, Tejun Heo wrote: > Hello, Mike. > > On Thu, Feb 04, 2016 at 03:00:17AM +0100, Mike Galbraith wrote: > > Isn't it the case that, currently at least, each and every spot that > > requires execution on a specific CPU yet does not take active measures > > to deal with hotplug events is in fact buggy? The timer code clearly > > states that the user is responsible, and so do both workqueue.[ch]. > > Yeah, the usages which require affinity for correctness must flush the > work items from a cpu down callback. Good, we agree. Now bear with me a moment.. That very point is what makes it wrong for the workqueue code to ever target a work item. The instant it does target selection, correctness may be at stake, it doesn't know, thus it must assume the full onus, which it has neither the knowledge not the time to do. That's how we exploded on node = -1, trying to help out the user by doing his job, but then not doing the whole job. IMHO, a better plan is to let the user screw it up all by himself. -Mike