From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757874AbcBCRCA (ORCPT ); Wed, 3 Feb 2016 12:02:00 -0500 Received: from mail-wm0-f47.google.com ([74.125.82.47]:33763 "EHLO mail-wm0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756073AbcBCRB6 (ORCPT ); Wed, 3 Feb 2016 12:01:58 -0500 Message-ID: <1454518913.6148.15.camel@gmail.com> Subject: Re: Crashes with 874bbfe600a6 in 3.18.25 From: Mike Galbraith To: Tejun Heo , Michal Hocko Cc: Jiri Slaby , Thomas Gleixner , Petr Mladek , Jan Kara , Ben Hutchings , Sasha Levin , Shaohua Li , LKML , stable@vger.kernel.org, Daniel Bilik Date: Wed, 03 Feb 2016 18:01:53 +0100 In-Reply-To: <20160203162441.GE14091@mtj.duckdns.org> References: <20160121095234.GN10810@quack.suse.cz> <56A1817C.10300@oracle.com> <20160122160903.GH32380@htj.duckdns.org> <1453515623.3734.156.camel@decadent.org.uk> <20160126093400.GV24938@quack.suse.cz> <20160126111438.GA731@pathway.suse.cz> <56B1C9E4.4020400@suse.cz> <20160203122855.GB6762@dhcp22.suse.cz> <20160203162441.GE14091@mtj.duckdns.org> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.16.5 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2016-02-03 at 11:24 -0500, Tejun Heo wrote: > On Wed, Feb 03, 2016 at 01:28:56PM +0100, Michal Hocko wrote: > > > The CPU was 168, and that one was offlined in the meantime. So > > > __queue_work fails at: > > > if (!(wq->flags & WQ_UNBOUND)) > > > pwq = per_cpu_ptr(wq->cpu_pwqs, cpu); > > > else > > > pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu)); > > > ^^^ ^^^^ NODE is -1 > > > \ pwq is NULL > > > > > > if (last_pool && last_pool != pwq->pool) { <--- BOOM > > So, the proper fix here is keeping cpu <-> node mapping stable across > cpu on/offlining which has been being worked on for a long time now. > The patchst is pending and it fixes other issues too. Hm, so it's ok to queue work to an offline CPU? What happens if it doesn't come back for an eternity or two? -Mike