From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757886AbcBCQYs (ORCPT ); Wed, 3 Feb 2016 11:24:48 -0500 Received: from mail-yk0-f170.google.com ([209.85.160.170]:35865 "EHLO mail-yk0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757790AbcBCQYo (ORCPT ); Wed, 3 Feb 2016 11:24:44 -0500 Date: Wed, 3 Feb 2016 11:24:41 -0500 From: Tejun Heo To: Michal Hocko Cc: Jiri Slaby , Thomas Gleixner , Petr Mladek , Jan Kara , Ben Hutchings , Sasha Levin , Shaohua Li , LKML , stable@vger.kernel.org, Daniel Bilik Subject: Re: Crashes with 874bbfe600a6 in 3.18.25 Message-ID: <20160203162441.GE14091@mtj.duckdns.org> References: <20160121095234.GN10810@quack.suse.cz> <56A1817C.10300@oracle.com> <20160122160903.GH32380@htj.duckdns.org> <1453515623.3734.156.camel@decadent.org.uk> <20160126093400.GV24938@quack.suse.cz> <20160126111438.GA731@pathway.suse.cz> <56B1C9E4.4020400@suse.cz> <20160203122855.GB6762@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160203122855.GB6762@dhcp22.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 03, 2016 at 01:28:56PM +0100, Michal Hocko wrote: > > The CPU was 168, and that one was offlined in the meantime. So > > __queue_work fails at: > > if (!(wq->flags & WQ_UNBOUND)) > > pwq = per_cpu_ptr(wq->cpu_pwqs, cpu); > > else > > pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu)); > > ^^^ ^^^^ NODE is -1 > > \ pwq is NULL > > > > if (last_pool && last_pool != pwq->pool) { <--- BOOM So, the proper fix here is keeping cpu <-> node mapping stable across cpu on/offlining which has been being worked on for a long time now. The patchst is pending and it fixes other issues too. > So I think 874bbfe600a6 is really bogus. It should be reverted. We > already have a proper fix for vmstat 176bed1de5bf ("vmstat: explicitly > schedule per-cpu work on the CPU we need it to run on"). This which > should be used for the stable trees as a replacement. It's not bogus. We can't flip a property that has been guaranteed without any provision for verification. Why do you think vmstat blow up in the first place? vmstat would be the canary case as it runs frequently on all systems. It's exactly the sign that we can't break this guarantee willy-nilly. Thanks. -- tejun