From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965830AbcBCTCE (ORCPT ); Wed, 3 Feb 2016 14:02:04 -0500 Received: from mail-yk0-f169.google.com ([209.85.160.169]:35132 "EHLO mail-yk0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965408AbcBCTCB (ORCPT ); Wed, 3 Feb 2016 14:02:01 -0500 Date: Wed, 3 Feb 2016 14:01:59 -0500 From: Tejun Heo To: Thomas Gleixner Cc: Michal Hocko , Jiri Slaby , Petr Mladek , Jan Kara , Ben Hutchings , Sasha Levin , Shaohua Li , LKML , stable@vger.kernel.org, Daniel Bilik Subject: Re: Crashes with 874bbfe600a6 in 3.18.25 Message-ID: <20160203190159.GM14091@mtj.duckdns.org> References: <20160122160903.GH32380@htj.duckdns.org> <1453515623.3734.156.camel@decadent.org.uk> <20160126093400.GV24938@quack.suse.cz> <20160126111438.GA731@pathway.suse.cz> <56B1C9E4.4020400@suse.cz> <20160203122855.GB6762@dhcp22.suse.cz> <20160203162441.GE14091@mtj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Thomas. On Wed, Feb 03, 2016 at 07:46:11PM +0100, Thomas Gleixner wrote: > > > So I think 874bbfe600a6 is really bogus. It should be reverted. We > > > already have a proper fix for vmstat 176bed1de5bf ("vmstat: explicitly > > > schedule per-cpu work on the CPU we need it to run on"). This which > > > should be used for the stable trees as a replacement. > > > > It's not bogus. We can't flip a property that has been guaranteed > > without any provision for verification. Why do you think vmstat blow > > up in the first place? vmstat would be the canary case as it runs > > frequently on all systems. It's exactly the sign that we can't break > > this guarantee willy-nilly. > > You're in complete failure denial mode once again. Well, you're in an unnecessary escalation mode as usual. Was the attitude really necessary? Chill out and read the thread again. Michal is saying the dwork->cpu assignment was bogus and I was refuting that. > Fact is: > > That patch breaks stuff because there is no stable cpu -> node mapping > accross cpu on/offlining. As a result this selects unbound_pwq_by_node() on > node -1. > > The reason why you need to do that work->cpu assignment might be legitimate, > but that does not justify that you expose systems to a lurking out of bounds > access which results in a NULL pointer dereference. > > As long as cpu_to_node(cpu) can return -1, we need a sanity check there. And > we need that now and not at some point in the future when the patches > establishing a stable cpu -> node mapping are finished. > > Stop arguing around a bug which really exists and was exposed by this patch. Michal brought it up here but there's a different thread where Mike reported NUMA_NO_NODE issue and I already posted the fix. http://lkml.kernel.org/g/20160203185425.GK14091@mtj.duckdns.org Thanks. -- tejun