From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S933676AbcBCQs6 (ORCPT <rfc822;w@1wt.eu>);
	Wed, 3 Feb 2016 11:48:58 -0500
Received: from mail-wm0-f42.google.com ([74.125.82.42]:35820 "EHLO
	mail-wm0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1757087AbcBCQsz (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 3 Feb 2016 11:48:55 -0500
Date: Wed, 3 Feb 2016 17:48:52 +0100
From: Michal Hocko <mhocko@kernel.org>
To: Tejun Heo <tj@kernel.org>
Cc: Jiri Slaby <jslaby@suse.cz>, Thomas Gleixner <tglx@linutronix.de>,
        Petr Mladek <pmladek@suse.com>, Jan Kara <jack@suse.cz>,
        Ben Hutchings <ben@decadent.org.uk>,
        Sasha Levin <sasha.levin@oracle.com>, Shaohua Li <shli@fb.com>,
        LKML <linux-kernel@vger.kernel.org>, stable@vger.kernel.org,
        Daniel Bilik <daniel.bilik@neosystem.cz>
Subject: Re: Crashes with 874bbfe600a6 in 3.18.25
Message-ID: <20160203164852.GK6757@dhcp22.suse.cz>
References: <56A1817C.10300@oracle.com>
 <20160122160903.GH32380@htj.duckdns.org>
 <1453515623.3734.156.camel@decadent.org.uk>
 <alpine.DEB.2.11.1601231710210.3886@nanos>
 <20160126093400.GV24938@quack.suse.cz>
 <20160126111438.GA731@pathway.suse.cz>
 <alpine.DEB.2.11.1601261352010.3886@nanos>
 <56B1C9E4.4020400@suse.cz>
 <20160203122855.GB6762@dhcp22.suse.cz>
 <20160203162441.GE14091@mtj.duckdns.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160203162441.GE14091@mtj.duckdns.org>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed 03-02-16 11:24:41, Tejun Heo wrote:
> On Wed, Feb 03, 2016 at 01:28:56PM +0100, Michal Hocko wrote:
> > > The CPU was 168, and that one was offlined in the meantime. So
> > > __queue_work fails at:
> > >   if (!(wq->flags & WQ_UNBOUND))
> > >     pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
> > >   else
> > >     pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu));
> > >     ^^^                           ^^^^ NODE is -1
> > >       \ pwq is NULL
> > > 
> > >   if (last_pool && last_pool != pwq->pool) { <--- BOOM
> 
> So, the proper fix here is keeping cpu <-> node mapping stable across
> cpu on/offlining which has been being worked on for a long time now.
> The patchst is pending and it fixes other issues too.

What if that node was memory offlined as well? It just doesn't make any
sense to stick to the old node when the old cpu went away already. If
anything and add_timer_on also for WORK_CPU_UNBOUND is really required
then we should at least preserve WORK_CPU_UNBOUND in dwork->cpu so that
__queue_work can actually move on to the local CPU properly and handle
the offline cpu properly.

> > So I think 874bbfe600a6 is really bogus. It should be reverted. We
> > already have a proper fix for vmstat 176bed1de5bf ("vmstat: explicitly
> > schedule per-cpu work on the CPU we need it to run on"). This which
> > should be used for the stable trees as a replacement.
> 
> It's not bogus.  We can't flip a property that has been guaranteed
> without any provision for verification.  Why do you think vmstat blow
> up in the first place?

Because it wants to have a strong per-cpu guarantee while it used
to fail to tell so. My understanding was that this is exactly what
queue_delayed_work_on is for while WORK_CPU_UNBOUND tells that the
caller doesn't really insist on any particular CPU (just local CPU is
preferred).

-- 
Michal Hocko
SUSE Labs