From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754022AbcBEQt1 (ORCPT <rfc822;w@1wt.eu>);
	Fri, 5 Feb 2016 11:49:27 -0500
Received: from mail-yk0-f169.google.com ([209.85.160.169]:36484 "EHLO
	mail-yk0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751510AbcBEQtY (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 5 Feb 2016 11:49:24 -0500
Date: Fri, 5 Feb 2016 11:49:23 -0500
From: Tejun Heo <tj@kernel.org>
To: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>, Jiri Slaby <jslaby@suse.cz>,
        Thomas Gleixner <tglx@linutronix.de>, Petr Mladek <pmladek@suse.com>,
        Jan Kara <jack@suse.cz>, Ben Hutchings <ben@decadent.org.uk>,
        Sasha Levin <sasha.levin@oracle.com>, Shaohua Li <shli@fb.com>,
        LKML <linux-kernel@vger.kernel.org>, stable@vger.kernel.org,
        Daniel Bilik <daniel.bilik@neosystem.cz>
Subject: Re: Crashes with 874bbfe600a6 in 3.18.25
Message-ID: <20160205164923.GC4401@htj.duckdns.org>
References: <alpine.DEB.2.11.1601231710210.3886@nanos>
 <20160126093400.GV24938@quack.suse.cz>
 <20160126111438.GA731@pathway.suse.cz>
 <alpine.DEB.2.11.1601261352010.3886@nanos>
 <56B1C9E4.4020400@suse.cz>
 <20160203122855.GB6762@dhcp22.suse.cz>
 <20160203162441.GE14091@mtj.duckdns.org>
 <1454518913.6148.15.camel@gmail.com>
 <20160203170652.GI14091@mtj.duckdns.org>
 <1454551217.3677.27.camel@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1454551217.3677.27.camel@gmail.com>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hello, Mike.

On Thu, Feb 04, 2016 at 03:00:17AM +0100, Mike Galbraith wrote:
> Isn't it the case that, currently at least, each and every spot that
> requires execution on a specific CPU yet does not take active measures
> to deal with hotplug events is in fact buggy?  The timer code clearly
> states that the user is responsible, and so do both workqueue.[ch].

Yeah, the usages which require affinity for correctness must flush the
work items from a cpu down callback.

> I was surprised me to hear that some think they have an iron clad
> guarantee, given the null and void clause is prominently displayed.

Nobody is (or at least should be) expecting workqueue to handle
affinity across CPU offlining events.  That is not the problem.  The
problem is that currently queue_work(work) and
queue_work_on(smp_processor_id(), work) are identical and there likely
are affinity-for-correctness users which are doing the former.

Thanks.

-- 
tejun