From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932472AbcBIREM (ORCPT <rfc822;w@1wt.eu>);
	Tue, 9 Feb 2016 12:04:12 -0500
Received: from mail-wm0-f46.google.com ([74.125.82.46]:37315 "EHLO
	mail-wm0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1756830AbcBIREI (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 9 Feb 2016 12:04:08 -0500
Message-ID: <1455037444.3604.3.camel@gmail.com>
Subject: Re: Crashes with 874bbfe600a6 in 3.18.25
From: Mike Galbraith <umgwanakikbuti@gmail.com>
To: Tejun Heo <tj@kernel.org>, Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>, Jiri Slaby <jslaby@suse.cz>,
        Thomas Gleixner <tglx@linutronix.de>, Petr Mladek <pmladek@suse.com>,
        Jan Kara <jack@suse.cz>, Ben Hutchings <ben@decadent.org.uk>,
        Sasha Levin <sasha.levin@oracle.com>, Shaohua Li <shli@fb.com>,
        LKML <linux-kernel@vger.kernel.org>, stable <stable@vger.kernel.org>,
        Daniel Bilik <daniel.bilik@neosystem.cz>,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Tue, 09 Feb 2016 18:04:04 +0100
In-Reply-To: <20160209165024.GA3741@mtj.duckdns.org>
References: <1454518913.6148.15.camel@gmail.com>
	 <20160203170652.GI14091@mtj.duckdns.org>
	 <1454551217.3677.27.camel@gmail.com>
	 <20160205164923.GC4401@htj.duckdns.org>
	 <1454705231.3819.151.camel@gmail.com>
	 <20160205205456.GG4401@htj.duckdns.org>
	 <1454705989.3819.158.camel@gmail.com>
	 <20160205210606.GH4401@htj.duckdns.org>
	 <1455031885.3807.74.camel@gmail.com>
	 <CA+55aFwQDMuieCNEh86GXKoCjj+B6aZBTt--y7sUCtqzLkPf5Q@mail.gmail.com>
	 <20160209165024.GA3741@mtj.duckdns.org>
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.16.5 
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, 2016-02-09 at 11:50 -0500, Tejun Heo wrote:
> Hello,
> 
> On Tue, Feb 09, 2016 at 08:39:15AM -0800, Linus Torvalds wrote:
> > > A niggling question remaining is when is it gonna be killed?
> > 
> > It probably should be killed sooner rather than later.
> > 
> > Just document that if you need something to run on a _particular_
> > cpu,
> > you need to use "schedule_delayed_work_on()" and "add_timer_on()".
> 
> I'll queue a patch to put unbound work items on foreign cpus (maybe
> every Nth to reduce perf impact).  Wanted to align it to rc1 and then
> let it get tested during the devel cycle but missed this window.  It's
> a bit late in devel cycle but we can still do it in this cycle.

Or do something like the below, and get guinea pigs for free. 

workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs

WORK_CPU_UNBOUND work items queued to a bound workqueue always run
locally.  This is a good thing normally, but not when the user has
asked us to keep unbound work away from certain CPUs.  Round robin
these to wq_unbound_cpumask CPUs instead, as perturbation avoidance
trumps performance.

Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
---
 kernel/workqueue.c |   27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -303,6 +303,9 @@ static bool workqueue_freezing;		/* PL:
 
 static cpumask_var_t wq_unbound_cpumask; /* PL: low level cpumask for all unbound wqs */
 
+/* CPU where WORK_CPU_UNBOUND work was last round robin scheduled from this CPU */
+static DEFINE_PER_CPU(unsigned int, wq_unbound_rr_cpu_last);
+
 /* the per-cpu worker pools */
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS],
 				     cpu_worker_pools);
@@ -1298,6 +1301,28 @@ static bool is_chained_work(struct workq
 	return worker && worker->current_pwq->wq == wq;
 }
 
+/*
+ * When queueing WORK_CPU_UNBOUND work to a !WQ_UNBOUND queue, round
+ * robin among wq_unbound_cpumask to avoid perturbing sensitive tasks.
+ */
+static unsigned int select_round_robin_cpu(unsigned int cpu)
+{
+	int new_cpu;
+
+	if (cpumask_test_cpu(cpu, wq_unbound_cpumask))
+		return cpu;
+	if (cpumask_empty(wq_unbound_cpumask))
+		return cpu;
+	new_cpu = __this_cpu_read(wq_unbound_rr_cpu_last);
+	new_cpu = cpumask_next_and(new_cpu, wq_unbound_cpumask, cpu_online_mask);
+	if (unlikely(new_cpu >= nr_cpu_ids))
+		new_cpu = cpumask_first_and(wq_unbound_cpumask, cpu_online_mask);
+	if (unlikely(WARN_ON_ONCE(new_cpu >= nr_cpu_ids)))
+		return cpu;
+	__this_cpu_write(wq_unbound_rr_cpu_last, new_cpu);
+	return new_cpu;
+}
+
 static void __queue_work(int cpu, struct workqueue_struct *wq,
 			 struct work_struct *work)
 {
@@ -1323,7 +1348,7 @@ static void __queue_work(int cpu, struct
 		return;
 retry:
 	if (req_cpu == WORK_CPU_UNBOUND)
-		cpu = raw_smp_processor_id();
+		cpu = select_round_robin_cpu(raw_smp_processor_id());
 
 	/* pwq which will be used unless @work is executing elsewhere */
 	if (!(wq->flags & WQ_UNBOUND))