From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932472AbcBIREM (ORCPT ); Tue, 9 Feb 2016 12:04:12 -0500 Received: from mail-wm0-f46.google.com ([74.125.82.46]:37315 "EHLO mail-wm0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756830AbcBIREI (ORCPT ); Tue, 9 Feb 2016 12:04:08 -0500 Message-ID: <1455037444.3604.3.camel@gmail.com> Subject: Re: Crashes with 874bbfe600a6 in 3.18.25 From: Mike Galbraith To: Tejun Heo , Linus Torvalds Cc: Michal Hocko , Jiri Slaby , Thomas Gleixner , Petr Mladek , Jan Kara , Ben Hutchings , Sasha Levin , Shaohua Li , LKML , stable , Daniel Bilik , Greg Kroah-Hartman Date: Tue, 09 Feb 2016 18:04:04 +0100 In-Reply-To: <20160209165024.GA3741@mtj.duckdns.org> References: <1454518913.6148.15.camel@gmail.com> <20160203170652.GI14091@mtj.duckdns.org> <1454551217.3677.27.camel@gmail.com> <20160205164923.GC4401@htj.duckdns.org> <1454705231.3819.151.camel@gmail.com> <20160205205456.GG4401@htj.duckdns.org> <1454705989.3819.158.camel@gmail.com> <20160205210606.GH4401@htj.duckdns.org> <1455031885.3807.74.camel@gmail.com> <20160209165024.GA3741@mtj.duckdns.org> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.16.5 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2016-02-09 at 11:50 -0500, Tejun Heo wrote: > Hello, > > On Tue, Feb 09, 2016 at 08:39:15AM -0800, Linus Torvalds wrote: > > > A niggling question remaining is when is it gonna be killed? > > > > It probably should be killed sooner rather than later. > > > > Just document that if you need something to run on a _particular_ > > cpu, > > you need to use "schedule_delayed_work_on()" and "add_timer_on()". > > I'll queue a patch to put unbound work items on foreign cpus (maybe > every Nth to reduce perf impact). Wanted to align it to rc1 and then > let it get tested during the devel cycle but missed this window. It's > a bit late in devel cycle but we can still do it in this cycle. Or do something like the below, and get guinea pigs for free. workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs WORK_CPU_UNBOUND work items queued to a bound workqueue always run locally. This is a good thing normally, but not when the user has asked us to keep unbound work away from certain CPUs. Round robin these to wq_unbound_cpumask CPUs instead, as perturbation avoidance trumps performance. Signed-off-by: Mike Galbraith --- kernel/workqueue.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -303,6 +303,9 @@ static bool workqueue_freezing; /* PL: static cpumask_var_t wq_unbound_cpumask; /* PL: low level cpumask for all unbound wqs */ +/* CPU where WORK_CPU_UNBOUND work was last round robin scheduled from this CPU */ +static DEFINE_PER_CPU(unsigned int, wq_unbound_rr_cpu_last); + /* the per-cpu worker pools */ static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS], cpu_worker_pools); @@ -1298,6 +1301,28 @@ static bool is_chained_work(struct workq return worker && worker->current_pwq->wq == wq; } +/* + * When queueing WORK_CPU_UNBOUND work to a !WQ_UNBOUND queue, round + * robin among wq_unbound_cpumask to avoid perturbing sensitive tasks. + */ +static unsigned int select_round_robin_cpu(unsigned int cpu) +{ + int new_cpu; + + if (cpumask_test_cpu(cpu, wq_unbound_cpumask)) + return cpu; + if (cpumask_empty(wq_unbound_cpumask)) + return cpu; + new_cpu = __this_cpu_read(wq_unbound_rr_cpu_last); + new_cpu = cpumask_next_and(new_cpu, wq_unbound_cpumask, cpu_online_mask); + if (unlikely(new_cpu >= nr_cpu_ids)) + new_cpu = cpumask_first_and(wq_unbound_cpumask, cpu_online_mask); + if (unlikely(WARN_ON_ONCE(new_cpu >= nr_cpu_ids))) + return cpu; + __this_cpu_write(wq_unbound_rr_cpu_last, new_cpu); + return new_cpu; +} + static void __queue_work(int cpu, struct workqueue_struct *wq, struct work_struct *work) { @@ -1323,7 +1348,7 @@ static void __queue_work(int cpu, struct return; retry: if (req_cpu == WORK_CPU_UNBOUND) - cpu = raw_smp_processor_id(); + cpu = select_round_robin_cpu(raw_smp_processor_id()); /* pwq which will be used unless @work is executing elsewhere */ if (!(wq->flags & WQ_UNBOUND))