From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87576C433E0 for ; Wed, 13 Jan 2021 12:01:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4728A233F8 for ; Wed, 13 Jan 2021 12:01:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726523AbhAMMBe (ORCPT ); Wed, 13 Jan 2021 07:01:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33362 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725747AbhAMMBd (ORCPT ); Wed, 13 Jan 2021 07:01:33 -0500 Received: from mail-io1-xd32.google.com (mail-io1-xd32.google.com [IPv6:2607:f8b0:4864:20::d32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ADCB8C061575 for ; Wed, 13 Jan 2021 04:00:52 -0800 (PST) Received: by mail-io1-xd32.google.com with SMTP id u17so3552030iow.1 for ; Wed, 13 Jan 2021 04:00:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+LA1GuJKcbqCWCGBVFrg1djl2K55ygdfBsa654vAMCo=; b=pk3475P6QFH6JUljjpgIKcK5d81dUjL0j2Y8ESVQEaERi2FSlCmS/bwsn9pBDXUuTF jOFcPzXedDxMdVdH8DRYkMaIdOth8TwaQRyVQuHwMC/ojCwRxgcweLqrc1yv/lMZ1Ekj NR6lVkDSa+rl8oEWwGl0PWyAaRAijgNege86C8eFYqcrX0KMEYNZ/q23pu/jAPEEQFek e+zpXtcPaMImCxHMTHRfFukwosFhMNeebntsayZewyRWgV3xRcdwi9uaQSlIZDGFSuYd ty0Xvz6V9AUIs9ooFj64CT/LJ2JGzBwzHpxpLv/BvIjqr3qNPYJpDhdxcIV5eMOKumfZ qc+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+LA1GuJKcbqCWCGBVFrg1djl2K55ygdfBsa654vAMCo=; b=FpA4XOTijwbLBxn//RNsNFlpSkzCWFEFyBJ0vXhKvLpCGfe+i+yGMO/hlHNbCA9oRj +qXwSxNLLFSQLBjF3/OblQszr3Oa3241gHV03OLZH0fBpFrAtNxxXaAQ8eASPrvly3Os 4V7y4wBG8p5MFgbMbpYNXM4FPMCIsnT1ZYvfPy0UCRtpRtL+XgeDp7iuxZDBhrnsQbeN LJRER/ENRy8EnGic2XOKvWJRYFGgNXZRN5NQrsSZ7xb+eCPesV0+dgqiqyP65n/VQfYb 0SDGAa/F3/sA8jNrws6F+/6th35F21rgavAL4Dk9Yv02ojKaGhekevMD+b47zoeMyo0B 3amQ== X-Gm-Message-State: AOAM53163JgLF+M02ITsB/VG8PfhKKLLc9mGTTGJpZ3KJgGkrWPzfhkK y3nd52YOgzl2difUt/q2mjf0+kxoqxrga377UiIqdqsZ X-Google-Smtp-Source: ABdhPJzfe496emQL2ga8lY0noF9wxQqNekAn647s6h8+4tIT/jgvClJkhw+v1K9MXLUgekCbFDdL9HFrwlhyeOklRgI= X-Received: by 2002:a92:2802:: with SMTP id l2mr1946024ilf.47.1610539252174; Wed, 13 Jan 2021 04:00:52 -0800 (PST) MIME-Version: 1.0 References: <20201226025117.2770-1-jiangshanlai@gmail.com> <87o8hv7pnd.fsf@nanos.tec.linutronix.de> In-Reply-To: From: Lai Jiangshan Date: Wed, 13 Jan 2021 20:00:40 +0800 Message-ID: Subject: Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively To: Peter Zijlstra Cc: Valentin Schneider , Thomas Gleixner , LKML , Qian Cai , Vincent Donnefort , Dexuan Cui , Lai Jiangshan , Paul McKenney , Vincent Guittot , Steven Rostedt , Jens Axboe Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 13, 2021 at 7:11 PM Peter Zijlstra wrote: > > On Tue, Jan 12, 2021 at 11:38:12PM +0800, Lai Jiangshan wrote: > > > But the hard problem is "how to suppress the warning of > > online&!active in __set_cpus_allowed_ptr()" for late spawned > > unbound workers during hotplug. > > I cannot see create_worker() go bad like that. > > The thing is, it uses: > > kthread_bind_mask(, pool->attr->cpumask) > worker_attach_to_pool() > set_cpus_allowed_ptr(, pool->attr->cpumask) > > which means set_cpus_allowed_ptr() must be a NOP, because the affinity > is already set by kthread_bind_mask(). Further, the first wakeup of that > worker will then hit: > > select_task_rq() > is_cpu_allowed() > is_per_cpu_kthread() -- false > select_fallback_rq() > > > So normally that really isn't a problem. I can only see a tiny hole > there, where someone changes the cpumask between kthread_bind_mask() and > set_cpus_allowed_ptr(). AFAICT that can be fixed in two ways: > > - add wq_pool_mutex around things in create_worker(), or > - move the set_cpus_allowed_ptr() out of worker_attach_to_pool() and > into rescuer_thread(). > > Which then brings us to rescuer_thread... If we manage to trigger the > rescuer during hotplug, then yes, I think that can go wobbly. Oh, I forgot set_cpus_allowed_ptr() is NOP when combined with kthread_bind_mask()(create_worker()). So the problem becomes "how to suppress the warning of online&!active in __set_cpus_allowed_ptr()" for late *attached unbound rescuer* workers during hotplug. > > Let me consider that a bit more while I try and make sense of that splat > Paul reported. > > --- > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index ec0771e4a3fb..fe05308dc472 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -1844,15 +1844,19 @@ static struct worker *alloc_worker(int node) > * cpu-[un]hotplugs. > */ > static void worker_attach_to_pool(struct worker *worker, > - struct worker_pool *pool) > + struct worker_pool *pool, > + bool set_affinity) > { > mutex_lock(&wq_pool_attach_mutex); > > - /* > - * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any > - * online CPUs. It'll be re-applied when any of the CPUs come up. > - */ > - set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); > + if (set_affinity) { > + /* > + * set_cpus_allowed_ptr() will fail if the cpumask doesn't have > + * any online CPUs. It'll be re-applied when any of the CPUs > + * come up. > + */ > + set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); > + } > > /* > * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains > @@ -1944,7 +1948,7 @@ static struct worker *create_worker(struct worker_pool *pool) > kthread_bind_mask(worker->task, pool->attrs->cpumask); > > /* successful, attach the worker to the pool */ > - worker_attach_to_pool(worker, pool); > + worker_attach_to_pool(worker, pool, false); > > /* start the newly created worker */ > raw_spin_lock_irq(&pool->lock); > @@ -2509,7 +2513,11 @@ static int rescuer_thread(void *__rescuer) > > raw_spin_unlock_irq(&wq_mayday_lock); > > - worker_attach_to_pool(rescuer, pool); > + /* > + * XXX can go splat when running during hot-un-plug and > + * the pool affinity is wobbly. > + */ > + worker_attach_to_pool(rescuer, pool, true); > > raw_spin_lock_irq(&pool->lock); >