From: Gabriele Monaco <gmonaco@redhat.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, aubrey.li@linux.intel.com,
yu.c.chen@intel.com, Andrew Morton <akpm@linux-foundation.org>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
Shuah Khan <shuah@kernel.org>
Subject: Re: [PATCH v6 2/3] sched: Move task_mm_cid_work to mm delayed work
Date: Thu, 13 Feb 2025 15:37:47 +0100 [thread overview]
Message-ID: <35fe8e74229af24f45954dd27789363dd5c2f8b8.camel@redhat.com> (raw)
In-Reply-To: <1a295a1e-08da-4684-81be-9539773a1c94@efficios.com>
On Thu, 2025-02-13 at 08:55 -0500, Mathieu Desnoyers wrote:
> On 2025-02-13 08:25, Gabriele Monaco wrote:
> > On Thu, 2025-02-13 at 14:52 +0800, kernel test robot wrote:
> > > kernel test robot noticed
> > > "WARNING:at_kernel/workqueue.c:#__queue_delayed_work" on:
> > >
> > > [ 2.640924][ T0] ------------[ cut here ]------------
> > > [ 2.641646][ T0] WARNING: CPU: 0 PID: 0 at
> > > kernel/workqueue.c:2495
> > > __queue_delayed_work (kernel/workqueue.c:2495 (discriminator 9))
> > > [ 2.642874][ T0] Modules linked in:
> > > [ 2.643381][ T0] CPU: 0 UID: 0 PID: 0 Comm: swapper Not
> > > tainted
> > > 6.14.0-rc2-00002-g287adf9e9c1f #1
> > > [ 2.644582][ T0] Hardware name: QEMU Standard PC (i440FX +
> > > PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > [ 2.645943][ T0] RIP: 0010:__queue_delayed_work
> > > (kernel/workqueue.c:2495 (discriminator 9))
> >
> > There seem to be major problems with this configuration, I'm trying
> > to
> > understand what's wrong but, for the time being, this patchset is
> > not
> > ready for inclusion.
>
> I think there is an issue with the order of init functions at boot.
>
> poking_init() calls mm_alloc(), which ends up calling mm_init().
>
> The WARN_ON() is about a NULL wq pointer, which I suspect happens
> if poking_init() is called before workqueue_init_early(), which
> allocates system_wq.
>
> Indeed, in start_kernel(), poking_init() is called before
> workqueue_init_early().
>
> I'm not sure what are the init order dependencies across subsystems
> here.
> There is the following order in start_kernel():
>
> [...]
> mm_core_init();
> poking_init();
> ftrace_init();
>
> /* trace_printk can be enabled here */
> early_trace_init();
>
> /*
> * Set up the scheduler prior starting any interrupts (such
> as the
> * timer interrupt). Full topology setup happens at
> smp_init()
> * time - but meanwhile we still have a functioning
> scheduler.
> */
> sched_init();
>
> if (WARN(!irqs_disabled(),
> "Interrupts were enabled *very* early, fixing
> it\n"))
> local_irq_disable();
> radix_tree_init();
> maple_tree_init();
>
> /*
> * Set up housekeeping before setting up workqueues to allow
> the unbound
> * workqueue to take non-housekeeping into account.
> */
> housekeeping_init();
>
> /*
> * Allow workqueue creation and work item
> queueing/cancelling
> * early. Work item execution depends on kthreads and
> starts after
> * workqueue_init().
> */
> workqueue_init_early();
> [...]
>
> So either we find a way to reorder this, or we make sure
> poking_init()
> does not require the workqueue.
>
> Thanks,
>
> Mathieu
>
Nice suggestion! That seems the culprit..
From the full dmesg of the failure I've seen there's also a problem
with disabling the delayed work synchronously, since mmdrop cannot
sleep if we are not in PREEMPT_RT.
I'm trying to come with some satisfactory solution for both, ideally:
1. the delayed work is not needed in early boot, we may have a better
place where to start it
2. we can cancel the work asynchronously on mmdrop and abort it if the
pcpu_cid is null, but it seems racy, perhaps there's a better place for
that too
Thanks,
Gabriele
next prev parent reply other threads:[~2025-02-13 14:37 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-10 15:32 [PATCH v6 0/3] sched: Restructure task_mm_cid_work for predictability Gabriele Monaco
2025-02-10 15:32 ` [PATCH v6 1/3] sched: Compact RSEQ concurrency IDs with reduced threads and affinity Gabriele Monaco
2025-02-13 14:56 ` Mathieu Desnoyers
2025-02-18 7:53 ` Peter Zijlstra
2025-02-20 15:08 ` [tip: sched/urgent] " tip-bot2 for Mathieu Desnoyers
2025-02-10 15:32 ` [PATCH v6 2/3] sched: Move task_mm_cid_work to mm delayed work Gabriele Monaco
2025-02-13 6:52 ` kernel test robot
2025-02-13 13:25 ` Gabriele Monaco
2025-02-13 13:55 ` Mathieu Desnoyers
2025-02-13 14:37 ` Gabriele Monaco [this message]
2025-02-13 14:52 ` Mathieu Desnoyers
2025-02-13 14:54 ` Gabriele Monaco
2025-02-13 17:31 ` Mathieu Desnoyers
2025-02-14 6:44 ` Gabriele Monaco
2025-02-10 15:32 ` [PATCH v6 3/3] rseq/selftests: Add test for mm_cid compaction Gabriele Monaco
2025-02-10 16:09 ` [PATCH v6 0/3] sched: Restructure task_mm_cid_work for predictability Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=35fe8e74229af24f45954dd27789363dd5c2f8b8.camel@redhat.com \
--to=gmonaco@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aubrey.li@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=shuah@kernel.org \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.