Keyrings Archive mirror
 help / color / mirror / Atom feed
From: "Jarkko Sakkinen" <jarkko@kernel.org>
To: "Maria Yu" <quic_aiquny@quicinc.com>, <ebiederm@xmission.com>
Cc: <kernel@quicinc.com>, <quic_pkondeti@quicinc.com>,
	<keescook@chromium.or>, <viro@zeniv.linux.org.uk>,
	<brauner@kernel.org>, <oleg@redhat.com>, <dhowells@redhat.com>,
	<paul@paul-moore.com>, <jmorris@namei.org>, <serge@hallyn.com>,
	<linux-mm@kvack.org>, <linux-fsdevel@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <keyrings@vger.kernel.org>,
	<linux-security-module@vger.kernel.org>,
	<linux-arm-msm@vger.kernel.org>
Subject: Re: [PATCH] kernel: Introduce a write lock/unlock wrapper for tasklist_lock
Date: Wed, 03 Jan 2024 16:04:54 +0200	[thread overview]
Message-ID: <CY54MOETXVFI.1102C6BQTO36@suppilovahvero> (raw)
In-Reply-To: <20231225081932.17752-1-quic_aiquny@quicinc.com>

On Mon Dec 25, 2023 at 10:19 AM EET, Maria Yu wrote:
> As a rwlock for tasklist_lock, there are multiple scenarios to acquire
> read lock which write lock needed to be waiting for.
> In freeze_process/thaw_processes it can take about 200+ms for holding read
> lock of tasklist_lock by walking and freezing/thawing tasks in commercial
> devices. And write_lock_irq will have preempt disabled and local irq
> disabled to spin until the tasklist_lock can be acquired. This leading to
> a bad responsive performance of current system.
> Take an example:
> 1. cpu0 is holding read lock of tasklist_lock to thaw_processes.
> 2. cpu1 is waiting write lock of tasklist_lock to exec a new thread with
>    preempt_disabled and local irq disabled.
> 3. cpu2 is waiting write lock of tasklist_lock to do_exit with
>    preempt_disabled and local irq disabled.
> 4. cpu3 is waiting write lock of tasklist_lock to do_exit with
>    preempt_disabled and local irq disabled.
> So introduce a write lock/unlock wrapper for tasklist_lock specificly.
> The current taskslist_lock writers all have write_lock_irq to hold
> tasklist_lock, and write_unlock_irq to release tasklist_lock, that means
> the writers are not suitable or workable to wait on tasklist_lock in irq
> disabled scenarios. So the write lock/unlock wrapper here only follow the
> current design of directly use local_irq_disable and local_irq_enable,
> and not take already irq disabled writer callers into account.
> Use write_trylock in the loop and enabled irq for cpu to repsond if lock
> cannot be taken.
>
> Signed-off-by: Maria Yu <quic_aiquny@quicinc.com>
> ---
>  fs/exec.c                  | 10 +++++-----
>  include/linux/sched/task.h | 29 +++++++++++++++++++++++++++++
>  kernel/exit.c              | 16 ++++++++--------
>  kernel/fork.c              |  6 +++---
>  kernel/ptrace.c            | 12 ++++++------
>  kernel/sys.c               |  8 ++++----
>  security/keys/keyctl.c     |  4 ++--
>  7 files changed, 57 insertions(+), 28 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 4aa19b24f281..030eef6852eb 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1086,7 +1086,7 @@ static int de_thread(struct task_struct *tsk)
>  
>  		for (;;) {
>  			cgroup_threadgroup_change_begin(tsk);
> -			write_lock_irq(&tasklist_lock);
> +			write_lock_tasklist_lock();
>  			/*
>  			 * Do this under tasklist_lock to ensure that
>  			 * exit_notify() can't miss ->group_exec_task
> @@ -1095,7 +1095,7 @@ static int de_thread(struct task_struct *tsk)
>  			if (likely(leader->exit_state))
>  				break;
>  			__set_current_state(TASK_KILLABLE);
> -			write_unlock_irq(&tasklist_lock);
> +			write_unlock_tasklist_lock();
>  			cgroup_threadgroup_change_end(tsk);
>  			schedule();
>  			if (__fatal_signal_pending(tsk))
> @@ -1150,7 +1150,7 @@ static int de_thread(struct task_struct *tsk)
>  		 */
>  		if (unlikely(leader->ptrace))
>  			__wake_up_parent(leader, leader->parent);
> -		write_unlock_irq(&tasklist_lock);
> +		write_unlock_tasklist_lock();
>  		cgroup_threadgroup_change_end(tsk);
>  
>  		release_task(leader);
> @@ -1198,13 +1198,13 @@ static int unshare_sighand(struct task_struct *me)
>  
>  		refcount_set(&newsighand->count, 1);
>  
> -		write_lock_irq(&tasklist_lock);
> +		write_lock_tasklist_lock();
>  		spin_lock(&oldsighand->siglock);
>  		memcpy(newsighand->action, oldsighand->action,
>  		       sizeof(newsighand->action));
>  		rcu_assign_pointer(me->sighand, newsighand);
>  		spin_unlock(&oldsighand->siglock);
> -		write_unlock_irq(&tasklist_lock);
> +		write_unlock_tasklist_lock();
>  
>  		__cleanup_sighand(oldsighand);
>  	}
> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> index a23af225c898..6f69d9a3c868 100644
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -50,6 +50,35 @@ struct kernel_clone_args {
>   * a separate lock).
>   */
>  extern rwlock_t tasklist_lock;
> +
> +/*
> + * Tasklist_lock is a special lock, it takes a good amount of time of
> + * taskslist_lock readers to finish, and the pure write_irq_lock api
> + * will do local_irq_disable at the very first, and put the current cpu
> + * waiting for the lock while is non-responsive for interrupts.
> + *
> + * The current taskslist_lock writers all have write_lock_irq to hold
> + * tasklist_lock, and write_unlock_irq to release tasklist_lock, that
> + * means the writers are not suitable or workable to wait on
> + * tasklist_lock in irq disabled scenarios. So the write lock/unlock
> + * wrapper here only follow the current design of directly use
> + * local_irq_disable and local_irq_enable.
> + */
> +static inline void write_lock_tasklist_lock(void)
> +{
> +	while (1) {
> +		local_irq_disable();
> +		if (write_trylock(&tasklist_lock))
> +			break;
> +		local_irq_enable();
> +		cpu_relax();
> +	}

Maybe:

	local_irq_disable();
	while (!write_trylock(&tasklist_lock)) {
		local_irq_enable();
		cpu_relax();
		local_irq_disable();
	}

BR, Jarkko

  parent reply	other threads:[~2024-01-03 14:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-25  8:19 [PATCH] kernel: Introduce a write lock/unlock wrapper for tasklist_lock Maria Yu
2023-12-25  8:26 ` Aiqun Yu (Maria)
2024-01-03 14:04 ` Jarkko Sakkinen [this message]
  -- strict thread matches above, loose matches on Subject: below --
2023-12-13 10:17 Maria Yu
2023-12-13 16:22 ` Matthew Wilcox
2023-12-13 18:27   ` Eric W. Biederman
2023-12-15  5:52     ` Aiqun Yu (Maria)
2023-12-28 22:20     ` Matthew Wilcox
2024-01-02  2:19       ` Aiqun Yu (Maria)
2024-01-02  9:14         ` Matthew Wilcox
2024-01-03  2:58           ` Aiqun Yu (Maria)
2024-01-03 18:18             ` Matthew Wilcox
2024-01-04  0:46               ` Aiqun Yu (Maria)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CY54MOETXVFI.1102C6BQTO36@suppilovahvero \
    --to=jarkko@kernel.org \
    --cc=brauner@kernel.org \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=jmorris@namei.org \
    --cc=keescook@chromium.or \
    --cc=kernel@quicinc.com \
    --cc=keyrings@vger.kernel.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=paul@paul-moore.com \
    --cc=quic_aiquny@quicinc.com \
    --cc=quic_pkondeti@quicinc.com \
    --cc=serge@hallyn.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).