[PATCH 35/48] rcu: Reduce synchronize_rcu() latency

LKML Archive mirror
 help / color / mirror / Atom feed

From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
To: "Paul E . McKenney" <paulmck@kernel.org>
Cc: RCU <rcu@vger.kernel.org>,
	Neeraj upadhyay <Neeraj.Upadhyay@amd.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Hillf Danton <hdanton@sina.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Subject: [PATCH 35/48] rcu: Reduce synchronize_rcu() latency
Date: Tue,  7 May 2024 11:35:17 +0200	[thread overview]
Message-ID: <20240507093530.3043-36-urezki@gmail.com> (raw)
In-Reply-To: <20240507093530.3043-1-urezki@gmail.com>

A call to a synchronize_rcu() can be optimized from a latency
point of view. Workloads which depend on this can benefit of it.

The delay of wakeme_after_rcu() callback, which unblocks a waiter,
depends on several factors:

- how fast a process of offloading is started. Combination of:
    - !CONFIG_RCU_NOCB_CPU/CONFIG_RCU_NOCB_CPU;
    - !CONFIG_RCU_LAZY/CONFIG_RCU_LAZY;
    - other.
- when started, invoking path is interrupted due to:
    - time limit;
    - need_resched();
    - if limit is reached.
- where in a nocb list it is located;
- how fast previous callbacks completed;

Example:

1. On our embedded devices i can easily trigger the scenario when
it is a last in the list out of ~3600 callbacks:

<snip>
  <...>-29      [001] d..1. 21950.145313: rcu_batch_start: rcu_preempt CBs=3613 bl=28
...
  <...>-29      [001] ..... 21950.152578: rcu_invoke_callback: rcu_preempt rhp=00000000b2d6dee8 func=__free_vm_area_struct.cfi_jt
  <...>-29      [001] ..... 21950.152579: rcu_invoke_callback: rcu_preempt rhp=00000000a446f607 func=__free_vm_area_struct.cfi_jt
  <...>-29      [001] ..... 21950.152580: rcu_invoke_callback: rcu_preempt rhp=00000000a5cab03b func=__free_vm_area_struct.cfi_jt
  <...>-29      [001] ..... 21950.152581: rcu_invoke_callback: rcu_preempt rhp=0000000013b7e5ee func=__free_vm_area_struct.cfi_jt
  <...>-29      [001] ..... 21950.152582: rcu_invoke_callback: rcu_preempt rhp=000000000a8ca6f9 func=__free_vm_area_struct.cfi_jt
  <...>-29      [001] ..... 21950.152583: rcu_invoke_callback: rcu_preempt rhp=000000008f162ca8 func=wakeme_after_rcu.cfi_jt
  <...>-29      [001] d..1. 21950.152625: rcu_batch_end: rcu_preempt CBs-invoked=3612 idle=....
<snip>

2. We use cpuset/cgroup to classify tasks and assign them into
different cgroups. For example "backgrond" group which binds tasks
only to little CPUs or "foreground" which makes use of all CPUs.
Tasks can be migrated between groups by a request if an acceleration
is needed.

See below an example how "surfaceflinger" task gets migrated.
Initially it is located in the "system-background" cgroup which
allows to run only on little cores. In order to speed it up it
can be temporary moved into "foreground" cgroup which allows
to use big/all CPUs:

cgroup_attach_task():
 -> cgroup_migrate_execute()
   -> cpuset_can_attach()
     -> percpu_down_write()
       -> rcu_sync_enter()
         -> synchronize_rcu()
   -> now move tasks to the new cgroup.
 -> cgroup_migrate_finish()

<snip>
         rcuop/1-29      [000] .....  7030.528570: rcu_invoke_callback: rcu_preempt rhp=00000000461605e0 func=wakeme_after_rcu.cfi_jt
    PERFD-SERVER-1855    [000] d..1.  7030.530293: cgroup_attach_task: dst_root=3 dst_id=22 dst_level=1 dst_path=/foreground pid=1900 comm=surfaceflinger
   TimerDispatch-2768    [002] d..5.  7030.537542: sched_migrate_task: comm=surfaceflinger pid=1900 prio=98 orig_cpu=0 dest_cpu=4
<snip>

"Boosting a task" depends on synchronize_rcu() latency:

- first trace shows a completion of synchronize_rcu();
- second shows attaching a task to a new group;
- last shows a final step when migration occurs.

3. To address this drawback, maintain a separate track that consists
of synchronize_rcu() callers only. After completion of a grace period
users are deferred to a dedicated worker to process requests.

4. This patch reduces the latency of synchronize_rcu() approximately
by ~30-40% on synthetic tests. The real test case, camera launch time,
shows(time is in milliseconds):

1-run 542 vs 489 improvement 9%
2-run 540 vs 466 improvement 13%
3-run 518 vs 468 improvement 9%
4-run 531 vs 457 improvement 13%
5-run 548 vs 475 improvement 13%
6-run 509 vs 484 improvement 4%

Synthetic test(no "noise" from other callbacks):
Hardware: x86_64 64 CPUs, 64GB of memory
Linux-6.6

- 10K tasks(simultaneous);
- each task does(1000 loops)
     synchronize_rcu();
     kfree(p);

default: CONFIG_RCU_NOCB_CPU: takes 54 seconds to complete all users;
patch: CONFIG_RCU_NOCB_CPU: takes 35 seconds to complete all users.

Running 60K gives approximately same results on my setup. Please note
it is without any interaction with another type of callbacks, otherwise
it will impact a lot a default case.

5. By default it is disabled. To enable this perform one of the
below sequence:

echo 1 > /sys/module/rcutree/parameters/rcu_normal_wake_from_gp
or pass a boot parameter "rcutree.rcu_normal_wake_from_gp=1"

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Co-developed-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 .../admin-guide/kernel-parameters.txt         |  14 +
 kernel/rcu/tree.c                             | 331 +++++++++++++++++-
 kernel/rcu/tree_exp.h                         |   2 +-
 3 files changed, 345 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index bb884c14b2f6..0a3b0fd1910e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5091,6 +5091,20 @@
 			delay, memory pressure or callback list growing too
 			big.
 
+	rcutree.rcu_normal_wake_from_gp= [KNL]
+			Reduces a latency of synchronize_rcu() call. This approach
+			maintains its own track of synchronize_rcu() callers, so it
+			does not interact with regular callbacks because it does not
+			use a call_rcu[_hurry]() path. Please note, this is for a
+			normal grace period.
+
+			How to enable it:
+
+			echo 1 > /sys/module/rcutree/parameters/rcu_normal_wake_from_gp
+			or pass a boot parameter "rcutree.rcu_normal_wake_from_gp=1"
+
+			Default is 0.
+
 	rcuscale.gp_async= [KNL]
 			Measure performance of asynchronous
 			grace-period primitives such as call_rcu().
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d9642dd06c25..f65255205e44 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -75,6 +75,7 @@
 #define MODULE_PARAM_PREFIX "rcutree."
 
 /* Data structures. */
+static void rcu_sr_normal_gp_cleanup_work(struct work_struct *);
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
 	.gpwrap = true,
@@ -93,6 +94,8 @@ static struct rcu_state rcu_state = {
 	.exp_mutex = __MUTEX_INITIALIZER(rcu_state.exp_mutex),
 	.exp_wake_mutex = __MUTEX_INITIALIZER(rcu_state.exp_wake_mutex),
 	.ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED,
+	.srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work,
+		rcu_sr_normal_gp_cleanup_work),
 };
 
 /* Dump rcu_node combining tree at boot to verify correct setup. */
@@ -1422,6 +1425,282 @@ static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap)
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 }
 
+/*
+ * There is a single llist, which is used for handling
+ * synchronize_rcu() users' enqueued rcu_synchronize nodes.
+ * Within this llist, there are two tail pointers:
+ *
+ * wait tail: Tracks the set of nodes, which need to
+ *            wait for the current GP to complete.
+ * done tail: Tracks the set of nodes, for which grace
+ *            period has elapsed. These nodes processing
+ *            will be done as part of the cleanup work
+ *            execution by a kworker.
+ *
+ * At every grace period init, a new wait node is added
+ * to the llist. This wait node is used as wait tail
+ * for this new grace period. Given that there are a fixed
+ * number of wait nodes, if all wait nodes are in use
+ * (which can happen when kworker callback processing
+ * is delayed) and additional grace period is requested.
+ * This means, a system is slow in processing callbacks.
+ *
+ * TODO: If a slow processing is detected, a first node
+ * in the llist should be used as a wait-tail for this
+ * grace period, therefore users which should wait due
+ * to a slow process are handled by _this_ grace period
+ * and not next.
+ *
+ * Below is an illustration of how the done and wait
+ * tail pointers move from one set of rcu_synchronize nodes
+ * to the other, as grace periods start and finish and
+ * nodes are processed by kworker.
+ *
+ *
+ * a. Initial llist callbacks list:
+ *
+ * +----------+           +--------+          +-------+
+ * |          |           |        |          |       |
+ * |   head   |---------> |   cb2  |--------->| cb1   |
+ * |          |           |        |          |       |
+ * +----------+           +--------+          +-------+
+ *
+ *
+ *
+ * b. New GP1 Start:
+ *
+ *                    WAIT TAIL
+ *                      |
+ *                      |
+ *                      v
+ * +----------+     +--------+      +--------+        +-------+
+ * |          |     |        |      |        |        |       |
+ * |   head   ------> wait   |------>   cb2  |------> |  cb1  |
+ * |          |     | head1  |      |        |        |       |
+ * +----------+     +--------+      +--------+        +-------+
+ *
+ *
+ *
+ * c. GP completion:
+ *
+ * WAIT_TAIL == DONE_TAIL
+ *
+ *                   DONE TAIL
+ *                     |
+ *                     |
+ *                     v
+ * +----------+     +--------+      +--------+        +-------+
+ * |          |     |        |      |        |        |       |
+ * |   head   ------> wait   |------>   cb2  |------> |  cb1  |
+ * |          |     | head1  |      |        |        |       |
+ * +----------+     +--------+      +--------+        +-------+
+ *
+ *
+ *
+ * d. New callbacks and GP2 start:
+ *
+ *                    WAIT TAIL                          DONE TAIL
+ *                      |                                 |
+ *                      |                                 |
+ *                      v                                 v
+ * +----------+     +------+    +------+    +------+    +-----+    +-----+    +-----+
+ * |          |     |      |    |      |    |      |    |     |    |     |    |     |
+ * |   head   ------> wait |--->|  cb4 |--->| cb3  |--->|wait |--->| cb2 |--->| cb1 |
+ * |          |     | head2|    |      |    |      |    |head1|    |     |    |     |
+ * +----------+     +------+    +------+    +------+    +-----+    +-----+    +-----+
+ *
+ *
+ *
+ * e. GP2 completion:
+ *
+ * WAIT_TAIL == DONE_TAIL
+ *                   DONE TAIL
+ *                      |
+ *                      |
+ *                      v
+ * +----------+     +------+    +------+    +------+    +-----+    +-----+    +-----+
+ * |          |     |      |    |      |    |      |    |     |    |     |    |     |
+ * |   head   ------> wait |--->|  cb4 |--->| cb3  |--->|wait |--->| cb2 |--->| cb1 |
+ * |          |     | head2|    |      |    |      |    |head1|    |     |    |     |
+ * +----------+     +------+    +------+    +------+    +-----+    +-----+    +-----+
+ *
+ *
+ * While the llist state transitions from d to e, a kworker
+ * can start executing rcu_sr_normal_gp_cleanup_work() and
+ * can observe either the old done tail (@c) or the new
+ * done tail (@e). So, done tail updates and reads need
+ * to use the rel-acq semantics. If the concurrent kworker
+ * observes the old done tail, the newly queued work
+ * execution will process the updated done tail. If the
+ * concurrent kworker observes the new done tail, then
+ * the newly queued work will skip processing the done
+ * tail, as workqueue semantics guarantees that the new
+ * work is executed only after the previous one completes.
+ *
+ * f. kworker callbacks processing complete:
+ *
+ *
+ *                   DONE TAIL
+ *                     |
+ *                     |
+ *                     v
+ * +----------+     +--------+
+ * |          |     |        |
+ * |   head   ------> wait   |
+ * |          |     | head2  |
+ * +----------+     +--------+
+ *
+ */
+static bool rcu_sr_is_wait_head(struct llist_node *node)
+{
+	return &(rcu_state.srs_wait_nodes)[0].node <= node &&
+		node <= &(rcu_state.srs_wait_nodes)[SR_NORMAL_GP_WAIT_HEAD_MAX - 1].node;
+}
+
+static struct llist_node *rcu_sr_get_wait_head(void)
+{
+	struct sr_wait_node *sr_wn;
+	int i;
+
+	for (i = 0; i < SR_NORMAL_GP_WAIT_HEAD_MAX; i++) {
+		sr_wn = &(rcu_state.srs_wait_nodes)[i];
+
+		if (!atomic_cmpxchg_acquire(&sr_wn->inuse, 0, 1))
+			return &sr_wn->node;
+	}
+
+	return NULL;
+}
+
+static void rcu_sr_put_wait_head(struct llist_node *node)
+{
+	struct sr_wait_node *sr_wn = container_of(node, struct sr_wait_node, node);
+
+	atomic_set_release(&sr_wn->inuse, 0);
+}
+
+/* Disabled by default. */
+static int rcu_normal_wake_from_gp;
+module_param(rcu_normal_wake_from_gp, int, 0644);
+
+static void rcu_sr_normal_complete(struct llist_node *node)
+{
+	struct rcu_synchronize *rs = container_of(
+		(struct rcu_head *) node, struct rcu_synchronize, head);
+	unsigned long oldstate = (unsigned long) rs->head.func;
+
+	WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) &&
+		!poll_state_synchronize_rcu(oldstate),
+		"A full grace period is not passed yet: %lu",
+		rcu_seq_diff(get_state_synchronize_rcu(), oldstate));
+
+	/* Finally. */
+	complete(&rs->completion);
+}
+
+static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work)
+{
+	struct llist_node *done, *rcu, *next, *head;
+
+	/*
+	 * This work execution can potentially execute
+	 * while a new done tail is being updated by
+	 * grace period kthread in rcu_sr_normal_gp_cleanup().
+	 * So, read and updates of done tail need to
+	 * follow acq-rel semantics.
+	 *
+	 * Given that wq semantics guarantees that a single work
+	 * cannot execute concurrently by multiple kworkers,
+	 * the done tail list manipulations are protected here.
+	 */
+	done = smp_load_acquire(&rcu_state.srs_done_tail);
+	if (!done)
+		return;
+
+	WARN_ON_ONCE(!rcu_sr_is_wait_head(done));
+	head = done->next;
+	done->next = NULL;
+
+	/*
+	 * The dummy node, which is pointed to by the
+	 * done tail which is acq-read above is not removed
+	 * here.  This allows lockless additions of new
+	 * rcu_synchronize nodes in rcu_sr_normal_add_req(),
+	 * while the cleanup work executes. The dummy
+	 * nodes is removed, in next round of cleanup
+	 * work execution.
+	 */
+	llist_for_each_safe(rcu, next, head) {
+		if (!rcu_sr_is_wait_head(rcu)) {
+			rcu_sr_normal_complete(rcu);
+			continue;
+		}
+
+		rcu_sr_put_wait_head(rcu);
+	}
+}
+
+/*
+ * Helper function for rcu_gp_cleanup().
+ */
+static void rcu_sr_normal_gp_cleanup(void)
+{
+	struct llist_node *wait_tail;
+
+	wait_tail = rcu_state.srs_wait_tail;
+	if (wait_tail == NULL)
+		return;
+
+	rcu_state.srs_wait_tail = NULL;
+	ASSERT_EXCLUSIVE_WRITER(rcu_state.srs_wait_tail);
+
+	// concurrent sr_normal_gp_cleanup work might observe this update.
+	smp_store_release(&rcu_state.srs_done_tail, wait_tail);
+	ASSERT_EXCLUSIVE_WRITER(rcu_state.srs_done_tail);
+
+	schedule_work(&rcu_state.srs_cleanup_work);
+}
+
+/*
+ * Helper function for rcu_gp_init().
+ */
+static bool rcu_sr_normal_gp_init(void)
+{
+	struct llist_node *first;
+	struct llist_node *wait_head;
+	bool start_new_poll = false;
+
+	first = READ_ONCE(rcu_state.srs_next.first);
+	if (!first || rcu_sr_is_wait_head(first))
+		return start_new_poll;
+
+	wait_head = rcu_sr_get_wait_head();
+	if (!wait_head) {
+		// Kick another GP to retry.
+		start_new_poll = true;
+		return start_new_poll;
+	}
+
+	/* Inject a wait-dummy-node. */
+	llist_add(wait_head, &rcu_state.srs_next);
+
+	/*
+	 * A waiting list of rcu_synchronize nodes should be empty on
+	 * this step, since a GP-kthread, rcu_gp_init() -> gp_cleanup(),
+	 * rolls it over. If not, it is a BUG, warn a user.
+	 */
+	WARN_ON_ONCE(rcu_state.srs_wait_tail != NULL);
+	rcu_state.srs_wait_tail = wait_head;
+	ASSERT_EXCLUSIVE_WRITER(rcu_state.srs_wait_tail);
+
+	return start_new_poll;
+}
+
+static void rcu_sr_normal_add_req(struct rcu_synchronize *rs)
+{
+	llist_add((struct llist_node *) &rs->head, &rcu_state.srs_next);
+}
+
 /*
  * Initialize a new grace period.  Return false if no grace period required.
  */
@@ -1432,6 +1711,7 @@ static noinline_for_stack bool rcu_gp_init(void)
 	unsigned long mask;
 	struct rcu_data *rdp;
 	struct rcu_node *rnp = rcu_get_root();
+	bool start_new_poll;
 
 	WRITE_ONCE(rcu_state.gp_activity, jiffies);
 	raw_spin_lock_irq_rcu_node(rnp);
@@ -1456,10 +1736,24 @@ static noinline_for_stack bool rcu_gp_init(void)
 	/* Record GP times before starting GP, hence rcu_seq_start(). */
 	rcu_seq_start(&rcu_state.gp_seq);
 	ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
+	start_new_poll = rcu_sr_normal_gp_init();
 	trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
 	rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap);
 	raw_spin_unlock_irq_rcu_node(rnp);
 
+	/*
+	 * The "start_new_poll" is set to true, only when this GP is not able
+	 * to handle anything and there are outstanding users. It happens when
+	 * the rcu_sr_normal_gp_init() function was not able to insert a dummy
+	 * separator to the llist, because there were no left any dummy-nodes.
+	 *
+	 * Number of dummy-nodes is fixed, it could be that we are run out of
+	 * them, if so we start a new pool request to repeat a try. It is rare
+	 * and it means that a system is doing a slow processing of callbacks.
+	 */
+	if (start_new_poll)
+		(void) start_poll_synchronize_rcu();
+
 	/*
 	 * Apply per-leaf buffered online and offline operations to
 	 * the rcu_node tree. Note that this new grace period need not
@@ -1825,6 +2119,9 @@ static noinline void rcu_gp_cleanup(void)
 	}
 	raw_spin_unlock_irq_rcu_node(rnp);
 
+	// Make synchronize_rcu() users aware of the end of old grace period.
+	rcu_sr_normal_gp_cleanup();
+
 	// If strict, make all CPUs aware of the end of the old grace period.
 	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
 		on_each_cpu(rcu_strict_gp_boundary, NULL, 0);
@@ -3559,6 +3856,38 @@ static int rcu_blocking_is_gp(void)
 	return true;
 }
 
+/*
+ * Helper function for the synchronize_rcu() API.
+ */
+static void synchronize_rcu_normal(void)
+{
+	struct rcu_synchronize rs;
+
+	if (!READ_ONCE(rcu_normal_wake_from_gp)) {
+		wait_rcu_gp(call_rcu_hurry);
+		return;
+	}
+
+	init_rcu_head_on_stack(&rs.head);
+	init_completion(&rs.completion);
+
+	/*
+	 * This code might be preempted, therefore take a GP
+	 * snapshot before adding a request.
+	 */
+	if (IS_ENABLED(CONFIG_PROVE_RCU))
+		rs.head.func = (void *) get_state_synchronize_rcu();
+
+	rcu_sr_normal_add_req(&rs);
+
+	/* Kick a GP and start waiting. */
+	(void) start_poll_synchronize_rcu();
+
+	/* Now we can wait. */
+	wait_for_completion(&rs.completion);
+	destroy_rcu_head_on_stack(&rs.head);
+}
+
 /**
  * synchronize_rcu - wait until a grace period has elapsed.
  *
@@ -3610,7 +3939,7 @@ void synchronize_rcu(void)
 		if (rcu_gp_is_expedited())
 			synchronize_rcu_expedited();
 		else
-			wait_rcu_gp(call_rcu_hurry);
+			synchronize_rcu_normal();
 		return;
 	}
 
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 6b83537480b1..8a1d9c8bd9f7 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -930,7 +930,7 @@ void synchronize_rcu_expedited(void)
 
 	/* If expedited grace periods are prohibited, fall back to normal. */
 	if (rcu_gp_is_normal()) {
-		wait_rcu_gp(call_rcu_hurry);
+		synchronize_rcu_normal();
 		return;
 	}
 
-- 
2.39.2

next prev parent reply	other threads:[~2024-05-07  9:36 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-07  9:34 [PATCH 00/48] RCU changes for v6.10 Uladzislau Rezki (Sony)
2024-05-07  9:34 ` [PATCH 01/48] scftorture: Increase memory provided to guest OS Uladzislau Rezki (Sony)
2024-05-07  9:34 ` [PATCH 02/48] rcutorture: Disable tracing to permit Tasks Rude RCU testing Uladzislau Rezki (Sony)
2024-05-07  9:34 ` [PATCH 03/48] rcu: Add lockdep checks and kernel-doc header to rcu_softirq_qs() Uladzislau Rezki (Sony)
2024-05-07  9:34 ` [PATCH 04/48] rcutorture: Enable RCU priority boosting for TREE09 Uladzislau Rezki (Sony)
2024-05-07  9:34 ` [PATCH 05/48] rcutorture: Dump # online CPUs on insufficient cb-flood laundering Uladzislau Rezki (Sony)
2024-05-07  9:34 ` [PATCH 06/48] rcutorture: Dump GP kthread state " Uladzislau Rezki (Sony)
2024-05-07  9:34 ` [PATCH 07/48] rcutorture: ASSERT_EXCLUSIVE_WRITER() for ->rtort_pipe_count updates Uladzislau Rezki (Sony)
2024-05-07  9:34 ` [PATCH 08/48] rcu-tasks: Make Tasks RCU wait idly for grace-period delays Uladzislau Rezki (Sony)
2024-05-07  9:34 ` [PATCH 09/48] doc: Remove references to arrayRCU.rst Uladzislau Rezki (Sony)
2024-05-07  9:34 ` [PATCH 10/48] MAINTAINERS: Update Neeraj's email address Uladzislau Rezki (Sony)
2024-05-07  9:34 ` [PATCH 11/48] bpf: Select new NEED_TASKS_RCU Kconfig option Uladzislau Rezki (Sony)
2024-05-07  9:34 ` [PATCH 12/48] arch: " Uladzislau Rezki (Sony)
2024-05-07  9:34 ` [PATCH 13/48] rcu: Add data structures for synchronize_rcu() Uladzislau Rezki (Sony)
2024-05-07  9:34 ` [PATCH 14/48] tracing: Select new NEED_TASKS_RCU Kconfig option Uladzislau Rezki (Sony)
2024-05-07  9:34 ` [PATCH 15/48] bpf: Choose RCU Tasks based on TASKS_RCU rather than PREEMPTION Uladzislau Rezki (Sony)
2024-05-07  9:34 ` [PATCH 16/48] ftrace: " Uladzislau Rezki (Sony)
2024-05-07  9:34 ` [PATCH 17/48] rcu: Update lockdep while in RCU read-side critical section Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 18/48] rcu: Make TINY_RCU depend on !PREEMPT_RCU rather than !PREEMPTION Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 19/48] srcu: Make Tiny SRCU explicitly disable preemption Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 20/48] rcu: Create NEED_TASKS_RCU to factor out enablement logic Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 21/48] rcu: Remove redundant BH disabling in TINY_RCU Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 22/48] rcu: Make Tiny RCU explicitly disable preemption Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 23/48] rcu: Remove redundant READ_ONCE() of rcu_state.gp_flags in tree.c Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 24/48] rcu: Bring diagnostic read of rcu_state.gp_flags into alignment Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field Uladzislau Rezki (Sony)
2024-05-07 14:54   ` Oleg Nesterov
2024-05-07 17:44     ` Paul E. McKenney
2024-05-09 15:13       ` Oleg Nesterov
2024-05-10  3:59         ` Paul E. McKenney
2024-05-10 11:31           ` Oleg Nesterov
2024-05-10 14:00             ` Alan Huang
2024-05-10 14:04             ` Paul E. McKenney
2024-05-12 10:53               ` Oleg Nesterov
2024-05-12 14:57                 ` Paul E. McKenney
2024-05-10 13:18       ` Oleg Nesterov
2024-05-10 13:50         ` Oleg Nesterov
2024-05-10 14:11           ` Paul E. McKenney
2024-05-12 10:41             ` Oleg Nesterov
2024-05-13 14:13             ` Marco Elver
2024-05-14 16:54               ` Paul E. McKenney
2024-05-10 13:58         ` Paul E. McKenney
2024-05-12 11:19   ` [PATCH] rcu/sync: don't read rcu_sync->gp_count lockless Oleg Nesterov
2024-05-12 15:10     ` Paul E. McKenney
2024-05-12 16:55       ` Oleg Nesterov
2024-05-12 17:11         ` Paul E. McKenney
2024-05-12 18:06           ` Oleg Nesterov
2024-05-07  9:35 ` [PATCH 26/48] rcu: Mark loads from rcu_state.n_online_cpus Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 27/48] rcu: Make hotplug operations track GP state, not flags Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 28/48] rcu: Inform KCSAN of one-byte cmpxchg() in rcu_trc_cmpxchg_need_qs() Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 29/48] rcu: Remove redundant CONFIG_PROVE_RCU #if condition Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 30/48] rcu-tasks: Replace exit_tasks_rcu_start() initialization with WARN_ON_ONCE() Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 31/48] rcu-tasks: Fix the comments for tasks_rcu_exit_srcu_stall_timer Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 32/48] rcu-tasks: Fix show_rcu_tasks_trace_gp_kthread buffer overflow Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 33/48] rcu: Mollify sparse with RCU guard Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 34/48] rcu: Fix buffer overflow in print_cpu_stall_info() Uladzislau Rezki (Sony)
2024-05-07  9:35 ` Uladzislau Rezki (Sony) [this message]
2024-05-07  9:35 ` [PATCH 36/48] rcu: Add a trace event for synchronize_rcu_normal() Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 37/48] rcu: Support direct wake-up of synchronize_rcu() users Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 38/48] rcu: Allocate WQ with WQ_MEM_RECLAIM bit set Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 39/48] rcutorture: Remove extraneous rcu_torture_pipe_update_one() READ_ONCE() Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 40/48] rcutorture: Fix rcu_torture_one_read() pipe_count overflow comment Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 41/48] rcutorture: Re-use value stored to ->rtort_pipe_count instead of re-reading Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 42/48] rcutorture: Use the gp_kthread_dbg operation specified by cur_ops Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 43/48] rcutorture: Make rcutorture support print rcu-tasks gp state Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 44/48] rcutorture: Removing redundant function pointer initialization Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 45/48] rcutorture: Make stall-tasks directly exit when rcutorture tests end Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 46/48] rcutorture: Fix invalid context warning when enable srcu barrier testing Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 47/48] torture: Scale --do-kvfree test time Uladzislau Rezki (Sony)
2024-05-07  9:35 ` [PATCH 48/48] rcutorture: Use rcu_gp_slow_register/unregister() only for rcutype test Uladzislau Rezki (Sony)

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:bb884c14b2f dfblob:0a3b0fd1910 dfblob:d9642dd06c2
dfblob:f65255205e4 dfblob:6b83537480b dfblob:8a1d9c8bd9f )
 OR (
bs:"[PATCH 35/48] rcu: Reduce synchronize_rcu() latency" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240507093530.3043-36-urezki@gmail.com \
    --to=urezki@gmail.com \
    --cc=Neeraj.Upadhyay@amd.com \
    --cc=boqun.feng@gmail.com \
    --cc=frederic@kernel.org \
    --cc=hdanton@sina.com \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neeraj.iitr10@gmail.com \
    --cc=oleksiy.avramchenko@sony.com \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).