* [PATCH v4 0/4] Fix some warnings about rq clock
@ 2023-06-08 6:33 Hao Jia
2023-06-08 6:33 ` [PATCH v4 1/4] sched/core: Fixed missing rq clock update before calling set_rq_offline() Hao Jia
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Hao Jia @ 2023-06-08 6:33 UTC (permalink / raw)
To: mingo, peterz, mingo, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
mgorman
Cc: linux-kernel, Hao Jia
These four patches fix some warnings about rq clock.
Patch1 fixes the warning of using the old rq clock caused by
missing update rq clock.
Patch2-4 fixes the warning that the rq clock was updated multiple
times while holding the rq lock.
v3->v4:
- Add Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> for
all patches.
v2->v3:
- Modify the commit information of patch1 to make the description clearer.
- Split v2 patch2 into 3 separate patches.
v1->v2:
- Vincent Guittot suggested using rq_clock_start_loop_update()
to prevent multiple calls to update_rq_clock() in unthrottle_cfs_rq()
instead of removing update_rq_clock() from unthrottle_cfs_rq()
and updating the rq clock before unthrottle_cfs_rq() for patch2.
[v2] https://lore.kernel.org/all/20230510083450.62334-1-jiahao.os@bytedance.com
[v1] https://lore.kernel.org/all/20230410081206.23441-1-jiahao.os@bytedance.com
Hao Jia (4):
sched/core: Fixed missing rq clock update before calling
set_rq_offline()
sched/core: Avoid double calling update_rq_clock() in
__balance_push_cpu_stop()
sched/core: Avoid multiple calling update_rq_clock() in
__cfsb_csd_unthrottle()
sched/core: Avoid multiple calling update_rq_clock() in
unthrottle_offline_cfs_rqs()
kernel/sched/core.c | 7 ++++---
kernel/sched/fair.c | 18 ++++++++++++++++++
kernel/sched/sched.h | 21 +++++++++++++++++++++
kernel/sched/topology.c | 10 ++++++----
4 files changed, 49 insertions(+), 7 deletions(-)
--
2.37.0 (Apple Git-136)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v4 1/4] sched/core: Fixed missing rq clock update before calling set_rq_offline()
2023-06-08 6:33 [PATCH v4 0/4] Fix some warnings about rq clock Hao Jia
@ 2023-06-08 6:33 ` Hao Jia
2023-06-08 6:33 ` [PATCH v4 2/4] sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop() Hao Jia
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: Hao Jia @ 2023-06-08 6:33 UTC (permalink / raw)
To: mingo, peterz, mingo, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
mgorman
Cc: linux-kernel, Hao Jia
This is triggered during cpu offline when CONFIG_CPU_FREQ is enabled
and cpufreq is set to powersave:
------------[ cut here ]------------
rq->clock_update_flags < RQCF_ACT_SKIP
WARNING: CPU: 24 PID: 754 at kernel/sched/sched.h:1496
enqueue_top_rt_rq+0x139/0x160
Call Trace:
<TASK>
? intel_pstate_update_util+0x3b0/0x3b0
rq_offline_rt+0x1b7/0x250
set_rq_offline.part.120+0x28/0x60
rq_attach_root+0xc4/0xd0
cpu_attach_domain+0x3dc/0x7f0
? __schedule+0x65e/0x1310
partition_sched_domains_locked+0x2a5/0x3c0
rebuild_sched_domains_locked+0x477/0x830
? percpu_rwsem_wait+0x140/0x140
rebuild_sched_domains+0x1b/0x30
cpuset_hotplug_workfn+0x2ca/0xc90
? balance_push+0x56/0x120
? _raw_spin_unlock+0x15/0x30
? finish_task_switch+0x98/0x2f0
? __switch_to+0x116/0x410
? __schedule+0x65e/0x1310 ? internal_add_timer+0x42/0x60
? _raw_spin_unlock_irqrestore+0x23/0x40
? add_timer_on+0xd5/0x130
process_one_work+0x1bc/0x3d0
worker_thread+0x4c/0x380
? preempt_count_add+0x56/0xa0
? rescuer_thread+0x310/0x310
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
More detailed key function call graph:
rq_offline_rt()
__disable_runtime()
sched_rt_rq_enqueue()
enqueue_top_rt_rq()
cpufreq_update_util() <-- depends on CONFIG_CPU_FREQ
data->func(data, *rq_clock(rq)*, flags)
intel_pstate_update_util() <-- powersave policy callback function
Before calling set_rq_offline() we need to update the rq clock to avoid
using the old rq clock, and use rq_lock_irqsave()/rq_unlock_irqrestore()
to replace raw_spin_rq_lock_irqsave()/raw_spin_rq_unlock_irqrestore() to
ensure that rq->clock_update_flags are cleared before updating the rq
clock.
Steps to reproduce:
1. Enable CONFIG_SMP and CONFIG_CPU_FREQ when compiling the kernel
2. echo 1 > /sys/kernel/debug/clear_warn_once
3. cpupower -c all frequency-set -g powersave
4. Run some rt tasks e.g. Create 5*n rt (100% running) tasks (on a
system with n CPUs)
5. Offline cpu one by one until the warninng is triggered
Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/topology.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 6682535e37c8..b89497696880 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -487,15 +487,17 @@ static void free_rootdomain(struct rcu_head *rcu)
void rq_attach_root(struct rq *rq, struct root_domain *rd)
{
struct root_domain *old_rd = NULL;
- unsigned long flags;
+ struct rq_flags rf;
- raw_spin_rq_lock_irqsave(rq, flags);
+ rq_lock_irqsave(rq, &rf);
if (rq->rd) {
old_rd = rq->rd;
- if (cpumask_test_cpu(rq->cpu, old_rd->online))
+ if (cpumask_test_cpu(rq->cpu, old_rd->online)) {
+ update_rq_clock(rq);
set_rq_offline(rq);
+ }
cpumask_clear_cpu(rq->cpu, old_rd->span);
@@ -515,7 +517,7 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
if (cpumask_test_cpu(rq->cpu, cpu_active_mask))
set_rq_online(rq);
- raw_spin_rq_unlock_irqrestore(rq, flags);
+ rq_unlock_irqrestore(rq, &rf);
if (old_rd)
call_rcu(&old_rd->rcu, free_rootdomain);
--
2.37.0 (Apple Git-136)
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v4 2/4] sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()
2023-06-08 6:33 [PATCH v4 0/4] Fix some warnings about rq clock Hao Jia
2023-06-08 6:33 ` [PATCH v4 1/4] sched/core: Fixed missing rq clock update before calling set_rq_offline() Hao Jia
@ 2023-06-08 6:33 ` Hao Jia
2023-06-08 6:33 ` [PATCH v4 3/4] sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle() Hao Jia
2023-06-08 6:33 ` [PATCH v4 4/4] sched/core: Avoid multiple calling update_rq_clock() in unthrottle_offline_cfs_rqs() Hao Jia
3 siblings, 0 replies; 9+ messages in thread
From: Hao Jia @ 2023-06-08 6:33 UTC (permalink / raw)
To: mingo, peterz, mingo, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
mgorman
Cc: linux-kernel, Hao Jia
The WARN_DOUBLE_CLOCK warning is triggered during cpu offline.
------------[ cut here ]------------
rq->clock_update_flags & RQCF_UPDATED
WARNING: CPU: 17 PID: 138 at kernel/sched/core.c:741
update_rq_clock+0xaf/0x180
Call Trace:
<TASK>
__balance_push_cpu_stop+0x146/0x180
? migration_cpu_stop+0x2a0/0x2a0
cpu_stopper_thread+0xa3/0x140
smpboot_thread_fn+0x14f/0x210
? sort_range+0x20/0x20
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
To avoid this warning, we remove update_rq_clock() from
the __migrate_task() function. And in order to avoid
missing rq clock update, add update_rq_clock() call before
migration_cpu_stop() calls __migrate_task().
Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/core.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a68d1276bab0..1fd87657f521 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2398,7 +2398,6 @@ static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
if (!is_cpu_allowed(p, dest_cpu))
return rq;
- update_rq_clock(rq);
rq = move_queued_task(rq, rf, p, dest_cpu);
return rq;
@@ -2456,10 +2455,12 @@ static int migration_cpu_stop(void *data)
goto out;
}
- if (task_on_rq_queued(p))
+ if (task_on_rq_queued(p)) {
+ update_rq_clock(rq);
rq = __migrate_task(rq, &rf, p, arg->dest_cpu);
- else
+ } else {
p->wake_cpu = arg->dest_cpu;
+ }
/*
* XXX __migrate_task() can fail, at which point we might end
--
2.37.0 (Apple Git-136)
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v4 3/4] sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()
2023-06-08 6:33 [PATCH v4 0/4] Fix some warnings about rq clock Hao Jia
2023-06-08 6:33 ` [PATCH v4 1/4] sched/core: Fixed missing rq clock update before calling set_rq_offline() Hao Jia
2023-06-08 6:33 ` [PATCH v4 2/4] sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop() Hao Jia
@ 2023-06-08 6:33 ` Hao Jia
2023-06-08 6:33 ` [PATCH v4 4/4] sched/core: Avoid multiple calling update_rq_clock() in unthrottle_offline_cfs_rqs() Hao Jia
3 siblings, 0 replies; 9+ messages in thread
From: Hao Jia @ 2023-06-08 6:33 UTC (permalink / raw)
To: mingo, peterz, mingo, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
mgorman
Cc: linux-kernel, Hao Jia
After commit 8ad075c2eb1f ("sched: Async unthrottling for cfs
bandwidth"), we may update the rq clock multiple times in the loop of
__cfsb_csd_unthrottle(). At that time the following warning will be
triggered.
------------[ cut here ]------------
rq->clock_update_flags & RQCF_UPDATED
WARNING: CPU: 54 PID: 0 at kernel/sched/core.c:741
update_rq_clock+0xaf/0x180
Call Trace:
<TASK>
unthrottle_cfs_rq+0x4b/0x300
__cfsb_csd_unthrottle+0xe0/0x100
__flush_smp_call_function_queue+0xaf/0x1d0
flush_smp_call_function_queue+0x49/0x90
do_idle+0x17c/0x270
cpu_startup_entry+0x19/0x20
start_secondary+0xfa/0x120
secondary_startup_64_no_verify+0xce/0xdb
Before the loop starts, we update the rq clock once and call
rq_clock_start_loop_update() to prevent updating the rq clock
multiple times. And call rq_clock_stop_loop_update() After
the loop to clear rq->clock_update_flags.
Fixes: 8ad075c2eb1f ("sched: Async unthrottling for cfs bandwidth")
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/fair.c | 9 +++++++++
kernel/sched/sched.h | 21 +++++++++++++++++++++
2 files changed, 30 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 373ff5f55884..af9604f4b135 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5576,6 +5576,14 @@ static void __cfsb_csd_unthrottle(void *arg)
rq_lock(rq, &rf);
+ /*
+ * Iterating over the list can trigger several call to
+ * update_rq_clock() in unthrottle_cfs_rq().
+ * Do it once and skip the potential next ones.
+ */
+ update_rq_clock(rq);
+ rq_clock_start_loop_update(rq);
+
/*
* Since we hold rq lock we're safe from concurrent manipulation of
* the CSD list. However, this RCU critical section annotates the
@@ -5595,6 +5603,7 @@ static void __cfsb_csd_unthrottle(void *arg)
rcu_read_unlock();
+ rq_clock_stop_loop_update(rq);
rq_unlock(rq, &rf);
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ec7b3e0a2b20..50446e401b9f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1546,6 +1546,27 @@ static inline void rq_clock_cancel_skipupdate(struct rq *rq)
rq->clock_update_flags &= ~RQCF_REQ_SKIP;
}
+/*
+ * During cpu offlining and rq wide unthrottling, we can trigger
+ * an update_rq_clock() for several cfs and rt runqueues (Typically
+ * when using list_for_each_entry_*)
+ * rq_clock_start_loop_update() can be called after updating the clock
+ * once and before iterating over the list to prevent multiple update.
+ * After the iterative traversal, we need to call rq_clock_stop_loop_update()
+ * to clear RQCF_ACT_SKIP of rq->clock_update_flags.
+ */
+static inline void rq_clock_start_loop_update(struct rq *rq)
+{
+ lockdep_assert_rq_held(rq);
+ rq->clock_update_flags |= RQCF_ACT_SKIP;
+}
+
+static inline void rq_clock_stop_loop_update(struct rq *rq)
+{
+ lockdep_assert_rq_held(rq);
+ rq->clock_update_flags &= ~RQCF_ACT_SKIP;
+}
+
struct rq_flags {
unsigned long flags;
struct pin_cookie cookie;
--
2.37.0 (Apple Git-136)
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v4 4/4] sched/core: Avoid multiple calling update_rq_clock() in unthrottle_offline_cfs_rqs()
2023-06-08 6:33 [PATCH v4 0/4] Fix some warnings about rq clock Hao Jia
` (2 preceding siblings ...)
2023-06-08 6:33 ` [PATCH v4 3/4] sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle() Hao Jia
@ 2023-06-08 6:33 ` Hao Jia
2023-06-08 21:06 ` Benjamin Segall
3 siblings, 1 reply; 9+ messages in thread
From: Hao Jia @ 2023-06-08 6:33 UTC (permalink / raw)
To: mingo, peterz, mingo, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
mgorman
Cc: linux-kernel, Hao Jia
This WARN_DOUBLE_CLOCK warning is triggered during cpu offline.
------------[ cut here ]------------
rq->clock_update_flags & RQCF_UPDATED
WARNING: CPU: 0 PID: 3323 at kernel/sched/core.c:741
update_rq_clock+0xaf/0x180
Call Trace:
<TASK>
unthrottle_cfs_rq+0x4b/0x300
rq_offline_fair+0x89/0x90
set_rq_offline.part.118+0x28/0x60
rq_attach_root+0xc4/0xd0
cpu_attach_domain+0x3dc/0x7f0
partition_sched_domains_locked+0x2a5/0x3c0
rebuild_sched_domains_locked+0x477/0x830
rebuild_sched_domains+0x1b/0x30
cpuset_hotplug_workfn+0x2ca/0xc90
? balance_push+0x56/0xf0
? _raw_spin_unlock+0x15/0x30
? finish_task_switch+0x98/0x2f0
? __switch_to+0x291/0x410
? __schedule+0x65e/0x1310
process_one_work+0x1bc/0x3d0
worker_thread+0x4c/0x380
? preempt_count_add+0x92/0xa0
? rescuer_thread+0x310/0x310
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
The rq clock has been updated before the set_rq_offline()
function runs, so we don't need to call update_rq_clock() in
unthrottle_offline_cfs_rqs().
We only need to call rq_clock_start_loop_update() before the
loop starts and rq_clock_stop_loop_update() after the loop
to avoid this warning.
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/fair.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index af9604f4b135..9e961e0ec971 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6124,6 +6124,13 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
lockdep_assert_rq_held(rq);
+ /*
+ * The rq clock has already been updated before the
+ * set_rq_offline() runs, so we should skip updating
+ * the rq clock again in unthrottle_cfs_rq().
+ */
+ rq_clock_start_loop_update(rq);
+
rcu_read_lock();
list_for_each_entry_rcu(tg, &task_groups, list) {
struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
@@ -6146,6 +6153,8 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
unthrottle_cfs_rq(cfs_rq);
}
rcu_read_unlock();
+
+ rq_clock_stop_loop_update(rq);
}
#else /* CONFIG_CFS_BANDWIDTH */
--
2.37.0 (Apple Git-136)
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v4 4/4] sched/core: Avoid multiple calling update_rq_clock() in unthrottle_offline_cfs_rqs()
2023-06-08 6:33 ` [PATCH v4 4/4] sched/core: Avoid multiple calling update_rq_clock() in unthrottle_offline_cfs_rqs() Hao Jia
@ 2023-06-08 21:06 ` Benjamin Segall
2023-06-09 3:19 ` [External] " Hao Jia
0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Segall @ 2023-06-08 21:06 UTC (permalink / raw)
To: Hao Jia
Cc: mingo, peterz, mingo, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, mgorman, bristot, vschneid, mgorman,
linux-kernel
Hao Jia <jiahao.os@bytedance.com> writes:
> This WARN_DOUBLE_CLOCK warning is triggered during cpu offline.
> ------------[ cut here ]------------
> rq->clock_update_flags & RQCF_UPDATED
> WARNING: CPU: 0 PID: 3323 at kernel/sched/core.c:741
> update_rq_clock+0xaf/0x180
> Call Trace:
> <TASK>
> unthrottle_cfs_rq+0x4b/0x300
> rq_offline_fair+0x89/0x90
> set_rq_offline.part.118+0x28/0x60
> rq_attach_root+0xc4/0xd0
> cpu_attach_domain+0x3dc/0x7f0
> partition_sched_domains_locked+0x2a5/0x3c0
> rebuild_sched_domains_locked+0x477/0x830
> rebuild_sched_domains+0x1b/0x30
> cpuset_hotplug_workfn+0x2ca/0xc90
> ? balance_push+0x56/0xf0
> ? _raw_spin_unlock+0x15/0x30
> ? finish_task_switch+0x98/0x2f0
> ? __switch_to+0x291/0x410
> ? __schedule+0x65e/0x1310
> process_one_work+0x1bc/0x3d0
> worker_thread+0x4c/0x380
> ? preempt_count_add+0x92/0xa0
> ? rescuer_thread+0x310/0x310
> kthread+0xe6/0x110
> ? kthread_complete_and_exit+0x20/0x20
> ret_from_fork+0x1f/0x30
>
> The rq clock has been updated before the set_rq_offline()
> function runs, so we don't need to call update_rq_clock() in
> unthrottle_offline_cfs_rqs().
I don't think we do in the path from rq_attach_root (though that's easy
enough to fix, of course).
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [External] Re: [PATCH v4 4/4] sched/core: Avoid multiple calling update_rq_clock() in unthrottle_offline_cfs_rqs()
2023-06-08 21:06 ` Benjamin Segall
@ 2023-06-09 3:19 ` Hao Jia
2023-06-09 20:11 ` Benjamin Segall
0 siblings, 1 reply; 9+ messages in thread
From: Hao Jia @ 2023-06-09 3:19 UTC (permalink / raw)
To: Benjamin Segall
Cc: mingo, peterz, mingo, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, mgorman, bristot, vschneid, mgorman,
linux-kernel
On 2023/6/9 Benjamin Segall wrote:
> Hao Jia <jiahao.os@bytedance.com> writes:
>
>> This WARN_DOUBLE_CLOCK warning is triggered during cpu offline.
>> ------------[ cut here ]------------
>> rq->clock_update_flags & RQCF_UPDATED
>> WARNING: CPU: 0 PID: 3323 at kernel/sched/core.c:741
>> update_rq_clock+0xaf/0x180
>> Call Trace:
>> <TASK>
>> unthrottle_cfs_rq+0x4b/0x300
>> rq_offline_fair+0x89/0x90
>> set_rq_offline.part.118+0x28/0x60
>> rq_attach_root+0xc4/0xd0
>> cpu_attach_domain+0x3dc/0x7f0
>> partition_sched_domains_locked+0x2a5/0x3c0
>> rebuild_sched_domains_locked+0x477/0x830
>> rebuild_sched_domains+0x1b/0x30
>> cpuset_hotplug_workfn+0x2ca/0xc90
>> ? balance_push+0x56/0xf0
>> ? _raw_spin_unlock+0x15/0x30
>> ? finish_task_switch+0x98/0x2f0
>> ? __switch_to+0x291/0x410
>> ? __schedule+0x65e/0x1310
>> process_one_work+0x1bc/0x3d0
>> worker_thread+0x4c/0x380
>> ? preempt_count_add+0x92/0xa0
>> ? rescuer_thread+0x310/0x310
>> kthread+0xe6/0x110
>> ? kthread_complete_and_exit+0x20/0x20
>> ret_from_fork+0x1f/0x30
>>
>> The rq clock has been updated before the set_rq_offline()
>> function runs, so we don't need to call update_rq_clock() in
>> unthrottle_offline_cfs_rqs().
>
> I don't think we do in the path from rq_attach_root (though that's easy
> enough to fix, of course).
>
Thanks for your review.
Now our fix method is that after applying patch1, we update the rq clock
before set_rq_offline(). Then use rq_clock_{start, stop}_loop_update to
avoid updating the rq clock multiple times in unthrottle_cfs_rq().
Do you have any better suggestions?
Thanks,
Hao
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [External] Re: [PATCH v4 4/4] sched/core: Avoid multiple calling update_rq_clock() in unthrottle_offline_cfs_rqs()
2023-06-09 3:19 ` [External] " Hao Jia
@ 2023-06-09 20:11 ` Benjamin Segall
2023-06-12 2:40 ` Hao Jia
0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Segall @ 2023-06-09 20:11 UTC (permalink / raw)
To: Hao Jia
Cc: mingo, peterz, mingo, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, mgorman, bristot, vschneid, mgorman,
linux-kernel
Hao Jia <jiahao.os@bytedance.com> writes:
> On 2023/6/9 Benjamin Segall wrote:
>> Hao Jia <jiahao.os@bytedance.com> writes:
>>
>>> This WARN_DOUBLE_CLOCK warning is triggered during cpu offline.
>>> ------------[ cut here ]------------
>>> rq->clock_update_flags & RQCF_UPDATED
>>> WARNING: CPU: 0 PID: 3323 at kernel/sched/core.c:741
>>> update_rq_clock+0xaf/0x180
>>> Call Trace:
>>> <TASK>
>>> unthrottle_cfs_rq+0x4b/0x300
>>> rq_offline_fair+0x89/0x90
>>> set_rq_offline.part.118+0x28/0x60
>>> rq_attach_root+0xc4/0xd0
>>> cpu_attach_domain+0x3dc/0x7f0
>>> partition_sched_domains_locked+0x2a5/0x3c0
>>> rebuild_sched_domains_locked+0x477/0x830
>>> rebuild_sched_domains+0x1b/0x30
>>> cpuset_hotplug_workfn+0x2ca/0xc90
>>> ? balance_push+0x56/0xf0
>>> ? _raw_spin_unlock+0x15/0x30
>>> ? finish_task_switch+0x98/0x2f0
>>> ? __switch_to+0x291/0x410
>>> ? __schedule+0x65e/0x1310
>>> process_one_work+0x1bc/0x3d0
>>> worker_thread+0x4c/0x380
>>> ? preempt_count_add+0x92/0xa0
>>> ? rescuer_thread+0x310/0x310
>>> kthread+0xe6/0x110
>>> ? kthread_complete_and_exit+0x20/0x20
>>> ret_from_fork+0x1f/0x30
>>>
>>> The rq clock has been updated before the set_rq_offline()
>>> function runs, so we don't need to call update_rq_clock() in
>>> unthrottle_offline_cfs_rqs().
>> I don't think we do in the path from rq_attach_root (though that's easy
>> enough to fix, of course).
>>
>
> Thanks for your review.
>
> Now our fix method is that after applying patch1, we update the rq clock before
> set_rq_offline(). Then use rq_clock_{start, stop}_loop_update to avoid updating
> the rq clock multiple times in unthrottle_cfs_rq().
>
> Do you have any better suggestions?
>
> Thanks,
> Hao
Yeah, the obvious fixes are to either add an update_rq_clock in
rq_attach_root as you suggest, or put it in set_rq_offline instead of
the callers.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [External] Re: [PATCH v4 4/4] sched/core: Avoid multiple calling update_rq_clock() in unthrottle_offline_cfs_rqs()
2023-06-09 20:11 ` Benjamin Segall
@ 2023-06-12 2:40 ` Hao Jia
0 siblings, 0 replies; 9+ messages in thread
From: Hao Jia @ 2023-06-12 2:40 UTC (permalink / raw)
To: Benjamin Segall
Cc: mingo, peterz, mingo, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, mgorman, bristot, vschneid, mgorman,
linux-kernel
On 2023/6/10 Benjamin Segall wrote:
> Hao Jia <jiahao.os@bytedance.com> writes:
>
>> On 2023/6/9 Benjamin Segall wrote:
>>> Hao Jia <jiahao.os@bytedance.com> writes:
>>>
>>>> This WARN_DOUBLE_CLOCK warning is triggered during cpu offline.
>>>> ------------[ cut here ]------------
>>>> rq->clock_update_flags & RQCF_UPDATED
>>>> WARNING: CPU: 0 PID: 3323 at kernel/sched/core.c:741
>>>> update_rq_clock+0xaf/0x180
>>>> Call Trace:
>>>> <TASK>
>>>> unthrottle_cfs_rq+0x4b/0x300
>>>> rq_offline_fair+0x89/0x90
>>>> set_rq_offline.part.118+0x28/0x60
>>>> rq_attach_root+0xc4/0xd0
>>>> cpu_attach_domain+0x3dc/0x7f0
>>>> partition_sched_domains_locked+0x2a5/0x3c0
>>>> rebuild_sched_domains_locked+0x477/0x830
>>>> rebuild_sched_domains+0x1b/0x30
>>>> cpuset_hotplug_workfn+0x2ca/0xc90
>>>> ? balance_push+0x56/0xf0
>>>> ? _raw_spin_unlock+0x15/0x30
>>>> ? finish_task_switch+0x98/0x2f0
>>>> ? __switch_to+0x291/0x410
>>>> ? __schedule+0x65e/0x1310
>>>> process_one_work+0x1bc/0x3d0
>>>> worker_thread+0x4c/0x380
>>>> ? preempt_count_add+0x92/0xa0
>>>> ? rescuer_thread+0x310/0x310
>>>> kthread+0xe6/0x110
>>>> ? kthread_complete_and_exit+0x20/0x20
>>>> ret_from_fork+0x1f/0x30
>>>>
>>>> The rq clock has been updated before the set_rq_offline()
>>>> function runs, so we don't need to call update_rq_clock() in
>>>> unthrottle_offline_cfs_rqs().
>>> I don't think we do in the path from rq_attach_root (though that's easy
>>> enough to fix, of course).
>>>
>>
>> Thanks for your review.
>>
>> Now our fix method is that after applying patch1, we update the rq clock before
>> set_rq_offline(). Then use rq_clock_{start, stop}_loop_update to avoid updating
>> the rq clock multiple times in unthrottle_cfs_rq().
>>
>> Do you have any better suggestions?
>>
>> Thanks,
>> Hao
>
> Yeah, the obvious fixes are to either add an update_rq_clock in
> rq_attach_root as you suggest, or put it in set_rq_offline instead of
> the callers.
Thanks for your suggestion. I will do it in the next version.
Thanks,
Hao
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-06-12 2:40 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-08 6:33 [PATCH v4 0/4] Fix some warnings about rq clock Hao Jia
2023-06-08 6:33 ` [PATCH v4 1/4] sched/core: Fixed missing rq clock update before calling set_rq_offline() Hao Jia
2023-06-08 6:33 ` [PATCH v4 2/4] sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop() Hao Jia
2023-06-08 6:33 ` [PATCH v4 3/4] sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle() Hao Jia
2023-06-08 6:33 ` [PATCH v4 4/4] sched/core: Avoid multiple calling update_rq_clock() in unthrottle_offline_cfs_rqs() Hao Jia
2023-06-08 21:06 ` Benjamin Segall
2023-06-09 3:19 ` [External] " Hao Jia
2023-06-09 20:11 ` Benjamin Segall
2023-06-12 2:40 ` Hao Jia
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).