From: Rabin Vincent <rabin.vincent@axis.com>
To: mingo@redhat.com, peterz@infradead.org
Cc: linux-kernel@vger.kernel.org
Subject: [PATCH?] Livelock in pick_next_task_fair() / idle_balance()
Date: Tue, 30 Jun 2015 16:30:58 +0200 [thread overview]
Message-ID: <20150630143057.GA31689@axis.com> (raw)
Hi,
We're seeing a livelock where two CPUs both loop with interrupts
disabled in pick_next_task_fair() / idle_balance() and continuously
fetch all tasks from each other. This has been observed on a 3.18
kernel but the relevant code paths appear to be the same in current
mainline.
The problem situation appears to be this:
cpu0 cpu1
pick_next_task
take rq0->lock
pick_next_task_fair
running=0
idle_balance()
drop rq0->lock
wakeup A,B on cpu0
pick_next_task
take rq1->lock
pick_next_task_fair
running=0
idle_balance()
drop r1->lock
load_balance()
busiest=rq0
take rq0->lock
detach A,B
drop rq0->lock
take rq1->lock
attach A,B
pulled_task = 2
drop rq1->lock
load_balance()
busiest=rq1
take rq1->lock
detach A,B
drop rq1->lock
take rq0->lock
attach A,B
pulled_task = 2
drop rq0->lock
running=0()
idle_balance()
busiest=rq0, pull A,B, etc.
running = 0
load_balance()
busiest=rq1, pull A,B, etc
And this goes on, with interrupts disabled on both CPUs, for at least a
100 ms (which is when a hardware watchdog hits).
The conditions needed, apart from the right timing, are:
- cgroups. One of the tasks, say A, needs to be in a CPU cgroup. When
the problem occurs, A's ->se has zero load_avg_contrib and
task_h_load(A) is zero. However, the se->parent->parent of A has a
(relatively) high load_avg_contrib. cpu0's cfs_rq has therefore a
relatively high runnable_load_avg. find_busiest_group() therefore
detects imbalance, and detach_tasks() detaches all tasks.
- PREEMPT=n. Otherwise, the code under #ifdef in detach_tasks() would
ensure that we'd only ever pull a maximum of one task during idle
balancing.
- cpu0 and cpu1 are threads on the same core (cpus_share_cache() ==
true). otherwise, cpu1 will not be able to wakeup tasks on cpu0
while cpu0 has interrupts disabled (since an IPI would be required).
Turning off the default TTWU_QUEUE feature would also provide the
same effect.
I see two simple ways to prevent the livelock. One is to just to remove
the #ifdef:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3d57cc0..74c94dc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5937,7 +5937,6 @@ static int detach_tasks(struct lb_env *env)
detached++;
env->imbalance -= load;
-#ifdef CONFIG_PREEMPT
/*
* NEWIDLE balancing is a source of latency, so preemptible
* kernels will stop after the first task is detached to minimize
@@ -5945,7 +5944,6 @@ static int detach_tasks(struct lb_env *env)
*/
if (env->idle == CPU_NEWLY_IDLE)
break;
-#endif
/*
* We only want to steal up to the prescribed amount of
Or this should work too:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3d57cc0..13358cf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7487,6 +7487,8 @@ static int idle_balance(struct rq *this_rq)
*/
if (this_rq->cfs.h_nr_running && !pulled_task)
pulled_task = 1;
+ else if (!this_rq->cfs.h_nr_running && pulled_task)
+ pulled_task = 0;
out:
/* Move the next balance forward */
But it seems that the real problem is that detach_tasks() can remove all
tasks, when cgroups are involved as described above?
Thanks.
/Rabin
next reply other threads:[~2015-06-30 14:31 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-30 14:30 Rabin Vincent [this message]
2015-07-01 5:36 ` [PATCH?] Livelock in pick_next_task_fair() / idle_balance() Mike Galbraith
2015-07-01 14:55 ` Rabin Vincent
2015-07-01 15:47 ` Mike Galbraith
2015-07-01 20:44 ` Peter Zijlstra
2015-07-01 23:25 ` Yuyang Du
2015-07-02 8:05 ` Mike Galbraith
2015-07-02 1:05 ` Yuyang Du
2015-07-02 10:25 ` Mike Galbraith
2015-07-02 11:40 ` Morten Rasmussen
2015-07-02 19:37 ` Yuyang Du
2015-07-03 9:34 ` Morten Rasmussen
2015-07-03 16:38 ` Peter Zijlstra
2015-07-05 22:31 ` Yuyang Du
2015-07-09 14:32 ` Morten Rasmussen
2015-07-09 23:24 ` Yuyang Du
2015-07-05 20:12 ` Yuyang Du
2015-07-06 17:36 ` Dietmar Eggemann
2015-07-07 11:17 ` Rabin Vincent
2015-07-13 17:43 ` Dietmar Eggemann
2015-07-09 13:53 ` Morten Rasmussen
2015-07-09 22:34 ` Yuyang Du
2015-07-02 10:53 ` Peter Zijlstra
2015-07-02 11:44 ` Morten Rasmussen
2015-07-02 18:42 ` Yuyang Du
2015-07-03 4:42 ` Mike Galbraith
2015-07-03 16:39 ` Peter Zijlstra
2015-07-05 22:11 ` Yuyang Du
2015-07-09 6:15 ` Stefan Ekenberg
2015-07-26 18:57 ` Yuyang Du
2015-08-03 17:05 ` [tip:sched/core] sched/fair: Avoid pulling all tasks in idle balancing tip-bot for Yuyang Du
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150630143057.GA31689@axis.com \
--to=rabin.vincent@axis.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.