All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] Fix tasks being forgotten for a long time on SMP
       [not found] <CACOPJM4ZeSLvxX8dEnFyL50NqnCVm5=qwNL-Cck0CsH2sf9jWw@mail.gmail.com>
@ 2016-09-22  7:34 ` Peter Zijlstra
  2016-09-22 19:06   ` Yuriy Romanenko
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2016-09-22  7:34 UTC (permalink / raw
  To: Yuriy Romanenko; +Cc: Ingo Molnar, linux-kernel

On Tue, Sep 20, 2016 at 06:14:34PM -0700, Yuriy Romanenko wrote:
> From e9a304ae91fa2a4427bde7d3aea18296d0ebb27f Mon Sep 17 00:00:00 2001
> From: Yuriy Romanenko <yromanenko@carrietech.com>
> Date: Tue, 20 Sep 2016 17:47:28 -0700
> Subject: [PATCH] Fix tasks being forgotten for a long time on SMP
> 
> Observed occasional very high latency on an embedded SMP system between
> a task becoming ready to run and actually running with low system load,
> impacting interactive usage.
> 
> A sched_wake() from CPUx on CPUy puts the task into the run queue and
> marks it runnable, but does not trigger an IPI to have the scheduler
> re-run on CPUy and see if the current task needs to get pre-empted and
> does not wake up CPUy if it is asleep.
> 
> This is especially evident when a CPU is in SWFI and simply does not
> wake up even though it now has a runnable task.

WTH his SWFI and which arch has that?

> This is probably not the most elegant fix and definitely generates some
> unnecessary scheduler runs, but it's better for overall latency.
> 
> Signed-off-by: Yuriy Romanenko <yromanenko@carrietech.com>
> ---
>  kernel/sched/core.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 860070f..7c334b7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1686,6 +1686,14 @@ static void ttwu_do_wakeup(struct rq *rq, struct
> task_struct *p, int wake_flags,
>   trace_sched_wakeup(p);
> 
>  #ifdef CONFIG_SMP
> + /*
> + * If the task is not on the current cpu, there is a chance
> + * the other cpu might be asleep and will not get to our task
> + * for a really long time. Send an IPI to avoid that
> + */
> + if (task_cpu(p) != smp_processor_id())
> + smp_send_reschedule(task_cpu(p));
> +

Yeah, no, this is completely insane.

If the remote cpu is running the idle task, check_preempt_curr() should
very much wake it up, if its not the idle class, it should never get
there because there is now an actually runnable task on.

Please explain in detail what happens and/or provide traces of this
happening.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Fix tasks being forgotten for a long time on SMP
  2016-09-22  7:34 ` [PATCH] Fix tasks being forgotten for a long time on SMP Peter Zijlstra
@ 2016-09-22 19:06   ` Yuriy Romanenko
  2016-09-23  8:15     ` Peter Zijlstra
  0 siblings, 1 reply; 3+ messages in thread
From: Yuriy Romanenko @ 2016-09-22 19:06 UTC (permalink / raw
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel

Hello,

I just spent three hours studying TOT scheduler code, and I am leaning
towards that this patch being no longer relevant (or being never
relevant, really).
If it ever fixed a real issue, I think it fixed it incorrectly. However, I would
love to understand what exactly we were seeing and why this helped at all.

So a little background on this. I found this change I made that I failed to
upstream from a few years ago, so my memory is a little rusty. This
was definitely an issue on the setup we had, which was an arch/arm
mach-msm (now known as mach-qcom, I believe) fork of 3.4.0

> WTH his SWFI and which arch has that?

SWFI is "Stop and Wait For Interrupt".  This appears to be a qcom specific term
that they stopped using, so I apologize for that. It specifically
refers to a CPU that
is sitting in some sort of a "wait" or "wfi" instruction and not
executing anything
until an interrupt occurs.

> Yeah, no, this is completely insane.

Totally possible. In fact, I sort of hope that it is.

> If the remote cpu is running the idle task, check_preempt_curr() should
> very much wake it up, if its not the idle class, it should never get
> there because there is now an actually runnable task on.

> Please explain in detail what happens and/or provide traces of this
> happening.

We observed two relevant scenarios that are similar but distinct.
Looking through
top-of-tree code more carefully I don't see either as possible
anymore, so the patch
probably does not apply, but I would still defer to your expertise

We have an SCHED_FIFO thread that falls asleep in a poll() waiting for
an interrupt from a
device driver.

1)  It is the last thread on that CPU, the CPU enters idle and goes
into a 'wfi' instruction
because of cpuidle_idle_call(), and will not wake up until it receives
an interrupt of some sort (like IPI).
The interrupt occurs on a different CPU, the task is woken, but
doesn't run until much later.

2)  It is not the last thread on that CPU. The CPU switches to a lower
priority task in a different class.
The interrupt occurs on a different CPU, the task is woken, but
doesn't preempt until much later.

Unfortunately I don't have the ftrace logs from back then, and it
would be hard to recreate the scenario.

If I recall correctly, the problem seemed to occur because the task
cpus_share_cache() was true, the task
was never put on the wake_list, and correspondingly the
scheduler_ipi() that was triggered by check_preempt_curr()
did nothing, when it returned it would somehow wind up in schedule()
where it did nothing because task was not TASK_RUNNING.
Does that make any amount of sense at all?

Sorry if this is a stupendous waste of time,

Regards,

Yuriy


On Thu, Sep 22, 2016 at 12:34 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Sep 20, 2016 at 06:14:34PM -0700, Yuriy Romanenko wrote:
>> From e9a304ae91fa2a4427bde7d3aea18296d0ebb27f Mon Sep 17 00:00:00 2001
>> From: Yuriy Romanenko <yromanenko@carrietech.com>
>> Date: Tue, 20 Sep 2016 17:47:28 -0700
>> Subject: [PATCH] Fix tasks being forgotten for a long time on SMP
>>
>> Observed occasional very high latency on an embedded SMP system between
>> a task becoming ready to run and actually running with low system load,
>> impacting interactive usage.
>>
>> A sched_wake() from CPUx on CPUy puts the task into the run queue and
>> marks it runnable, but does not trigger an IPI to have the scheduler
>> re-run on CPUy and see if the current task needs to get pre-empted and
>> does not wake up CPUy if it is asleep.
>>
>> This is especially evident when a CPU is in SWFI and simply does not
>> wake up even though it now has a runnable task.
>
> WTH his SWFI and which arch has that?
>
>> This is probably not the most elegant fix and definitely generates some
>> unnecessary scheduler runs, but it's better for overall latency.
>>
>> Signed-off-by: Yuriy Romanenko <yromanenko@carrietech.com>
>> ---
>>  kernel/sched/core.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 860070f..7c334b7 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1686,6 +1686,14 @@ static void ttwu_do_wakeup(struct rq *rq, struct
>> task_struct *p, int wake_flags,
>>   trace_sched_wakeup(p);
>>
>>  #ifdef CONFIG_SMP
>> + /*
>> + * If the task is not on the current cpu, there is a chance
>> + * the other cpu might be asleep and will not get to our task
>> + * for a really long time. Send an IPI to avoid that
>> + */
>> + if (task_cpu(p) != smp_processor_id())
>> + smp_send_reschedule(task_cpu(p));
>> +
>
> Yeah, no, this is completely insane.
>
> If the remote cpu is running the idle task, check_preempt_curr() should
> very much wake it up, if its not the idle class, it should never get
> there because there is now an actually runnable task on.
>
> Please explain in detail what happens and/or provide traces of this
> happening.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Fix tasks being forgotten for a long time on SMP
  2016-09-22 19:06   ` Yuriy Romanenko
@ 2016-09-23  8:15     ` Peter Zijlstra
  0 siblings, 0 replies; 3+ messages in thread
From: Peter Zijlstra @ 2016-09-23  8:15 UTC (permalink / raw
  To: Yuriy Romanenko; +Cc: Ingo Molnar, linux-kernel

On Thu, Sep 22, 2016 at 12:06:03PM -0700, Yuriy Romanenko wrote:
> Hello,
> 
> I just spent three hours studying TOT scheduler code, and I am leaning
> towards that this patch being no longer relevant (or being never
> relevant, really).
> If it ever fixed a real issue, I think it fixed it incorrectly. However, I would
> love to understand what exactly we were seeing and why this helped at all.
> 
> So a little background on this. I found this change I made that I failed to
> upstream from a few years ago, so my memory is a little rusty. This
> was definitely an issue on the setup we had, which was an arch/arm
> mach-msm (now known as mach-qcom, I believe) fork of 3.4.0

Urgh, 3.4 is a _long_ time ago ;-) Let me check it out to remind myself
what it looked like.

> > WTH his SWFI and which arch has that?
> 
> SWFI is "Stop and Wait For Interrupt".  This appears to be a qcom
> specific term that they stopped using, so I apologize for that. It
> specifically refers to a CPU that is sitting in some sort of a "wait"
> or "wfi" instruction and not executing anything until an interrupt
> occurs.

Ah, ok. So I'm somewhat familiar with the ARM 'WFI' instruction and had
a vague idea this might be an ARM related part.

> > If the remote cpu is running the idle task, check_preempt_curr() should
> > very much wake it up, if its not the idle class, it should never get
> > there because there is now an actually runnable task on.
> 
> > Please explain in detail what happens and/or provide traces of this
> > happening.
> 
> We observed two relevant scenarios that are similar but distinct.
> Looking through top-of-tree code more carefully I don't see either as
> possible anymore, so the patch probably does not apply, but I would
> still defer to your expertise
> 
> We have an SCHED_FIFO thread that falls asleep in a poll() waiting for
> an interrupt from a device driver.
> 
> 1)  It is the last thread on that CPU, the CPU enters idle and goes
> into a 'wfi' instruction because of cpuidle_idle_call(), and will not
> wake up until it receives an interrupt of some sort (like IPI).  The
> interrupt occurs on a different CPU, the task is woken, but doesn't
> run until much later.
> 
> 2)  It is not the last thread on that CPU. The CPU switches to a lower
> priority task in a different class.  The interrupt occurs on a
> different CPU, the task is woken, but doesn't preempt until much
> later.
> 
> Unfortunately I don't have the ftrace logs from back then, and it
> would be hard to recreate the scenario.
> 
> If I recall correctly, the problem seemed to occur because the task
> cpus_share_cache() was true, the task was never put on the wake_list,
> and correspondingly the scheduler_ipi() that was triggered by
> check_preempt_curr() did nothing, when it returned it would somehow
> wind up in schedule() where it did nothing because task was not
> TASK_RUNNING.  Does that make any amount of sense at all?

So if cpus_share_cache() is true, we'll do the remote enqueue, that is,
the waking CPU will acquire the rq->lock of the CPU we want to wake the
task on and enqueue.

The code is like:

   raw_spin_lock(rq->lock);
   ttwu_do_activate(rq, p, 0);
     ttwu_activate(...); // does the actual enqueue
     ttwu_do_wakeup(...);
       check_preempt_curr() // <-- this is supposed to issue the IPI

   raw_spin_unlock(rq->lock);

In either scenario, 'current' is of a lower scheduling class (idle or
fair) than the newly woken task (fifo), so check_preempt_curr() should
find (class == p->sched_class) true and issue resched_task(rq->curr).

resched_task() in its turn tests TIF_NEED_RESCHED, if not set, it will
set it and issue an IPI (lets ignore the polling thing for now).

The IPI will execute scheduler_ipi(), and as you say, fail to find work
to do -- this is expected and OK.

_However_ on the return to user path (from interrupt), you're supposed
to check TIF_NEED_RESCHED and call schedule(). _THIS_ is what should
affect the actual task switch and get our freshly woken FIFO task
running.

Not to be confused with CONFIG_PREEMPT which should check
TIF_NEED_RESCHED on any return from interrupt path and call
preempt_schedule().

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-09-23  8:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CACOPJM4ZeSLvxX8dEnFyL50NqnCVm5=qwNL-Cck0CsH2sf9jWw@mail.gmail.com>
2016-09-22  7:34 ` [PATCH] Fix tasks being forgotten for a long time on SMP Peter Zijlstra
2016-09-22 19:06   ` Yuriy Romanenko
2016-09-23  8:15     ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.