All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: K Prateek Nayak <kprateek.nayak@amd.com>
To: Chen Yu <yu.c.chen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	mingo@redhat.com, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	bristot@redhat.com, vschneid@redhat.com,
	linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com,
	tglx@linutronix.de, efault@gmx.de, tim.c.chen@intel.com,
	yu.c.chen.y@gmail.com
Subject: Re: [RFC][PATCH 10/10] sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion
Date: Tue, 14 May 2024 20:53:16 +0530	[thread overview]
Message-ID: <f2eafde3-b9e8-afc9-8934-ca8e597c33e6@amd.com> (raw)
In-Reply-To: <ZkMsf4Fz7/AFoQfC@chenyu5-mobl2>

Hello Chenyu,

On 5/14/2024 2:48 PM, Chen Yu wrote:
>>> [..snip..]
>>>  /*
>>>   * Scan the LLC domain for idle CPUs; this is dynamically regulated by
>>>   * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
>>> @@ -7384,10 +7402,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>>>  	if (sched_feat(SIS_UTIL)) {
>>>  		sd_share = rcu_dereference(per_cpu(sd_llc_shared, target));
>>>  		if (sd_share) {
>>> -			/* because !--nr is the condition to stop scan */> -			nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
>>> +			nr = adjust_idle_scan(p, READ_ONCE(sd_share->nr_idle_scan));
>>>  			/* overloaded LLC is unlikely to have idle cpu/core */
>>> -			if (nr == 1)
>>> +			if (nr <= 0)
>>
>> I was wondering if this would preserve the current behavior with
>> SIS_FAST toggled off? Since the implementation below still does a
>> "--nr <= 0" , wouldn't it effectively visit one CPU less overall now?
>>
>> Have you tried something similar to the below hunk?
>>
>> 	/* because !--nr is the condition to stop scan */
>> 	nr = adjust_idle_scan(p, READ_ONCE(sd_share->nr_idle_scan)) + 1;
>> 	if (nr == 1)
>> 		return -1;
>>
> 
> Yeah, right, to keep the scan depth consistent, the "+1" should be kept.
>  
>> I agree with Mike that looking at slice to limit scan-depth seems odd.
>> My experience with netperf is that the workload cares more about the
>> server-client being co-located on the closest cache domain and by
>> limiting scan-depth using slice, this is indirectly achieved since all
>> the wakeups carry the WF_SYNc flag.
>>
> 
> Exactly. This is the original motivation.
>  
>> P.S. have you tried using the slice in __select_idle_cpu()? Similar to
>> sched_idle_cpu() check, perhaps an additional sched_preempt_short_cpu()
>> which compares rq->curr->se.slice with the waking task's slice and
>> returs that cpu if SIS_SHORT can help run the workload quicker?
> 
> This is a good idea, it seems to be benefit PREEMPT_SHORT. If the customized
> task slice is introduced, we can leverage this hint for latency related
> optimization. Task wakeup is one thing, I can also think of other aspects,
> like idle load balance, etc. I'm not sure what is the proper usage of the
> task slice though, this is why I sent this RFC.
> 
>> Note:
>> This will not work if the SIS scan itself is the largest overhead in the
>> wakeup cycle and not the task placement itself. Previously during
>> SIS_UTIL testing, to measure the overheads of scan vs placement, we
>> would do a full scan but return the result that SIS_UTIL would have
>> returned to determine the overhead of the search itself.
>>
> 
> Regarding the task placement, do you mean the time between a task is enqueued
> and picked up? Do you have any recommendation which workload can expose the
> scan overhead most?

Sorry for not being clear here. From what I've observed in the past,
there are two dimensions to slect_idle_sibling():

i)  Placement: Final CPU select_idle_sibling() returns
ii) Search: Do we find an idle core/CPU in select_idle_sibling()

In case of netperf, I've observed that i) is more important than ii)
wherin a placement of client on same core/thread as that of the server
results in better performance vs finding an idle CPU on a remote LLC.
For hackbench/tbench, when runqueues are under high utilization (~75%),
reduction in search time ii) seems to be more beneficial.

There was also a wakeup from IPI / without IPI angle that I never quite
got to the bottom of that Mathieu has highlighted last year. I'll go
get some more data on that front and give your patch a try. Expect
results in a couple of days. 

Thank you.

> 
> thanks,
> Chenyu
>  
>>>  				return -1;
>>>  		}
>>>  	}
>>> [..snip..]
--
Thanks and Regards,
Prateek

  reply	other threads:[~2024-05-14 15:23 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-05 10:27 [RFC][PATCH 00/10] sched/fair: Complete EEVDF Peter Zijlstra
2024-04-05 10:27 ` [RFC][PATCH 01/10] sched/eevdf: Add feature comments Peter Zijlstra
2024-04-05 10:27 ` [RFC][PATCH 02/10] sched/eevdf: Remove min_vruntime_copy Peter Zijlstra
2024-04-05 10:27 ` [RFC][PATCH 03/10] sched/fair: Cleanup pick_task_fair() vs throttle Peter Zijlstra
2024-04-05 21:11   ` Benjamin Segall
2024-04-05 10:27 ` [RFC][PATCH 04/10] sched/fair: Cleanup pick_task_fair()s curr Peter Zijlstra
2024-04-05 10:27 ` [RFC][PATCH 05/10] sched/fair: Unify pick_{,next_}_task_fair() Peter Zijlstra
2024-04-06  2:20   ` Mike Galbraith
2024-04-05 10:28 ` [RFC][PATCH 06/10] sched: Allow sched_class::dequeue_task() to fail Peter Zijlstra
2024-04-05 10:28 ` [RFC][PATCH 07/10] sched/fair: Re-organize dequeue_task_fair() Peter Zijlstra
2024-04-05 10:28 ` [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue Peter Zijlstra
2024-04-06  9:23   ` Chen Yu
2024-04-08  9:06     ` Peter Zijlstra
2024-04-11  1:32       ` Yan-Jie Wang
2024-04-25 10:25         ` Peter Zijlstra
2024-04-12 10:42   ` K Prateek Nayak
2024-04-15 10:56     ` Mike Galbraith
2024-04-16  3:18       ` K Prateek Nayak
2024-04-16  5:36         ` Mike Galbraith
2024-04-18 16:24           ` Mike Galbraith
2024-04-18 17:08             ` K Prateek Nayak
2024-04-24 15:20             ` Peter Zijlstra
2024-04-25 11:28             ` Peter Zijlstra
2024-04-26 10:56               ` Peter Zijlstra
2024-04-26 11:16                 ` Peter Zijlstra
2024-04-26 16:03                   ` Mike Galbraith
2024-04-27  6:42                     ` Mike Galbraith
2024-04-28 16:32                       ` Mike Galbraith
2024-04-29 12:14                         ` Peter Zijlstra
2024-04-15 17:07   ` Luis Machado
2024-04-24 15:15     ` Luis Machado
2024-04-25 10:42       ` Peter Zijlstra
2024-04-25 11:49         ` Peter Zijlstra
2024-04-26  9:32           ` Peter Zijlstra
2024-04-26  9:36             ` Peter Zijlstra
2024-04-26 10:16             ` Luis Machado
2024-04-29 14:33             ` Luis Machado
2024-05-02 10:26               ` Luis Machado
2024-05-10 14:49                 ` Luis Machado
2024-05-15  9:36                   ` Peter Zijlstra
2024-05-15 11:48                     ` Peter Zijlstra
2024-05-15 18:03                       ` Mike Galbraith
2024-05-20 15:20                       ` Luis Machado
2024-05-29 22:50                 ` Peter Zijlstra
2024-05-23  8:45               ` Peter Zijlstra
2024-05-23  9:06                 ` Luis Machado
2024-05-23  9:33                   ` Peter Zijlstra
2024-04-26 10:15         ` Luis Machado
2024-04-20  5:57   ` Mike Galbraith
2024-04-22 13:13   ` Tobias Huschle
2024-04-05 10:28 ` [RFC][PATCH 09/10] sched/eevdf: Allow shorter slices to wakeup-preempt Peter Zijlstra
2024-04-05 10:28 ` [RFC][PATCH 10/10] sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion Peter Zijlstra
2024-04-06  8:16   ` Hillf Danton
2024-05-07  5:34   ` Mike Galbraith
2024-05-15 10:13     ` Peter Zijlstra
2024-05-07 15:15   ` Chen Yu
2024-05-08 13:52     ` Mike Galbraith
2024-05-09  3:48       ` Chen Yu
2024-05-09  5:00         ` Mike Galbraith
2024-05-13  4:07     ` K Prateek Nayak
2024-05-14  9:18       ` Chen Yu
2024-05-14 15:23         ` K Prateek Nayak [this message]
2024-05-14 16:15           ` Chen Yu
2024-05-22 14:48           ` Chen Yu
2024-05-27 10:11 ` [RFC][PATCH 00/10] sched/fair: Complete EEVDF K Prateek Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f2eafde3-b9e8-afc9-8934-ca8e597c33e6@amd.com \
    --to=kprateek.nayak@amd.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=efault@gmx.de \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=wuyun.abel@bytedance.com \
    --cc=yu.c.chen.y@gmail.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.