LinuxPPC-Dev Archive mirror
 help / color / mirror / Atom feed
From: Shrikanth Hegde <sshegde@linux.ibm.com>
To: Srikar Dronamraju <srikar@linux.ibm.com>, linux-kernel@vger.kernel.org
Cc: Michael Ellerman <mpe@ellerman.id.au>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org, Ben Segall <bsegall@google.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Nicholas Piggin <npiggin@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: Re: [PATCH 1/2] sched: Feature to decide if steal should update CPU capacity
Date: Tue, 28 Oct 2025 20:35:05 +0530	[thread overview]
Message-ID: <8ec843b6-ac7d-4cef-a0b1-12b85470fde8@linux.ibm.com> (raw)
In-Reply-To: <20251028104255.1892485-1-srikar@linux.ibm.com>



On 10/28/25 4:12 PM, Srikar Dronamraju wrote:
> At present, scheduler scales CPU capacity for fair tasks based on time
> spent on irq and steal time. If a CPU sees irq or steal time, its
> capacity for fair tasks decreases causing tasks to migrate to other CPU
> that are not affected by irq and steal time. All of this is gated by
> NONTASK_CAPACITY.
> 
> In virtualized setups, a CPU that reports steal time (time taken by the
> hypervisor) can cause tasks to migrate unnecessarily to sibling CPUs that
> appear to be less busy, only for the situation to reverse shortly.
> 
> To mitigate this ping-pong behaviour, this change introduces a new
> scheduler feature flag: ACCT_STEAL which will control whether steal time
> contributes to non-task capacity adjustments (used for fair scheduling).
> 
> Signed-off-by: Srikar Dronamraju <srikar@linux.ibm.com>
> ---
>   include/linux/sched.h   | 1 +
>   kernel/sched/core.c     | 7 +++++--
>   kernel/sched/debug.c    | 8 ++++++++
>   kernel/sched/features.h | 1 +
>   4 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index aa9c5be7a632..451931cce5bf 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2272,5 +2272,6 @@ static __always_inline void alloc_tag_restore(struct alloc_tag *tag, struct allo
>   #define alloc_tag_save(_tag)			NULL
>   #define alloc_tag_restore(_tag, _old)		do {} while (0)
>   #endif
> +extern void steal_updates_cpu_capacity(bool enable);
>   
>   #endif
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 81c6df746df1..3a7c4e307371 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -792,8 +792,11 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
>   	rq->clock_task += delta;
>   
>   #ifdef CONFIG_HAVE_SCHED_AVG_IRQ

Curious to know if there are users/distro which have CONFIG_HAVE_SCHED_AVG_IRQ=n

> -	if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY))
> -		update_irq_load_avg(rq, irq_delta + steal);
> +	if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY)) {
> +		if (steal && sched_feat(ACCT_STEAL))
> +			irq_delta += steal;
> +		update_irq_load_avg(rq, irq_delta);
> +	}
>   #endif
>   	update_rq_clock_pelt(rq, delta);
>   }
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index 557246880a7e..a0393dd43bb2 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -1307,3 +1307,11 @@ void resched_latency_warn(int cpu, u64 latency)
>   	       cpu, latency, cpu_rq(cpu)->ticks_without_resched);
>   	dump_stack();
>   }
> +
> +void steal_updates_cpu_capacity(bool enable)
> +{
> +	if (enable)
> +		sched_feat_set("ACCT_STEAL");
> +	else
> +		sched_feat_set("NO_ACCT_STEAL");
> +}
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index 3c12d9f93331..82d7806ea515 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -121,3 +121,4 @@ SCHED_FEAT(WA_BIAS, true)
>   SCHED_FEAT(UTIL_EST, true)
>   
>   SCHED_FEAT(LATENCY_WARN, false)
> +SCHED_FEAT(ACCT_STEAL, true)



  parent reply	other threads:[~2025-10-28 15:05 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-28 10:42 [PATCH 1/2] sched: Feature to decide if steal should update CPU capacity Srikar Dronamraju
2025-10-28 10:42 ` [PATCH 2/2] powerpc/smp: Disable ACCT_STEAL for shared LPARs Srikar Dronamraju
2025-10-28 11:18 ` [PATCH 1/2] sched: Feature to decide if steal should update CPU capacity Peter Zijlstra
2025-10-28 11:42   ` Srikar Dronamraju
2025-10-28 15:05 ` Shrikanth Hegde [this message]
2025-10-29  6:08   ` K Prateek Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8ec843b6-ac7d-4cef-a0b1-12b85470fde8@linux.ibm.com \
    --to=sshegde@linux.ibm.com \
    --cc=bsegall@google.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=srikar@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).