Re: [linus:master] [timers] 7ee9887703: stress-ng.uprobe.ops_per_sec -17.1% regression

LKML Archive mirror
 help / color / mirror / Atom feed

From: Frederic Weisbecker <frederic@kernel.org>
To: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Christian Loehle <christian.loehle@arm.com>,
	Oliver Sang <oliver.sang@intel.com>,
	oe-lkp@lists.linux.dev, lkp@intel.com,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	ying.huang@intel.com, feng.tang@intel.com, fengwei.yin@intel.com,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	linux-pm@vger.kernel.org
Subject: Re: [linus:master] [timers] 7ee9887703: stress-ng.uprobe.ops_per_sec -17.1% regression
Date: Fri, 26 Apr 2024 13:35:39 +0200	[thread overview]
Message-ID: <ZiuRiwOZ1eMOZN5J@pavilion.home> (raw)
In-Reply-To: <87h6foig4s.fsf@somnus>

Le Fri, Apr 26, 2024 at 12:15:47PM +0200, Anna-Maria Behnsen a écrit :
> Christian Loehle <christian.loehle@arm.com> writes:
> 
> > On 25/04/2024 09:23, Anna-Maria Behnsen wrote:
> >> Hi,
> >> 
> >> (adding cpuidle/power people to cc-list)
> >> 
> >> Oliver Sang <oliver.sang@intel.com> writes:
> >> 
> >>> hi, Frederic Weisbecker,
> >>>
> >>> On Tue, Apr 02, 2024 at 12:46:15AM +0200, Frederic Weisbecker wrote:
> >>>> Le Wed, Mar 27, 2024 at 04:39:17PM +0800, kernel test robot a écrit :
> >>>>>
> >>>>>
> >>>>> Hello,
> >>>>>
> >>>>>
> >>>>> we reported
> >>>>> "[tip:timers/core] [timers]  7ee9887703:  netperf.Throughput_Mbps -1.2% regression"
> >>>>> in
> >>>>> https://lore.kernel.org/all/202403011511.24defbbd-oliver.sang@intel.com/
> >>>>>
> >>>>> now we noticed this commit is in mainline and we captured further results.
> >>>>>
> >>>>> still include netperf results for complete. below details FYI.
> >>>>>
> >>>>>
> >>>>> kernel test robot noticed a -17.1% regression of stress-ng.uprobe.ops_per_sec
> >>>>> on:
> >>>>
> >>>> The good news is that I can reproduce.
> >>>> It has made me spot something already:
> >>>>
> >>>>    https://lore.kernel.org/lkml/ZgsynV536q1L17IS@pavilion.home/T/#m28c37a943fdbcbadf0332cf9c32c350c74c403b0
> >>>>
> >>>> But that's not enough to fix the regression. Investigation continues...
> >>>
> >>> Thanks a lot for information! if you want us test any patch, please let us know.
> >> 
> >> Oliver, I would be happy to see, whether the patch at the end of the
> >> message restores the original behaviour also in your test setup. I
> >> applied it on 6.9-rc4. This patch is not a fix - it is just a pointer to
> >> the kernel path, that might cause the regression. I know, it is
> >> probable, that a warning in tick_sched is triggered. This happens when
> >> the first timer is alredy in the past. I didn't add an extra check when
> >> creating the 'defacto' timer thingy. But existing code handles this
> >> problem already properly. So the warning could be ignored here.
> >> 
> >> For the cpuidle people, let me explain what I oberserved, my resulting
> >> assumption and my request for help:
> >> 
> >> cpuidle governors use expected sleep length values (beside other data)
> >> to decide which idle state would be good to enter. The expected sleep
> >> length takes the first queued timer of the CPU into account and is
> >> provided by tick_nohz_get_sleep_length(). With the timer pull model in
> >> place the non pinned timers are not taken into account when there are
> >> other CPUs up and running which could handle those timers. This could
> >> lead to increased sleep length values. On my system during the stress-ng
> >> uprobes test it was in the range of maximum 100us without the patch set
> >> and with the patch set the maximum was in a range of 200sec. This is
> >> intended behaviour, because timers which could expire on any CPU should
> >> expire on the CPU which is busy anyway and the non busy CPU should be
> >> able to go idle.
> >> 
> >> Those increased sleep length values were the only anomalies I could find
> >> in the traces with the regression.
> >> 
> >> I created the patch below which simply fakes the sleep length values
> >> that they take all timers of the CPU into account (also the non
> >> pinned). This patch kind of restores the behavoir of
> >> tick_nohz_get_sleep_length() before the change but still with the timer
> >> pull model in place.
> >> 
> >> With the patch the regression was gone, at least on my system (using
> >> cpuidle governor menu but also teo).
> >
> > I assume the regression is reproducible for both?
> > (The original report is using menu for anyone else looking at this)
> 
> Yes. (at least in my setup)
> 
> >> 
> >> So my assumption here is, that cpuidle governors assume that a deeper
> >> idle state could be choosen and selecting the deeper idle state makes an
> >> overhead when returning from idle. But I have to notice here, that I'm
> >> still not familiar with cpuidle internals... So I would be happy about
> >> some hints how I can debug/trace cpuidle internals to falsify or verify
> >> this assumption.
> >
> > I'd say that sounds correct.
> > Comparing cpu_idle_miss would be interesting for both.
> 
> 	total nr	above		below
> "bad":	2518343		2329072		189271
> "good":	3016019         2960004		56015
> 
> -> this is the result of just a single run using:
> 
>   perf script record -a -e power:cpu_idle_miss /home/anna-maria/src/stress-ng/stress-ng --timeout 60  --times --verify --metrics --no-rand-seed --uprobe 112
> 
> 
> 
> But beside of this, when running this stress-ng test, the cpus seems to
> be mostly idle (top tells me). So the question here fore me is, what is
> the stress in this test and what should the numbers tell we are
> comparing? It is not totally clear to me even after looking at the code.

I can at least help a bit with that since I stared at stress-uprobe for a while.

A single stress-uprobe thread creates a uprobe trace event to fire everytime
getpid() is called. Then it does a lot of getpid() calls, which creates uprobes
events in ftrace and then it does a loop reading /sys/kernel/tracing/trace_pipe
until the end.

The bogomips measured is the total number of uprobes trace events read from
trace_pipe.

And since there are 112 threads doing all this at the same time, there is
probably a lot of contention on trace_pipe rwsem: trace_access_lock() ->
down_write(). Although what I observed with perf was more about mutex contention
so there could be another lock somewhere I missed...

Christian may correct me if I'm wrong...

Thanks.

> Thanks,
> 
> 	Anna-Maria
>

next prev parent reply	other threads:[~2024-04-26 11:35 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-27  8:39 [linus:master] [timers] 7ee9887703: stress-ng.uprobe.ops_per_sec -17.1% regression kernel test robot
2024-04-01 22:46 ` Frederic Weisbecker
2024-04-02  1:46   ` Oliver Sang
2024-04-04 14:05     ` Frederic Weisbecker
2024-04-25  8:23     ` Anna-Maria Behnsen
2024-04-25 10:15       ` Christian Loehle
2024-04-26 10:15         ` Anna-Maria Behnsen
2024-04-26 11:35           ` Frederic Weisbecker [this message]
2024-04-26 15:39           ` Christian Loehle
2024-04-26  6:53       ` Oliver Sang
2024-04-26 16:03       ` Rafael J. Wysocki
2024-04-29  7:53         ` Lukasz Luba
2024-04-29  9:26           ` Anna-Maria Behnsen
2024-04-29 10:40             ` Anna-Maria Behnsen
2024-04-29 17:02               ` Rafael J. Wysocki
2024-05-02 12:56                 ` Anna-Maria Behnsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZiuRiwOZ1eMOZN5J@pavilion.home \
    --to=frederic@kernel.org \
    --cc=anna-maria@linutronix.de \
    --cc=christian.loehle@arm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=rafael@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).