Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4

LKML Archive mirror
 help / color / mirror / Atom feed

* Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4
@ 2023-06-08 22:48 Saeed Mirzamohammadi
  2023-06-09 16:52 ` Vincent Guittot
  0 siblings, 1 reply; 10+ messages in thread
From: Saeed Mirzamohammadi @ 2023-06-08 22:48 UTC (permalink / raw)
  To: Ingo Molnar, peterz@infradead.org, Linux Kernel Mailing List,
	vincent.guittot@linaro.org, zhangqiao22@huawei.com

Hi all,

I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:

Commit Data:
 commit-id        : a53ce18cacb477dd0513c607f187d16f0fa96f71
 subject          : sched/fair: Sanitize vruntime of entity being migrated
 author           : vincent.guittot@linaro.org
 author date      : 2023-03-17 16:08:10


We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.

ub_gcc_1copy_Shell_Scripts_1_concurrent  :  -0.01%
ub_gcc_1copy_Shell_Scripts_8_concurrent  :  -0.1%
ub_gcc_1copy_Shell_Scripts_16_concurrent  :  -0.12%%
ub_gcc_56copies_Shell_Scripts_1_concurrent  :  -2.29%%
ub_gcc_56copies_Shell_Scripts_8_concurrent  :  -4.22%
ub_gcc_56copies_Shell_Scripts_16_concurrent  :  -4.23%
ub_gcc_224copies_Shell_Scripts_1_concurrent  :  -5.54%
ub_gcc_224copies_Shell_Scripts_8_concurrent  :  -8%
ub_gcc_224copies_Shell_Scripts_16_concurrent  :  -7.05%
ub_gcc_448copies_Shell_Scripts_1_concurrent  :  -6.4%
ub_gcc_448copies_Shell_Scripts_8_concurrent  :  -8.35%
ub_gcc_448copies_Shell_Scripts_16_concurrent  :  -7.09%

Link to unixbench:
github.com/kdlucas/byte-unixbench

Info about benchmark:
 "The shells scripts test measures the number of times per minute a
  process can start and reap a set of one, two, four and eight concurrent
copies of a shell scripts where the shell script applies a series of
transformation to a data file”

I have also evaluated performance before and after both of these two commits (one if fixing the other) but I still observe the same regression (C1 is still the source of regression).
C1. a53ce18cacb4 sched/fair: Sanitize vruntime of entity being migrated
C2. 829c1651e9c4 sched/fair: sanitize vruntime of entity being placed

Thank you very much,
Saeed


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4
  2023-06-08 22:48 Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4 Saeed Mirzamohammadi
@ 2023-06-09 16:52 ` Vincent Guittot
  2023-06-13 19:35   ` Saeed Mirzamohammadi
  0 siblings, 1 reply; 10+ messages in thread
From: Vincent Guittot @ 2023-06-09 16:52 UTC (permalink / raw)
  To: Saeed Mirzamohammadi
  Cc: Ingo Molnar, peterz@infradead.org, Linux Kernel Mailing List,
	zhangqiao22@huawei.com

Hi Saeed,

On Fri, 9 Jun 2023 at 00:48, Saeed Mirzamohammadi
<saeed.mirzamohammadi@oracle.com> wrote:
>
> Hi all,
>
> I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:
>
> Commit Data:
>  commit-id        : a53ce18cacb477dd0513c607f187d16f0fa96f71
>  subject          : sched/fair: Sanitize vruntime of entity being migrated
>  author           : vincent.guittot@linaro.org
>  author date      : 2023-03-17 16:08:10
>
>
> We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.

It would be good to confirm that the regression is present on v6.3
where the patch has been merged originally.  It can be that there is
hidden dependency with other patches introduced since v5.4

>
> ub_gcc_1copy_Shell_Scripts_1_concurrent  :  -0.01%
> ub_gcc_1copy_Shell_Scripts_8_concurrent  :  -0.1%
> ub_gcc_1copy_Shell_Scripts_16_concurrent  :  -0.12%%
> ub_gcc_56copies_Shell_Scripts_1_concurrent  :  -2.29%%
> ub_gcc_56copies_Shell_Scripts_8_concurrent  :  -4.22%
> ub_gcc_56copies_Shell_Scripts_16_concurrent  :  -4.23%
> ub_gcc_224copies_Shell_Scripts_1_concurrent  :  -5.54%
> ub_gcc_224copies_Shell_Scripts_8_concurrent  :  -8%
> ub_gcc_224copies_Shell_Scripts_16_concurrent  :  -7.05%
> ub_gcc_448copies_Shell_Scripts_1_concurrent  :  -6.4%
> ub_gcc_448copies_Shell_Scripts_8_concurrent  :  -8.35%
> ub_gcc_448copies_Shell_Scripts_16_concurrent  :  -7.09%
>
> Link to unixbench:
> github.com/kdlucas/byte-unixbench

I tried to reproduce the problem with v6.3 on my system but I don't
see any difference with or without the patch

Do you have more details on your setup ? number of cpu and topology ?

>
> Info about benchmark:
>  "The shells scripts test measures the number of times per minute a
>   process can start and reap a set of one, two, four and eight concurrent
> copies of a shell scripts where the shell script applies a series of
> transformation to a data file”
>
> I have also evaluated performance before and after both of these two commits (one if fixing the other) but I still observe the same regression (C1 is still the source of regression).
> C1. a53ce18cacb4 sched/fair: Sanitize vruntime of entity being migrated
> C2. 829c1651e9c4 sched/fair: sanitize vruntime of entity being placed

C2 has introduced some regressions because of the case of newly
migrated tasks that were not correctly managed and C1 fixes this
problem. Then, both have an impact on system that runs for days  with
low prio task

Thanks,
Vincent


>
> Thank you very much,
> Saeed
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4
  2023-06-09 16:52 ` Vincent Guittot
@ 2023-06-13 19:35   ` Saeed Mirzamohammadi
  2023-06-14  6:37     ` Chen Yu
  0 siblings, 1 reply; 10+ messages in thread
From: Saeed Mirzamohammadi @ 2023-06-13 19:35 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ingo Molnar, peterz@infradead.org, Linux Kernel Mailing List,
	zhangqiao22@huawei.com

Hi Vincent,

> On Jun 9, 2023, at 9:52 AM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> 
> Hi Saeed,
> 
> On Fri, 9 Jun 2023 at 00:48, Saeed Mirzamohammadi
> <saeed.mirzamohammadi@oracle.com> wrote:
>> 
>> Hi all,
>> 
>> I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:
>> 
>> Commit Data:
>> commit-id        : a53ce18cacb477dd0513c607f187d16f0fa96f71
>> subject          : sched/fair: Sanitize vruntime of entity being migrated
>> author           : vincent.guittot@linaro.org
>> author date      : 2023-03-17 16:08:10
>> 
>> 
>> We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.
> 
> It would be good to confirm that the regression is present on v6.3
> where the patch has been merged originally.  It can be that there is
> hidden dependency with other patches introduced since v5.4

Regression is present on v6.3 as well, examples:
ub_gcc_224copies_Shell_Scripts_8_concurrent: ~6%
ub_gcc_224copies_Shell_Scripts_16_concurrent: ~8%
ub_gcc_448copies_Shell_Scripts_1_concurrent: ~2%
> 
> 
>> 
>> ub_gcc_1copy_Shell_Scripts_1_concurrent  :  -0.01%
>> ub_gcc_1copy_Shell_Scripts_8_concurrent  :  -0.1%
>> ub_gcc_1copy_Shell_Scripts_16_concurrent  :  -0.12%%
>> ub_gcc_56copies_Shell_Scripts_1_concurrent  :  -2.29%%
>> ub_gcc_56copies_Shell_Scripts_8_concurrent  :  -4.22%
>> ub_gcc_56copies_Shell_Scripts_16_concurrent  :  -4.23%
>> ub_gcc_224copies_Shell_Scripts_1_concurrent  :  -5.54%
>> ub_gcc_224copies_Shell_Scripts_8_concurrent  :  -8%
>> ub_gcc_224copies_Shell_Scripts_16_concurrent  :  -7.05%
>> ub_gcc_448copies_Shell_Scripts_1_concurrent  :  -6.4%
>> ub_gcc_448copies_Shell_Scripts_8_concurrent  :  -8.35%
>> ub_gcc_448copies_Shell_Scripts_16_concurrent  :  -7.09%
>> 
>> Link to unixbench:
>> github.com/kdlucas/byte-unixbench
> 
> I tried to reproduce the problem with v6.3 on my system but I don't
> see any difference with or without the patch
> 
> Do you have more details on your setup ? number of cpu and topology ?
> 
model name	: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz

Topology:
node   0   1 
  0:  10  21 
  1:  21  10 

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
CPU(s):                56
On-line CPU(s) list:   0-55
Thread(s) per core:    2
Core(s) per socket:    14
Socket(s):             2
NUMA node(s):          2

Thanks,


>> 
>> Info about benchmark:
>> "The shells scripts test measures the number of times per minute a
>>  process can start and reap a set of one, two, four and eight concurrent
>> copies of a shell scripts where the shell script applies a series of
>> transformation to a data file”
>> 
>> I have also evaluated performance before and after both of these two commits (one if fixing the other) but I still observe the same regression (C1 is still the source of regression).
>> C1. a53ce18cacb4 sched/fair: Sanitize vruntime of entity being migrated
>> C2. 829c1651e9c4 sched/fair: sanitize vruntime of entity being placed
> 
> C2 has introduced some regressions because of the case of newly
> migrated tasks that were not correctly managed and C1 fixes this
> problem. Then, both have an impact on system that runs for days  with
> low prio task
> 
> Thanks,
> Vincent
> 
> 
>> 
>> Thank you very much,
>> Saeed


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4
  2023-06-13 19:35   ` Saeed Mirzamohammadi
@ 2023-06-14  6:37     ` Chen Yu
  2023-06-21 16:41       ` Saeed Mirzamohammadi
  0 siblings, 1 reply; 10+ messages in thread
From: Chen Yu @ 2023-06-14  6:37 UTC (permalink / raw)
  To: Saeed Mirzamohammadi
  Cc: Vincent Guittot, Ingo Molnar, peterz@infradead.org,
	Linux Kernel Mailing List, zhangqiao22@huawei.com

On 2023-06-13 at 19:35:55 +0000, Saeed Mirzamohammadi wrote:
> Hi Vincent,
> 
> > On Jun 9, 2023, at 9:52 AM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> > 
> > Hi Saeed,
> > 
> > On Fri, 9 Jun 2023 at 00:48, Saeed Mirzamohammadi
> > <saeed.mirzamohammadi@oracle.com> wrote:
> >> 
> >> Hi all,
> >> 
> >> I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:
> >> 
> >> Commit Data:
> >> commit-id        : a53ce18cacb477dd0513c607f187d16f0fa96f71
> >> subject          : sched/fair: Sanitize vruntime of entity being migrated
> >> author           : vincent.guittot@linaro.org
> >> author date      : 2023-03-17 16:08:10
> >> 
> >> 
> >> We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.
> > 
> > It would be good to confirm that the regression is present on v6.3
> > where the patch has been merged originally.  It can be that there is
> > hidden dependency with other patches introduced since v5.4
> 
> Regression is present on v6.3 as well, examples:
> ub_gcc_224copies_Shell_Scripts_8_concurrent: ~6%
> ub_gcc_224copies_Shell_Scripts_16_concurrent: ~8%
> ub_gcc_448copies_Shell_Scripts_1_concurrent: ~2%
> > 
> > 
> >> 
> >> ub_gcc_1copy_Shell_Scripts_1_concurrent  :  -0.01%
> >> ub_gcc_1copy_Shell_Scripts_8_concurrent  :  -0.1%
> >> ub_gcc_1copy_Shell_Scripts_16_concurrent  :  -0.12%%
> >> ub_gcc_56copies_Shell_Scripts_1_concurrent  :  -2.29%%
> >> ub_gcc_56copies_Shell_Scripts_8_concurrent  :  -4.22%
> >> ub_gcc_56copies_Shell_Scripts_16_concurrent  :  -4.23%
> >> ub_gcc_224copies_Shell_Scripts_1_concurrent  :  -5.54%
> >> ub_gcc_224copies_Shell_Scripts_8_concurrent  :  -8%
> >> ub_gcc_224copies_Shell_Scripts_16_concurrent  :  -7.05%
> >> ub_gcc_448copies_Shell_Scripts_1_concurrent  :  -6.4%
> >> ub_gcc_448copies_Shell_Scripts_8_concurrent  :  -8.35%
> >> ub_gcc_448copies_Shell_Scripts_16_concurrent  :  -7.09%
> >> 
> >> Link to unixbench:
> >> github.com/kdlucas/byte-unixbench
> > 
> > I tried to reproduce the problem with v6.3 on my system but I don't
> > see any difference with or without the patch
> > 
> > Do you have more details on your setup ? number of cpu and topology ?
> > 
> model name	: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
> 
> Topology:
> node   0   1 
>   0:  10  21 
>   1:  21  10 
> 
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> CPU(s):                56
> On-line CPU(s) list:   0-55
> Thread(s) per core:    2
> Core(s) per socket:    14
> Socket(s):             2
> NUMA node(s):          2
>
Tested on a similar platform E5-2697 v2 @ 2.70GHz which has 2 nodes,
24 cores/48 CPUs in total, however I could not reproduce the issue.
Since the regression was reported mainly against 224 and 448 copies case
on your platform, I tested unixbench shell1 with 4 x 48 = 192 copies.


a53ce18cacb477dd 213acadd21a080fc8cda8eebe6d
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
     21304            +0.5%      21420        unixbench.score
    632.43            +0.0%     632.44        unixbench.time.elapsed_time
    632.43            +0.0%     632.44        unixbench.time.elapsed_time.max
  11837046            -4.7%   11277727        unixbench.time.involuntary_context_switches
    864713            +0.1%     865914        unixbench.time.major_page_faults
      9600            +4.0%       9984        unixbench.time.maximum_resident_set_size
 8.433e+08            +0.6%   8.48e+08        unixbench.time.minor_page_faults
      4096            +0.0%       4096        unixbench.time.page_size
      3741            +1.1%       3783        unixbench.time.percent_of_cpu_this_job_got
     18341            +1.3%      18572        unixbench.time.system_time
      5323            +0.6%       5353        unixbench.time.user_time
  78197044            -3.1%   75791701        unixbench.time.voluntary_context_switches
  57178573            +0.4%   57399061        unixbench.workload

There is no much difference with a53ce18cacb477dd applied or not.





a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
     19985            +8.6%      21697        unixbench.score
    632.64            -0.0%     632.53        unixbench.time.elapsed_time
    632.64            -0.0%     632.53        unixbench.time.elapsed_time.max
  11453985            +3.7%   11880259        unixbench.time.involuntary_context_switches
    818996            +3.1%     844681        unixbench.time.major_page_faults
      9600            +0.0%       9600        unixbench.time.maximum_resident_set_size
 7.911e+08            +8.4%  8.575e+08        unixbench.time.minor_page_faults
      4096            +0.0%       4096        unixbench.time.page_size
      3767            -0.4%       3752        unixbench.time.percent_of_cpu_this_job_got
     18873            -2.4%      18423        unixbench.time.system_time
      4960            +7.1%       5313        unixbench.time.user_time
  75436000           +10.8%   83581483        unixbench.time.voluntary_context_switches
  53553404            +8.7%   58235303        unixbench.workload

Previously with 829c1651e9c4a6f introduced, there is 8.6% improvement. And this improvement
remains with a53ce18cacb477dd applied.

Can you send the full test script so I can have a try locally?

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4
  2023-06-14  6:37     ` Chen Yu
@ 2023-06-21 16:41       ` Saeed Mirzamohammadi
  2023-06-29 22:19         ` Saeed Mirzamohammadi
  0 siblings, 1 reply; 10+ messages in thread
From: Saeed Mirzamohammadi @ 2023-06-21 16:41 UTC (permalink / raw)
  To: Chen Yu
  Cc: Vincent Guittot, Ingo Molnar, peterz@infradead.org,
	Linux Kernel Mailing List, zhangqiao22@huawei.com

Hi Chen, Vincent,

> On Jun 13, 2023, at 11:37 PM, Chen Yu <yu.c.chen@intel.com> wrote:
> 
> On 2023-06-13 at 19:35:55 +0000, Saeed Mirzamohammadi wrote:
>> Hi Vincent,
>> 
>>> On Jun 9, 2023, at 9:52 AM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
>>> 
>>> Hi Saeed,
>>> 
>>> On Fri, 9 Jun 2023 at 00:48, Saeed Mirzamohammadi
>>> <saeed.mirzamohammadi@oracle.com> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:
>>>> 
>>>> Commit Data:
>>>> commit-id        : a53ce18cacb477dd0513c607f187d16f0fa96f71
>>>> subject          : sched/fair: Sanitize vruntime of entity being migrated
>>>> author           : vincent.guittot@linaro.org
>>>> author date      : 2023-03-17 16:08:10
>>>> 
>>>> 
>>>> We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.
>>> 
>>> It would be good to confirm that the regression is present on v6.3
>>> where the patch has been merged originally.  It can be that there is
>>> hidden dependency with other patches introduced since v5.4
>> 
>> Regression is present on v6.3 as well, examples:
>> ub_gcc_224copies_Shell_Scripts_8_concurrent: ~6%
>> ub_gcc_224copies_Shell_Scripts_16_concurrent: ~8%
>> ub_gcc_448copies_Shell_Scripts_1_concurrent: ~2%

Apologize for the confusion, I should correct the v6.3 upstream result above. v6.3 doesn’t have any regression.
v6.3.y -> no regression
v5.15.y -> no regression
v5.4.y -> 5-8% regression.


>>> 
>>> 
>>>> 
>>>> ub_gcc_1copy_Shell_Scripts_1_concurrent  :  -0.01%
>>>> ub_gcc_1copy_Shell_Scripts_8_concurrent  :  -0.1%
>>>> ub_gcc_1copy_Shell_Scripts_16_concurrent  :  -0.12%%
>>>> ub_gcc_56copies_Shell_Scripts_1_concurrent  :  -2.29%%
>>>> ub_gcc_56copies_Shell_Scripts_8_concurrent  :  -4.22%
>>>> ub_gcc_56copies_Shell_Scripts_16_concurrent  :  -4.23%
>>>> ub_gcc_224copies_Shell_Scripts_1_concurrent  :  -5.54%
>>>> ub_gcc_224copies_Shell_Scripts_8_concurrent  :  -8%
>>>> ub_gcc_224copies_Shell_Scripts_16_concurrent  :  -7.05%
>>>> ub_gcc_448copies_Shell_Scripts_1_concurrent  :  -6.4%
>>>> ub_gcc_448copies_Shell_Scripts_8_concurrent  :  -8.35%
>>>> ub_gcc_448copies_Shell_Scripts_16_concurrent  :  -7.09%
>>>> 
>>>> Link to unixbench:
>>>> github.com/kdlucas/byte-unixbench
>>> 
>>> I tried to reproduce the problem with v6.3 on my system but I don't
>>> see any difference with or without the patch
>>> 
>>> Do you have more details on your setup ? number of cpu and topology ?
>>> 
>> model name	: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>> 
>> Topology:
>> node   0   1 
>>  0:  10  21 
>>  1:  21  10 
>> 
>> Architecture:          x86_64
>> CPU op-mode(s):        32-bit, 64-bit
>> CPU(s):                56
>> On-line CPU(s) list:   0-55
>> Thread(s) per core:    2
>> Core(s) per socket:    14
>> Socket(s):             2
>> NUMA node(s):          2
>> 
> Tested on a similar platform E5-2697 v2 @ 2.70GHz which has 2 nodes,
> 24 cores/48 CPUs in total, however I could not reproduce the issue.
> Since the regression was reported mainly against 224 and 448 copies case
> on your platform, I tested unixbench shell1 with 4 x 48 = 192 copies.
> 
> 
> a53ce18cacb477dd 213acadd21a080fc8cda8eebe6d
> ---------------- ---------------------------
>         %stddev     %change         %stddev
>             \          |                \
>     21304            +0.5%      21420        unixbench.score
>    632.43            +0.0%     632.44        unixbench.time.elapsed_time
>    632.43            +0.0%     632.44        unixbench.time.elapsed_time.max
>  11837046            -4.7%   11277727        unixbench.time.involuntary_context_switches
>    864713            +0.1%     865914        unixbench.time.major_page_faults
>      9600            +4.0%       9984        unixbench.time.maximum_resident_set_size
> 8.433e+08            +0.6%   8.48e+08        unixbench.time.minor_page_faults
>      4096            +0.0%       4096        unixbench.time.page_size
>      3741            +1.1%       3783        unixbench.time.percent_of_cpu_this_job_got
>     18341            +1.3%      18572        unixbench.time.system_time
>      5323            +0.6%       5353        unixbench.time.user_time
>  78197044            -3.1%   75791701        unixbench.time.voluntary_context_switches
>  57178573            +0.4%   57399061        unixbench.workload
> 
> There is no much difference with a53ce18cacb477dd applied or not.
> 
> 
> 
> 
> 
> a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
> ---------------- ---------------------------
>         %stddev     %change         %stddev
>             \          |                \
>     19985            +8.6%      21697        unixbench.score
>    632.64            -0.0%     632.53        unixbench.time.elapsed_time
>    632.64            -0.0%     632.53        unixbench.time.elapsed_time.max
>  11453985            +3.7%   11880259        unixbench.time.involuntary_context_switches
>    818996            +3.1%     844681        unixbench.time.major_page_faults
>      9600            +0.0%       9600        unixbench.time.maximum_resident_set_size
> 7.911e+08            +8.4%  8.575e+08        unixbench.time.minor_page_faults
>      4096            +0.0%       4096        unixbench.time.page_size
>      3767            -0.4%       3752        unixbench.time.percent_of_cpu_this_job_got
>     18873            -2.4%      18423        unixbench.time.system_time
>      4960            +7.1%       5313        unixbench.time.user_time
>  75436000           +10.8%   83581483        unixbench.time.voluntary_context_switches
>  53553404            +8.7%   58235303        unixbench.workload
> 
> Previously with 829c1651e9c4a6f introduced, there is 8.6% improvement. And this improvement
> remains with a53ce18cacb477dd applied.
> 
> Can you send the full test script so I can have a try locally?

Thanks for testing this. For v5.4.y kernel (not for v6.3.y or v5.15.y), there is an 8% regression with the following test: ub_gcc_448copies_Shell_Scripts_8_concurrent
And that’s ’shell8’ with ‘-c 448’ copies passed as argument.

Thanks,
Saeed

> 
> thanks,
> Chenyu


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4
  2023-06-21 16:41       ` Saeed Mirzamohammadi
@ 2023-06-29 22:19         ` Saeed Mirzamohammadi
  2023-06-30  8:28           ` Vincent Guittot
  0 siblings, 1 reply; 10+ messages in thread
From: Saeed Mirzamohammadi @ 2023-06-29 22:19 UTC (permalink / raw)
  To: Chen Yu, Vincent Guittot
  Cc: Ingo Molnar, peterz@infradead.org, Linux Kernel Mailing List,
	zhangqiao22@huawei.com



> On Jun 21, 2023, at 9:41 AM, Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> wrote:
> 
> Hi Chen, Vincent,
> 
>> On Jun 13, 2023, at 11:37 PM, Chen Yu <yu.c.chen@intel.com> wrote:
>> 
>> On 2023-06-13 at 19:35:55 +0000, Saeed Mirzamohammadi wrote:
>>> Hi Vincent,
>>> 
>>>> On Jun 9, 2023, at 9:52 AM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
>>>> 
>>>> Hi Saeed,
>>>> 
>>>> On Fri, 9 Jun 2023 at 00:48, Saeed Mirzamohammadi
>>>> <saeed.mirzamohammadi@oracle.com> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:
>>>>> 
>>>>> Commit Data:
>>>>> commit-id        : a53ce18cacb477dd0513c607f187d16f0fa96f71
>>>>> subject          : sched/fair: Sanitize vruntime of entity being migrated
>>>>> author           : vincent.guittot@linaro.org
>>>>> author date      : 2023-03-17 16:08:10
>>>>> 
>>>>> 
>>>>> We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.
>>>> 
>>>> It would be good to confirm that the regression is present on v6.3
>>>> where the patch has been merged originally.  It can be that there is
>>>> hidden dependency with other patches introduced since v5.4
>>> 
>>> Regression is present on v6.3 as well, examples:
>>> ub_gcc_224copies_Shell_Scripts_8_concurrent: ~6%
>>> ub_gcc_224copies_Shell_Scripts_16_concurrent: ~8%
>>> ub_gcc_448copies_Shell_Scripts_1_concurrent: ~2%
> 
> Apologize for the confusion, I should correct the v6.3 upstream result above. v6.3 doesn’t have any regression.
> v6.3.y -> no regression
> v5.15.y -> no regression
> v5.4.y -> 5-8% regression.

A gentle reminder if there is any recommendation for v5.4.y and v4.14.y regression. Thanks!

> 
> 
>>>> 
>>>> 
>>>>> 
>>>>> ub_gcc_1copy_Shell_Scripts_1_concurrent  :  -0.01%
>>>>> ub_gcc_1copy_Shell_Scripts_8_concurrent  :  -0.1%
>>>>> ub_gcc_1copy_Shell_Scripts_16_concurrent  :  -0.12%%
>>>>> ub_gcc_56copies_Shell_Scripts_1_concurrent  :  -2.29%%
>>>>> ub_gcc_56copies_Shell_Scripts_8_concurrent  :  -4.22%
>>>>> ub_gcc_56copies_Shell_Scripts_16_concurrent  :  -4.23%
>>>>> ub_gcc_224copies_Shell_Scripts_1_concurrent  :  -5.54%
>>>>> ub_gcc_224copies_Shell_Scripts_8_concurrent  :  -8%
>>>>> ub_gcc_224copies_Shell_Scripts_16_concurrent  :  -7.05%
>>>>> ub_gcc_448copies_Shell_Scripts_1_concurrent  :  -6.4%
>>>>> ub_gcc_448copies_Shell_Scripts_8_concurrent  :  -8.35%
>>>>> ub_gcc_448copies_Shell_Scripts_16_concurrent  :  -7.09%
>>>>> 
>>>>> Link to unixbench:
>>>>> github.com/kdlucas/byte-unixbench
>>>> 
>>>> I tried to reproduce the problem with v6.3 on my system but I don't
>>>> see any difference with or without the patch
>>>> 
>>>> Do you have more details on your setup ? number of cpu and topology ?
>>>> 
>>> model name	: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>>> 
>>> Topology:
>>> node   0   1 
>>> 0:  10  21 
>>> 1:  21  10 
>>> 
>>> Architecture:          x86_64
>>> CPU op-mode(s):        32-bit, 64-bit
>>> CPU(s):                56
>>> On-line CPU(s) list:   0-55
>>> Thread(s) per core:    2
>>> Core(s) per socket:    14
>>> Socket(s):             2
>>> NUMA node(s):          2
>>> 
>> Tested on a similar platform E5-2697 v2 @ 2.70GHz which has 2 nodes,
>> 24 cores/48 CPUs in total, however I could not reproduce the issue.
>> Since the regression was reported mainly against 224 and 448 copies case
>> on your platform, I tested unixbench shell1 with 4 x 48 = 192 copies.
>> 
>> 
>> a53ce18cacb477dd 213acadd21a080fc8cda8eebe6d
>> ---------------- ---------------------------
>>        %stddev     %change         %stddev
>>            \          |                \
>>    21304            +0.5%      21420        unixbench.score
>>   632.43            +0.0%     632.44        unixbench.time.elapsed_time
>>   632.43            +0.0%     632.44        unixbench.time.elapsed_time.max
>> 11837046            -4.7%   11277727        unixbench.time.involuntary_context_switches
>>   864713            +0.1%     865914        unixbench.time.major_page_faults
>>     9600            +4.0%       9984        unixbench.time.maximum_resident_set_size
>> 8.433e+08            +0.6%   8.48e+08        unixbench.time.minor_page_faults
>>     4096            +0.0%       4096        unixbench.time.page_size
>>     3741            +1.1%       3783        unixbench.time.percent_of_cpu_this_job_got
>>    18341            +1.3%      18572        unixbench.time.system_time
>>     5323            +0.6%       5353        unixbench.time.user_time
>> 78197044            -3.1%   75791701        unixbench.time.voluntary_context_switches
>> 57178573            +0.4%   57399061        unixbench.workload
>> 
>> There is no much difference with a53ce18cacb477dd applied or not.
>> 
>> 
>> 
>> 
>> 
>> a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
>> ---------------- ---------------------------
>>        %stddev     %change         %stddev
>>            \          |                \
>>    19985            +8.6%      21697        unixbench.score
>>   632.64            -0.0%     632.53        unixbench.time.elapsed_time
>>   632.64            -0.0%     632.53        unixbench.time.elapsed_time.max
>> 11453985            +3.7%   11880259        unixbench.time.involuntary_context_switches
>>   818996            +3.1%     844681        unixbench.time.major_page_faults
>>     9600            +0.0%       9600        unixbench.time.maximum_resident_set_size
>> 7.911e+08            +8.4%  8.575e+08        unixbench.time.minor_page_faults
>>     4096            +0.0%       4096        unixbench.time.page_size
>>     3767            -0.4%       3752        unixbench.time.percent_of_cpu_this_job_got
>>    18873            -2.4%      18423        unixbench.time.system_time
>>     4960            +7.1%       5313        unixbench.time.user_time
>> 75436000           +10.8%   83581483        unixbench.time.voluntary_context_switches
>> 53553404            +8.7%   58235303        unixbench.workload
>> 
>> Previously with 829c1651e9c4a6f introduced, there is 8.6% improvement. And this improvement
>> remains with a53ce18cacb477dd applied.
>> 
>> Can you send the full test script so I can have a try locally?
> 
> Thanks for testing this. For v5.4.y kernel (not for v6.3.y or v5.15.y), there is an 8% regression with the following test: ub_gcc_448copies_Shell_Scripts_8_concurrent
> And that’s ’shell8’ with ‘-c 448’ copies passed as argument.
> 
> Thanks,
> Saeed
> 
>> 
>> thanks,
>> Chenyu


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4
  2023-06-29 22:19         ` Saeed Mirzamohammadi
@ 2023-06-30  8:28           ` Vincent Guittot
  2023-07-20 23:04             ` Saeed Mirzamohammadi
  0 siblings, 1 reply; 10+ messages in thread
From: Vincent Guittot @ 2023-06-30  8:28 UTC (permalink / raw)
  To: Saeed Mirzamohammadi
  Cc: Chen Yu, Ingo Molnar, peterz@infradead.org,
	Linux Kernel Mailing List, zhangqiao22@huawei.com

On Fri, 30 Jun 2023 at 00:20, Saeed Mirzamohammadi
<saeed.mirzamohammadi@oracle.com> wrote:
>
>
>
> > On Jun 21, 2023, at 9:41 AM, Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> wrote:
> >
> > Hi Chen, Vincent,
> >
> >> On Jun 13, 2023, at 11:37 PM, Chen Yu <yu.c.chen@intel.com> wrote:
> >>
> >> On 2023-06-13 at 19:35:55 +0000, Saeed Mirzamohammadi wrote:
> >>> Hi Vincent,
> >>>
> >>>> On Jun 9, 2023, at 9:52 AM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> >>>>
> >>>> Hi Saeed,
> >>>>
> >>>> On Fri, 9 Jun 2023 at 00:48, Saeed Mirzamohammadi
> >>>> <saeed.mirzamohammadi@oracle.com> wrote:
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:
> >>>>>
> >>>>> Commit Data:
> >>>>> commit-id        : a53ce18cacb477dd0513c607f187d16f0fa96f71
> >>>>> subject          : sched/fair: Sanitize vruntime of entity being migrated
> >>>>> author           : vincent.guittot@linaro.org
> >>>>> author date      : 2023-03-17 16:08:10
> >>>>>
> >>>>>
> >>>>> We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.
> >>>>
> >>>> It would be good to confirm that the regression is present on v6.3
> >>>> where the patch has been merged originally.  It can be that there is
> >>>> hidden dependency with other patches introduced since v5.4
> >>>
> >>> Regression is present on v6.3 as well, examples:
> >>> ub_gcc_224copies_Shell_Scripts_8_concurrent: ~6%
> >>> ub_gcc_224copies_Shell_Scripts_16_concurrent: ~8%
> >>> ub_gcc_448copies_Shell_Scripts_1_concurrent: ~2%
> >
> > Apologize for the confusion, I should correct the v6.3 upstream result above. v6.3 doesn’t have any regression.
> > v6.3.y -> no regression
> > v5.15.y -> no regression
> > v5.4.y -> 5-8% regression.
>
> A gentle reminder if there is any recommendation for v5.4.y and v4.14.y regression. Thanks!

I tried to find why the regression happens only for v5.4.y (or lower)
and not for v5.15.y (or above) but I haven't been able to find any
possible reason in the code.

Regarding the 2 commits below, they must come together so we can't
simply revert 1 and not the other.
commit 829c1651e9c4 sched/fair: sanitize vruntime of entity being placed
commit a53ce18cacb4 sched/fair: Sanitize vruntime of entity being migrated

entity_is_long_sleeper() should never return true in your case. Could
you try to check that it's the case for you ?





>
> >
> >
> >>>>
> >>>>
> >>>>>
> >>>>> ub_gcc_1copy_Shell_Scripts_1_concurrent  :  -0.01%
> >>>>> ub_gcc_1copy_Shell_Scripts_8_concurrent  :  -0.1%
> >>>>> ub_gcc_1copy_Shell_Scripts_16_concurrent  :  -0.12%%
> >>>>> ub_gcc_56copies_Shell_Scripts_1_concurrent  :  -2.29%%
> >>>>> ub_gcc_56copies_Shell_Scripts_8_concurrent  :  -4.22%
> >>>>> ub_gcc_56copies_Shell_Scripts_16_concurrent  :  -4.23%
> >>>>> ub_gcc_224copies_Shell_Scripts_1_concurrent  :  -5.54%
> >>>>> ub_gcc_224copies_Shell_Scripts_8_concurrent  :  -8%
> >>>>> ub_gcc_224copies_Shell_Scripts_16_concurrent  :  -7.05%
> >>>>> ub_gcc_448copies_Shell_Scripts_1_concurrent  :  -6.4%
> >>>>> ub_gcc_448copies_Shell_Scripts_8_concurrent  :  -8.35%
> >>>>> ub_gcc_448copies_Shell_Scripts_16_concurrent  :  -7.09%
> >>>>>
> >>>>> Link to unixbench:
> >>>>> github.com/kdlucas/byte-unixbench
> >>>>
> >>>> I tried to reproduce the problem with v6.3 on my system but I don't
> >>>> see any difference with or without the patch
> >>>>
> >>>> Do you have more details on your setup ? number of cpu and topology ?
> >>>>
> >>> model name  : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
> >>>
> >>> Topology:
> >>> node   0   1
> >>> 0:  10  21
> >>> 1:  21  10
> >>>
> >>> Architecture:          x86_64
> >>> CPU op-mode(s):        32-bit, 64-bit
> >>> CPU(s):                56
> >>> On-line CPU(s) list:   0-55
> >>> Thread(s) per core:    2
> >>> Core(s) per socket:    14
> >>> Socket(s):             2
> >>> NUMA node(s):          2
> >>>
> >> Tested on a similar platform E5-2697 v2 @ 2.70GHz which has 2 nodes,
> >> 24 cores/48 CPUs in total, however I could not reproduce the issue.
> >> Since the regression was reported mainly against 224 and 448 copies case
> >> on your platform, I tested unixbench shell1 with 4 x 48 = 192 copies.
> >>
> >>
> >> a53ce18cacb477dd 213acadd21a080fc8cda8eebe6d
> >> ---------------- ---------------------------
> >>        %stddev     %change         %stddev
> >>            \          |                \
> >>    21304            +0.5%      21420        unixbench.score
> >>   632.43            +0.0%     632.44        unixbench.time.elapsed_time
> >>   632.43            +0.0%     632.44        unixbench.time.elapsed_time.max
> >> 11837046            -4.7%   11277727        unixbench.time.involuntary_context_switches
> >>   864713            +0.1%     865914        unixbench.time.major_page_faults
> >>     9600            +4.0%       9984        unixbench.time.maximum_resident_set_size
> >> 8.433e+08            +0.6%   8.48e+08        unixbench.time.minor_page_faults
> >>     4096            +0.0%       4096        unixbench.time.page_size
> >>     3741            +1.1%       3783        unixbench.time.percent_of_cpu_this_job_got
> >>    18341            +1.3%      18572        unixbench.time.system_time
> >>     5323            +0.6%       5353        unixbench.time.user_time
> >> 78197044            -3.1%   75791701        unixbench.time.voluntary_context_switches
> >> 57178573            +0.4%   57399061        unixbench.workload
> >>
> >> There is no much difference with a53ce18cacb477dd applied or not.
> >>
> >>
> >>
> >>
> >>
> >> a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
> >> ---------------- ---------------------------
> >>        %stddev     %change         %stddev
> >>            \          |                \
> >>    19985            +8.6%      21697        unixbench.score
> >>   632.64            -0.0%     632.53        unixbench.time.elapsed_time
> >>   632.64            -0.0%     632.53        unixbench.time.elapsed_time.max
> >> 11453985            +3.7%   11880259        unixbench.time.involuntary_context_switches
> >>   818996            +3.1%     844681        unixbench.time.major_page_faults
> >>     9600            +0.0%       9600        unixbench.time.maximum_resident_set_size
> >> 7.911e+08            +8.4%  8.575e+08        unixbench.time.minor_page_faults
> >>     4096            +0.0%       4096        unixbench.time.page_size
> >>     3767            -0.4%       3752        unixbench.time.percent_of_cpu_this_job_got
> >>    18873            -2.4%      18423        unixbench.time.system_time
> >>     4960            +7.1%       5313        unixbench.time.user_time
> >> 75436000           +10.8%   83581483        unixbench.time.voluntary_context_switches
> >> 53553404            +8.7%   58235303        unixbench.workload
> >>
> >> Previously with 829c1651e9c4a6f introduced, there is 8.6% improvement. And this improvement
> >> remains with a53ce18cacb477dd applied.
> >>
> >> Can you send the full test script so I can have a try locally?
> >
> > Thanks for testing this. For v5.4.y kernel (not for v6.3.y or v5.15.y), there is an 8% regression with the following test: ub_gcc_448copies_Shell_Scripts_8_concurrent
> > And that’s ’shell8’ with ‘-c 448’ copies passed as argument.
> >
> > Thanks,
> > Saeed
> >
> >>
> >> thanks,
> >> Chenyu
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4
  2023-06-30  8:28           ` Vincent Guittot
@ 2023-07-20 23:04             ` Saeed Mirzamohammadi
  2023-07-21 14:01               ` Vincent Guittot
  0 siblings, 1 reply; 10+ messages in thread
From: Saeed Mirzamohammadi @ 2023-07-20 23:04 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Chen Yu, Ingo Molnar, peterz@infradead.org,
	Linux Kernel Mailing List, zhangqiao22@huawei.com

Hi Vincent,

> On Jun 30, 2023, at 1:28 AM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> 
> On Fri, 30 Jun 2023 at 00:20, Saeed Mirzamohammadi
> <saeed.mirzamohammadi@oracle.com> wrote:
>> 
>> 
>> 
>>> On Jun 21, 2023, at 9:41 AM, Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> wrote:
>>> 
>>> Hi Chen, Vincent,
>>> 
>>>> On Jun 13, 2023, at 11:37 PM, Chen Yu <yu.c.chen@intel.com> wrote:
>>>> 
>>>> On 2023-06-13 at 19:35:55 +0000, Saeed Mirzamohammadi wrote:
>>>>> Hi Vincent,
>>>>> 
>>>>>> On Jun 9, 2023, at 9:52 AM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
>>>>>> 
>>>>>> Hi Saeed,
>>>>>> 
>>>>>> On Fri, 9 Jun 2023 at 00:48, Saeed Mirzamohammadi
>>>>>> <saeed.mirzamohammadi@oracle.com> wrote:
>>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:
>>>>>>> 
>>>>>>> Commit Data:
>>>>>>> commit-id        : a53ce18cacb477dd0513c607f187d16f0fa96f71
>>>>>>> subject          : sched/fair: Sanitize vruntime of entity being migrated
>>>>>>> author           : vincent.guittot@linaro.org
>>>>>>> author date      : 2023-03-17 16:08:10
>>>>>>> 
>>>>>>> 
>>>>>>> We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.
>>>>>> 
>>>>>> It would be good to confirm that the regression is present on v6.3
>>>>>> where the patch has been merged originally.  It can be that there is
>>>>>> hidden dependency with other patches introduced since v5.4
>>>>> 
>>>>> Regression is present on v6.3 as well, examples:
>>>>> ub_gcc_224copies_Shell_Scripts_8_concurrent: ~6%
>>>>> ub_gcc_224copies_Shell_Scripts_16_concurrent: ~8%
>>>>> ub_gcc_448copies_Shell_Scripts_1_concurrent: ~2%
>>> 
>>> Apologize for the confusion, I should correct the v6.3 upstream result above. v6.3 doesn’t have any regression.
>>> v6.3.y -> no regression
>>> v5.15.y -> no regression
>>> v5.4.y -> 5-8% regression.
>> 
>> A gentle reminder if there is any recommendation for v5.4.y and v4.14.y regression. Thanks!
> 
> I tried to find why the regression happens only for v5.4.y (or lower)
> and not for v5.15.y (or above) but I haven't been able to find any
> possible reason in the code.
> 
> Regarding the 2 commits below, they must come together so we can't
> simply revert 1 and not the other.
> commit 829c1651e9c4 sched/fair: sanitize vruntime of entity being placed
> commit a53ce18cacb4 sched/fair: Sanitize vruntime of entity being migrated
> 
Tests were done before and after these 2 commits.

> entity_is_long_sleeper() should never return true in your case. Could
> you try to check that it's the case for you ?
> 
Tested this and entity_is_long_sleeper() never returns True.

I actually removed the related part, tested, and the regression is gone with the following change (partial revert):

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3ebd2054996bc..0d70dd6e14844 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -792,9 +792,6 @@ static inline void dequeue_task(struct rq *rq, struct task_struct *p, int flags)
 
 void activate_task(struct rq *rq, struct task_struct *p, int flags)
 {
-       if (task_on_rq_migrating(p))
-               flags |= ENQUEUE_MIGRATED;
-
        if (task_contributes_to_load(p))
                rq->nr_uninterruptible--;
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 83a7cf62c0f53..ef9aca05c7bdf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3779,9 +3779,6 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 
        if (flags & ENQUEUE_WAKEUP)
                place_entity(cfs_rq, se, 0);
-       /* Entity has migrated, no longer consider this task hot */
-       if (flags & ENQUEUE_MIGRATED)
-               se->exec_start = 0;
 
        check_schedstat_required();
        update_stats_enqueue(cfs_rq, se, flags);
@@ -6182,6 +6179,9 @@ static void migrate_task_rq_fair(struct task_struct *p)
 
        /* Tell new CPU we are migrated */
        p->se.avg.last_update_time = 0;
+
+       /* We have migrated, no longer consider this task hot */
+       p->se.exec_start = 0;
 }
 
 static void task_dead_fair(struct task_struct *p)


> 
> 
> 
> 
>> 
>>> 
>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> ub_gcc_1copy_Shell_Scripts_1_concurrent  :  -0.01%
>>>>>>> ub_gcc_1copy_Shell_Scripts_8_concurrent  :  -0.1%
>>>>>>> ub_gcc_1copy_Shell_Scripts_16_concurrent  :  -0.12%%
>>>>>>> ub_gcc_56copies_Shell_Scripts_1_concurrent  :  -2.29%%
>>>>>>> ub_gcc_56copies_Shell_Scripts_8_concurrent  :  -4.22%
>>>>>>> ub_gcc_56copies_Shell_Scripts_16_concurrent  :  -4.23%
>>>>>>> ub_gcc_224copies_Shell_Scripts_1_concurrent  :  -5.54%
>>>>>>> ub_gcc_224copies_Shell_Scripts_8_concurrent  :  -8%
>>>>>>> ub_gcc_224copies_Shell_Scripts_16_concurrent  :  -7.05%
>>>>>>> ub_gcc_448copies_Shell_Scripts_1_concurrent  :  -6.4%
>>>>>>> ub_gcc_448copies_Shell_Scripts_8_concurrent  :  -8.35%
>>>>>>> ub_gcc_448copies_Shell_Scripts_16_concurrent  :  -7.09%
>>>>>>> 
>>>>>>> Link to unixbench:
>>>>>>> github.com/kdlucas/byte-unixbench
>>>>>> 
>>>>>> I tried to reproduce the problem with v6.3 on my system but I don't
>>>>>> see any difference with or without the patch
>>>>>> 
>>>>>> Do you have more details on your setup ? number of cpu and topology ?
>>>>>> 
>>>>> model name  : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>>>>> 
>>>>> Topology:
>>>>> node   0   1
>>>>> 0:  10  21
>>>>> 1:  21  10
>>>>> 
>>>>> Architecture:          x86_64
>>>>> CPU op-mode(s):        32-bit, 64-bit
>>>>> CPU(s):                56
>>>>> On-line CPU(s) list:   0-55
>>>>> Thread(s) per core:    2
>>>>> Core(s) per socket:    14
>>>>> Socket(s):             2
>>>>> NUMA node(s):          2
>>>>> 
>>>> Tested on a similar platform E5-2697 v2 @ 2.70GHz which has 2 nodes,
>>>> 24 cores/48 CPUs in total, however I could not reproduce the issue.
>>>> Since the regression was reported mainly against 224 and 448 copies case
>>>> on your platform, I tested unixbench shell1 with 4 x 48 = 192 copies.
>>>> 
>>>> 
>>>> a53ce18cacb477dd 213acadd21a080fc8cda8eebe6d
>>>> ---------------- ---------------------------
>>>>       %stddev     %change         %stddev
>>>>           \          |                \
>>>>   21304            +0.5%      21420        unixbench.score
>>>>  632.43            +0.0%     632.44        unixbench.time.elapsed_time
>>>>  632.43            +0.0%     632.44        unixbench.time.elapsed_time.max
>>>> 11837046            -4.7%   11277727        unixbench.time.involuntary_context_switches
>>>>  864713            +0.1%     865914        unixbench.time.major_page_faults
>>>>    9600            +4.0%       9984        unixbench.time.maximum_resident_set_size
>>>> 8.433e+08            +0.6%   8.48e+08        unixbench.time.minor_page_faults
>>>>    4096            +0.0%       4096        unixbench.time.page_size
>>>>    3741            +1.1%       3783        unixbench.time.percent_of_cpu_this_job_got
>>>>   18341            +1.3%      18572        unixbench.time.system_time
>>>>    5323            +0.6%       5353        unixbench.time.user_time
>>>> 78197044            -3.1%   75791701        unixbench.time.voluntary_context_switches
>>>> 57178573            +0.4%   57399061        unixbench.workload
>>>> 
>>>> There is no much difference with a53ce18cacb477dd applied or not.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
>>>> ---------------- ---------------------------
>>>>       %stddev     %change         %stddev
>>>>           \          |                \
>>>>   19985            +8.6%      21697        unixbench.score
>>>>  632.64            -0.0%     632.53        unixbench.time.elapsed_time
>>>>  632.64            -0.0%     632.53        unixbench.time.elapsed_time.max
>>>> 11453985            +3.7%   11880259        unixbench.time.involuntary_context_switches
>>>>  818996            +3.1%     844681        unixbench.time.major_page_faults
>>>>    9600            +0.0%       9600        unixbench.time.maximum_resident_set_size
>>>> 7.911e+08            +8.4%  8.575e+08        unixbench.time.minor_page_faults
>>>>    4096            +0.0%       4096        unixbench.time.page_size
>>>>    3767            -0.4%       3752        unixbench.time.percent_of_cpu_this_job_got
>>>>   18873            -2.4%      18423        unixbench.time.system_time
>>>>    4960            +7.1%       5313        unixbench.time.user_time
>>>> 75436000           +10.8%   83581483        unixbench.time.voluntary_context_switches
>>>> 53553404            +8.7%   58235303        unixbench.workload
>>>> 
>>>> Previously with 829c1651e9c4a6f introduced, there is 8.6% improvement. And this improvement
>>>> remains with a53ce18cacb477dd applied.
>>>> 
>>>> Can you send the full test script so I can have a try locally?
>>> 
>>> Thanks for testing this. For v5.4.y kernel (not for v6.3.y or v5.15.y), there is an 8% regression with the following test: ub_gcc_448copies_Shell_Scripts_8_concurrent
>>> And that’s ’shell8’ with ‘-c 448’ copies passed as argument.
>>> 
>>> Thanks,
>>> Saeed
>>> 
>>>> 
>>>> thanks,
>>>> Chenyu


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4
  2023-07-20 23:04             ` Saeed Mirzamohammadi
@ 2023-07-21 14:01               ` Vincent Guittot
  2023-07-26  0:03                 ` Saeed Mirzamohammadi
  0 siblings, 1 reply; 10+ messages in thread
From: Vincent Guittot @ 2023-07-21 14:01 UTC (permalink / raw)
  To: Saeed Mirzamohammadi
  Cc: Chen Yu, Ingo Molnar, peterz@infradead.org,
	Linux Kernel Mailing List, zhangqiao22@huawei.com

Hi Saeed,

On Fri, 21 Jul 2023 at 01:04, Saeed Mirzamohammadi
<saeed.mirzamohammadi@oracle.com> wrote:
>
> Hi Vincent,
>
> > On Jun 30, 2023, at 1:28 AM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> >
> > On Fri, 30 Jun 2023 at 00:20, Saeed Mirzamohammadi
> > <saeed.mirzamohammadi@oracle.com> wrote:
> >>
> >>
> >>
> >>> On Jun 21, 2023, at 9:41 AM, Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> wrote:
> >>>
> >>> Hi Chen, Vincent,
> >>>
> >>>> On Jun 13, 2023, at 11:37 PM, Chen Yu <yu.c.chen@intel.com> wrote:
> >>>>
> >>>> On 2023-06-13 at 19:35:55 +0000, Saeed Mirzamohammadi wrote:
> >>>>> Hi Vincent,
> >>>>>
> >>>>>> On Jun 9, 2023, at 9:52 AM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> >>>>>>
> >>>>>> Hi Saeed,
> >>>>>>
> >>>>>> On Fri, 9 Jun 2023 at 00:48, Saeed Mirzamohammadi
> >>>>>> <saeed.mirzamohammadi@oracle.com> wrote:
> >>>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:
> >>>>>>>
> >>>>>>> Commit Data:
> >>>>>>> commit-id        : a53ce18cacb477dd0513c607f187d16f0fa96f71
> >>>>>>> subject          : sched/fair: Sanitize vruntime of entity being migrated
> >>>>>>> author           : vincent.guittot@linaro.org
> >>>>>>> author date      : 2023-03-17 16:08:10
> >>>>>>>
> >>>>>>>
> >>>>>>> We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.
> >>>>>>
> >>>>>> It would be good to confirm that the regression is present on v6.3
> >>>>>> where the patch has been merged originally.  It can be that there is
> >>>>>> hidden dependency with other patches introduced since v5.4
> >>>>>
> >>>>> Regression is present on v6.3 as well, examples:
> >>>>> ub_gcc_224copies_Shell_Scripts_8_concurrent: ~6%
> >>>>> ub_gcc_224copies_Shell_Scripts_16_concurrent: ~8%
> >>>>> ub_gcc_448copies_Shell_Scripts_1_concurrent: ~2%
> >>>
> >>> Apologize for the confusion, I should correct the v6.3 upstream result above. v6.3 doesn’t have any regression.
> >>> v6.3.y -> no regression
> >>> v5.15.y -> no regression
> >>> v5.4.y -> 5-8% regression.
> >>
> >> A gentle reminder if there is any recommendation for v5.4.y and v4.14.y regression. Thanks!
> >
> > I tried to find why the regression happens only for v5.4.y (or lower)
> > and not for v5.15.y (or above) but I haven't been able to find any
> > possible reason in the code.
> >
> > Regarding the 2 commits below, they must come together so we can't
> > simply revert 1 and not the other.
> > commit 829c1651e9c4 sched/fair: sanitize vruntime of entity being placed
> > commit a53ce18cacb4 sched/fair: Sanitize vruntime of entity being migrated
> >
> Tests were done before and after these 2 commits.
>
> > entity_is_long_sleeper() should never return true in your case. Could
> > you try to check that it's the case for you ?
> >
> Tested this and entity_is_long_sleeper() never returns True.
>
> I actually removed the related part, tested, and the regression is gone with the following change (partial revert):
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3ebd2054996bc..0d70dd6e14844 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -792,9 +792,6 @@ static inline void dequeue_task(struct rq *rq, struct task_struct *p, int flags)
>
>  void activate_task(struct rq *rq, struct task_struct *p, int flags)
>  {
> -       if (task_on_rq_migrating(p))
> -               flags |= ENQUEUE_MIGRATED;
> -
>         if (task_contributes_to_load(p))
>                 rq->nr_uninterruptible--;
>

Is the regression still there if you only apply the partial revert
below but not the above part ?
I have rechecked the code but can't see any obvious reason why there
is a regression on v5.4 and not on v5.15.

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 83a7cf62c0f53..ef9aca05c7bdf 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3779,9 +3779,6 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>
>         if (flags & ENQUEUE_WAKEUP)
>                 place_entity(cfs_rq, se, 0);
> -       /* Entity has migrated, no longer consider this task hot */
> -       if (flags & ENQUEUE_MIGRATED)
> -               se->exec_start = 0;
>
>         check_schedstat_required();
>         update_stats_enqueue(cfs_rq, se, flags);
> @@ -6182,6 +6179,9 @@ static void migrate_task_rq_fair(struct task_struct *p)
>
>         /* Tell new CPU we are migrated */
>         p->se.avg.last_update_time = 0;
> +
> +       /* We have migrated, no longer consider this task hot */
> +       p->se.exec_start = 0;
>  }
>
>  static void task_dead_fair(struct task_struct *p)
>
>
> >
> >
> >
> >
> >>
> >>>
> >>>
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> ub_gcc_1copy_Shell_Scripts_1_concurrent  :  -0.01%
> >>>>>>> ub_gcc_1copy_Shell_Scripts_8_concurrent  :  -0.1%
> >>>>>>> ub_gcc_1copy_Shell_Scripts_16_concurrent  :  -0.12%%
> >>>>>>> ub_gcc_56copies_Shell_Scripts_1_concurrent  :  -2.29%%
> >>>>>>> ub_gcc_56copies_Shell_Scripts_8_concurrent  :  -4.22%
> >>>>>>> ub_gcc_56copies_Shell_Scripts_16_concurrent  :  -4.23%
> >>>>>>> ub_gcc_224copies_Shell_Scripts_1_concurrent  :  -5.54%
> >>>>>>> ub_gcc_224copies_Shell_Scripts_8_concurrent  :  -8%
> >>>>>>> ub_gcc_224copies_Shell_Scripts_16_concurrent  :  -7.05%
> >>>>>>> ub_gcc_448copies_Shell_Scripts_1_concurrent  :  -6.4%
> >>>>>>> ub_gcc_448copies_Shell_Scripts_8_concurrent  :  -8.35%
> >>>>>>> ub_gcc_448copies_Shell_Scripts_16_concurrent  :  -7.09%
> >>>>>>>
> >>>>>>> Link to unixbench:
> >>>>>>> github.com/kdlucas/byte-unixbench
> >>>>>>
> >>>>>> I tried to reproduce the problem with v6.3 on my system but I don't
> >>>>>> see any difference with or without the patch
> >>>>>>
> >>>>>> Do you have more details on your setup ? number of cpu and topology ?
> >>>>>>
> >>>>> model name  : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
> >>>>>
> >>>>> Topology:
> >>>>> node   0   1
> >>>>> 0:  10  21
> >>>>> 1:  21  10
> >>>>>
> >>>>> Architecture:          x86_64
> >>>>> CPU op-mode(s):        32-bit, 64-bit
> >>>>> CPU(s):                56
> >>>>> On-line CPU(s) list:   0-55
> >>>>> Thread(s) per core:    2
> >>>>> Core(s) per socket:    14
> >>>>> Socket(s):             2
> >>>>> NUMA node(s):          2
> >>>>>
> >>>> Tested on a similar platform E5-2697 v2 @ 2.70GHz which has 2 nodes,
> >>>> 24 cores/48 CPUs in total, however I could not reproduce the issue.
> >>>> Since the regression was reported mainly against 224 and 448 copies case
> >>>> on your platform, I tested unixbench shell1 with 4 x 48 = 192 copies.
> >>>>
> >>>>
> >>>> a53ce18cacb477dd 213acadd21a080fc8cda8eebe6d
> >>>> ---------------- ---------------------------
> >>>>       %stddev     %change         %stddev
> >>>>           \          |                \
> >>>>   21304            +0.5%      21420        unixbench.score
> >>>>  632.43            +0.0%     632.44        unixbench.time.elapsed_time
> >>>>  632.43            +0.0%     632.44        unixbench.time.elapsed_time.max
> >>>> 11837046            -4.7%   11277727        unixbench.time.involuntary_context_switches
> >>>>  864713            +0.1%     865914        unixbench.time.major_page_faults
> >>>>    9600            +4.0%       9984        unixbench.time.maximum_resident_set_size
> >>>> 8.433e+08            +0.6%   8.48e+08        unixbench.time.minor_page_faults
> >>>>    4096            +0.0%       4096        unixbench.time.page_size
> >>>>    3741            +1.1%       3783        unixbench.time.percent_of_cpu_this_job_got
> >>>>   18341            +1.3%      18572        unixbench.time.system_time
> >>>>    5323            +0.6%       5353        unixbench.time.user_time
> >>>> 78197044            -3.1%   75791701        unixbench.time.voluntary_context_switches
> >>>> 57178573            +0.4%   57399061        unixbench.workload
> >>>>
> >>>> There is no much difference with a53ce18cacb477dd applied or not.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
> >>>> ---------------- ---------------------------
> >>>>       %stddev     %change         %stddev
> >>>>           \          |                \
> >>>>   19985            +8.6%      21697        unixbench.score
> >>>>  632.64            -0.0%     632.53        unixbench.time.elapsed_time
> >>>>  632.64            -0.0%     632.53        unixbench.time.elapsed_time.max
> >>>> 11453985            +3.7%   11880259        unixbench.time.involuntary_context_switches
> >>>>  818996            +3.1%     844681        unixbench.time.major_page_faults
> >>>>    9600            +0.0%       9600        unixbench.time.maximum_resident_set_size
> >>>> 7.911e+08            +8.4%  8.575e+08        unixbench.time.minor_page_faults
> >>>>    4096            +0.0%       4096        unixbench.time.page_size
> >>>>    3767            -0.4%       3752        unixbench.time.percent_of_cpu_this_job_got
> >>>>   18873            -2.4%      18423        unixbench.time.system_time
> >>>>    4960            +7.1%       5313        unixbench.time.user_time
> >>>> 75436000           +10.8%   83581483        unixbench.time.voluntary_context_switches
> >>>> 53553404            +8.7%   58235303        unixbench.workload
> >>>>
> >>>> Previously with 829c1651e9c4a6f introduced, there is 8.6% improvement. And this improvement
> >>>> remains with a53ce18cacb477dd applied.
> >>>>
> >>>> Can you send the full test script so I can have a try locally?
> >>>
> >>> Thanks for testing this. For v5.4.y kernel (not for v6.3.y or v5.15.y), there is an 8% regression with the following test: ub_gcc_448copies_Shell_Scripts_8_concurrent
> >>> And that’s ’shell8’ with ‘-c 448’ copies passed as argument.
> >>>
> >>> Thanks,
> >>> Saeed
> >>>
> >>>>
> >>>> thanks,
> >>>> Chenyu
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4
  2023-07-21 14:01               ` Vincent Guittot
@ 2023-07-26  0:03                 ` Saeed Mirzamohammadi
  0 siblings, 0 replies; 10+ messages in thread
From: Saeed Mirzamohammadi @ 2023-07-26  0:03 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Chen Yu, Ingo Molnar, peterz@infradead.org,
	Linux Kernel Mailing List, zhangqiao22@huawei.com

Hi Vincent,

> On Jul 21, 2023, at 7:01 AM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> 
> Hi Saeed,
> 
> On Fri, 21 Jul 2023 at 01:04, Saeed Mirzamohammadi
> <saeed.mirzamohammadi@oracle.com> wrote:
>> 
>> Hi Vincent,
>> 
>>> On Jun 30, 2023, at 1:28 AM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
>>> 
>>> On Fri, 30 Jun 2023 at 00:20, Saeed Mirzamohammadi
>>> <saeed.mirzamohammadi@oracle.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Jun 21, 2023, at 9:41 AM, Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> wrote:
>>>>> 
>>>>> Hi Chen, Vincent,
>>>>> 
>>>>>> On Jun 13, 2023, at 11:37 PM, Chen Yu <yu.c.chen@intel.com> wrote:
>>>>>> 
>>>>>> On 2023-06-13 at 19:35:55 +0000, Saeed Mirzamohammadi wrote:
>>>>>>> Hi Vincent,
>>>>>>> 
>>>>>>>> On Jun 9, 2023, at 9:52 AM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
>>>>>>>> 
>>>>>>>> Hi Saeed,
>>>>>>>> 
>>>>>>>> On Fri, 9 Jun 2023 at 00:48, Saeed Mirzamohammadi
>>>>>>>> <saeed.mirzamohammadi@oracle.com> wrote:
>>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> I’m reporting a regression of up to 8% with Unixbench Shell Scripts benchmarks after the following commit:
>>>>>>>>> 
>>>>>>>>> Commit Data:
>>>>>>>>> commit-id        : a53ce18cacb477dd0513c607f187d16f0fa96f71
>>>>>>>>> subject          : sched/fair: Sanitize vruntime of entity being migrated
>>>>>>>>> author           : vincent.guittot@linaro.org
>>>>>>>>> author date      : 2023-03-17 16:08:10
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> We have observed this on our v5.4 and v4.14 kernel and not yet tested 5.15 but I expect the same.
>>>>>>>> 
>>>>>>>> It would be good to confirm that the regression is present on v6.3
>>>>>>>> where the patch has been merged originally.  It can be that there is
>>>>>>>> hidden dependency with other patches introduced since v5.4
>>>>>>> 
>>>>>>> Regression is present on v6.3 as well, examples:
>>>>>>> ub_gcc_224copies_Shell_Scripts_8_concurrent: ~6%
>>>>>>> ub_gcc_224copies_Shell_Scripts_16_concurrent: ~8%
>>>>>>> ub_gcc_448copies_Shell_Scripts_1_concurrent: ~2%
>>>>> 
>>>>> Apologize for the confusion, I should correct the v6.3 upstream result above. v6.3 doesn’t have any regression.
>>>>> v6.3.y -> no regression
>>>>> v5.15.y -> no regression
>>>>> v5.4.y -> 5-8% regression.
>>>> 
>>>> A gentle reminder if there is any recommendation for v5.4.y and v4.14.y regression. Thanks!
>>> 
>>> I tried to find why the regression happens only for v5.4.y (or lower)
>>> and not for v5.15.y (or above) but I haven't been able to find any
>>> possible reason in the code.
>>> 
>>> Regarding the 2 commits below, they must come together so we can't
>>> simply revert 1 and not the other.
>>> commit 829c1651e9c4 sched/fair: sanitize vruntime of entity being placed
>>> commit a53ce18cacb4 sched/fair: Sanitize vruntime of entity being migrated
>>> 
>> Tests were done before and after these 2 commits.
>> 
>>> entity_is_long_sleeper() should never return true in your case. Could
>>> you try to check that it's the case for you ?
>>> 
>> Tested this and entity_is_long_sleeper() never returns True.
>> 
>> I actually removed the related part, tested, and the regression is gone with the following change (partial revert):
>> 
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 3ebd2054996bc..0d70dd6e14844 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -792,9 +792,6 @@ static inline void dequeue_task(struct rq *rq, struct task_struct *p, int flags)
>> 
>> void activate_task(struct rq *rq, struct task_struct *p, int flags)
>> {
>> -       if (task_on_rq_migrating(p))
>> -               flags |= ENQUEUE_MIGRATED;
>> -
>>        if (task_contributes_to_load(p))
>>                rq->nr_uninterruptible--;
>> 
> 
> Is the regression still there if you only apply the partial revert
> below but not the above part ?
Regression is still gone after I added back the following change from partial revert:

+       if (task_on_rq_migrating(p))
+               flags |= ENQUEUE_MIGRATED;
+

So this partial revert below is fixing the regression:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e19fe88914574..ccc0acd477a09 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3777,9 +3777,6 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 
        if (flags & ENQUEUE_WAKEUP)
                place_entity(cfs_rq, se, 0);
-       /* Entity has migrated, no longer consider this task hot */
-       if (flags & ENQUEUE_MIGRATED)
-               se->exec_start = 0;
 
        check_schedstat_required();
        update_stats_enqueue(cfs_rq, se, flags);
@@ -6180,6 +6177,9 @@ static void migrate_task_rq_fair(struct task_struct *p)
 
        /* Tell new CPU we are migrated */
        p->se.avg.last_update_time = 0;
+
+       /* We have migrated, no longer consider this task hot */
+       p->se.exec_start = 0;
 }
 
 static void task_dead_fair(struct task_struct *p)


> I have rechecked the code but can't see any obvious reason why there
> is a regression on v5.4 and not on v5.15.
> 
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 83a7cf62c0f53..ef9aca05c7bdf 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -3779,9 +3779,6 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>> 
>>        if (flags & ENQUEUE_WAKEUP)
>>                place_entity(cfs_rq, se, 0);
>> -       /* Entity has migrated, no longer consider this task hot */
>> -       if (flags & ENQUEUE_MIGRATED)
>> -               se->exec_start = 0;
>> 
>>        check_schedstat_required();
>>        update_stats_enqueue(cfs_rq, se, flags);
>> @@ -6182,6 +6179,9 @@ static void migrate_task_rq_fair(struct task_struct *p)
>> 
>>        /* Tell new CPU we are migrated */
>>        p->se.avg.last_update_time = 0;
>> +
>> +       /* We have migrated, no longer consider this task hot */
>> +       p->se.exec_start = 0;
>> }
>> 
>> static void task_dead_fair(struct task_struct *p)
>> 
>> 
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>>> 
>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ub_gcc_1copy_Shell_Scripts_1_concurrent  :  -0.01%
>>>>>>>>> ub_gcc_1copy_Shell_Scripts_8_concurrent  :  -0.1%
>>>>>>>>> ub_gcc_1copy_Shell_Scripts_16_concurrent  :  -0.12%%
>>>>>>>>> ub_gcc_56copies_Shell_Scripts_1_concurrent  :  -2.29%%
>>>>>>>>> ub_gcc_56copies_Shell_Scripts_8_concurrent  :  -4.22%
>>>>>>>>> ub_gcc_56copies_Shell_Scripts_16_concurrent  :  -4.23%
>>>>>>>>> ub_gcc_224copies_Shell_Scripts_1_concurrent  :  -5.54%
>>>>>>>>> ub_gcc_224copies_Shell_Scripts_8_concurrent  :  -8%
>>>>>>>>> ub_gcc_224copies_Shell_Scripts_16_concurrent  :  -7.05%
>>>>>>>>> ub_gcc_448copies_Shell_Scripts_1_concurrent  :  -6.4%
>>>>>>>>> ub_gcc_448copies_Shell_Scripts_8_concurrent  :  -8.35%
>>>>>>>>> ub_gcc_448copies_Shell_Scripts_16_concurrent  :  -7.09%
>>>>>>>>> 
>>>>>>>>> Link to unixbench:
>>>>>>>>> github.com/kdlucas/byte-unixbench
>>>>>>>> 
>>>>>>>> I tried to reproduce the problem with v6.3 on my system but I don't
>>>>>>>> see any difference with or without the patch
>>>>>>>> 
>>>>>>>> Do you have more details on your setup ? number of cpu and topology ?
>>>>>>>> 
>>>>>>> model name  : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
>>>>>>> 
>>>>>>> Topology:
>>>>>>> node   0   1
>>>>>>> 0:  10  21
>>>>>>> 1:  21  10
>>>>>>> 
>>>>>>> Architecture:          x86_64
>>>>>>> CPU op-mode(s):        32-bit, 64-bit
>>>>>>> CPU(s):                56
>>>>>>> On-line CPU(s) list:   0-55
>>>>>>> Thread(s) per core:    2
>>>>>>> Core(s) per socket:    14
>>>>>>> Socket(s):             2
>>>>>>> NUMA node(s):          2
>>>>>>> 
>>>>>> Tested on a similar platform E5-2697 v2 @ 2.70GHz which has 2 nodes,
>>>>>> 24 cores/48 CPUs in total, however I could not reproduce the issue.
>>>>>> Since the regression was reported mainly against 224 and 448 copies case
>>>>>> on your platform, I tested unixbench shell1 with 4 x 48 = 192 copies.
>>>>>> 
>>>>>> 
>>>>>> a53ce18cacb477dd 213acadd21a080fc8cda8eebe6d
>>>>>> ---------------- ---------------------------
>>>>>>      %stddev     %change         %stddev
>>>>>>          \          |                \
>>>>>>  21304            +0.5%      21420        unixbench.score
>>>>>> 632.43            +0.0%     632.44        unixbench.time.elapsed_time
>>>>>> 632.43            +0.0%     632.44        unixbench.time.elapsed_time.max
>>>>>> 11837046            -4.7%   11277727        unixbench.time.involuntary_context_switches
>>>>>> 864713            +0.1%     865914        unixbench.time.major_page_faults
>>>>>>   9600            +4.0%       9984        unixbench.time.maximum_resident_set_size
>>>>>> 8.433e+08            +0.6%   8.48e+08        unixbench.time.minor_page_faults
>>>>>>   4096            +0.0%       4096        unixbench.time.page_size
>>>>>>   3741            +1.1%       3783        unixbench.time.percent_of_cpu_this_job_got
>>>>>>  18341            +1.3%      18572        unixbench.time.system_time
>>>>>>   5323            +0.6%       5353        unixbench.time.user_time
>>>>>> 78197044            -3.1%   75791701        unixbench.time.voluntary_context_switches
>>>>>> 57178573            +0.4%   57399061        unixbench.workload
>>>>>> 
>>>>>> There is no much difference with a53ce18cacb477dd applied or not.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
>>>>>> ---------------- ---------------------------
>>>>>>      %stddev     %change         %stddev
>>>>>>          \          |                \
>>>>>>  19985            +8.6%      21697        unixbench.score
>>>>>> 632.64            -0.0%     632.53        unixbench.time.elapsed_time
>>>>>> 632.64            -0.0%     632.53        unixbench.time.elapsed_time.max
>>>>>> 11453985            +3.7%   11880259        unixbench.time.involuntary_context_switches
>>>>>> 818996            +3.1%     844681        unixbench.time.major_page_faults
>>>>>>   9600            +0.0%       9600        unixbench.time.maximum_resident_set_size
>>>>>> 7.911e+08            +8.4%  8.575e+08        unixbench.time.minor_page_faults
>>>>>>   4096            +0.0%       4096        unixbench.time.page_size
>>>>>>   3767            -0.4%       3752        unixbench.time.percent_of_cpu_this_job_got
>>>>>>  18873            -2.4%      18423        unixbench.time.system_time
>>>>>>   4960            +7.1%       5313        unixbench.time.user_time
>>>>>> 75436000           +10.8%   83581483        unixbench.time.voluntary_context_switches
>>>>>> 53553404            +8.7%   58235303        unixbench.workload
>>>>>> 
>>>>>> Previously with 829c1651e9c4a6f introduced, there is 8.6% improvement. And this improvement
>>>>>> remains with a53ce18cacb477dd applied.
>>>>>> 
>>>>>> Can you send the full test script so I can have a try locally?
>>>>> 
>>>>> Thanks for testing this. For v5.4.y kernel (not for v6.3.y or v5.15.y), there is an 8% regression with the following test: ub_gcc_448copies_Shell_Scripts_8_concurrent
>>>>> And that’s ’shell8’ with ‘-c 448’ copies passed as argument.
>>>>> 
>>>>> Thanks,
>>>>> Saeed
>>>>> 
>>>>>> 
>>>>>> thanks,
>>>>>> Chenyu


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-07-26  0:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-08 22:48 Reporting a performance regression in sched/fair on Unixbench Shell Scripts with commit a53ce18cacb4 Saeed Mirzamohammadi
2023-06-09 16:52 ` Vincent Guittot
2023-06-13 19:35   ` Saeed Mirzamohammadi
2023-06-14  6:37     ` Chen Yu
2023-06-21 16:41       ` Saeed Mirzamohammadi
2023-06-29 22:19         ` Saeed Mirzamohammadi
2023-06-30  8:28           ` Vincent Guittot
2023-07-20 23:04             ` Saeed Mirzamohammadi
2023-07-21 14:01               ` Vincent Guittot
2023-07-26  0:03                 ` Saeed Mirzamohammadi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).