[linus:master] [sched/eevdf] 2227a957e1: will-it-scale.per_process_ops 2.5% improvement

oe-lkp.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

From: kernel test robot <oliver.sang@intel.com>
To: Abel Wu <wuyun.abel@bytedance.com>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	<linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>, <ying.huang@intel.com>,
	<feng.tang@intel.com>, <fengwei.yin@intel.com>,
	<aubrey.li@linux.intel.com>, <yu.c.chen@intel.com>,
	<oliver.sang@intel.com>
Subject: [linus:master] [sched/eevdf]  2227a957e1: will-it-scale.per_process_ops 2.5% improvement
Date: Mon, 29 Jan 2024 22:16:30 +0800	[thread overview]
Message-ID: <202401292151.829b01b0-oliver.sang@intel.com> (raw)



Hello,

kernel test robot noticed a 2.5% improvement of will-it-scale.per_process_ops on:


commit: 2227a957e1d5b1941be4e4207879ec74f4bb37f8 ("sched/eevdf: Sort the rbtree by virtual deadline")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: will-it-scale
test machine: 104 threads 2 sockets (Skylake) with 192G memory
parameters:

	nr_task: 16
	mode: process
	test: sched_yield
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops 2.6% improvement                                      |
| test machine     | 104 threads 2 sockets (Skylake) with 192G memory                                                   |
| test parameters  | cpufreq_governor=performance                                                                       |
|                  | mode=process                                                                                       |
|                  | nr_task=50%                                                                                        |
|                  | test=sched_yield                                                                                   |
+------------------+----------------------------------------------------------------------------------------------------+



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240129/202401292151.829b01b0-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/process/16/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/sched_yield/will-it-scale

commit: 
  84db47ca71 ("sched/numa: Fix mm numa_scan_seq based unconditional scan")
  2227a957e1 ("sched/eevdf: Sort the rbtree by virtual deadline")

84db47ca7146d7bd 2227a957e1d5b1941be4e420787 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    363.99 ±141%    +104.2%     743.31 ± 69%  numa-meminfo.node1.Inactive(file)
     91.00 ±141%    +104.2%     185.83 ± 69%  numa-vmstat.node1.nr_inactive_file
     91.00 ±141%    +104.2%     185.83 ± 69%  numa-vmstat.node1.nr_zone_inactive_file
  16803184            +2.5%   17227597        will-it-scale.16.processes
   1050198            +2.5%    1076724        will-it-scale.per_process_ops
  16803184            +2.5%   17227597        will-it-scale.workload
      1.70 ±  5%     -12.0%       1.50 ±  4%  perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      1.72 ±  5%     -11.7%       1.51 ±  4%  perf-sched.wait_time.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
      3.41 ±  5%     -12.0%       3.00 ±  4%  perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      3.43 ±  5%     -11.7%       3.03 ±  4%  perf-sched.wait_time.max.ms.syslog_print.do_syslog.kmsg_read.vfs_read
      0.18            +7.1%       0.19        perf-stat.i.MPKI
 3.486e+09            -7.6%  3.222e+09        perf-stat.i.branch-instructions
      1.34            +0.1        1.47        perf-stat.i.branch-miss-rate%
  46582130            +1.6%   47319245        perf-stat.i.branch-misses
      2.67            +8.3%       2.90        perf-stat.i.cpi
      0.33            +0.0        0.36        perf-stat.i.dTLB-load-miss-rate%
  18084714            +2.4%   18523285        perf-stat.i.dTLB-load-misses
 5.491e+09            -5.2%  5.204e+09        perf-stat.i.dTLB-loads
 3.036e+09            -1.1%  3.003e+09        perf-stat.i.dTLB-stores
    741655            -4.3%     709869 ±  2%  perf-stat.i.iTLB-loads
 1.811e+10            -7.4%  1.677e+10        perf-stat.i.instructions
      1115            -9.5%       1009 ±  5%  perf-stat.i.instructions-per-iTLB-miss
      0.38            -7.4%       0.35        perf-stat.i.ipc
    115.51            -4.9%     109.88        perf-stat.i.metric.M/sec
      0.21 ±  3%      +7.6%       0.22        perf-stat.overall.MPKI
      1.34            +0.1        1.47        perf-stat.overall.branch-miss-rate%
      2.62            +8.0%       2.83        perf-stat.overall.cpi
      0.33            +0.0        0.35        perf-stat.overall.dTLB-load-miss-rate%
      0.00            +0.0        0.00        perf-stat.overall.dTLB-store-miss-rate%
      1032           -10.4%     925.55 ±  5%  perf-stat.overall.instructions-per-iTLB-miss
      0.38            -7.4%       0.35        perf-stat.overall.ipc
    324242            -9.7%     292715        perf-stat.overall.path-length
 3.474e+09            -7.6%  3.211e+09        perf-stat.ps.branch-instructions
  46423565            +1.6%   47153977        perf-stat.ps.branch-misses
  18023667            +2.4%   18460935        perf-stat.ps.dTLB-load-misses
 5.473e+09            -5.2%  5.186e+09        perf-stat.ps.dTLB-loads
 3.026e+09            -1.1%  2.993e+09        perf-stat.ps.dTLB-stores
    739444            -4.3%     707693 ±  2%  perf-stat.ps.iTLB-loads
 1.805e+10            -7.4%  1.671e+10        perf-stat.ps.instructions
 5.448e+12            -7.4%  5.043e+12        perf-stat.total.instructions
      7.82 ±  2%      -1.5        6.30        perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64
     12.22 ±  2%      -1.3       10.90        perf-profile.calltrace.cycles-pp.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
     12.72 ±  2%      -1.3       11.42        perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
     15.64 ±  2%      -1.1       14.55        perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      0.60            +0.0        0.64 ±  3%  perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      0.56            +0.0        0.61 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock.do_sched_yield.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.28 ±  2%      +0.2        2.44 ±  3%  perf-profile.calltrace.cycles-pp.do_sched_yield.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      8.28            +0.3        8.54        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__sched_yield
      0.09 ±223%      +0.4        0.53 ±  3%  perf-profile.calltrace.cycles-pp.update_min_vruntime.update_curr.pick_next_task_fair.__schedule.schedule
     15.48            +0.5       16.01        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      0.00            +0.6        0.61 ±  3%  perf-profile.calltrace.cycles-pp.pick_eevdf.pick_next_task_fair.__schedule.schedule.__x64_sys_sched_yield
      2.38 ±  2%      -2.2        0.23 ±  4%  perf-profile.children.cycles-pp.pick_next_entity
      8.21 ±  2%      -1.5        6.71        perf-profile.children.cycles-pp.pick_next_task_fair
     12.32 ±  2%      -1.3       11.01        perf-profile.children.cycles-pp.__schedule
     12.75 ±  2%      -1.3       11.46        perf-profile.children.cycles-pp.schedule
     15.82 ±  2%      -1.1       14.75        perf-profile.children.cycles-pp.__x64_sys_sched_yield
      0.64            +0.0        0.68 ±  3%  perf-profile.children.cycles-pp.syscall_enter_from_user_mode
      0.49 ±  4%      +0.0        0.54 ±  3%  perf-profile.children.cycles-pp.update_min_vruntime
      1.19 ±  3%      +0.1        1.28        perf-profile.children.cycles-pp._raw_spin_lock
      2.34 ±  2%      +0.2        2.51 ±  3%  perf-profile.children.cycles-pp.do_sched_yield
      8.17            +0.2        8.42        perf-profile.children.cycles-pp.entry_SYSCALL_64
     15.76            +0.6       16.31        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.00            +0.6        0.63 ±  3%  perf-profile.children.cycles-pp.pick_eevdf
      0.55 ±  2%      -0.5        0.05 ± 45%  perf-profile.self.cycles-pp.pick_next_entity
      0.44 ±  4%      +0.0        0.48 ±  3%  perf-profile.self.cycles-pp.update_min_vruntime
      1.30            +0.1        1.36        perf-profile.self.cycles-pp.__sched_yield
      1.46 ±  3%      +0.1        1.53 ±  2%  perf-profile.self.cycles-pp.__schedule
      1.14 ±  2%      +0.1        1.22        perf-profile.self.cycles-pp._raw_spin_lock
      7.13            +0.2        7.33        perf-profile.self.cycles-pp.entry_SYSCALL_64
      9.36 ±  2%      +0.3        9.70        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
     14.93            +0.5       15.47        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.00            +0.6        0.57 ±  3%  perf-profile.self.cycles-pp.pick_eevdf


***************************************************************************************************
lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/sched_yield/will-it-scale

commit: 
  84db47ca71 ("sched/numa: Fix mm numa_scan_seq based unconditional scan")
  2227a957e1 ("sched/eevdf: Sort the rbtree by virtual deadline")

84db47ca7146d7bd 2227a957e1d5b1941be4e420787 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.01 ± 33%     +56.6%       0.01 ± 15%  perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
      0.01 ± 13%     +29.0%       0.01 ± 21%  perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
  54153138            +2.6%   55542860        will-it-scale.52.processes
   1041406            +2.6%    1068131        will-it-scale.per_process_ops
  54153138            +2.6%   55542860        will-it-scale.workload
    125729 ± 92%     -58.8%      51829 ± 20%  numa-meminfo.node0.Mapped
   3437584 ± 28%     -56.5%    1494873 ± 61%  numa-meminfo.node0.MemUsed
   1980318 ± 52%     -66.7%     660255 ±131%  numa-meminfo.node0.Unevictable
    814154 ±127%    +162.1%    2134179 ± 40%  numa-meminfo.node1.Unevictable
     31380 ± 91%     -58.7%      12965 ± 20%  numa-vmstat.node0.nr_mapped
    495079 ± 52%     -66.7%     165063 ±131%  numa-vmstat.node0.nr_unevictable
    495079 ± 52%     -66.7%     165063 ±131%  numa-vmstat.node0.nr_zone_unevictable
    203538 ±127%    +162.1%     533544 ± 40%  numa-vmstat.node1.nr_unevictable
    203538 ±127%    +162.1%     533544 ± 40%  numa-vmstat.node1.nr_zone_unevictable
      8.82            -1.6        7.23        perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64
     13.35            -1.5       11.86        perf-profile.calltrace.cycles-pp.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
     13.86            -1.5       12.40        perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
     16.88            -1.4       15.52        perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      2.48 ±  2%      +0.1        2.56        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__sched_yield
      8.35            +0.3        8.60        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__sched_yield
     17.48            +0.5       17.96        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__sched_yield
      0.00            +0.6        0.63 ±  3%  perf-profile.calltrace.cycles-pp.pick_eevdf.pick_next_task_fair.__schedule.schedule.__x64_sys_sched_yield
      2.40 ±  3%      -2.2        0.23 ±  5%  perf-profile.children.cycles-pp.pick_next_entity
      9.22            -1.6        7.65        perf-profile.children.cycles-pp.pick_next_task_fair
     13.44            -1.5       11.96        perf-profile.children.cycles-pp.__schedule
     13.89            -1.5       12.43        perf-profile.children.cycles-pp.schedule
     17.07            -1.4       15.72        perf-profile.children.cycles-pp.__x64_sys_sched_yield
      1.55 ±  2%      +0.0        1.60        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      8.22            +0.3        8.48        perf-profile.children.cycles-pp.entry_SYSCALL_64
     17.65            +0.5       18.12        perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.00            +0.7        0.66 ±  2%  perf-profile.children.cycles-pp.pick_eevdf
      0.57 ±  2%      -0.5        0.06 ±  8%  perf-profile.self.cycles-pp.pick_next_entity
      1.16 ±  2%      +0.0        1.19        perf-profile.self.cycles-pp._raw_spin_lock
      7.17            +0.2        7.39        perf-profile.self.cycles-pp.entry_SYSCALL_64
      9.54            +0.3        9.88        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
     17.60            +0.5       18.07        perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.00            +0.6        0.60 ±  3%  perf-profile.self.cycles-pp.pick_eevdf
 1.099e+10            -7.9%  1.012e+10        perf-stat.i.branch-instructions
      1.08            +0.1        1.21        perf-stat.i.branch-miss-rate%
 1.192e+08            +2.2%  1.219e+08        perf-stat.i.branch-misses
      2.61            +8.6%       2.83        perf-stat.i.cpi
      0.33            +0.0        0.35        perf-stat.i.dTLB-load-miss-rate%
  56475655            +2.1%   57669096        perf-stat.i.dTLB-load-misses
 1.743e+10            -5.4%  1.649e+10        perf-stat.i.dTLB-loads
 9.656e+09            -1.2%  9.536e+09        perf-stat.i.dTLB-stores
  55897710            +3.6%   57909818 ±  3%  perf-stat.i.iTLB-load-misses
 5.716e+10            -7.7%  5.276e+10        perf-stat.i.instructions
      1103           -10.6%     987.24 ±  3%  perf-stat.i.instructions-per-iTLB-miss
      0.39            -7.6%       0.36        perf-stat.i.ipc
    366.15            -5.1%     347.53        perf-stat.i.metric.M/sec
      1.08            +0.1        1.20        perf-stat.overall.branch-miss-rate%
      2.56            +8.2%       2.77        perf-stat.overall.cpi
      0.32            +0.0        0.35        perf-stat.overall.dTLB-load-miss-rate%
      1022           -10.8%     912.21 ±  3%  perf-stat.overall.instructions-per-iTLB-miss
      0.39            -7.6%       0.36        perf-stat.overall.ipc
    317393            -9.9%     286100        perf-stat.overall.path-length
 1.096e+10            -7.9%  1.009e+10        perf-stat.ps.branch-instructions
 1.188e+08            +2.2%  1.215e+08        perf-stat.ps.branch-misses
  56295357            +2.1%   57482750        perf-stat.ps.dTLB-load-misses
 1.738e+10            -5.4%  1.643e+10        perf-stat.ps.dTLB-loads
 9.625e+09            -1.2%  9.505e+09        perf-stat.ps.dTLB-stores
  55706724            +3.6%   57713872 ±  3%  perf-stat.ps.iTLB-load-misses
 5.698e+10            -7.7%  5.259e+10        perf-stat.ps.instructions
 1.719e+13            -7.5%  1.589e+13        perf-stat.total.instructions



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

                 reply	other threads:[~2024-01-29 14:16 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202401292151.829b01b0-oliver.sang@intel.com \
    --to=oliver.sang@intel.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=wuyun.abel@bytedance.com \
    --cc=ying.huang@intel.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).