Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

From: Morten Rasmussen <morten.rasmussen@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steve Muckle <steve.muckle@linaro.org>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>,
	"yuyang.du@intel.com" <yuyang.du@intel.com>,
	"mturquette@baylibre.com" <mturquette@baylibre.com>,
	"rjw@rjwysocki.net" <rjw@rjwysocki.net>,
	Juri Lelli <Juri.Lelli@arm.com>,
	"sgurrappadi@nvidia.com" <sgurrappadi@nvidia.com>,
	"pang.xunlei@zte.com.cn" <pang.xunlei@zte.com.cn>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig
Date: Fri, 11 Sep 2015 18:22:47 +0100	[thread overview]
Message-ID: <20150911172246.GI27098@e105550-lin.cambridge.arm.com> (raw)
In-Reply-To: <20150909111309.GD27098@e105550-lin.cambridge.arm.com>

On Wed, Sep 09, 2015 at 12:13:10PM +0100, Morten Rasmussen wrote:
> On Wed, Sep 09, 2015 at 11:43:05AM +0200, Peter Zijlstra wrote:
> > Sadly that makes the code worse; I get 14 mul instructions where
> > previously I had 11.
> > 
> > What happens is that GCC gets confused and cannot constant propagate the
> > new variables, so what used to be shifts now end up being actual
> > multiplications.
> > 
> > With this, I get back to 11. Can you see what happens on ARM where you
> > have both functions defined to non constants?
> 
> We repeated the experiment on arm and arm64 but still with functions
> defined to constant to compare with your results. The mul instruction
> count seems to be somewhat compiler version dependent, but consistently
> show no effect of the patch:
> 
> arm	before	after
> gcc4.9	12	12
> gcc4.8	10	10
> 
> arm64	before	after
> gcc4.9	11	11
> 
> I will get numbers with the arch-functions implemented as well and do
> hackbench runs to see what happens in terms of performance.

I have done some runs with the proposed fixes added:

1. PeterZ's util_sum shift fix (change util_sum).
2. Morten's scaling of weight instead of time (reduce bit loss).
3. PeterZ's unconditional calls to arch*() functions (compiler opt).

To be clear: 2 includes 1, and 3 includes 1 and 2.

Runs where done with the default (#define) implementation of the
arch-functions and with arch specific implementation for ARM.

I realized that just looking for 'mul' instructions in
update_blocked_averages() is probably not a fair comparison on ARM as it
turned out that it has quite a few multiply-accumulate instructions. So
I have included the total count including those too.


Test platforms:

ARM TC2 (A7x3 only)
perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 200
#mul: grep -e mul (in update_blocked_averages())
#mul_all: grep -e mul -e mla -e mls -e mia (in update_blocked_averages())
gcc: 4.9.3

Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz
perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 15000
#mul: grep -e mul (in update_blocked_averages())
gcc: 4.9.2


Results:

perf numbers are average of three (x10) runs. Raw data is available
further down.

ARM TC2		#mul		#mul_all	perf bench
arch*()		default	arm	default	arm	default	arm

1 shift_fix	10	16	22	36	13.401	13.288
2 scaled_weight	12	14	30	32	13.282	13.238
3 unconditional	12	14	26	32	13.296	13.427

Intel E5-2690	#mul		#mul_all	perf bench
arch*()		default		default		default

1 shift_fix	13				14.786
2 scaled_weight	18				15.078
3 unconditional	14				15.195


Overall it appears that fewer 'mul' instructions doesn't necessarily
mean better perf bench score. For ARM, 2 seems the best choice overall.
While 1 is better for Intel. If we want to try avoid the bit loss by
scaling weight instead of time, 2 is best for both. However, all that
said, looking at the raw numbers there is a significant difference
between runs of perf --repeat, so we can't really draw any strong
conclusions. It all appears to be in the noise.

I suggest that I spin a v2 of this series and go with scaled_weight to
reduce bit loss. Any objections?

While at it, should I include Yuyang's patch redefining the SCALE/SHIFT
mess?


Raw numbers:

ARM TC2

shift_fix	default_arch
gcc4.9.3
#mul 10
#mul+mla+mls+mia 22
13.384416727 seconds time elapsed ( +-  0.17% )
13.431014702 seconds time elapsed ( +-  0.18% )
13.387434890 seconds time elapsed ( +-  0.15% )

shift_fix	arm_arch
gcc4.9.3
#mul 16
#mul+mla+mls+mia 36
13.271044081 seconds time elapsed ( +-  0.11% )
13.310189123 seconds time elapsed ( +-  0.19% )
13.283594740 seconds time elapsed ( +-  0.12% )

scaled_weight	default_arch
gcc4.9.3
#mul 12
#mul+mla+mls+mia 30
13.295649553 seconds time elapsed ( +-  0.20% )
13.271634654 seconds time elapsed ( +-  0.19% )
13.280081329 seconds time elapsed ( +-  0.14% )

scaled_weight	arm_arch
gcc4.9.3
#mul 14
#mul+mla+mls+mia 32
13.230659223 seconds time elapsed ( +-  0.15% )
13.222276527 seconds time elapsed ( +-  0.15% )
13.260275081 seconds time elapsed ( +-  0.21% )

unconditional	default_arch
gcc4.9.3
#mul 12
#mul+mla+mls+mia 26
13.274904460 seconds time elapsed ( +-  0.13% )
13.307853511 seconds time elapsed ( +-  0.15% )
13.304084844 seconds time elapsed ( +-  0.22% )

unconditional	arm_arch
gcc4.9.3
#mul 14
#mul+mla+mls+mia 32
13.432878577 seconds time elapsed ( +-  0.13% )
13.417950552 seconds time elapsed ( +-  0.12% )
13.431682719 seconds time elapsed ( +-  0.18% )


Intel

shift_fix	default_arch
gcc4.9.2
#mul 13
14.905815416 seconds time elapsed ( +-  0.61% )
14.811113694 seconds time elapsed ( +-  0.84% )
14.639739309 seconds time elapsed ( +-  0.76% )

scaled_weight	default_arch
gcc4.9.2
#mul 18
15.113275474 seconds time elapsed ( +-  0.64% )
15.056777680 seconds time elapsed ( +-  0.44% )
15.064074416 seconds time elapsed ( +-  0.71% )

unconditional	default_arch
gcc4.9.2
#mul 14
15.105152500 seconds time elapsed ( +-  0.71% )
15.346405473 seconds time elapsed ( +-  0.81% )
15.132933523 seconds time elapsed ( +-  0.82% )

next prev parent reply	other threads:[~2015-09-11 17:18 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-14 16:23 [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking Morten Rasmussen
2015-08-14 16:23 ` [PATCH 1/6] sched/fair: Make load tracking frequency scale-invariant Morten Rasmussen
2015-09-13 11:03   ` [tip:sched/core] " tip-bot for Dietmar Eggemann
2015-08-14 16:23 ` [PATCH 2/6] sched/fair: Convert arch_scale_cpu_capacity() from weak function to #define Morten Rasmussen
2015-09-02  9:31   ` Vincent Guittot
2015-09-02 12:41     ` Vincent Guittot
2015-09-03 19:58     ` Dietmar Eggemann
2015-09-04  7:26       ` Vincent Guittot
2015-09-07 13:25         ` Dietmar Eggemann
2015-09-11 13:21         ` Dietmar Eggemann
2015-09-11 14:45           ` Vincent Guittot
2015-09-13 11:03   ` [tip:sched/core] " tip-bot for Morten Rasmussen
2015-08-14 16:23 ` [PATCH 3/6] sched/fair: Make utilization tracking cpu scale-invariant Morten Rasmussen
2015-08-14 23:04   ` Dietmar Eggemann
2015-09-04  7:52     ` Vincent Guittot
2015-09-13 11:04     ` [tip:sched/core] sched/fair: Make utilization tracking CPU scale-invariant tip-bot for Dietmar Eggemann
2015-08-14 16:23 ` [PATCH 4/6] sched/fair: Name utilization related data and functions consistently Morten Rasmussen
2015-09-04  9:08   ` Vincent Guittot
2015-09-11 16:35     ` Dietmar Eggemann
2015-09-13 11:04   ` [tip:sched/core] " tip-bot for Dietmar Eggemann
2015-08-14 16:23 ` [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig Morten Rasmussen
2015-09-03 23:51   ` Steve Muckle
2015-09-07 15:37     ` Dietmar Eggemann
2015-09-07 16:21       ` Vincent Guittot
2015-09-07 18:54         ` Dietmar Eggemann
2015-09-07 19:47           ` Peter Zijlstra
2015-09-08 12:47             ` Dietmar Eggemann
2015-09-08  7:22           ` Vincent Guittot
2015-09-08 12:26             ` Peter Zijlstra
2015-09-08 12:52               ` Peter Zijlstra
2015-09-08 14:06                 ` Vincent Guittot
2015-09-08 14:35                   ` Morten Rasmussen
2015-09-08 14:40                     ` Vincent Guittot
2015-09-08 14:31                 ` Morten Rasmussen
2015-09-08 15:33                   ` Peter Zijlstra
2015-09-09 22:23                     ` bsegall
2015-09-10 11:06                       ` Morten Rasmussen
2015-09-10 11:11                         ` Vincent Guittot
2015-09-10 12:10                           ` Morten Rasmussen
2015-09-11  0:50                             ` Yuyang Du
2015-09-10 17:23                         ` bsegall
2015-09-08 16:53                   ` Morten Rasmussen
2015-09-09  9:43                     ` Peter Zijlstra
2015-09-09  9:45                       ` Peter Zijlstra
2015-09-09 11:13                       ` Morten Rasmussen
2015-09-11 17:22                         ` Morten Rasmussen [this message]
2015-09-17  9:51                           ` Peter Zijlstra
2015-09-17 10:38                           ` Peter Zijlstra
2015-09-21  1:16                             ` Yuyang Du
2015-09-21 17:30                               ` bsegall
2015-09-21 23:39                                 ` Yuyang Du
2015-09-22 17:18                                   ` bsegall
2015-09-22 23:22                                     ` Yuyang Du
2015-09-23 16:54                                       ` bsegall
2015-09-24  0:22                                         ` Yuyang Du
2015-09-30 12:52                                     ` Peter Zijlstra
2015-09-11  7:46                     ` Leo Yan
2015-09-11 10:02                       ` Morten Rasmussen
2015-09-11 14:11                         ` Leo Yan
2015-09-09 19:07                 ` Yuyang Du
2015-09-10 10:06                   ` Peter Zijlstra
2015-09-08 13:39               ` Vincent Guittot
2015-09-08 14:10                 ` Peter Zijlstra
2015-09-08 15:17                   ` Vincent Guittot
2015-09-08 12:50             ` Dietmar Eggemann
2015-09-08 14:01               ` Vincent Guittot
2015-09-08 14:27                 ` Dietmar Eggemann
2015-09-09 20:15               ` Yuyang Du
2015-09-10 10:07                 ` Peter Zijlstra
2015-09-11  0:28                   ` Yuyang Du
2015-09-11 10:31                     ` Morten Rasmussen
2015-09-11 17:05                       ` bsegall
2015-09-11 18:24                         ` Yuyang Du
2015-09-14 17:36                           ` bsegall
2015-09-14 12:56                         ` Morten Rasmussen
2015-09-14 17:34                           ` bsegall
2015-09-14 22:56                             ` Yuyang Du
2015-09-15 17:11                               ` bsegall
2015-09-15 18:39                                 ` Yuyang Du
2015-09-16 17:06                                   ` bsegall
2015-09-17  2:31                                     ` Yuyang Du
2015-09-15  8:43                             ` Morten Rasmussen
2015-09-16 15:36                             ` Peter Zijlstra
2015-09-08 11:44           ` Peter Zijlstra
2015-09-13 11:04       ` [tip:sched/core] " tip-bot for Dietmar Eggemann
2015-08-14 16:23 ` [PATCH 6/6] sched/fair: Initialize task load and utilization before placing task on rq Morten Rasmussen
2015-09-13 11:05   ` [tip:sched/core] " tip-bot for Morten Rasmussen
2015-08-16 20:46 ` [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking Peter Zijlstra
2015-08-17 11:29   ` Morten Rasmussen
2015-08-17 11:48     ` Peter Zijlstra
2015-08-31  9:24 ` Peter Zijlstra
2015-09-02  9:51   ` Dietmar Eggemann
2015-09-07 12:42   ` Peter Zijlstra
2015-09-07 13:21     ` Peter Zijlstra
2015-09-07 13:23     ` Peter Zijlstra
2015-09-07 14:44     ` Dietmar Eggemann
2015-09-13 11:06       ` [tip:sched/core] sched/fair: Defer calling scaling functions tip-bot for Dietmar Eggemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150911172246.GI27098@e105550-lin.cambridge.arm.com \
    --to=morten.rasmussen@arm.com \
    --cc=Juri.Lelli@arm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mturquette@baylibre.com \
    --cc=pang.xunlei@zte.com.cn \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=sgurrappadi@nvidia.com \
    --cc=steve.muckle@linaro.org \
    --cc=vincent.guittot@linaro.org \
    --cc=yuyang.du@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.