From: Peter Zijlstra <peterz@infradead.org>
To: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steve Muckle <steve.muckle@linaro.org>,
"mingo@redhat.com" <mingo@redhat.com>,
"daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>,
"yuyang.du@intel.com" <yuyang.du@intel.com>,
"mturquette@baylibre.com" <mturquette@baylibre.com>,
"rjw@rjwysocki.net" <rjw@rjwysocki.net>,
Juri Lelli <Juri.Lelli@arm.com>,
"sgurrappadi@nvidia.com" <sgurrappadi@nvidia.com>,
"pang.xunlei@zte.com.cn" <pang.xunlei@zte.com.cn>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig
Date: Thu, 17 Sep 2015 11:51:38 +0200 [thread overview]
Message-ID: <20150917095138.GF3604@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20150911172246.GI27098@e105550-lin.cambridge.arm.com>
On Fri, Sep 11, 2015 at 06:22:47PM +0100, Morten Rasmussen wrote:
> I have done some runs with the proposed fixes added:
>
> 1. PeterZ's util_sum shift fix (change util_sum).
> 2. Morten's scaling of weight instead of time (reduce bit loss).
> 3. PeterZ's unconditional calls to arch*() functions (compiler opt).
>
> To be clear: 2 includes 1, and 3 includes 1 and 2.
>
> Runs where done with the default (#define) implementation of the
> arch-functions and with arch specific implementation for ARM.
> Results:
>
> perf numbers are average of three (x10) runs. Raw data is available
> further down.
>
> ARM TC2 #mul #mul_all perf bench
> arch*() default arm default arm default arm
>
> 1 shift_fix 10 16 22 36 13.401 13.288
> 2 scaled_weight 12 14 30 32 13.282 13.238
> 3 unconditional 12 14 26 32 13.296 13.427
>
> Intel E5-2690 #mul #mul_all perf bench
> arch*() default default default
>
> 1 shift_fix 13 14.786
> 2 scaled_weight 18 15.078
> 3 unconditional 14 15.195
>
>
> Overall it appears that fewer 'mul' instructions doesn't necessarily
> mean better perf bench score. For ARM, 2 seems the best choice overall.
I suspect you're paying for having to do an actual load which can miss
there. So that makes sense.
> While 1 is better for Intel.
Right, because GCC shits itself with those conditionals. Weirdly though;
the below version does not seem so affected.
> I suggest that I spin a v2 of this series and go with scaled_weight to
> reduce bit loss. Any objections?
Just playing devils advocate to myself; how about cgroups? Will not a
per-cpu share of the cgroup weight often be very small?
So I had a little play, and I'm not at all convinced we want to do this
(I've not actually ran any numbers on it, but I can well imagine the
extra condition to hurt on branch miss predict) but it does show GCC
need not always get confused.
---
kernel/sched/fair.c | 58 +++++++++++++++++++++++++++++++++++------------------
1 file changed, 38 insertions(+), 20 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9176f7c588a8..1b60fbe3b86c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2519,7 +2519,25 @@ static u32 __compute_runnable_contrib(u64 n)
#error "load tracking assumes 2^10 as unit"
#endif
-#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
+static __always_inline unsigned long fp_mult2(unsigned long x, unsigned long y)
+{
+ y *= x;
+ y >>= 10;
+
+ return y;
+}
+
+static __always_inline unsigned long fp_mult3(unsigned long x, unsigned long y, unsigned long z)
+{
+ if (x > y)
+ swap(x,y);
+
+ z *= y;
+ z >>= 10;
+ z *= x;
+
+ return z;
+}
/*
* We can represent the historical contribution to runnable average as the
@@ -2553,9 +2571,9 @@ static __always_inline int
__update_load_avg(u64 now, int cpu, struct sched_avg *sa,
unsigned long weight, int running, struct cfs_rq *cfs_rq)
{
- u64 delta, scaled_delta, periods;
+ u64 delta, periods;
u32 contrib;
- unsigned int delta_w, scaled_delta_w, decayed = 0;
+ unsigned int delta_w, decayed = 0;
unsigned long scale_freq, scale_cpu;
delta = now - sa->last_update_time;
@@ -2577,8 +2595,10 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
return 0;
sa->last_update_time = now;
- scale_freq = arch_scale_freq_capacity(NULL, cpu);
- scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
+ if (weight)
+ scale_freq = arch_scale_freq_capacity(NULL, cpu);
+ if (running)
+ scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
/* delta_w is the amount already accumulated against our next period */
delta_w = sa->period_contrib;
@@ -2594,16 +2614,14 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
* period and accrue it.
*/
delta_w = 1024 - delta_w;
- scaled_delta_w = cap_scale(delta_w, scale_freq);
if (weight) {
- sa->load_sum += weight * scaled_delta_w;
- if (cfs_rq) {
- cfs_rq->runnable_load_sum +=
- weight * scaled_delta_w;
- }
+ unsigned long t = fp_mult3(delta_w, weight, scale_freq);
+ sa->load_sum += t;
+ if (cfs_rq)
+ cfs_rq->runnable_load_sum += t;
}
if (running)
- sa->util_sum += scaled_delta_w * scale_cpu;
+ sa->util_sum += delta_w * fp_mult2(scale_cpu, scale_freq);
delta -= delta_w;
@@ -2620,25 +2638,25 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
/* Efficiently calculate \sum (1..n_period) 1024*y^i */
contrib = __compute_runnable_contrib(periods);
- contrib = cap_scale(contrib, scale_freq);
if (weight) {
- sa->load_sum += weight * contrib;
+ unsigned long t = fp_mult3(contrib, weight, scale_freq);
+ sa->load_sum += t;
if (cfs_rq)
- cfs_rq->runnable_load_sum += weight * contrib;
+ cfs_rq->runnable_load_sum += t;
}
if (running)
- sa->util_sum += contrib * scale_cpu;
+ sa->util_sum += contrib * fp_mult2(scale_cpu, scale_freq);
}
/* Remainder of delta accrued against u_0` */
- scaled_delta = cap_scale(delta, scale_freq);
if (weight) {
- sa->load_sum += weight * scaled_delta;
+ unsigned long t = fp_mult3(delta, weight, scale_freq);
+ sa->load_sum += t;
if (cfs_rq)
- cfs_rq->runnable_load_sum += weight * scaled_delta;
+ cfs_rq->runnable_load_sum += t;
}
if (running)
- sa->util_sum += scaled_delta * scale_cpu;
+ sa->util_sum += delta * fp_mult2(scale_cpu, scale_freq);
sa->period_contrib += delta;
next prev parent reply other threads:[~2015-09-17 9:57 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-14 16:23 [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking Morten Rasmussen
2015-08-14 16:23 ` [PATCH 1/6] sched/fair: Make load tracking frequency scale-invariant Morten Rasmussen
2015-09-13 11:03 ` [tip:sched/core] " tip-bot for Dietmar Eggemann
2015-08-14 16:23 ` [PATCH 2/6] sched/fair: Convert arch_scale_cpu_capacity() from weak function to #define Morten Rasmussen
2015-09-02 9:31 ` Vincent Guittot
2015-09-02 12:41 ` Vincent Guittot
2015-09-03 19:58 ` Dietmar Eggemann
2015-09-04 7:26 ` Vincent Guittot
2015-09-07 13:25 ` Dietmar Eggemann
2015-09-11 13:21 ` Dietmar Eggemann
2015-09-11 14:45 ` Vincent Guittot
2015-09-13 11:03 ` [tip:sched/core] " tip-bot for Morten Rasmussen
2015-08-14 16:23 ` [PATCH 3/6] sched/fair: Make utilization tracking cpu scale-invariant Morten Rasmussen
2015-08-14 23:04 ` Dietmar Eggemann
2015-09-04 7:52 ` Vincent Guittot
2015-09-13 11:04 ` [tip:sched/core] sched/fair: Make utilization tracking CPU scale-invariant tip-bot for Dietmar Eggemann
2015-08-14 16:23 ` [PATCH 4/6] sched/fair: Name utilization related data and functions consistently Morten Rasmussen
2015-09-04 9:08 ` Vincent Guittot
2015-09-11 16:35 ` Dietmar Eggemann
2015-09-13 11:04 ` [tip:sched/core] " tip-bot for Dietmar Eggemann
2015-08-14 16:23 ` [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig Morten Rasmussen
2015-09-03 23:51 ` Steve Muckle
2015-09-07 15:37 ` Dietmar Eggemann
2015-09-07 16:21 ` Vincent Guittot
2015-09-07 18:54 ` Dietmar Eggemann
2015-09-07 19:47 ` Peter Zijlstra
2015-09-08 12:47 ` Dietmar Eggemann
2015-09-08 7:22 ` Vincent Guittot
2015-09-08 12:26 ` Peter Zijlstra
2015-09-08 12:52 ` Peter Zijlstra
2015-09-08 14:06 ` Vincent Guittot
2015-09-08 14:35 ` Morten Rasmussen
2015-09-08 14:40 ` Vincent Guittot
2015-09-08 14:31 ` Morten Rasmussen
2015-09-08 15:33 ` Peter Zijlstra
2015-09-09 22:23 ` bsegall
2015-09-10 11:06 ` Morten Rasmussen
2015-09-10 11:11 ` Vincent Guittot
2015-09-10 12:10 ` Morten Rasmussen
2015-09-11 0:50 ` Yuyang Du
2015-09-10 17:23 ` bsegall
2015-09-08 16:53 ` Morten Rasmussen
2015-09-09 9:43 ` Peter Zijlstra
2015-09-09 9:45 ` Peter Zijlstra
2015-09-09 11:13 ` Morten Rasmussen
2015-09-11 17:22 ` Morten Rasmussen
2015-09-17 9:51 ` Peter Zijlstra [this message]
2015-09-17 10:38 ` Peter Zijlstra
2015-09-21 1:16 ` Yuyang Du
2015-09-21 17:30 ` bsegall
2015-09-21 23:39 ` Yuyang Du
2015-09-22 17:18 ` bsegall
2015-09-22 23:22 ` Yuyang Du
2015-09-23 16:54 ` bsegall
2015-09-24 0:22 ` Yuyang Du
2015-09-30 12:52 ` Peter Zijlstra
2015-09-11 7:46 ` Leo Yan
2015-09-11 10:02 ` Morten Rasmussen
2015-09-11 14:11 ` Leo Yan
2015-09-09 19:07 ` Yuyang Du
2015-09-10 10:06 ` Peter Zijlstra
2015-09-08 13:39 ` Vincent Guittot
2015-09-08 14:10 ` Peter Zijlstra
2015-09-08 15:17 ` Vincent Guittot
2015-09-08 12:50 ` Dietmar Eggemann
2015-09-08 14:01 ` Vincent Guittot
2015-09-08 14:27 ` Dietmar Eggemann
2015-09-09 20:15 ` Yuyang Du
2015-09-10 10:07 ` Peter Zijlstra
2015-09-11 0:28 ` Yuyang Du
2015-09-11 10:31 ` Morten Rasmussen
2015-09-11 17:05 ` bsegall
2015-09-11 18:24 ` Yuyang Du
2015-09-14 17:36 ` bsegall
2015-09-14 12:56 ` Morten Rasmussen
2015-09-14 17:34 ` bsegall
2015-09-14 22:56 ` Yuyang Du
2015-09-15 17:11 ` bsegall
2015-09-15 18:39 ` Yuyang Du
2015-09-16 17:06 ` bsegall
2015-09-17 2:31 ` Yuyang Du
2015-09-15 8:43 ` Morten Rasmussen
2015-09-16 15:36 ` Peter Zijlstra
2015-09-08 11:44 ` Peter Zijlstra
2015-09-13 11:04 ` [tip:sched/core] " tip-bot for Dietmar Eggemann
2015-08-14 16:23 ` [PATCH 6/6] sched/fair: Initialize task load and utilization before placing task on rq Morten Rasmussen
2015-09-13 11:05 ` [tip:sched/core] " tip-bot for Morten Rasmussen
2015-08-16 20:46 ` [PATCH 0/6] sched/fair: Compute capacity invariant load/utilization tracking Peter Zijlstra
2015-08-17 11:29 ` Morten Rasmussen
2015-08-17 11:48 ` Peter Zijlstra
2015-08-31 9:24 ` Peter Zijlstra
2015-09-02 9:51 ` Dietmar Eggemann
2015-09-07 12:42 ` Peter Zijlstra
2015-09-07 13:21 ` Peter Zijlstra
2015-09-07 13:23 ` Peter Zijlstra
2015-09-07 14:44 ` Dietmar Eggemann
2015-09-13 11:06 ` [tip:sched/core] sched/fair: Defer calling scaling functions tip-bot for Dietmar Eggemann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150917095138.GF3604@twins.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=Juri.Lelli@arm.com \
--cc=daniel.lezcano@linaro.org \
--cc=dietmar.eggemann@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=morten.rasmussen@arm.com \
--cc=mturquette@baylibre.com \
--cc=pang.xunlei@zte.com.cn \
--cc=rjw@rjwysocki.net \
--cc=sgurrappadi@nvidia.com \
--cc=steve.muckle@linaro.org \
--cc=vincent.guittot@linaro.org \
--cc=yuyang.du@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.