From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932296AbbIVH2Q (ORCPT <rfc822;w@1wt.eu>);
	Tue, 22 Sep 2015 03:28:16 -0400
Received: from mga01.intel.com ([192.55.52.88]:57896 "EHLO mga01.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1757508AbbIVH2N (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Tue, 22 Sep 2015 03:28:13 -0400
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.17,571,1437462000"; 
   d="scan'208";a="566064495"
Date: Tue, 22 Sep 2015 07:39:01 +0800
From: Yuyang Du <yuyang.du@intel.com>
To: bsegall@google.com
Cc: Peter Zijlstra <peterz@infradead.org>,
        Morten Rasmussen <morten.rasmussen@arm.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Steve Muckle <steve.muckle@linaro.org>,
        "mingo@redhat.com" <mingo@redhat.com>,
        "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>,
        "mturquette@baylibre.com" <mturquette@baylibre.com>,
        "rjw@rjwysocki.net" <rjw@rjwysocki.net>,
        Juri Lelli <Juri.Lelli@arm.com>,
        "sgurrappadi@nvidia.com" <sgurrappadi@nvidia.com>,
        "pang.xunlei@zte.com.cn" <pang.xunlei@zte.com.cn>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by
 capacity_orig
Message-ID: <20150921233900.GE11102@intel.com>
References: <20150908122606.GH3644@twins.programming.kicks-ass.net>
 <20150908125205.GW18673@twins.programming.kicks-ass.net>
 <20150908143157.GA27098@e105550-lin.cambridge.arm.com>
 <20150908165331.GC27098@e105550-lin.cambridge.arm.com>
 <20150909094305.GO3644@twins.programming.kicks-ass.net>
 <20150909111309.GD27098@e105550-lin.cambridge.arm.com>
 <20150911172246.GI27098@e105550-lin.cambridge.arm.com>
 <20150917103825.GG3604@twins.programming.kicks-ass.net>
 <20150921011652.GD11102@intel.com>
 <xm26d1xbader.fsf@sword-of-the-dawn.mtv.corp.google.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <xm26d1xbader.fsf@sword-of-the-dawn.mtv.corp.google.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, Sep 21, 2015 at 10:30:04AM -0700, bsegall@google.com wrote:
> > But first, I think as load_sum and load_avg can afford NICE_0_LOAD with either high
> > or low resolution. So we have no reason to have low resolution (10bits) load_avg
> > when NICE_0_LOAD has high resolution (20bits), because load_avg = runnable% * load,
> > as opposed to now we have load_avg = runnable% * scale_load_down(load).
> >
> > We get rid of all scale_load_down() for runnable load average?
> 
> Hmm, LOAD_AVG_MAX * prio_to_weight[0] is 4237627662, ie barely within a
> 32-bit unsigned long, but in fact LOAD_AVG_MAX * MAX_SHARES is already
> going to give errors on 32-bit (even with the old code in fact). This
> should probably be fixed... somehow (dividing by 4 for load_sum on
> 32-bit would work, though be ugly. Reducing MAX_SHARES by 2 bits on
> 32-bit might have made sense but would be a weird difference between 32
> and 64, and could break userspace anyway, so it's presumably too late
> for that).
> 
> 64-bit has ~30 bits free, so this would be fine so long as SLR is 0 on
> 32-bit.
> 

load_avg has no LOAD_AVG_MAX term in it, it is runnable% * load, IOW, load_avg <= load.
So, on 32bit, cfs_rq's load_avg can host 2^32/prio_to_weight[0]/1024 = 47, with 20bits
load resolution. This is ok, because struct load_weight's load is also unsigned
long. If overflown, cfs_rq->load.weight will be overflown in the first place.

However, after a second thought, this is not quite right. Because load_avg is not
necessarily no greater than load, since load_avg has blocked load in it. Although,
load_avg is still at the same level as load (converging to be <= load), we may not
want the risk to overflow on 32bit.

> > +/*
> > + * NICE_0's weight (visible to user) and its load (invisible to user) have
> > + * independent resolution, but they should be well calibrated. We use scale_load()
> > + * and scale_load_down(w) to convert between them, the following must be true:
> > + * scale_load(prio_to_weight[20]) == NICE_0_LOAD
> > + */
> >  #define NICE_0_LOAD		SCHED_LOAD_SCALE
> >  #define NICE_0_SHIFT		SCHED_LOAD_SHIFT
> 
> I still think tying the scale_load shift to be the same as the
> SCHED_CAPACITY/etc shift is silly, and tying the NICE_0_LOAD/SHIFT in is
> worse. Honestly if I was going to change anything it would be to define
> NICE_0_LOAD/SHIFT entirely separately from SCHED_LOAD_SCALE/SHIFT.

If NICE_0_LOAD is nice-0's load, and if SCHED_LOAD_SHIFT is to say how to get 
nice-0's load, I don't understand why you want to separate them.