From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932296AbbIVH2Q (ORCPT ); Tue, 22 Sep 2015 03:28:16 -0400 Received: from mga01.intel.com ([192.55.52.88]:57896 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757508AbbIVH2N (ORCPT ); Tue, 22 Sep 2015 03:28:13 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,571,1437462000"; d="scan'208";a="566064495" Date: Tue, 22 Sep 2015 07:39:01 +0800 From: Yuyang Du To: bsegall@google.com Cc: Peter Zijlstra , Morten Rasmussen , Vincent Guittot , Dietmar Eggemann , Steve Muckle , "mingo@redhat.com" , "daniel.lezcano@linaro.org" , "mturquette@baylibre.com" , "rjw@rjwysocki.net" , Juri Lelli , "sgurrappadi@nvidia.com" , "pang.xunlei@zte.com.cn" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig Message-ID: <20150921233900.GE11102@intel.com> References: <20150908122606.GH3644@twins.programming.kicks-ass.net> <20150908125205.GW18673@twins.programming.kicks-ass.net> <20150908143157.GA27098@e105550-lin.cambridge.arm.com> <20150908165331.GC27098@e105550-lin.cambridge.arm.com> <20150909094305.GO3644@twins.programming.kicks-ass.net> <20150909111309.GD27098@e105550-lin.cambridge.arm.com> <20150911172246.GI27098@e105550-lin.cambridge.arm.com> <20150917103825.GG3604@twins.programming.kicks-ass.net> <20150921011652.GD11102@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 21, 2015 at 10:30:04AM -0700, bsegall@google.com wrote: > > But first, I think as load_sum and load_avg can afford NICE_0_LOAD with either high > > or low resolution. So we have no reason to have low resolution (10bits) load_avg > > when NICE_0_LOAD has high resolution (20bits), because load_avg = runnable% * load, > > as opposed to now we have load_avg = runnable% * scale_load_down(load). > > > > We get rid of all scale_load_down() for runnable load average? > > Hmm, LOAD_AVG_MAX * prio_to_weight[0] is 4237627662, ie barely within a > 32-bit unsigned long, but in fact LOAD_AVG_MAX * MAX_SHARES is already > going to give errors on 32-bit (even with the old code in fact). This > should probably be fixed... somehow (dividing by 4 for load_sum on > 32-bit would work, though be ugly. Reducing MAX_SHARES by 2 bits on > 32-bit might have made sense but would be a weird difference between 32 > and 64, and could break userspace anyway, so it's presumably too late > for that). > > 64-bit has ~30 bits free, so this would be fine so long as SLR is 0 on > 32-bit. > load_avg has no LOAD_AVG_MAX term in it, it is runnable% * load, IOW, load_avg <= load. So, on 32bit, cfs_rq's load_avg can host 2^32/prio_to_weight[0]/1024 = 47, with 20bits load resolution. This is ok, because struct load_weight's load is also unsigned long. If overflown, cfs_rq->load.weight will be overflown in the first place. However, after a second thought, this is not quite right. Because load_avg is not necessarily no greater than load, since load_avg has blocked load in it. Although, load_avg is still at the same level as load (converging to be <= load), we may not want the risk to overflow on 32bit. > > +/* > > + * NICE_0's weight (visible to user) and its load (invisible to user) have > > + * independent resolution, but they should be well calibrated. We use scale_load() > > + * and scale_load_down(w) to convert between them, the following must be true: > > + * scale_load(prio_to_weight[20]) == NICE_0_LOAD > > + */ > > #define NICE_0_LOAD SCHED_LOAD_SCALE > > #define NICE_0_SHIFT SCHED_LOAD_SHIFT > > I still think tying the scale_load shift to be the same as the > SCHED_CAPACITY/etc shift is silly, and tying the NICE_0_LOAD/SHIFT in is > worse. Honestly if I was going to change anything it would be to define > NICE_0_LOAD/SHIFT entirely separately from SCHED_LOAD_SCALE/SHIFT. If NICE_0_LOAD is nice-0's load, and if SCHED_LOAD_SHIFT is to say how to get nice-0's load, I don't understand why you want to separate them.