From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753237AbbGBLiI (ORCPT ); Thu, 2 Jul 2015 07:38:08 -0400 Received: from foss.arm.com ([217.140.101.70]:49185 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753415AbbGBLiF (ORCPT ); Thu, 2 Jul 2015 07:38:05 -0400 Date: Thu, 2 Jul 2015 12:40:32 +0100 From: Morten Rasmussen To: Yuyang Du Cc: Mike Galbraith , Peter Zijlstra , Rabin Vincent , "mingo@redhat.com" , "linux-kernel@vger.kernel.org" , Paul Turner , Ben Segall Subject: Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() Message-ID: <20150702114032.GA7598@e105550-lin.cambridge.arm.com> References: <20150630143057.GA31689@axis.com> <1435728995.9397.7.camel@gmail.com> <20150701145551.GA15690@axis.com> <20150701204404.GH25159@twins.programming.kicks-ass.net> <20150701232511.GA5197@intel.com> <1435824347.5351.18.camel@gmail.com> <20150702010539.GB5197@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150702010539.GB5197@intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 02, 2015 at 09:05:39AM +0800, Yuyang Du wrote: > Hi Mike, > > On Thu, Jul 02, 2015 at 10:05:47AM +0200, Mike Galbraith wrote: > > On Thu, 2015-07-02 at 07:25 +0800, Yuyang Du wrote: > > > > > That being said, it is also obvious to prevent the livelock from happening: > > > idle pulling until the source rq's nr_running is 1, becuase otherwise we > > > just avoid idleness by making another idleness. > > > > Yeah, but that's just the symptom, not the disease. Better for the idle > > balance symptom may actually be to only pull one when idle balancing. > > After all, the immediate goal is to find something better to do than > > idle, not to achieve continual perfect (is the enemy of good) balance. > > > Symptom? :) > > You mean "pull one and stop, can't be greedy"? Right, but still need to > assure you don't make another idle CPU (meaning until nr_running == 1), which > is the cure to disease. > > I am ok with at most "pull one", but probably we stick to the load_balance() > by pulling an fair amount, assuming load_balance() magically computes the > right imbalance, otherwise you may have to do multiple "pull one"s. Talking about the disease and looking at the debug data that Rabin has provided I think the problem is due to the way blocked load is handled (or not handled) in calculate_imbalance(). We have three entities in the root cfs_rq on cpu1: 1. Task entity pid 7, load_avg_contrib = 5. 2. Task entity pid 30, load_avg_contrib = 10. 3. Group entity, load_avg_contrib = 118, but contains task entity pid 413 further down the hierarchy with task_h_load() = 0. The 118 comes from the blocked load contribution in the system.slice task group. calculate_imbalance() figures out the average loads are: cpu0: load/capacity = 0*1024/1024 = 0 cpu1: load/capacity = (5 + 10 + 118)*1024/1024 = 133 domain: load/capacity = (0 + 133)*1024/(2*1024) = 62 env->imbalance = 62 Rabin reported env->imbalance = 60 after pulling the rcu task with load_avg_contrib = 5. It doesn't match my numbers exactly, but it pretty close ;-) detach_tasks() will attempts to pull 62 based on tasks task_h_load() but the task_h_load() sum is only 5 + 10 + 0 and hence detach_tasks() will empty the src_rq. IOW, since task groups include blocked load in the load_avg_contrib (see __update_group_entity_contrib() and __update_cfs_rq_tg_load_contrib()) the imbalance includes blocked load and hence env->imbalance >= sum(task_h_load(p)) for all tasks p on the rq. Which leads to detach_tasks() emptying the rq completely in the reported scenario where blocked load > runnable load. Whether emptying the src_rq is the right thing to do depends on on your point of view. Does balanced load (runnable+blocked) take priority over keeping cpus busy or not? For idle_balance() it seems intuitively correct to not empty the rq and hence you could consider env->imbalance to be too big. I think we will see more of this kind of problems if we include weighted_cpuload() as well. Parts of the imbalance calculation code is quite old and could use some attention first. A short term fix could be what Yuyang propose, stop pulling tasks when there is only one left in detach_tasks(). It won't affect active load balance where we may want to migrate the last task as it active load balance doesn't use detach_tasks(). Morten