From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933644AbcBDFvO (ORCPT ); Thu, 4 Feb 2016 00:51:14 -0500 Received: from mail-pf0-f175.google.com ([209.85.192.175]:32883 "EHLO mail-pf0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750936AbcBDFvL (ORCPT ); Thu, 4 Feb 2016 00:51:11 -0500 Date: Thu, 4 Feb 2016 11:21:08 +0530 From: Viresh Kumar To: "Rafael J. Wysocki" Cc: Shilpa Bhat , Juri Lelli , Rafael Wysocki , Lists linaro-kernel , "linux-pm@vger.kernel.org" , Saravana Kannan , Peter Zijlstra , Michael Turquette , Steve Muckle , Vincent Guittot , Morten Rasmussen , dietmar.eggemann@arm.com, Linux Kernel Mailing List Subject: Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups Message-ID: <20160204055108.GY3469@vireshk> References: <20160203155428.GY3947@e106622-lin> <20160203161059.GH3469@vireshk> <20160203172009.GC12132@e106622-lin> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04-02-16, 00:50, Rafael J. Wysocki wrote: > On Thu, Feb 4, 2016 at 12:31 AM, Shilpa Bhat wrote: > > Sorry for the delayed report. But I see the below backtrace on Power8 box. It > > has 4 chips with 128 cpus. Honestly, I wasn't expecting you to test this stuff, but I really appreciate you doing that. Thanks a lot .. > > [ 906.765768] Possible unsafe locking scenario: > > > > [ 906.765880] CPU0 CPU1 > > [ 906.765969] ---- ---- This race scenario is perhaps incomplete and difficult to understand without below lines: Governor's EXIT Update sampling rate from sysfs lock(s_active#91); > > [ 906.766058] lock(od_dbs_cdata.mutex); > > [ 906.766170] lock(&dbs_data->mutex); > > [ 906.766304] lock(od_dbs_cdata.mutex); > > [ 906.766461] lock(s_active#91); > > [ 906.766572] > > *** DEADLOCK *** > > This is exactly right. We've avoided one deadlock only to trip into > another one. As we discussed on IRC, we haven't introduced this deadlock with the current series. But this is what Juri has reported some days back, while he tested linus/master on TC2. > This happens because update_sampling_rate() acquires > od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by > cpufreq_governor_dbs(). > > Worse yet, a deadlock can still happen without (the new) > dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if > update_sampling_rate() runs in parallel with > cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins > the race. > > It looks like we need to drop the governor mutex before putting the > kobject in cpufreq_governor_exit(). That wouldn't be trivial to implement as we discussed. Okay, here is a proposal for the current series and the series's you have post Rafael: - Firstly, I would like to clarify that I don't have any issues with rebasing on top of your series, it should be easy enough. - One thing is for sure that nothing from these 3 series's is getting merged in 4.5, as we aren't fixing the real issue Shilpa/Juril have reported. - I think the first 4 patches here are just fine and don't need any updates. They actually do the right thing and makes code so much cleaner. - So, can we apply the first 4 patches (which you have already applied to bleeding-edge) now and do all work on top of that ? Again, I can rebase if you merge your patches first, no issues at all :) -- viresh