Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups

LKML Archive mirror
 help / color / mirror / Atom feed

From: Viresh Kumar <viresh.kumar@linaro.org>
To: Juri Lelli <juri.lelli@arm.com>
Cc: Rafael Wysocki <rjw@rjwysocki.net>,
	linaro-kernel@lists.linaro.org, linux-pm@vger.kernel.org,
	skannan@codeaurora.org, peterz@infradead.org,
	mturquette@baylibre.com, steve.muckle@linaro.org,
	vincent.guittot@linaro.org, morten.rasmussen@arm.com,
	dietmar.eggemann@arm.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
Date: Thu, 4 Feb 2016 11:54:39 +0530	[thread overview]
Message-ID: <20160204062439.GZ3469@vireshk> (raw)
In-Reply-To: <20160203161059.GH3469@vireshk>

On 03-02-16, 21:40, Viresh Kumar wrote:
> On 03-02-16, 15:54, Juri Lelli wrote:
> > Ouch, I've just got this executing -f basic on Juno. :(
> > It happens with the hotplug_1_by_1 test.
> > 
> > 
> > [ 1086.531252] IRQ1 no longer affine to CPU1
> > [ 1086.531495] CPU1: shutdown
> > [ 1086.538199] psci: CPU1 killed.
> > [ 1086.583396]
> > [ 1086.584881] ======================================================
> > [ 1086.590999] [ INFO: possible circular locking dependency detected ]
> > [ 1086.597205] 4.5.0-rc2+ #37 Not tainted
> > [ 1086.600914] -------------------------------------------------------
> > [ 1086.607118] runme.sh/1052 is trying to acquire lock:
> > [ 1086.612031]  (sb_writers#7){.+.+.+}, at: [<ffffffc000249500>] __sb_start_write+0xcc/0xe0
> > [ 1086.620090]
> > [ 1086.620090] but task is already holding lock:
> > [ 1086.625865]  (&policy->rwsem){+++++.}, at: [<ffffffc0005c8ee4>] cpufreq_offline+0x7c/0x278
> > [ 1086.634081]
> > [ 1086.634081] which lock already depends on the new lock.
> > [ 1086.634081]
> > [ 1086.642180]
> > [ 1086.642180] the existing dependency chain (in reverse order) is:
> > [ 1086.649589]
> > -> #1 (&policy->rwsem){+++++.}:
> > [ 1086.653929]        [<ffffffc00011d9a4>] check_prev_add+0x670/0x754
> > [ 1086.660060]        [<ffffffc00011e1ac>] validate_chain.isra.36+0x724/0xa0c
> > [ 1086.666876]        [<ffffffc00011f904>] __lock_acquire+0x4e4/0xba0
> > [ 1086.673001]        [<ffffffc000120b58>] lock_release+0x244/0x570
> > [ 1086.678955]        [<ffffffc0007351d0>] __mutex_unlock_slowpath+0xa0/0x18c
> > [ 1086.685771]        [<ffffffc0007352dc>] mutex_unlock+0x20/0x2c
> > [ 1086.691553]        [<ffffffc0002ccd24>] kernfs_fop_write+0xb0/0x194
> > [ 1086.697768]        [<ffffffc00024478c>] __vfs_write+0x48/0x104
> > [ 1086.703550]        [<ffffffc0002457a4>] vfs_write+0x98/0x198
> > [ 1086.709161]        [<ffffffc0002465e4>] SyS_write+0x54/0xb0
> > [ 1086.714684]        [<ffffffc000085d30>] el0_svc_naked+0x24/0x28
> > [ 1086.720555]
> > -> #0 (sb_writers#7){.+.+.+}:
> > [ 1086.724730]        [<ffffffc00011c574>] print_circular_bug+0x80/0x2e4
> > [ 1086.731116]        [<ffffffc00011d470>] check_prev_add+0x13c/0x754
> > [ 1086.737243]        [<ffffffc00011e1ac>] validate_chain.isra.36+0x724/0xa0c
> > [ 1086.744059]        [<ffffffc00011f904>] __lock_acquire+0x4e4/0xba0
> > [ 1086.750184]        [<ffffffc0001207f4>] lock_acquire+0xe4/0x204
> > [ 1086.756052]        [<ffffffc000118da0>] percpu_down_read+0x50/0xe4
> > [ 1086.762180]        [<ffffffc000249500>] __sb_start_write+0xcc/0xe0
> > [ 1086.768306]        [<ffffffc00026ae90>] mnt_want_write+0x28/0x54
> > [ 1086.774263]        [<ffffffc0002555f8>] do_last+0x660/0xcb8
> > [ 1086.779788]        [<ffffffc000255cdc>] path_openat+0x8c/0x2b0
> > [ 1086.785570]        [<ffffffc000256fbc>] do_filp_open+0x78/0xf0
> > [ 1086.791353]        [<ffffffc000244058>] do_sys_open+0x150/0x214
> > [ 1086.797222]        [<ffffffc0002441a0>] SyS_openat+0x3c/0x48
> > [ 1086.802831]        [<ffffffc000085d30>] el0_svc_naked+0x24/0x28
> > [ 1086.808700]
> > [ 1086.808700] other info that might help us debug this:
> > [ 1086.808700]
> > [ 1086.816627]  Possible unsafe locking scenario:
> > [ 1086.816627]
> > [ 1086.822488]        CPU0                    CPU1
> > [ 1086.826971]        ----                    ----
> > [ 1086.831453]   lock(&policy->rwsem);
> > [ 1086.834918]                                lock(sb_writers#7);
> > [ 1086.840713]                                lock(&policy->rwsem);
> > [ 1086.846671]   lock(sb_writers#7);
> > [ 1086.849972]
> > [ 1086.849972]  *** DEADLOCK ***
> > [ 1086.849972]
> > [ 1086.855836] 1 lock held by runme.sh/1052:
> > [ 1086.859802]  #0:  (&policy->rwsem){+++++.}, at: [<ffffffc0005c8ee4>] cpufreq_offline+0x7c/0x278
> > [ 1086.868453]
> > [ 1086.868453] stack backtrace:
> > [ 1086.872769] CPU: 5 PID: 1052 Comm: runme.sh Not tainted 4.5.0-rc2+ #37
> > [ 1086.879229] Hardware name: ARM Juno development board (r2) (DT)
> > [ 1086.885089] Call trace:
> > [ 1086.887511] [<ffffffc00008a788>] dump_backtrace+0x0/0x1f4
> > [ 1086.892858] [<ffffffc00008a99c>] show_stack+0x20/0x28
> > [ 1086.897861] [<ffffffc00041a380>] dump_stack+0x84/0xc0
> > [ 1086.902863] [<ffffffc00011c6c8>] print_circular_bug+0x1d4/0x2e4
> > [ 1086.908725] [<ffffffc00011d470>] check_prev_add+0x13c/0x754
> > [ 1086.914244] [<ffffffc00011e1ac>] validate_chain.isra.36+0x724/0xa0c
> > [ 1086.920448] [<ffffffc00011f904>] __lock_acquire+0x4e4/0xba0
> > [ 1086.925965] [<ffffffc0001207f4>] lock_acquire+0xe4/0x204
> > [ 1086.931224] [<ffffffc000118da0>] percpu_down_read+0x50/0xe4
> > [ 1086.936742] [<ffffffc000249500>] __sb_start_write+0xcc/0xe0
> > [ 1086.942260] [<ffffffc00026ae90>] mnt_want_write+0x28/0x54
> > [ 1086.947605] [<ffffffc0002555f8>] do_last+0x660/0xcb8
> > [ 1086.952520] [<ffffffc000255cdc>] path_openat+0x8c/0x2b0
> > [ 1086.957693] [<ffffffc000256fbc>] do_filp_open+0x78/0xf0
> > [ 1086.962865] [<ffffffc000244058>] do_sys_open+0x150/0x214
> > [ 1086.968123] [<ffffffc0002441a0>] SyS_openat+0x3c/0x48
> > [ 1086.973124] [<ffffffc000085d30>] el0_svc_naked+0x24/0x28
> > [ 1087.019315] Detected PIPT I-cache on CPU1
> > [ 1087.019373] CPU1: Booted secondary processor [410fd080]
> 
> Urg..

Urg square :(

> I failed to understand it for now though. Please test only the first 4
> patches and leave the bottom three. AFAICT, this is caused by the 6th
> patch.

>From the code I still failed to understand this since sometime back
and I something just caught my eyes and the 6th patch needs this
fixup:

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 7bc8a5ed97e5..ac3348ecde7b 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1351,7 +1351,7 @@ static void cpufreq_offline(unsigned int cpu)
                                pr_err("%s: Failed to start governor\n", __func__);
                }
 
-               return;
+               goto unlock;
        }
 
        if (cpufreq_driver->stop_cpu)
@@ -1373,6 +1373,8 @@ static void cpufreq_offline(unsigned int cpu)
                cpufreq_driver->exit(policy);
                policy->freq_table = NULL;
        }
+
+unlock:
        up_write(&policy->rwsem);
 }

I tried the basic tests using './runme' and they aren't reporting the
same lockdep now. And yes, your lockdep occurred on my exynos board as
well :)

I have re-pushed my patches again to the same branch. All 7 look fine
to me now :)

-- 
viresh

next prev parent reply	other threads:[~2016-02-04  6:24 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-03 14:02 [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups Viresh Kumar
2016-02-03 14:02 ` [PATCH V2 1/7] cpufreq: governor: Treat min_sampling_rate as a governor-specific tunable Viresh Kumar
2016-02-05  2:31   ` Rafael J. Wysocki
2016-02-05  2:47     ` Viresh Kumar
2016-02-03 14:02 ` [PATCH V2 2/7] cpufreq: governor: New sysfs show/store callbacks for governor tunables Viresh Kumar
2016-02-03 16:17   ` Viresh Kumar
2016-02-03 14:02 ` [PATCH V2 3/7] cpufreq: governor: Drop unused macros for creating governor tunable attributes Viresh Kumar
2016-02-03 14:02 ` [PATCH V2 4/7] Revert "cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT" Viresh Kumar
2016-02-03 14:02 ` [PATCH V2 5/7] cpufreq: Merge cpufreq_offline_prepare/finish routines Viresh Kumar
2016-02-03 20:21   ` Saravana Kannan
2016-02-04  1:49     ` Viresh Kumar
2016-02-03 14:02 ` [PATCH V2 6/7] cpufreq: Call __cpufreq_governor() with policy->rwsem held Viresh Kumar
2016-02-03 14:02 ` [PATCH V2 7/7] cpufreq: Remove cpufreq_governor_lock Viresh Kumar
2016-02-04  6:43   ` Viresh Kumar
2016-02-03 15:54 ` [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups Juri Lelli
2016-02-03 16:10   ` Viresh Kumar
2016-02-03 17:20     ` Juri Lelli
2016-02-03 17:20       ` Rafael J. Wysocki
2016-02-03 23:31         ` Shilpa Bhat
2016-02-03 23:50           ` Rafael J. Wysocki
2016-02-04  5:51             ` Viresh Kumar
2016-02-04 11:09             ` Viresh Kumar
2016-02-04 17:43               ` Saravana Kannan
2016-02-04 17:44                 ` Saravana Kannan
2016-02-04 18:18                   ` Rafael J. Wysocki
2016-02-05  2:44                     ` Viresh Kumar
2016-02-05  3:54                     ` Rafael J. Wysocki
2016-02-05  9:49                       ` Viresh Kumar
2016-02-08  2:20                         ` Rafael J. Wysocki
2016-02-06  2:22                       ` Saravana Kannan
2016-02-08  2:28                         ` Rafael J. Wysocki
2016-02-09 21:02                           ` Saravana Kannan
2016-02-04  6:24     ` Viresh Kumar [this message]
2016-02-04 12:17       ` Viresh Kumar
2016-02-04 20:50         ` Shilpasri G Bhat
2016-02-05  2:49           ` Viresh Kumar

find likely ancestor, descendant, or conflicting patches for this message:
dfblob:7bc8a5ed97e dfblob:ac3348ecde7
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160204062439.GZ3469@vireshk \
    --to=viresh.kumar@linaro.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@arm.com \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=mturquette@baylibre.com \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=skannan@codeaurora.org \
    --cc=steve.muckle@linaro.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).