From: Daniel Kurtz <djkurtz@chromium.org> To: Ulf Hansson <ulf.hansson@linaro.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>, Kevin Hilman <khilman@kernel.org>, linux-pm@vger.kernel.org, Len Brown <len.brown@intel.com>, Russell King <linux@arm.linux.org.uk>, Geert Uytterhoeven <geert+renesas@glider.be>, Krzysztof Kozlowski <k.kozlowski.k@gmail.com>, Dmitry Torokhov <dmitry.torokhov@gmail.com>, Axel Haslam <ahaslam+renesas@baylibre.com>, "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>, Pavel Machek <pavel@ucw.cz>, Lina Iyer <lina.iyer@linaro.org>, Nicolas Boichat <drinkcat@chromium.org> Subject: Re: [PATCH V4] PM / Domains: Remove intermediate states from the power off sequence Date: Tue, 22 Dec 2015 20:49:15 +0800 [thread overview] Message-ID: <CAGS+omDa6miWTSCL2ETx2vx-XRt1Aq2EEu+WwTgmtdkJBKrp9A@mail.gmail.com> (raw) In-Reply-To: <1434633473-12908-1-git-send-email-ulf.hansson@linaro.org> Hi All, On Thu, Jun 18, 2015 at 9:17 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote: > Genpd's ->runtime_suspend() (assigned to pm_genpd_runtime_suspend()) > doesn't immediately walk the hierarchy of ->runtime_suspend() callbacks. > Instead, pm_genpd_runtime_suspend() calls pm_genpd_poweroff() which > postpones that until *all* the devices in the genpd are runtime suspended. [snip...] > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> > --- > > Changes in v4: > - According to comments from Lina, restore to existing behavior to > invoke the ->start|stop() callbacks for IRQ safe devices. > - Improve fine-grained PM support for IRQ safe devices, by allowing to > walk the hierarchy of the ->runtime_suspend|resume() callbacks. > - Updated the changelog to describe the forth positive side effect case > conserning IRQ safe devices. > - Rebased on top Geert's recently queued genpd patch. > > Changes in v3: > Moved from RFC to PATCH. > According to comment from Lina, changed __pm_genpd_poweron() to keep > current behavior which means to *not* power off potentially unused > masters in the error path. Instead, let's deal with that from a separate > patch. > Updated a comment in pm_genpd_poweroff(). > > Changes in v2: > Updated the changelog and the commit message header. > Header for v1 was "PM / Domains: Minimize latencies by not delaying > save/restore". > > --- > drivers/base/power/domain.c | 363 ++++++++------------------------------------ > include/linux/pm_domain.h | 7 - > 2 files changed, 62 insertions(+), 308 deletions(-) > I was testing a kernel with this patch on one of my systems today I saw a splat like this on boot: [ 1.893134] [ 1.893137] ============================================= [ 1.893139] [ INFO: possible recursive locking detected ] [ 1.893143] 3.18.0 #531 Not tainted [ 1.893145] --------------------------------------------- [ 1.893148] kworker/u8:4/113 is trying to acquire lock: [ 1.893167] (&genpd->lock){+.+...}, at: [<ffffffc000573818>] genpd_poweron+0x30/0x70 [ 1.893169] [ 1.893169] but task is already holding lock: [ 1.893179] (&genpd->lock){+.+...}, at: [<ffffffc000573818>] genpd_poweron+0x30/0x70 [ 1.893182] [ 1.893182] other info that might help us debug this: [ 1.893184] Possible unsafe locking scenario: [ 1.893184] [ 1.893185] CPU0 [ 1.893187] ---- [ 1.893191] lock(&genpd->lock); [ 1.893195] lock(&genpd->lock); [ 1.893196] [ 1.893196] *** DEADLOCK *** [ 1.893196] [ 1.893198] May be due to missing lock nesting notation [ 1.893198] [ 1.893201] 4 locks held by kworker/u8:4/113: [ 1.893217] #0: ("%s""deferwq"){++++.+}, at: [<ffffffc00023b4e0>] process_one_work+0x1f8/0x50c [ 1.893229] #1: (deferred_probe_work){+.+.+.}, at: [<ffffffc00023b4e0>] process_one_work+0x1f8/0x50c [ 1.893241] #2: (&dev->mutex){......}, at: [<ffffffc000560920>] __device_attach+0x40/0x12c [ 1.893251] #3: (&genpd->lock){+.+...}, at: [<ffffffc000573818>] genpd_poweron+0x30/0x70 [ 1.893253] [ 1.893253] stack backtrace: [ 1.893259] CPU: 2 PID: 113 Comm: kworker/u8:4 Not tainted 3.18.0 #531 [ 1.893261] Hardware name: Mediatek Oak rev4 board (DT) [ 1.893269] Workqueue: deferwq deferred_probe_work_func [ 1.893271] Call trace: [ 1.893278] [<ffffffc000208cf0>] dump_backtrace+0x0/0x140 [ 1.893283] [<ffffffc000208e4c>] show_stack+0x1c/0x28 [ 1.893289] [<ffffffc00084ab48>] dump_stack+0x80/0xb4 [ 1.893295] [<ffffffc000269dcc>] __lock_acquire+0x68c/0x19a8 [ 1.893299] [<ffffffc00026b954>] lock_acquire+0x128/0x164 [ 1.893304] [<ffffffc00084e090>] mutex_lock_nested+0x90/0x3b4 [ 1.893308] [<ffffffc000573814>] genpd_poweron+0x2c/0x70 [ 1.893312] [<ffffffc0005738ac>] __genpd_poweron.part.14+0x54/0xcc [ 1.893316] [<ffffffc000573834>] genpd_poweron+0x4c/0x70 [ 1.893321] [<ffffffc00057447c>] genpd_dev_pm_attach+0x160/0x19c [ 1.893326] [<ffffffc00056931c>] dev_pm_domain_attach+0x1c/0x2c [ 1.893331] [<ffffffc000562754>] platform_drv_probe+0x3c/0xa4 [ 1.893336] [<ffffffc000560604>] driver_probe_device+0x124/0x2c8 [ 1.893340] [<ffffffc000560824>] __device_attach_driver+0x7c/0x90 [ 1.893344] [<ffffffc00055e6fc>] bus_for_each_drv+0x90/0xc4 [ 1.893348] [<ffffffc000560988>] __device_attach+0xa8/0x12c [ 1.893352] [<ffffffc000560a5c>] device_initial_probe+0x20/0x30 [ 1.893356] [<ffffffc00055f994>] bus_probe_device+0x34/0x9c [ 1.893360] [<ffffffc00055fe38>] deferred_probe_work_func+0x7c/0xb0 [ 1.893365] [<ffffffc00023b5d4>] process_one_work+0x2ec/0x50c [ 1.893370] [<ffffffc00023c340>] worker_thread+0x350/0x470 [ 1.893374] [<ffffffc00024165c>] kthread+0xf0/0xfc I have not been able to reproduce, but by looking at how genpd_poweron() was modified by this patch, it looks like it might have been introduced by this hunk: > @@ -291,20 +239,8 @@ static int __pm_genpd_poweron(struct generic_pm_domain *genpd) > */ > list_for_each_entry(link, &genpd->slave_links, slave_node) { > genpd_sd_counter_inc(link->master); > - genpd->status = GPD_STATE_WAIT_MASTER; > - > - mutex_unlock(&genpd->lock); > > ret = pm_genpd_poweron(link->master); > - > - mutex_lock(&genpd->lock); > - > - /* > - * The "wait for parent" status is guaranteed not to change > - * while the master is powering on. > - */ > - genpd->status = GPD_STATE_POWER_OFF; > - wake_up_all(&genpd->status_wait_queue); AFAICT, the mutex_*() calls here that were removed unlock (&genpd->lock) and allowed pm_genpd_poweron() to be recursively called without the above splat in the case where a pm_genpd_poweron() on the master would take a lock of the same lock class as the current (slave) genpd. Other places in domain.c use something like this to lock a second genpd while already holding one genpd's lock: mutex_lock_nested(&genpd->lock, SINGLE_DEPTH_NESTING); However, it seems like in this case we might actually need to walk up several layers of a hierarchy. So, AFAICT the best thing to do would be to restore the mutex_unlock()/_lock() around the call to pm_genpd_poweron(). It isn't clear to me whether this was removed by this patch on purpose or as an accident. Thanks for your help! -Dan
WARNING: multiple messages have this Message-ID (diff)
From: djkurtz@chromium.org (Daniel Kurtz) To: linux-arm-kernel@lists.infradead.org Subject: [PATCH V4] PM / Domains: Remove intermediate states from the power off sequence Date: Tue, 22 Dec 2015 20:49:15 +0800 [thread overview] Message-ID: <CAGS+omDa6miWTSCL2ETx2vx-XRt1Aq2EEu+WwTgmtdkJBKrp9A@mail.gmail.com> (raw) In-Reply-To: <1434633473-12908-1-git-send-email-ulf.hansson@linaro.org> Hi All, On Thu, Jun 18, 2015 at 9:17 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote: > Genpd's ->runtime_suspend() (assigned to pm_genpd_runtime_suspend()) > doesn't immediately walk the hierarchy of ->runtime_suspend() callbacks. > Instead, pm_genpd_runtime_suspend() calls pm_genpd_poweroff() which > postpones that until *all* the devices in the genpd are runtime suspended. [snip...] > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> > --- > > Changes in v4: > - According to comments from Lina, restore to existing behavior to > invoke the ->start|stop() callbacks for IRQ safe devices. > - Improve fine-grained PM support for IRQ safe devices, by allowing to > walk the hierarchy of the ->runtime_suspend|resume() callbacks. > - Updated the changelog to describe the forth positive side effect case > conserning IRQ safe devices. > - Rebased on top Geert's recently queued genpd patch. > > Changes in v3: > Moved from RFC to PATCH. > According to comment from Lina, changed __pm_genpd_poweron() to keep > current behavior which means to *not* power off potentially unused > masters in the error path. Instead, let's deal with that from a separate > patch. > Updated a comment in pm_genpd_poweroff(). > > Changes in v2: > Updated the changelog and the commit message header. > Header for v1 was "PM / Domains: Minimize latencies by not delaying > save/restore". > > --- > drivers/base/power/domain.c | 363 ++++++++------------------------------------ > include/linux/pm_domain.h | 7 - > 2 files changed, 62 insertions(+), 308 deletions(-) > I was testing a kernel with this patch on one of my systems today I saw a splat like this on boot: [ 1.893134] [ 1.893137] ============================================= [ 1.893139] [ INFO: possible recursive locking detected ] [ 1.893143] 3.18.0 #531 Not tainted [ 1.893145] --------------------------------------------- [ 1.893148] kworker/u8:4/113 is trying to acquire lock: [ 1.893167] (&genpd->lock){+.+...}, at: [<ffffffc000573818>] genpd_poweron+0x30/0x70 [ 1.893169] [ 1.893169] but task is already holding lock: [ 1.893179] (&genpd->lock){+.+...}, at: [<ffffffc000573818>] genpd_poweron+0x30/0x70 [ 1.893182] [ 1.893182] other info that might help us debug this: [ 1.893184] Possible unsafe locking scenario: [ 1.893184] [ 1.893185] CPU0 [ 1.893187] ---- [ 1.893191] lock(&genpd->lock); [ 1.893195] lock(&genpd->lock); [ 1.893196] [ 1.893196] *** DEADLOCK *** [ 1.893196] [ 1.893198] May be due to missing lock nesting notation [ 1.893198] [ 1.893201] 4 locks held by kworker/u8:4/113: [ 1.893217] #0: ("%s""deferwq"){++++.+}, at: [<ffffffc00023b4e0>] process_one_work+0x1f8/0x50c [ 1.893229] #1: (deferred_probe_work){+.+.+.}, at: [<ffffffc00023b4e0>] process_one_work+0x1f8/0x50c [ 1.893241] #2: (&dev->mutex){......}, at: [<ffffffc000560920>] __device_attach+0x40/0x12c [ 1.893251] #3: (&genpd->lock){+.+...}, at: [<ffffffc000573818>] genpd_poweron+0x30/0x70 [ 1.893253] [ 1.893253] stack backtrace: [ 1.893259] CPU: 2 PID: 113 Comm: kworker/u8:4 Not tainted 3.18.0 #531 [ 1.893261] Hardware name: Mediatek Oak rev4 board (DT) [ 1.893269] Workqueue: deferwq deferred_probe_work_func [ 1.893271] Call trace: [ 1.893278] [<ffffffc000208cf0>] dump_backtrace+0x0/0x140 [ 1.893283] [<ffffffc000208e4c>] show_stack+0x1c/0x28 [ 1.893289] [<ffffffc00084ab48>] dump_stack+0x80/0xb4 [ 1.893295] [<ffffffc000269dcc>] __lock_acquire+0x68c/0x19a8 [ 1.893299] [<ffffffc00026b954>] lock_acquire+0x128/0x164 [ 1.893304] [<ffffffc00084e090>] mutex_lock_nested+0x90/0x3b4 [ 1.893308] [<ffffffc000573814>] genpd_poweron+0x2c/0x70 [ 1.893312] [<ffffffc0005738ac>] __genpd_poweron.part.14+0x54/0xcc [ 1.893316] [<ffffffc000573834>] genpd_poweron+0x4c/0x70 [ 1.893321] [<ffffffc00057447c>] genpd_dev_pm_attach+0x160/0x19c [ 1.893326] [<ffffffc00056931c>] dev_pm_domain_attach+0x1c/0x2c [ 1.893331] [<ffffffc000562754>] platform_drv_probe+0x3c/0xa4 [ 1.893336] [<ffffffc000560604>] driver_probe_device+0x124/0x2c8 [ 1.893340] [<ffffffc000560824>] __device_attach_driver+0x7c/0x90 [ 1.893344] [<ffffffc00055e6fc>] bus_for_each_drv+0x90/0xc4 [ 1.893348] [<ffffffc000560988>] __device_attach+0xa8/0x12c [ 1.893352] [<ffffffc000560a5c>] device_initial_probe+0x20/0x30 [ 1.893356] [<ffffffc00055f994>] bus_probe_device+0x34/0x9c [ 1.893360] [<ffffffc00055fe38>] deferred_probe_work_func+0x7c/0xb0 [ 1.893365] [<ffffffc00023b5d4>] process_one_work+0x2ec/0x50c [ 1.893370] [<ffffffc00023c340>] worker_thread+0x350/0x470 [ 1.893374] [<ffffffc00024165c>] kthread+0xf0/0xfc I have not been able to reproduce, but by looking at how genpd_poweron() was modified by this patch, it looks like it might have been introduced by this hunk: > @@ -291,20 +239,8 @@ static int __pm_genpd_poweron(struct generic_pm_domain *genpd) > */ > list_for_each_entry(link, &genpd->slave_links, slave_node) { > genpd_sd_counter_inc(link->master); > - genpd->status = GPD_STATE_WAIT_MASTER; > - > - mutex_unlock(&genpd->lock); > > ret = pm_genpd_poweron(link->master); > - > - mutex_lock(&genpd->lock); > - > - /* > - * The "wait for parent" status is guaranteed not to change > - * while the master is powering on. > - */ > - genpd->status = GPD_STATE_POWER_OFF; > - wake_up_all(&genpd->status_wait_queue); AFAICT, the mutex_*() calls here that were removed unlock (&genpd->lock) and allowed pm_genpd_poweron() to be recursively called without the above splat in the case where a pm_genpd_poweron() on the master would take a lock of the same lock class as the current (slave) genpd. Other places in domain.c use something like this to lock a second genpd while already holding one genpd's lock: mutex_lock_nested(&genpd->lock, SINGLE_DEPTH_NESTING); However, it seems like in this case we might actually need to walk up several layers of a hierarchy. So, AFAICT the best thing to do would be to restore the mutex_unlock()/_lock() around the call to pm_genpd_poweron(). It isn't clear to me whether this was removed by this patch on purpose or as an accident. Thanks for your help! -Dan
next prev parent reply other threads:[~2015-12-22 12:49 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-06-18 13:17 [PATCH V4] PM / Domains: Remove intermediate states from the power off sequence Ulf Hansson 2015-06-18 13:17 ` Ulf Hansson 2015-07-08 19:56 ` Kevin Hilman 2015-07-08 19:56 ` Kevin Hilman 2015-07-09 7:05 ` Geert Uytterhoeven 2015-07-09 7:05 ` Geert Uytterhoeven 2015-07-09 14:05 ` Lina Iyer 2015-07-09 14:05 ` Lina Iyer 2015-07-10 1:26 ` Rafael J. Wysocki 2015-07-10 1:26 ` Rafael J. Wysocki 2015-12-22 12:49 ` Daniel Kurtz [this message] 2015-12-22 12:49 ` Daniel Kurtz 2015-12-23 11:40 ` Ulf Hansson 2015-12-23 11:40 ` Ulf Hansson
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAGS+omDa6miWTSCL2ETx2vx-XRt1Aq2EEu+WwTgmtdkJBKrp9A@mail.gmail.com \ --to=djkurtz@chromium.org \ --cc=ahaslam+renesas@baylibre.com \ --cc=dmitry.torokhov@gmail.com \ --cc=drinkcat@chromium.org \ --cc=geert+renesas@glider.be \ --cc=k.kozlowski.k@gmail.com \ --cc=khilman@kernel.org \ --cc=len.brown@intel.com \ --cc=lina.iyer@linaro.org \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-pm@vger.kernel.org \ --cc=linux@arm.linux.org.uk \ --cc=pavel@ucw.cz \ --cc=rjw@rjwysocki.net \ --cc=ulf.hansson@linaro.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.