From: Ulf Hansson <ulf.hansson@linaro.org> To: Daniel Kurtz <djkurtz@chromium.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>, Kevin Hilman <khilman@kernel.org>, "linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>, Len Brown <len.brown@intel.com>, Russell King <linux@arm.linux.org.uk>, Geert Uytterhoeven <geert+renesas@glider.be>, Krzysztof Kozlowski <k.kozlowski.k@gmail.com>, Dmitry Torokhov <dmitry.torokhov@gmail.com>, Axel Haslam <ahaslam+renesas@baylibre.com>, "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>, Pavel Machek <pavel@ucw.cz>, Lina Iyer <lina.iyer@linaro.org>, Nicolas Boichat <drinkcat@chromium.org> Subject: Re: [PATCH V4] PM / Domains: Remove intermediate states from the power off sequence Date: Wed, 23 Dec 2015 12:40:49 +0100 [thread overview] Message-ID: <CAPDyKFqs_ZPMCYxUQDWbxRJoKsu907Hz2vu0WG8T1ykyM29aSA@mail.gmail.com> (raw) In-Reply-To: <CAGS+omDa6miWTSCL2ETx2vx-XRt1Aq2EEu+WwTgmtdkJBKrp9A@mail.gmail.com> [...] > > I was testing a kernel with this patch on one of my systems today I > saw a splat like this on boot: Interesting and thanks for reporting. Although, could you be a bit more detailed about the what kernel and what system. Especially I am interested to look at the code which is initializing the genpds and adding subdomains. > > [ 1.893134] > [ 1.893137] ============================================= > [ 1.893139] [ INFO: possible recursive locking detected ] > [ 1.893143] 3.18.0 #531 Not tainted > [ 1.893145] --------------------------------------------- > [ 1.893148] kworker/u8:4/113 is trying to acquire lock: > [ 1.893167] (&genpd->lock){+.+...}, at: [<ffffffc000573818>] > genpd_poweron+0x30/0x70 > [ 1.893169] > [ 1.893169] but task is already holding lock: > [ 1.893179] (&genpd->lock){+.+...}, at: [<ffffffc000573818>] > genpd_poweron+0x30/0x70 > [ 1.893182] > [ 1.893182] other info that might help us debug this: > [ 1.893184] Possible unsafe locking scenario: > [ 1.893184] > [ 1.893185] CPU0 > [ 1.893187] ---- > [ 1.893191] lock(&genpd->lock); > [ 1.893195] lock(&genpd->lock); > [ 1.893196] > [ 1.893196] *** DEADLOCK *** > [ 1.893196] > [ 1.893198] May be due to missing lock nesting notation > [ 1.893198] > [ 1.893201] 4 locks held by kworker/u8:4/113: > [ 1.893217] #0: ("%s""deferwq"){++++.+}, at: [<ffffffc00023b4e0>] > process_one_work+0x1f8/0x50c > [ 1.893229] #1: (deferred_probe_work){+.+.+.}, at: > [<ffffffc00023b4e0>] process_one_work+0x1f8/0x50c > [ 1.893241] #2: (&dev->mutex){......}, at: [<ffffffc000560920>] > __device_attach+0x40/0x12c > [ 1.893251] #3: (&genpd->lock){+.+...}, at: [<ffffffc000573818>] > genpd_poweron+0x30/0x70 > [ 1.893253] > [ 1.893253] stack backtrace: > [ 1.893259] CPU: 2 PID: 113 Comm: kworker/u8:4 Not tainted 3.18.0 #531 > [ 1.893261] Hardware name: Mediatek Oak rev4 board (DT) > [ 1.893269] Workqueue: deferwq deferred_probe_work_func > [ 1.893271] Call trace: > [ 1.893278] [<ffffffc000208cf0>] dump_backtrace+0x0/0x140 > [ 1.893283] [<ffffffc000208e4c>] show_stack+0x1c/0x28 > [ 1.893289] [<ffffffc00084ab48>] dump_stack+0x80/0xb4 > [ 1.893295] [<ffffffc000269dcc>] __lock_acquire+0x68c/0x19a8 > [ 1.893299] [<ffffffc00026b954>] lock_acquire+0x128/0x164 > [ 1.893304] [<ffffffc00084e090>] mutex_lock_nested+0x90/0x3b4 > [ 1.893308] [<ffffffc000573814>] genpd_poweron+0x2c/0x70 > [ 1.893312] [<ffffffc0005738ac>] __genpd_poweron.part.14+0x54/0xcc > [ 1.893316] [<ffffffc000573834>] genpd_poweron+0x4c/0x70 > [ 1.893321] [<ffffffc00057447c>] genpd_dev_pm_attach+0x160/0x19c > [ 1.893326] [<ffffffc00056931c>] dev_pm_domain_attach+0x1c/0x2c > [ 1.893331] [<ffffffc000562754>] platform_drv_probe+0x3c/0xa4 > [ 1.893336] [<ffffffc000560604>] driver_probe_device+0x124/0x2c8 > [ 1.893340] [<ffffffc000560824>] __device_attach_driver+0x7c/0x90 > [ 1.893344] [<ffffffc00055e6fc>] bus_for_each_drv+0x90/0xc4 > [ 1.893348] [<ffffffc000560988>] __device_attach+0xa8/0x12c > [ 1.893352] [<ffffffc000560a5c>] device_initial_probe+0x20/0x30 > [ 1.893356] [<ffffffc00055f994>] bus_probe_device+0x34/0x9c > [ 1.893360] [<ffffffc00055fe38>] deferred_probe_work_func+0x7c/0xb0 > [ 1.893365] [<ffffffc00023b5d4>] process_one_work+0x2ec/0x50c > [ 1.893370] [<ffffffc00023c340>] worker_thread+0x350/0x470 > [ 1.893374] [<ffffffc00024165c>] kthread+0xf0/0xfc > > I have not been able to reproduce, but by looking at how > genpd_poweron() was modified by this patch, it looks like it might > have been introduced by this hunk: > >> @@ -291,20 +239,8 @@ static int __pm_genpd_poweron(struct generic_pm_domain *genpd) >> */ >> list_for_each_entry(link, &genpd->slave_links, slave_node) { >> genpd_sd_counter_inc(link->master); >> - genpd->status = GPD_STATE_WAIT_MASTER; >> - >> - mutex_unlock(&genpd->lock); >> >> ret = pm_genpd_poweron(link->master); >> - >> - mutex_lock(&genpd->lock); >> - >> - /* >> - * The "wait for parent" status is guaranteed not to change >> - * while the master is powering on. >> - */ >> - genpd->status = GPD_STATE_POWER_OFF; >> - wake_up_all(&genpd->status_wait_queue); > > AFAICT, the mutex_*() calls here that were removed unlock > (&genpd->lock) and allowed pm_genpd_poweron() to be recursively called > without the above splat in the case where a pm_genpd_poweron() on the > master would take a lock of the same lock class as the current (slave) > genpd. > > Other places in domain.c use something like this to lock a second > genpd while already holding one genpd's lock: > mutex_lock_nested(&genpd->lock, SINGLE_DEPTH_NESTING); > > However, it seems like in this case we might actually need to walk up > several layers of a hierarchy. > > So, AFAICT the best thing to do would be to restore the > mutex_unlock()/_lock() around the call to pm_genpd_poweron(). > > It isn't clear to me whether this was removed by this patch on purpose > or as an accident. The removal of the unlock|lock were done on purpose. The reason was simply because there are no intermediate power states to consider. Regarding re-entrancy with a gendp struct which is already locked, I am wondering whether the configuration of the subdomain/masters are properly done. To reach the deadlock described above, one subdomain's master must be a subdomain of its own child. Is that really correct? Perhaps I have already entered Christmas "mode", so I might be wrong in my analyse and simplifying things too much. :-) I will give this some more thinking after the holidays. Kind regards Uffe > > Thanks for your help! > -Dan
WARNING: multiple messages have this Message-ID (diff)
From: ulf.hansson@linaro.org (Ulf Hansson) To: linux-arm-kernel@lists.infradead.org Subject: [PATCH V4] PM / Domains: Remove intermediate states from the power off sequence Date: Wed, 23 Dec 2015 12:40:49 +0100 [thread overview] Message-ID: <CAPDyKFqs_ZPMCYxUQDWbxRJoKsu907Hz2vu0WG8T1ykyM29aSA@mail.gmail.com> (raw) In-Reply-To: <CAGS+omDa6miWTSCL2ETx2vx-XRt1Aq2EEu+WwTgmtdkJBKrp9A@mail.gmail.com> [...] > > I was testing a kernel with this patch on one of my systems today I > saw a splat like this on boot: Interesting and thanks for reporting. Although, could you be a bit more detailed about the what kernel and what system. Especially I am interested to look at the code which is initializing the genpds and adding subdomains. > > [ 1.893134] > [ 1.893137] ============================================= > [ 1.893139] [ INFO: possible recursive locking detected ] > [ 1.893143] 3.18.0 #531 Not tainted > [ 1.893145] --------------------------------------------- > [ 1.893148] kworker/u8:4/113 is trying to acquire lock: > [ 1.893167] (&genpd->lock){+.+...}, at: [<ffffffc000573818>] > genpd_poweron+0x30/0x70 > [ 1.893169] > [ 1.893169] but task is already holding lock: > [ 1.893179] (&genpd->lock){+.+...}, at: [<ffffffc000573818>] > genpd_poweron+0x30/0x70 > [ 1.893182] > [ 1.893182] other info that might help us debug this: > [ 1.893184] Possible unsafe locking scenario: > [ 1.893184] > [ 1.893185] CPU0 > [ 1.893187] ---- > [ 1.893191] lock(&genpd->lock); > [ 1.893195] lock(&genpd->lock); > [ 1.893196] > [ 1.893196] *** DEADLOCK *** > [ 1.893196] > [ 1.893198] May be due to missing lock nesting notation > [ 1.893198] > [ 1.893201] 4 locks held by kworker/u8:4/113: > [ 1.893217] #0: ("%s""deferwq"){++++.+}, at: [<ffffffc00023b4e0>] > process_one_work+0x1f8/0x50c > [ 1.893229] #1: (deferred_probe_work){+.+.+.}, at: > [<ffffffc00023b4e0>] process_one_work+0x1f8/0x50c > [ 1.893241] #2: (&dev->mutex){......}, at: [<ffffffc000560920>] > __device_attach+0x40/0x12c > [ 1.893251] #3: (&genpd->lock){+.+...}, at: [<ffffffc000573818>] > genpd_poweron+0x30/0x70 > [ 1.893253] > [ 1.893253] stack backtrace: > [ 1.893259] CPU: 2 PID: 113 Comm: kworker/u8:4 Not tainted 3.18.0 #531 > [ 1.893261] Hardware name: Mediatek Oak rev4 board (DT) > [ 1.893269] Workqueue: deferwq deferred_probe_work_func > [ 1.893271] Call trace: > [ 1.893278] [<ffffffc000208cf0>] dump_backtrace+0x0/0x140 > [ 1.893283] [<ffffffc000208e4c>] show_stack+0x1c/0x28 > [ 1.893289] [<ffffffc00084ab48>] dump_stack+0x80/0xb4 > [ 1.893295] [<ffffffc000269dcc>] __lock_acquire+0x68c/0x19a8 > [ 1.893299] [<ffffffc00026b954>] lock_acquire+0x128/0x164 > [ 1.893304] [<ffffffc00084e090>] mutex_lock_nested+0x90/0x3b4 > [ 1.893308] [<ffffffc000573814>] genpd_poweron+0x2c/0x70 > [ 1.893312] [<ffffffc0005738ac>] __genpd_poweron.part.14+0x54/0xcc > [ 1.893316] [<ffffffc000573834>] genpd_poweron+0x4c/0x70 > [ 1.893321] [<ffffffc00057447c>] genpd_dev_pm_attach+0x160/0x19c > [ 1.893326] [<ffffffc00056931c>] dev_pm_domain_attach+0x1c/0x2c > [ 1.893331] [<ffffffc000562754>] platform_drv_probe+0x3c/0xa4 > [ 1.893336] [<ffffffc000560604>] driver_probe_device+0x124/0x2c8 > [ 1.893340] [<ffffffc000560824>] __device_attach_driver+0x7c/0x90 > [ 1.893344] [<ffffffc00055e6fc>] bus_for_each_drv+0x90/0xc4 > [ 1.893348] [<ffffffc000560988>] __device_attach+0xa8/0x12c > [ 1.893352] [<ffffffc000560a5c>] device_initial_probe+0x20/0x30 > [ 1.893356] [<ffffffc00055f994>] bus_probe_device+0x34/0x9c > [ 1.893360] [<ffffffc00055fe38>] deferred_probe_work_func+0x7c/0xb0 > [ 1.893365] [<ffffffc00023b5d4>] process_one_work+0x2ec/0x50c > [ 1.893370] [<ffffffc00023c340>] worker_thread+0x350/0x470 > [ 1.893374] [<ffffffc00024165c>] kthread+0xf0/0xfc > > I have not been able to reproduce, but by looking at how > genpd_poweron() was modified by this patch, it looks like it might > have been introduced by this hunk: > >> @@ -291,20 +239,8 @@ static int __pm_genpd_poweron(struct generic_pm_domain *genpd) >> */ >> list_for_each_entry(link, &genpd->slave_links, slave_node) { >> genpd_sd_counter_inc(link->master); >> - genpd->status = GPD_STATE_WAIT_MASTER; >> - >> - mutex_unlock(&genpd->lock); >> >> ret = pm_genpd_poweron(link->master); >> - >> - mutex_lock(&genpd->lock); >> - >> - /* >> - * The "wait for parent" status is guaranteed not to change >> - * while the master is powering on. >> - */ >> - genpd->status = GPD_STATE_POWER_OFF; >> - wake_up_all(&genpd->status_wait_queue); > > AFAICT, the mutex_*() calls here that were removed unlock > (&genpd->lock) and allowed pm_genpd_poweron() to be recursively called > without the above splat in the case where a pm_genpd_poweron() on the > master would take a lock of the same lock class as the current (slave) > genpd. > > Other places in domain.c use something like this to lock a second > genpd while already holding one genpd's lock: > mutex_lock_nested(&genpd->lock, SINGLE_DEPTH_NESTING); > > However, it seems like in this case we might actually need to walk up > several layers of a hierarchy. > > So, AFAICT the best thing to do would be to restore the > mutex_unlock()/_lock() around the call to pm_genpd_poweron(). > > It isn't clear to me whether this was removed by this patch on purpose > or as an accident. The removal of the unlock|lock were done on purpose. The reason was simply because there are no intermediate power states to consider. Regarding re-entrancy with a gendp struct which is already locked, I am wondering whether the configuration of the subdomain/masters are properly done. To reach the deadlock described above, one subdomain's master must be a subdomain of its own child. Is that really correct? Perhaps I have already entered Christmas "mode", so I might be wrong in my analyse and simplifying things too much. :-) I will give this some more thinking after the holidays. Kind regards Uffe > > Thanks for your help! > -Dan
next prev parent reply other threads:[~2015-12-23 11:40 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-06-18 13:17 [PATCH V4] PM / Domains: Remove intermediate states from the power off sequence Ulf Hansson 2015-06-18 13:17 ` Ulf Hansson 2015-07-08 19:56 ` Kevin Hilman 2015-07-08 19:56 ` Kevin Hilman 2015-07-09 7:05 ` Geert Uytterhoeven 2015-07-09 7:05 ` Geert Uytterhoeven 2015-07-09 14:05 ` Lina Iyer 2015-07-09 14:05 ` Lina Iyer 2015-07-10 1:26 ` Rafael J. Wysocki 2015-07-10 1:26 ` Rafael J. Wysocki 2015-12-22 12:49 ` Daniel Kurtz 2015-12-22 12:49 ` Daniel Kurtz 2015-12-23 11:40 ` Ulf Hansson [this message] 2015-12-23 11:40 ` Ulf Hansson
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAPDyKFqs_ZPMCYxUQDWbxRJoKsu907Hz2vu0WG8T1ykyM29aSA@mail.gmail.com \ --to=ulf.hansson@linaro.org \ --cc=ahaslam+renesas@baylibre.com \ --cc=djkurtz@chromium.org \ --cc=dmitry.torokhov@gmail.com \ --cc=drinkcat@chromium.org \ --cc=geert+renesas@glider.be \ --cc=k.kozlowski.k@gmail.com \ --cc=khilman@kernel.org \ --cc=len.brown@intel.com \ --cc=lina.iyer@linaro.org \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-pm@vger.kernel.org \ --cc=linux@arm.linux.org.uk \ --cc=pavel@ucw.cz \ --cc=rjw@rjwysocki.net \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.