All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Ulf Hansson <ulf.hansson@linaro.org>
To: Daniel Kurtz <djkurtz@chromium.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Kevin Hilman <khilman@kernel.org>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	Len Brown <len.brown@intel.com>,
	Russell King <linux@arm.linux.org.uk>,
	Geert Uytterhoeven <geert+renesas@glider.be>,
	Krzysztof Kozlowski <k.kozlowski.k@gmail.com>,
	Dmitry Torokhov <dmitry.torokhov@gmail.com>,
	Axel Haslam <ahaslam+renesas@baylibre.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	Pavel Machek <pavel@ucw.cz>, Lina Iyer <lina.iyer@linaro.org>,
	Nicolas Boichat <drinkcat@chromium.org>
Subject: Re: [PATCH V4] PM / Domains: Remove intermediate states from the power off sequence
Date: Wed, 23 Dec 2015 12:40:49 +0100	[thread overview]
Message-ID: <CAPDyKFqs_ZPMCYxUQDWbxRJoKsu907Hz2vu0WG8T1ykyM29aSA@mail.gmail.com> (raw)
In-Reply-To: <CAGS+omDa6miWTSCL2ETx2vx-XRt1Aq2EEu+WwTgmtdkJBKrp9A@mail.gmail.com>

[...]

>
> I was testing a kernel with this patch on one of my systems today I
> saw a splat like this on boot:

Interesting and thanks for reporting.

Although, could you be a bit more detailed about the what kernel and
what system.

Especially I am interested to look at the code which is initializing
the genpds and adding subdomains.

>
> [    1.893134]
> [    1.893137] =============================================
> [    1.893139] [ INFO: possible recursive locking detected ]
> [    1.893143] 3.18.0 #531 Not tainted
> [    1.893145] ---------------------------------------------
> [    1.893148] kworker/u8:4/113 is trying to acquire lock:
> [    1.893167]  (&genpd->lock){+.+...}, at: [<ffffffc000573818>]
> genpd_poweron+0x30/0x70
> [    1.893169]
> [    1.893169] but task is already holding lock:
> [    1.893179]  (&genpd->lock){+.+...}, at: [<ffffffc000573818>]
> genpd_poweron+0x30/0x70
> [    1.893182]
> [    1.893182] other info that might help us debug this:
> [    1.893184]  Possible unsafe locking scenario:
> [    1.893184]
> [    1.893185]        CPU0
> [    1.893187]        ----
> [    1.893191]   lock(&genpd->lock);
> [    1.893195]   lock(&genpd->lock);
> [    1.893196]
> [    1.893196]  *** DEADLOCK ***
> [    1.893196]
> [    1.893198]  May be due to missing lock nesting notation
> [    1.893198]
> [    1.893201] 4 locks held by kworker/u8:4/113:
> [    1.893217]  #0:  ("%s""deferwq"){++++.+}, at: [<ffffffc00023b4e0>]
> process_one_work+0x1f8/0x50c
> [    1.893229]  #1:  (deferred_probe_work){+.+.+.}, at:
> [<ffffffc00023b4e0>] process_one_work+0x1f8/0x50c
> [    1.893241]  #2:  (&dev->mutex){......}, at: [<ffffffc000560920>]
> __device_attach+0x40/0x12c
> [    1.893251]  #3:  (&genpd->lock){+.+...}, at: [<ffffffc000573818>]
> genpd_poweron+0x30/0x70
> [    1.893253]
> [    1.893253] stack backtrace:
> [    1.893259] CPU: 2 PID: 113 Comm: kworker/u8:4 Not tainted 3.18.0 #531
> [    1.893261] Hardware name: Mediatek Oak rev4 board (DT)
> [    1.893269] Workqueue: deferwq deferred_probe_work_func
> [    1.893271] Call trace:
> [    1.893278] [<ffffffc000208cf0>] dump_backtrace+0x0/0x140
> [    1.893283] [<ffffffc000208e4c>] show_stack+0x1c/0x28
> [    1.893289] [<ffffffc00084ab48>] dump_stack+0x80/0xb4
> [    1.893295] [<ffffffc000269dcc>] __lock_acquire+0x68c/0x19a8
> [    1.893299] [<ffffffc00026b954>] lock_acquire+0x128/0x164
> [    1.893304] [<ffffffc00084e090>] mutex_lock_nested+0x90/0x3b4
> [    1.893308] [<ffffffc000573814>] genpd_poweron+0x2c/0x70
> [    1.893312] [<ffffffc0005738ac>] __genpd_poweron.part.14+0x54/0xcc
> [    1.893316] [<ffffffc000573834>] genpd_poweron+0x4c/0x70
> [    1.893321] [<ffffffc00057447c>] genpd_dev_pm_attach+0x160/0x19c
> [    1.893326] [<ffffffc00056931c>] dev_pm_domain_attach+0x1c/0x2c
> [    1.893331] [<ffffffc000562754>] platform_drv_probe+0x3c/0xa4
> [    1.893336] [<ffffffc000560604>] driver_probe_device+0x124/0x2c8
> [    1.893340] [<ffffffc000560824>] __device_attach_driver+0x7c/0x90
> [    1.893344] [<ffffffc00055e6fc>] bus_for_each_drv+0x90/0xc4
> [    1.893348] [<ffffffc000560988>] __device_attach+0xa8/0x12c
> [    1.893352] [<ffffffc000560a5c>] device_initial_probe+0x20/0x30
> [    1.893356] [<ffffffc00055f994>] bus_probe_device+0x34/0x9c
> [    1.893360] [<ffffffc00055fe38>] deferred_probe_work_func+0x7c/0xb0
> [    1.893365] [<ffffffc00023b5d4>] process_one_work+0x2ec/0x50c
> [    1.893370] [<ffffffc00023c340>] worker_thread+0x350/0x470
> [    1.893374] [<ffffffc00024165c>] kthread+0xf0/0xfc
>
> I have not been able to reproduce, but by looking at how
> genpd_poweron() was modified by this patch, it looks like it might
> have been introduced by this hunk:
>
>> @@ -291,20 +239,8 @@ static int __pm_genpd_poweron(struct generic_pm_domain *genpd)
>>          */
>>         list_for_each_entry(link, &genpd->slave_links, slave_node) {
>>                 genpd_sd_counter_inc(link->master);
>> -               genpd->status = GPD_STATE_WAIT_MASTER;
>> -
>> -               mutex_unlock(&genpd->lock);
>>
>>                 ret = pm_genpd_poweron(link->master);
>> -
>> -               mutex_lock(&genpd->lock);
>> -
>> -               /*
>> -                * The "wait for parent" status is guaranteed not to change
>> -                * while the master is powering on.
>> -                */
>> -               genpd->status = GPD_STATE_POWER_OFF;
>> -               wake_up_all(&genpd->status_wait_queue);
>
> AFAICT, the mutex_*() calls here that were removed unlock
> (&genpd->lock) and allowed pm_genpd_poweron() to be recursively called
> without the above splat in the case where a pm_genpd_poweron() on the
> master would take a lock of the same lock class as the current (slave)
> genpd.
>
> Other places in domain.c use something like this to lock a second
> genpd while already holding one genpd's lock:
>    mutex_lock_nested(&genpd->lock, SINGLE_DEPTH_NESTING);
>
> However, it seems like in this case we might actually need to walk up
> several layers of a hierarchy.
>
> So, AFAICT the best thing to do would be to restore the
> mutex_unlock()/_lock() around the call to pm_genpd_poweron().
>
> It isn't clear to me whether this was removed by this patch on purpose
> or as an accident.

The removal of the unlock|lock were done on purpose. The reason was
simply because there are no intermediate power states to consider.

Regarding re-entrancy with a gendp struct which is already locked, I
am wondering whether the configuration of the subdomain/masters are
properly done. To reach the deadlock described above, one subdomain's
master must be a subdomain of its own child. Is that really correct?

Perhaps I have already entered Christmas "mode", so I might be wrong
in my analyse and simplifying things too much. :-)

I will give this some more thinking after the holidays.

Kind regards
Uffe

>
> Thanks for your help!
> -Dan

WARNING: multiple messages have this Message-ID (diff)
From: ulf.hansson@linaro.org (Ulf Hansson)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH V4] PM / Domains: Remove intermediate states from the power off sequence
Date: Wed, 23 Dec 2015 12:40:49 +0100	[thread overview]
Message-ID: <CAPDyKFqs_ZPMCYxUQDWbxRJoKsu907Hz2vu0WG8T1ykyM29aSA@mail.gmail.com> (raw)
In-Reply-To: <CAGS+omDa6miWTSCL2ETx2vx-XRt1Aq2EEu+WwTgmtdkJBKrp9A@mail.gmail.com>

[...]

>
> I was testing a kernel with this patch on one of my systems today I
> saw a splat like this on boot:

Interesting and thanks for reporting.

Although, could you be a bit more detailed about the what kernel and
what system.

Especially I am interested to look at the code which is initializing
the genpds and adding subdomains.

>
> [    1.893134]
> [    1.893137] =============================================
> [    1.893139] [ INFO: possible recursive locking detected ]
> [    1.893143] 3.18.0 #531 Not tainted
> [    1.893145] ---------------------------------------------
> [    1.893148] kworker/u8:4/113 is trying to acquire lock:
> [    1.893167]  (&genpd->lock){+.+...}, at: [<ffffffc000573818>]
> genpd_poweron+0x30/0x70
> [    1.893169]
> [    1.893169] but task is already holding lock:
> [    1.893179]  (&genpd->lock){+.+...}, at: [<ffffffc000573818>]
> genpd_poweron+0x30/0x70
> [    1.893182]
> [    1.893182] other info that might help us debug this:
> [    1.893184]  Possible unsafe locking scenario:
> [    1.893184]
> [    1.893185]        CPU0
> [    1.893187]        ----
> [    1.893191]   lock(&genpd->lock);
> [    1.893195]   lock(&genpd->lock);
> [    1.893196]
> [    1.893196]  *** DEADLOCK ***
> [    1.893196]
> [    1.893198]  May be due to missing lock nesting notation
> [    1.893198]
> [    1.893201] 4 locks held by kworker/u8:4/113:
> [    1.893217]  #0:  ("%s""deferwq"){++++.+}, at: [<ffffffc00023b4e0>]
> process_one_work+0x1f8/0x50c
> [    1.893229]  #1:  (deferred_probe_work){+.+.+.}, at:
> [<ffffffc00023b4e0>] process_one_work+0x1f8/0x50c
> [    1.893241]  #2:  (&dev->mutex){......}, at: [<ffffffc000560920>]
> __device_attach+0x40/0x12c
> [    1.893251]  #3:  (&genpd->lock){+.+...}, at: [<ffffffc000573818>]
> genpd_poweron+0x30/0x70
> [    1.893253]
> [    1.893253] stack backtrace:
> [    1.893259] CPU: 2 PID: 113 Comm: kworker/u8:4 Not tainted 3.18.0 #531
> [    1.893261] Hardware name: Mediatek Oak rev4 board (DT)
> [    1.893269] Workqueue: deferwq deferred_probe_work_func
> [    1.893271] Call trace:
> [    1.893278] [<ffffffc000208cf0>] dump_backtrace+0x0/0x140
> [    1.893283] [<ffffffc000208e4c>] show_stack+0x1c/0x28
> [    1.893289] [<ffffffc00084ab48>] dump_stack+0x80/0xb4
> [    1.893295] [<ffffffc000269dcc>] __lock_acquire+0x68c/0x19a8
> [    1.893299] [<ffffffc00026b954>] lock_acquire+0x128/0x164
> [    1.893304] [<ffffffc00084e090>] mutex_lock_nested+0x90/0x3b4
> [    1.893308] [<ffffffc000573814>] genpd_poweron+0x2c/0x70
> [    1.893312] [<ffffffc0005738ac>] __genpd_poweron.part.14+0x54/0xcc
> [    1.893316] [<ffffffc000573834>] genpd_poweron+0x4c/0x70
> [    1.893321] [<ffffffc00057447c>] genpd_dev_pm_attach+0x160/0x19c
> [    1.893326] [<ffffffc00056931c>] dev_pm_domain_attach+0x1c/0x2c
> [    1.893331] [<ffffffc000562754>] platform_drv_probe+0x3c/0xa4
> [    1.893336] [<ffffffc000560604>] driver_probe_device+0x124/0x2c8
> [    1.893340] [<ffffffc000560824>] __device_attach_driver+0x7c/0x90
> [    1.893344] [<ffffffc00055e6fc>] bus_for_each_drv+0x90/0xc4
> [    1.893348] [<ffffffc000560988>] __device_attach+0xa8/0x12c
> [    1.893352] [<ffffffc000560a5c>] device_initial_probe+0x20/0x30
> [    1.893356] [<ffffffc00055f994>] bus_probe_device+0x34/0x9c
> [    1.893360] [<ffffffc00055fe38>] deferred_probe_work_func+0x7c/0xb0
> [    1.893365] [<ffffffc00023b5d4>] process_one_work+0x2ec/0x50c
> [    1.893370] [<ffffffc00023c340>] worker_thread+0x350/0x470
> [    1.893374] [<ffffffc00024165c>] kthread+0xf0/0xfc
>
> I have not been able to reproduce, but by looking at how
> genpd_poweron() was modified by this patch, it looks like it might
> have been introduced by this hunk:
>
>> @@ -291,20 +239,8 @@ static int __pm_genpd_poweron(struct generic_pm_domain *genpd)
>>          */
>>         list_for_each_entry(link, &genpd->slave_links, slave_node) {
>>                 genpd_sd_counter_inc(link->master);
>> -               genpd->status = GPD_STATE_WAIT_MASTER;
>> -
>> -               mutex_unlock(&genpd->lock);
>>
>>                 ret = pm_genpd_poweron(link->master);
>> -
>> -               mutex_lock(&genpd->lock);
>> -
>> -               /*
>> -                * The "wait for parent" status is guaranteed not to change
>> -                * while the master is powering on.
>> -                */
>> -               genpd->status = GPD_STATE_POWER_OFF;
>> -               wake_up_all(&genpd->status_wait_queue);
>
> AFAICT, the mutex_*() calls here that were removed unlock
> (&genpd->lock) and allowed pm_genpd_poweron() to be recursively called
> without the above splat in the case where a pm_genpd_poweron() on the
> master would take a lock of the same lock class as the current (slave)
> genpd.
>
> Other places in domain.c use something like this to lock a second
> genpd while already holding one genpd's lock:
>    mutex_lock_nested(&genpd->lock, SINGLE_DEPTH_NESTING);
>
> However, it seems like in this case we might actually need to walk up
> several layers of a hierarchy.
>
> So, AFAICT the best thing to do would be to restore the
> mutex_unlock()/_lock() around the call to pm_genpd_poweron().
>
> It isn't clear to me whether this was removed by this patch on purpose
> or as an accident.

The removal of the unlock|lock were done on purpose. The reason was
simply because there are no intermediate power states to consider.

Regarding re-entrancy with a gendp struct which is already locked, I
am wondering whether the configuration of the subdomain/masters are
properly done. To reach the deadlock described above, one subdomain's
master must be a subdomain of its own child. Is that really correct?

Perhaps I have already entered Christmas "mode", so I might be wrong
in my analyse and simplifying things too much. :-)

I will give this some more thinking after the holidays.

Kind regards
Uffe

>
> Thanks for your help!
> -Dan

  reply	other threads:[~2015-12-23 11:40 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-18 13:17 [PATCH V4] PM / Domains: Remove intermediate states from the power off sequence Ulf Hansson
2015-06-18 13:17 ` Ulf Hansson
2015-07-08 19:56 ` Kevin Hilman
2015-07-08 19:56   ` Kevin Hilman
2015-07-09  7:05   ` Geert Uytterhoeven
2015-07-09  7:05     ` Geert Uytterhoeven
2015-07-09 14:05   ` Lina Iyer
2015-07-09 14:05     ` Lina Iyer
2015-07-10  1:26     ` Rafael J. Wysocki
2015-07-10  1:26       ` Rafael J. Wysocki
2015-12-22 12:49 ` Daniel Kurtz
2015-12-22 12:49   ` Daniel Kurtz
2015-12-23 11:40   ` Ulf Hansson [this message]
2015-12-23 11:40     ` Ulf Hansson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPDyKFqs_ZPMCYxUQDWbxRJoKsu907Hz2vu0WG8T1ykyM29aSA@mail.gmail.com \
    --to=ulf.hansson@linaro.org \
    --cc=ahaslam+renesas@baylibre.com \
    --cc=djkurtz@chromium.org \
    --cc=dmitry.torokhov@gmail.com \
    --cc=drinkcat@chromium.org \
    --cc=geert+renesas@glider.be \
    --cc=k.kozlowski.k@gmail.com \
    --cc=khilman@kernel.org \
    --cc=len.brown@intel.com \
    --cc=lina.iyer@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=pavel@ucw.cz \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.