All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] PM:runtime:Remove the link state check in function rpm_get_supplier() and rpm_put_supplier()
@ 2020-09-22 13:11 chenxiang
  2020-09-25 15:02 ` Rafael J. Wysocki
  0 siblings, 1 reply; 5+ messages in thread
From: chenxiang @ 2020-09-22 13:11 UTC (permalink / raw
  To: rafael, pavel, sre; +Cc: linuxarm, linux-pm, john.garry, Xiang Chen

From: Xiang Chen <chenxiang66@hisilicon.com>

To support runtime PM for hisi SAS driver (the driver is in directory
drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
(consumer device) and hisi_hba->dev(supplier device) with flags
DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.
After runtime suspended consumers and supplier, unload the dirver which
cause a hung. We find that it calls function device_release_driver_internal()
to release supplier device hisi_hba->dev, as the device link is busy,
it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function
device_release_driver_internal() to release consumer device
scsi_device->sdev_gendev). Then it will try to call pm_runtime_get_sync()
to resume consumer device, as consumer-supplier relation exists, it will try
to resume supplier first, but as the link state is already set as
DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume
consumer which cause a hung (it sends IOs to resume scsi_device while
SAS controller is suspended). Simple flow is as follows:

device_release_driver_internal -> (supplier device)
    if device_links_busy ->
	device_links_unbind_consumers ->
	    ...
	    WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND)
	    device_release_driver_internal (consumer device)
    pm_runtime_get_sync -> (consumer device)
	...
	__rpm_callback ->
	    rpm_get_suppliers ->
		if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier
		...
    pm_runtime_clean_up_links
    ...

It should guarantee correct suspend/resume ordering between a supplier
device and its consumer devices (resume the supplier device before resuming
consumer devices, and suspend consumer devices before suspending supplier
device) for runtime PM, but it seems the checks in rpm_get_supplier() and
rpm_put_supplier() break the rule, so remove them.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
---
 drivers/base/power/runtime.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 8143210..6f605f7 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev)
 				device_links_read_lock_held()) {
 		int retval;
 
-		if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
-		    READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
+		if (!(link->flags & DL_FLAG_PM_RUNTIME))
 			continue;
 
 		retval = pm_runtime_get_sync(link->supplier);
@@ -312,8 +311,6 @@ static void rpm_put_suppliers(struct device *dev)
 
 	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
 				device_links_read_lock_held()) {
-		if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
-			continue;
 
 		while (refcount_dec_not_one(&link->rpm_active))
 			pm_runtime_put(link->supplier);
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] PM:runtime:Remove the link state check in function rpm_get_supplier() and rpm_put_supplier()
  2020-09-22 13:11 [PATCH v2] PM:runtime:Remove the link state check in function rpm_get_supplier() and rpm_put_supplier() chenxiang
@ 2020-09-25 15:02 ` Rafael J. Wysocki
  2020-10-19 17:03   ` Rafael J. Wysocki
  0 siblings, 1 reply; 5+ messages in thread
From: Rafael J. Wysocki @ 2020-09-25 15:02 UTC (permalink / raw
  To: chenxiang
  Cc: Rafael J. Wysocki, Pavel Machek, Sebastian Reichel, Linuxarm,
	Linux PM, John Garry

On Tue, Sep 22, 2020 at 3:15 PM chenxiang <chenxiang66@hisilicon.com> wrote:
>
> From: Xiang Chen <chenxiang66@hisilicon.com>
>
> To support runtime PM for hisi SAS driver (the driver is in directory
> drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
> (consumer device) and hisi_hba->dev(supplier device) with flags
> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.
> After runtime suspended consumers and supplier, unload the dirver which
> cause a hung. We find that it calls function device_release_driver_internal()
> to release supplier device hisi_hba->dev, as the device link is busy,
> it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function
> device_release_driver_internal() to release consumer device
> scsi_device->sdev_gendev). Then it will try to call pm_runtime_get_sync()
> to resume consumer device, as consumer-supplier relation exists, it will try
> to resume supplier first, but as the link state is already set as
> DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume
> consumer which cause a hung (it sends IOs to resume scsi_device while
> SAS controller is suspended). Simple flow is as follows:
>
> device_release_driver_internal -> (supplier device)
>     if device_links_busy ->
>         device_links_unbind_consumers ->
>             ...
>             WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND)
>             device_release_driver_internal (consumer device)
>     pm_runtime_get_sync -> (consumer device)
>         ...
>         __rpm_callback ->
>             rpm_get_suppliers ->
>                 if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier
>                 ...
>     pm_runtime_clean_up_links
>     ...
>
> It should guarantee correct suspend/resume ordering between a supplier
> device and its consumer devices (resume the supplier device before resuming
> consumer devices, and suspend consumer devices before suspending supplier
> device) for runtime PM, but it seems the checks in rpm_get_supplier() and
> rpm_put_supplier() break the rule, so remove them.
>
> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
> ---
>  drivers/base/power/runtime.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index 8143210..6f605f7 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev)
>                                 device_links_read_lock_held()) {
>                 int retval;
>
> -               if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
> -                   READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
> +               if (!(link->flags & DL_FLAG_PM_RUNTIME))
>                         continue;
>
>                 retval = pm_runtime_get_sync(link->supplier);
> @@ -312,8 +311,6 @@ static void rpm_put_suppliers(struct device *dev)
>
>         list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
>                                 device_links_read_lock_held()) {
> -               if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
> -                       continue;
>
>                 while (refcount_dec_not_one(&link->rpm_active))
>                         pm_runtime_put(link->supplier);
> --

Applied as 5.10 material with some edits in the subject and changelog, thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] PM:runtime:Remove the link state check in function rpm_get_supplier() and rpm_put_supplier()
  2020-09-25 15:02 ` Rafael J. Wysocki
@ 2020-10-19 17:03   ` Rafael J. Wysocki
  2020-10-20  2:23     ` chenxiang (M)
  0 siblings, 1 reply; 5+ messages in thread
From: Rafael J. Wysocki @ 2020-10-19 17:03 UTC (permalink / raw
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Sebastian Reichel, Linuxarm, Linux PM, John Garry,
	Xiang Chen

On Fri, Sep 25, 2020 at 5:02 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Sep 22, 2020 at 3:15 PM chenxiang <chenxiang66@hisilicon.com> wrote:
> >
> > From: Xiang Chen <chenxiang66@hisilicon.com>
> >
> > To support runtime PM for hisi SAS driver (the driver is in directory
> > drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
> > (consumer device) and hisi_hba->dev(supplier device) with flags
> > DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.
> > After runtime suspended consumers and supplier, unload the dirver which
> > cause a hung. We find that it calls function device_release_driver_internal()
> > to release supplier device hisi_hba->dev, as the device link is busy,
> > it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function
> > device_release_driver_internal() to release consumer device
> > scsi_device->sdev_gendev). Then it will try to call pm_runtime_get_sync()
> > to resume consumer device, as consumer-supplier relation exists, it will try
> > to resume supplier first,

It looks like I have overlooked one thing here.

Does it help (without the $subject patch) to move the
pm_runtime_get_sync(dev) in __device_release_driver() before the while
(device_links_busy(dev)) loop?

> > but as the link state is already set as
> > DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume
> > consumer which cause a hung (it sends IOs to resume scsi_device while
> > SAS controller is suspended). Simple flow is as follows:
> >
> > device_release_driver_internal -> (supplier device)
> >     if device_links_busy ->
> >         device_links_unbind_consumers ->
> >             ...
> >             WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND)
> >             device_release_driver_internal (consumer device)
> >     pm_runtime_get_sync -> (consumer device)
> >         ...
> >         __rpm_callback ->
> >             rpm_get_suppliers ->
> >                 if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier
> >                 ...
> >     pm_runtime_clean_up_links
> >     ...
> >
> > It should guarantee correct suspend/resume ordering between a supplier
> > device and its consumer devices (resume the supplier device before resuming
> > consumer devices, and suspend consumer devices before suspending supplier
> > device) for runtime PM, but it seems the checks in rpm_get_supplier() and
> > rpm_put_supplier() break the rule, so remove them.
> >
> > Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
> > ---
> >  drivers/base/power/runtime.c | 5 +----
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> >
> > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> > index 8143210..6f605f7 100644
> > --- a/drivers/base/power/runtime.c
> > +++ b/drivers/base/power/runtime.c
> > @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev)
> >                                 device_links_read_lock_held()) {
> >                 int retval;
> >
> > -               if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
> > -                   READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
> > +               if (!(link->flags & DL_FLAG_PM_RUNTIME))
> >                         continue;
> >
> >                 retval = pm_runtime_get_sync(link->supplier);
> > @@ -312,8 +311,6 @@ static void rpm_put_suppliers(struct device *dev)
> >
> >         list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
> >                                 device_links_read_lock_held()) {
> > -               if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
> > -                       continue;
> >
> >                 while (refcount_dec_not_one(&link->rpm_active))
> >                         pm_runtime_put(link->supplier);
> > --
>
> Applied as 5.10 material with some edits in the subject and changelog, thanks!

I may need to revert it, because it introduced a (theoretical for now)
race condition between PM-runtime and pm_runtime_clean_up_links().

Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] PM:runtime:Remove the link state check in function rpm_get_supplier() and rpm_put_supplier()
  2020-10-19 17:03   ` Rafael J. Wysocki
@ 2020-10-20  2:23     ` chenxiang (M)
  2020-10-20 12:45       ` Rafael J. Wysocki
  0 siblings, 1 reply; 5+ messages in thread
From: chenxiang (M) @ 2020-10-20  2:23 UTC (permalink / raw
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Sebastian Reichel, Linuxarm, Linux PM, John Garry

Hi Rafael,

在 2020/10/20 1:03, Rafael J. Wysocki 写道:
> On Fri, Sep 25, 2020 at 5:02 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>> On Tue, Sep 22, 2020 at 3:15 PM chenxiang <chenxiang66@hisilicon.com> wrote:
>>> From: Xiang Chen <chenxiang66@hisilicon.com>
>>>
>>> To support runtime PM for hisi SAS driver (the driver is in directory
>>> drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
>>> (consumer device) and hisi_hba->dev(supplier device) with flags
>>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.
>>> After runtime suspended consumers and supplier, unload the dirver which
>>> cause a hung. We find that it calls function device_release_driver_internal()
>>> to release supplier device hisi_hba->dev, as the device link is busy,
>>> it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function
>>> device_release_driver_internal() to release consumer device
>>> scsi_device->sdev_gendev). Then it will try to call pm_runtime_get_sync()
>>> to resume consumer device, as consumer-supplier relation exists, it will try
>>> to resume supplier first,
> It looks like I have overlooked one thing here.
>
> Does it help (without the $subject patch) to move the
> pm_runtime_get_sync(dev) in __device_release_driver() before the while
> (device_links_busy(dev)) loop?

I have tested it, and it also solve the issue.


>
>>> but as the link state is already set as
>>> DL_STATE_SUPPLIER_UNBIND, so it skips resuming supplier and only resume
>>> consumer which cause a hung (it sends IOs to resume scsi_device while
>>> SAS controller is suspended). Simple flow is as follows:
>>>
>>> device_release_driver_internal -> (supplier device)
>>>      if device_links_busy ->
>>>          device_links_unbind_consumers ->
>>>              ...
>>>              WRITE_ONCE(link->status, DL_STATE_SUPPLIER_UNBIND)
>>>              device_release_driver_internal (consumer device)
>>>      pm_runtime_get_sync -> (consumer device)
>>>          ...
>>>          __rpm_callback ->
>>>              rpm_get_suppliers ->
>>>                  if link->state == DL_STATE_SUPPLIER_UNBIND -> skip the action of resuming the supplier
>>>                  ...
>>>      pm_runtime_clean_up_links
>>>      ...
>>>
>>> It should guarantee correct suspend/resume ordering between a supplier
>>> device and its consumer devices (resume the supplier device before resuming
>>> consumer devices, and suspend consumer devices before suspending supplier
>>> device) for runtime PM, but it seems the checks in rpm_get_supplier() and
>>> rpm_put_supplier() break the rule, so remove them.
>>>
>>> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
>>> ---
>>>   drivers/base/power/runtime.c | 5 +----
>>>   1 file changed, 1 insertion(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
>>> index 8143210..6f605f7 100644
>>> --- a/drivers/base/power/runtime.c
>>> +++ b/drivers/base/power/runtime.c
>>> @@ -291,8 +291,7 @@ static int rpm_get_suppliers(struct device *dev)
>>>                                  device_links_read_lock_held()) {
>>>                  int retval;
>>>
>>> -               if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
>>> -                   READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
>>> +               if (!(link->flags & DL_FLAG_PM_RUNTIME))
>>>                          continue;
>>>
>>>                  retval = pm_runtime_get_sync(link->supplier);
>>> @@ -312,8 +311,6 @@ static void rpm_put_suppliers(struct device *dev)
>>>
>>>          list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
>>>                                  device_links_read_lock_held()) {
>>> -               if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
>>> -                       continue;
>>>
>>>                  while (refcount_dec_not_one(&link->rpm_active))
>>>                          pm_runtime_put(link->supplier);
>>> --
>> Applied as 5.10 material with some edits in the subject and changelog, thanks!
> I may need to revert it, because it introduced a (theoretical for now)
> race condition between PM-runtime and pm_runtime_clean_up_links().
>
> Thanks!
>
> .
>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] PM:runtime:Remove the link state check in function rpm_get_supplier() and rpm_put_supplier()
  2020-10-20  2:23     ` chenxiang (M)
@ 2020-10-20 12:45       ` Rafael J. Wysocki
  0 siblings, 0 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2020-10-20 12:45 UTC (permalink / raw
  To: chenxiang (M)
  Cc: Rafael J. Wysocki, Pavel Machek, Sebastian Reichel, Linuxarm,
	Linux PM, John Garry

On Tue, Oct 20, 2020 at 4:24 AM chenxiang (M) <chenxiang66@hisilicon.com> wrote:
>
> Hi Rafael,
>
> 在 2020/10/20 1:03, Rafael J. Wysocki 写道:
> > On Fri, Sep 25, 2020 at 5:02 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >> On Tue, Sep 22, 2020 at 3:15 PM chenxiang <chenxiang66@hisilicon.com> wrote:
> >>> From: Xiang Chen <chenxiang66@hisilicon.com>
> >>>
> >>> To support runtime PM for hisi SAS driver (the driver is in directory
> >>> drivers/scsi/hisi_sas), we add device link between scsi_device->sdev_gendev
> >>> (consumer device) and hisi_hba->dev(supplier device) with flags
> >>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE.
> >>> After runtime suspended consumers and supplier, unload the dirver which
> >>> cause a hung. We find that it calls function device_release_driver_internal()
> >>> to release supplier device hisi_hba->dev, as the device link is busy,
> >>> it sets the device link as DL_STATE_SUPPLIER_UNBIND, and then call function
> >>> device_release_driver_internal() to release consumer device
> >>> scsi_device->sdev_gendev). Then it will try to call pm_runtime_get_sync()
> >>> to resume consumer device, as consumer-supplier relation exists, it will try
> >>> to resume supplier first,
> > It looks like I have overlooked one thing here.
> >
> > Does it help (without the $subject patch) to move the
> > pm_runtime_get_sync(dev) in __device_release_driver() before the while
> > (device_links_busy(dev)) loop?
>
> I have tested it, and it also solve the issue.

OK, thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-10-20 12:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-09-22 13:11 [PATCH v2] PM:runtime:Remove the link state check in function rpm_get_supplier() and rpm_put_supplier() chenxiang
2020-09-25 15:02 ` Rafael J. Wysocki
2020-10-19 17:03   ` Rafael J. Wysocki
2020-10-20  2:23     ` chenxiang (M)
2020-10-20 12:45       ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.