LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH] usb: host: xhci: Replace bus lock with host controller lock
@ 2016-02-04 19:46 Chris Bainbridge
  2016-02-04 21:00 ` Alan Stern
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Bainbridge @ 2016-02-04 19:46 UTC (permalink / raw)
  To: mathias.nyman; +Cc: Chris Bainbridge, johan, linux-usb, linux-kernel, stern

The XHCI controller presents two USB buses to the system - one for USB 2
and one for USB 3. When only one bus is locked there is a race condition
during hub init that results in errors like:

 [   13.183701] usb 3-3: device descriptor read/all, error -110

On a test system this error occurred on 6% of boots. Fix this by locking
the bus controller instead of the bus, thus preventing simultaneous hub
init on both buses.

Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in parallel")
Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com>
---
 drivers/usb/core/hcd.c | 2 +-
 drivers/usb/core/hub.c | 8 ++++----
 include/linux/usb.h    | 3 +--
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index df0e3b92533a..c468d047ec5e 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -966,7 +966,7 @@ static void usb_bus_init (struct usb_bus *bus)
 	bus->bandwidth_allocated = 0;
 	bus->bandwidth_int_reqs  = 0;
 	bus->bandwidth_isoc_reqs = 0;
-	mutex_init(&bus->usb_address0_mutex);
+	mutex_init(&bus->devnum_next_mutex);
 
 	INIT_LIST_HEAD (&bus->bus_list);
 }
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 350dcd9af5d8..e03c2f05c2f6 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -2066,7 +2066,7 @@ static void choose_devnum(struct usb_device *udev)
 	struct usb_bus	*bus = udev->bus;
 
 	/* be safe when more hub events are proceed in parallel */
-	mutex_lock(&bus->usb_address0_mutex);
+	mutex_lock(&bus->devnum_next_mutex);
 	if (udev->wusb) {
 		devnum = udev->portnum + 1;
 		BUG_ON(test_bit(devnum, bus->devmap.devicemap));
@@ -2084,7 +2084,7 @@ static void choose_devnum(struct usb_device *udev)
 		set_bit(devnum, bus->devmap.devicemap);
 		udev->devnum = devnum;
 	}
-	mutex_unlock(&bus->usb_address0_mutex);
+	mutex_unlock(&bus->devnum_next_mutex);
 }
 
 static void release_devnum(struct usb_device *udev)
@@ -4312,7 +4312,7 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1,
 	if (oldspeed == USB_SPEED_LOW)
 		delay = HUB_LONG_RESET_TIME;
 
-	mutex_lock(&hdev->bus->usb_address0_mutex);
+	mutex_lock(&hdev->bus->controller->mutex);
 
 	/* Reset the device; full speed may morph to high speed */
 	/* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */
@@ -4588,7 +4588,7 @@ fail:
 		hub_port_disable(hub, port1, 0);
 		update_devnum(udev, devnum);	/* for disconnect processing */
 	}
-	mutex_unlock(&hdev->bus->usb_address0_mutex);
+	mutex_unlock(&hdev->bus->controller->mutex);
 	return retval;
 }
 
diff --git a/include/linux/usb.h b/include/linux/usb.h
index 89533ba38691..6b736c82b9d1 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -371,14 +371,13 @@ struct usb_bus {
 
 	int devnum_next;		/* Next open device number in
 					 * round-robin allocation */
+	struct mutex devnum_next_mutex; /* devnum_next mutex */
 
 	struct usb_devmap devmap;	/* device address allocation map */
 	struct usb_device *root_hub;	/* Root hub */
 	struct usb_bus *hs_companion;	/* Companion EHCI bus, if any */
 	struct list_head bus_list;	/* list of busses */
 
-	struct mutex usb_address0_mutex; /* unaddressed device mutex */
-
 	int bandwidth_allocated;	/* on this bus: how much of the time
 					 * reserved for periodic (intr/iso)
 					 * requests is used, on average?
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] usb: host: xhci: Replace bus lock with host controller lock
  2016-02-04 19:46 [PATCH] usb: host: xhci: Replace bus lock with host controller lock Chris Bainbridge
@ 2016-02-04 21:00 ` Alan Stern
  2016-02-04 22:06   ` Chris Bainbridge
  0 siblings, 1 reply; 6+ messages in thread
From: Alan Stern @ 2016-02-04 21:00 UTC (permalink / raw)
  To: Chris Bainbridge; +Cc: mathias.nyman, johan, linux-usb, linux-kernel

On Thu, 4 Feb 2016, Chris Bainbridge wrote:

> The XHCI controller presents two USB buses to the system - one for USB 2
> and one for USB 3. When only one bus is locked there is a race condition
> during hub init that results in errors like:
> 
>  [   13.183701] usb 3-3: device descriptor read/all, error -110

What exactly is the race condition?  Why does locking both buses fix 
it?


> @@ -4312,7 +4312,7 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1,
>  	if (oldspeed == USB_SPEED_LOW)
>  		delay = HUB_LONG_RESET_TIME;
>  
> -	mutex_lock(&hdev->bus->usb_address0_mutex);
> +	mutex_lock(&hdev->bus->controller->mutex);
>  
>  	/* Reset the device; full speed may morph to high speed */
>  	/* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */
> @@ -4588,7 +4588,7 @@ fail:
>  		hub_port_disable(hub, port1, 0);
>  		update_devnum(udev, devnum);	/* for disconnect processing */
>  	}
> -	mutex_unlock(&hdev->bus->usb_address0_mutex);
> +	mutex_unlock(&hdev->bus->controller->mutex);
>  	return retval;
>  }

I don't think this is a good idea.  The driver core needs to be able to
access the controller while this function is running.  You can
introduce a new mutex if you want, perhaps in the primary hcd
structure, but don't use bus->controller->mutex.

Alan Stern

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] usb: host: xhci: Replace bus lock with host controller lock
  2016-02-04 21:00 ` Alan Stern
@ 2016-02-04 22:06   ` Chris Bainbridge
  2016-02-05  2:45     ` Alan Stern
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Bainbridge @ 2016-02-04 22:06 UTC (permalink / raw)
  To: Alan Stern; +Cc: mathias.nyman, johan, linux-usb, linux-kernel

On Thu, Feb 04, 2016 at 04:00:51PM -0500, Alan Stern wrote:
> On Thu, 4 Feb 2016, Chris Bainbridge wrote:
> 
> > The XHCI controller presents two USB buses to the system - one for USB 2
> > and one for USB 3. When only one bus is locked there is a race condition
> > during hub init that results in errors like:
> > 
> >  [   13.183701] usb 3-3: device descriptor read/all, error -110
> 
> What exactly is the race condition?  Why does locking both buses fix 
> it?

[    2.692571] xhci_hcd 0000:00:14.0: xHCI Host Controller
[    2.693279] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 3
[    2.694867] xhci_hcd 0000:00:14.0: hcc params 0x20007181 hci version 0x100 quirks 0x0000b930
[    2.694880] xhci_hcd 0000:00:14.0: cache line size of 256 is not supported
[    2.695995] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002
[    2.696005] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    2.696016] usb usb3: Product: xHCI Host Controller
[    2.696024] usb usb3: Manufacturer: Linux 4.4.0 xhci-hcd
[    2.696031] usb usb3: SerialNumber: 0000:00:14.0
[    2.698414] hub 3-0:1.0: USB hub found
[    2.704723] xhci_hcd 0000:00:14.0: xHCI Host Controller
[    2.705502] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 4
[    2.706136] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003
[    2.706138] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    2.706139] usb usb4: Product: xHCI Host Controller
[    2.706140] usb usb4: Manufacturer: Linux 4.4.0 xhci-hcd
[    2.706141] usb usb4: SerialNumber: 0000:00:14.0
[    2.708467] hub 4-0:1.0: USB hub found
[    3.021779] usb 3-3: new high-speed USB device number 2 using xhci_hcd
[    8.034843] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
...bus4 enumeration...
[   13.297835] usb 3-3: device descriptor read/all, error -110
...bus3 enumeration...

Note there seem to be two timeouts of 5 seconds/10 seconds after the
last message at 3.021779 seconds.

hub_port_init is called in parallel for both buses.
The first thread is in usb_get_device_descriptor when the second one
enters the function and calls the code to get an address. I don't know
precisely how it fails - it looks like the functions for doing the
initialisation are synchronous and sleeping waiting for a response and
that gets disrupted when the second thread tries to initialise the hub.
What was the basis for using a lock on the bus rather than the
controller?  Does the spec say that buses of the same controller can be
initialised in parallel? Mathias previously said:

> Just found an additional note in the xhci specs section 4.5.3 saying that:
> "Note: Software shall not transition more than one Device Slot to the Default State at a time"
> which is what xhci_setup_device() does in addition to moving slots to the addressed state

But I don't know if that means you can do the reset/set address/read
descriptors in parallel?


> > @@ -4312,7 +4312,7 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1,
> >  	if (oldspeed == USB_SPEED_LOW)
> >  		delay = HUB_LONG_RESET_TIME;
> >  
> > -	mutex_lock(&hdev->bus->usb_address0_mutex);
> > +	mutex_lock(&hdev->bus->controller->mutex);
> >  
> >  	/* Reset the device; full speed may morph to high speed */
> >  	/* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */
> > @@ -4588,7 +4588,7 @@ fail:
> >  		hub_port_disable(hub, port1, 0);
> >  		update_devnum(udev, devnum);	/* for disconnect processing */
> >  	}
> > -	mutex_unlock(&hdev->bus->usb_address0_mutex);
> > +	mutex_unlock(&hdev->bus->controller->mutex);
> >  	return retval;
> >  }
> 
> I don't think this is a good idea.  The driver core needs to be able to
> access the controller while this function is running.  You can
> introduce a new mutex if you want, perhaps in the primary hcd
> structure, but don't use bus->controller->mutex.

An explicit lock might be a good idea. I was trying to avoid adding
another lock so used the one in struct device as it appeared unused. The
XHCI code seems to only use the lock in struct xhci_hcd and ehci uses
struct ehci->lock.

btw I think this bug may be the same as reported at
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1437492

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] usb: host: xhci: Replace bus lock with host controller lock
  2016-02-04 22:06   ` Chris Bainbridge
@ 2016-02-05  2:45     ` Alan Stern
  2016-02-05 15:14       ` Chris Bainbridge
  0 siblings, 1 reply; 6+ messages in thread
From: Alan Stern @ 2016-02-05  2:45 UTC (permalink / raw)
  To: Chris Bainbridge; +Cc: mathias.nyman, johan, linux-usb, linux-kernel

On Thu, 4 Feb 2016, Chris Bainbridge wrote:

> On Thu, Feb 04, 2016 at 04:00:51PM -0500, Alan Stern wrote:
> > On Thu, 4 Feb 2016, Chris Bainbridge wrote:
> > 
> > > The XHCI controller presents two USB buses to the system - one for USB 2
> > > and one for USB 3. When only one bus is locked there is a race condition
> > > during hub init that results in errors like:
> > > 
> > >  [   13.183701] usb 3-3: device descriptor read/all, error -110
> > 
> > What exactly is the race condition?  Why does locking both buses fix 
> > it?

...

> hub_port_init is called in parallel for both buses.
> The first thread is in usb_get_device_descriptor when the second one
> enters the function and calls the code to get an address. I don't know
> precisely how it fails - it looks like the functions for doing the
> initialisation are synchronous and sleeping waiting for a response and
> that gets disrupted when the second thread tries to initialise the hub.
> What was the basis for using a lock on the bus rather than the
> controller?

I don't remember exactly.  At the time the code was written, there was
no important distinction between a bus and a controller.  This was long
before USB-3 appeared.

When USB-3 support was added, the basis for keeping the lock on the bus 
was that I assumed there would be no problem talking to different 
devices at address 0 if they were on different buses.

>  Does the spec say that buses of the same controller can be
> initialised in parallel? Mathias previously said:
> 
> > Just found an additional note in the xhci specs section 4.5.3 saying that:
> > "Note: Software shall not transition more than one Device Slot to the Default State at a time"
> > which is what xhci_setup_device() does in addition to moving slots to the addressed state
> 
> But I don't know if that means you can do the reset/set address/read
> descriptors in parallel?

In fact you can do the Set-Address and Read-Descriptor parts in
parallel.  (In USB-3 the Set-Address thing is a no-op anyhow; the
hardware takes over that role completely and does it during the reset.)  
But that quote from the spec implies that the resets must not be done
in parallel.

> > I don't think this is a good idea.  The driver core needs to be able to
> > access the controller while this function is running.  You can
> > introduce a new mutex if you want, perhaps in the primary hcd
> > structure, but don't use bus->controller->mutex.
> 
> An explicit lock might be a good idea. I was trying to avoid adding
> another lock so used the one in struct device as it appeared unused.

It gets used by the driver core.  Don't worry about the overhead of 
adding a new lock if it really is needed; the number of USB buses or 
controllers on any computer isn't big enough to matter.

> The
> XHCI code seems to only use the lock in struct xhci_hcd and ehci uses
> struct ehci->lock.
> 
> btw I think this bug may be the same as reported at
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1437492

It could well be.

Alan Stern

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] usb: host: xhci: Replace bus lock with host controller lock
  2016-02-05  2:45     ` Alan Stern
@ 2016-02-05 15:14       ` Chris Bainbridge
  2016-02-10 16:50         ` Mathias Nyman
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Bainbridge @ 2016-02-05 15:14 UTC (permalink / raw)
  To: Alan Stern; +Cc: mathias.nyman, johan, linux-usb, linux-kernel

Running task list at fail point:

[    8.978405] kworker/3:0     R  running task    11760    24      2 0x00000000
[    8.979624] Workqueue: usb_hub_wq hub_event
[    8.980831]  ffff880260613af8 ffff880260613ac0 00000000811247ed ffff8802604597c0
[    8.982061]  ffff880260608000 ffff880260614000 0000000000000100 ffff880260613b70
[    8.983280]  ffff880260613c14 0000000000001388 ffff880260613b10 ffffffff81b9bab7
[    8.984500] Call Trace:
[    8.985698]  [<ffffffff81b9bab7>] schedule+0x37/0x90
[    8.986918]  [<ffffffff817da7cd>] usb_kill_urb+0x8d/0xd0
[    8.988130]  [<ffffffff8111e5e0>] ? wake_up_atomic_t+0x30/0x30
[    8.989343]  [<ffffffff817dafbe>] usb_start_wait_urb+0xbe/0x150
[    8.990561]  [<ffffffff817db10c>] usb_control_msg+0xbc/0xf0
[    8.991766]  [<ffffffff817d07de>] hub_port_init+0x51e/0xb70
[    8.992964]  [<ffffffff817d4697>] hub_event+0x817/0x1570
[    8.994156]  [<ffffffff810f3e6f>] process_one_work+0x1ff/0x620
[    8.995342]  [<ffffffff810f3dcf>] ? process_one_work+0x15f/0x620
[    8.996528]  [<ffffffff810f4684>] worker_thread+0x64/0x4b0
[    8.997707]  [<ffffffff810f4620>] ? rescuer_thread+0x390/0x390  
[    8.998883]  [<ffffffff810fa7f5>] kthread+0x105/0x120
[    9.000056]  [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200
[    9.001241]  [<ffffffff81ba183f>] ret_from_fork+0x3f/0x70
[    9.002420]  [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200

[    9.094152] kworker/3:1     R  running task    11496   238      2 0x00000000
[    9.095361] Workqueue: pm pm_runtime_work
[    9.096545]  ffff88025fc13ae0 ffff88025fc13aa8 00000003811247ed ffff880260bc0000
[    9.097767]  ffff8802603aaf80 ffff88025fc14000 ffff88025cc7a520 ffff88007c4b7978
[    9.098986]  ffff88025cc7a520 ffff88007c4b7978 ffff88025fc13af8 ffffffff81b9bab7
[    9.100213] Call Trace:
[    9.101415]  [<ffffffff81b9bab7>] schedule+0x37/0x90
[    9.102626]  [<ffffffff817da7cd>] usb_kill_urb+0x8d/0xd0
[    9.103835]  [<ffffffff8111e5e0>] ? wake_up_atomic_t+0x30/0x30
[    9.105035]  [<ffffffff817d264b>] hub_quiesce+0x6b/0xa0
[    9.106226]  [<ffffffff817d281a>] hub_suspend+0x12a/0x250
[    9.107417]  [<ffffffff817de4b5>] usb_suspend_both+0x95/0x1d0
[    9.108600]  [<ffffffff817df64e>] usb_runtime_suspend+0x2e/0x70
[    9.109783]  [<ffffffff817df620>] ? usb_probe_interface+0x310/0x310
[    9.110958]  [<ffffffff8170d5ed>] __rpm_callback+0x2d/0x70
[    9.112130]  [<ffffffff817df620>] ? usb_probe_interface+0x310/0x310
[    9.113305]  [<ffffffff8170d64d>] rpm_callback+0x1d/0x90
[    9.114482]  [<ffffffff8170dbdb>] rpm_suspend+0x14b/0x750
[    9.115664]  [<ffffffff8170f697>] __pm_runtime_suspend+0x57/0x90
[    9.116852]  [<ffffffff817df6b0>] ? usb_runtime_resume+0x20/0x20
[    9.118034]  [<ffffffff817df6d5>] usb_runtime_idle+0x25/0x30
[    9.119210]  [<ffffffff8170d5ed>] __rpm_callback+0x2d/0x70
[    9.120382]  [<ffffffff8170e501>] rpm_idle+0x221/0x410
[    9.121541]  [<ffffffff8170f7e9>] pm_runtime_work+0xa9/0xc0
[    9.122693]  [<ffffffff810f3e6f>] process_one_work+0x1ff/0x620
[    9.123850]  [<ffffffff810f3dcf>] ? process_one_work+0x15f/0x620
[    9.124993]  [<ffffffff810f4684>] worker_thread+0x64/0x4b0
[    9.126134]  [<ffffffff810f4620>] ? rescuer_thread+0x390/0x390
[    9.127272]  [<ffffffff810f4620>] ? rescuer_thread+0x390/0x390  
[    9.128403]  [<ffffffff810fa7f5>] kthread+0x105/0x120
[    9.129524]  [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200
[    9.130653]  [<ffffffff81ba183f>] ret_from_fork+0x3f/0x70
[    9.131781]  [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200

[    9.864737] kworker/3:2     R  running task    11888  1348      2 0x00000008
[    9.865947] Workqueue: usb_hub_wq hub_event
[    9.867152]  ffff88025dbcdf00 ffff88025dbd3b88 ffffffff81109a8c ffffffff811099a9
[    9.868386]  ffff88025dbcdf00 0000000000000000 ffff88025dbce270 ffff88025dbd3bb8
[    9.869610]  ffffffff81109c04 ffff88025dac8000 ffff88025dac8368 ffff88025c8a4000
[    9.870837] Call Trace:
[    9.872044]  [<ffffffff81109a8c>] sched_show_task+0x15c/0x260
[    9.873248]  [<ffffffff811099a9>] ? sched_show_task+0x79/0x260
[    9.874456]  [<ffffffff81109c04>] show_state_filter+0x74/0xc0   
[    9.875664]  [<ffffffff817fd36d>] xhci_setup_device+0x53d/0xa40 
[    9.876871]  [<ffffffff817fd87e>] xhci_address_device+0xe/0x10
[    9.878068]  [<ffffffff817d047f>] hub_port_init+0x1bf/0xb70
[    9.879262]  [<ffffffff811247ed>] ? trace_hardirqs_on+0xd/0x10
[    9.880465]  [<ffffffff817d4697>] hub_event+0x817/0x1570
[    9.881668]  [<ffffffff810f3e6f>] process_one_work+0x1ff/0x620
[    9.882869]  [<ffffffff810f3dcf>] ? process_one_work+0x15f/0x620
[    9.884074]  [<ffffffff810f4684>] worker_thread+0x64/0x4b0
[    9.885268]  [<ffffffff810f4620>] ? rescuer_thread+0x390/0x390
[    9.886457]  [<ffffffff810fa7f5>] kthread+0x105/0x120
[    9.887634]  [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200
[    9.888817]  [<ffffffff81ba183f>] ret_from_fork+0x3f/0x70
[    9.889995]  [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200

[    9.891174] kworker/3:3     R  running task    10808  1350      2 0x00000000
[    9.892367] Workqueue: events_freezable usb_stor_scan_dwork
[    9.893560]  ffff88025dbdbb78 ffff88026a0ccd80 000000006a0ccd80 ffff8802604597c0
[    9.894773]  ffff88025dbcc740 ffff88025dbdc000 ffff88025dbdbbb0 ffff88026a0ccd80
[    9.896004]  ffff88026a0ccd80 0000000000000003 ffff88025dbdbb90 ffffffff81b9bab7
[    9.897233] Call Trace:
[    9.898444]  [<ffffffff81b9bab7>] schedule+0x37/0x90
[    9.899670]  [<ffffffff81ba026c>] schedule_timeout+0x17c/0x2f0
[    9.900893]  [<ffffffff81143f10>] ? init_timer_on_stack_key+0x40/0x40
[    9.902099]  [<ffffffff81b9c98d>] wait_for_completion_interruptible_timeout+0xcd/0x160
[    9.903324]  [<ffffffff81106620>] ? wake_up_q+0x70/0x70
[    9.904536]  [<ffffffff818197e8>] usb_stor_msg_common+0xd8/0x130
[    9.905751]  [<ffffffff818198d6>] usb_stor_control_msg+0x96/0xb0
[    9.906961]  [<ffffffff8181a6d1>] usb_stor_Bulk_max_lun+0x51/0xa0
[    9.908163]  [<ffffffff8181afbf>] usb_stor_scan_dwork+0x7f/0xe0 
[    9.909361]  [<ffffffff810f3e6f>] process_one_work+0x1ff/0x620
[    9.910563]  [<ffffffff810f3dcf>] ? process_one_work+0x15f/0x620
[    9.911762]  [<ffffffff810f4684>] worker_thread+0x64/0x4b0
[    9.912976]  [<ffffffff810f4620>] ? rescuer_thread+0x390/0x390
[    9.914190]  [<ffffffff810fa7f5>] kthread+0x105/0x120
[    9.915396]  [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200
[    9.916607]  [<ffffffff81ba183f>] ret_from_fork+0x3f/0x70
[    9.917817]  [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200

[   11.003856] 7 locks held by kworker/3:2/1348:
[   11.005032]  #0:  ("usb_hub_wq"){.+.+.+}, at: [<ffffffff810f3dcf>] process_one_work+0x15f/0x620
[   11.006250]  #1:  ((&hub->events)){+.+.+.}, at: [<ffffffff810f3dcf>] process_one_work+0x15f/0x620
[   11.007479]  #2:  (&dev->mutex){......}, at: [<ffffffff817d3ee4>] hub_event+0x64/0x1570
[   11.008723]  #3:  (&port_dev->status_lock){+.+.+.}, at: [<ffffffff817d4685>] hub_event+0x805/0x1570
[   11.009975]  #4:  (&bus->usb_address0_mutex){+.+.+.}, at: [<ffffffff817d031d>] hub_port_init+0x5d/0xb70
[   11.011233]  #5:  (&xhci->mutex){+.+.+.}, at: [<ffffffff817fce8f>] xhci_setup_device+0x5f/0xa40
[   11.012496]  #6:  (tasklist_lock){.+.+..}, at: [<ffffffff8112319f>] debug_show_all_locks+0x3f/0x1b0


kworker/3:0 and kworker/3:2 are both in hub_port_init. (I don't think
the other running threads are relevant but include them for
completeness).

Some of the functions appear to be inlined, the exact call chain is:

hub_port_init
    usb_get_device_descriptor
        usb_get_descriptor
            usb_control_msg
                usb_internal_control_msg
                    usb_start_wait_urb
                        usb_submit_urb / wait_for_completion_timeout / usb_kill_urb

hub_port_init
    hub_set_address
        xhci_address_device
            xhci_setup_device

So xhci_setup_device is entered while there is an outstanding URB on the
other bus. Unless anyone can think of a better way to fix this I'll make
the requested changes and resend my patch.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] usb: host: xhci: Replace bus lock with host controller lock
  2016-02-05 15:14       ` Chris Bainbridge
@ 2016-02-10 16:50         ` Mathias Nyman
  0 siblings, 0 replies; 6+ messages in thread
From: Mathias Nyman @ 2016-02-10 16:50 UTC (permalink / raw)
  To: Chris Bainbridge, Alan Stern; +Cc: johan, linux-usb, linux-kernel

On 05.02.2016 17:14, Chris Bainbridge wrote:
> Running task list at fail point:

...

> Some of the functions appear to be inlined, the exact call chain is:
>
> hub_port_init
>      usb_get_device_descriptor
>          usb_get_descriptor
>              usb_control_msg
>                  usb_internal_control_msg
>                      usb_start_wait_urb
>                          usb_submit_urb / wait_for_completion_timeout / usb_kill_urb
>
> hub_port_init
>      hub_set_address
>          xhci_address_device
>              xhci_setup_device
>

hub_port_reset() will end up moving the corresponding xhci device slot to default state.

As hub_port_reset() is called several times in hub_port_init() It sounds reasonable
that we could end up with two threads having their xhci device slots in default state at
the same time,  which according to xhci 4.5.3 specs still is a big no no.

So both threads fail at their next task after this.
One fails to read the descriptor, and the other fails addressing the device.

Nice catch btw.

> So xhci_setup_device is entered while there is an outstanding URB on the
> other bus. Unless anyone can think of a better way to fix this I'll make
> the requested changes and resend my patch.
>

For what it's wort I think that this suggested controller mutex sounds like a good idea.
Should work for xhci at least.

-Mathias

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-02-10 16:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-04 19:46 [PATCH] usb: host: xhci: Replace bus lock with host controller lock Chris Bainbridge
2016-02-04 21:00 ` Alan Stern
2016-02-04 22:06   ` Chris Bainbridge
2016-02-05  2:45     ` Alan Stern
2016-02-05 15:14       ` Chris Bainbridge
2016-02-10 16:50         ` Mathias Nyman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).