LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/1] nvmet-fcloop: unblock module removal
@ 2023-06-06 12:24 Daniel Wagner
  2023-06-06 12:24 ` [PATCH v4 1/1] nvmet-fcloop: Do not wait on completion when unregister fails Daniel Wagner
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Wagner @ 2023-06-06 12:24 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	Daniel Wagner

With the recent FC related changes in blktests we only need one patch to unlock
all tests. That means blktests is able to run through all tests but not all will
passs. These fail look like real bugs in the FC code except nvme/040. I do see
the unload loop module failure also with tcp, rdma and loop.

We have an issue open on github to track the 'module unload before fc resource
cleanup' bug:

  https://github.com/osandov/blktests/issues/120

Anyway, let's address one thing at the time:

The nvme/048 will block forever on the current nvme-6.4 branch:

  run blktests nvme/048 at 2023-06-06 13:04:49
  nvmet: adding nsid 1 to subsystem blktests-subsystem-1
  nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
  (NULL device *): {0:0} Association created
  [478] nvmet: ctrl 1 start keep-alive timer for 1 secs
  [478] nvmet: check nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a
  [478] nvmet: nvmet_setup_dhgroup: ctrl 1 selecting dhgroup 0
  [478] nvmet: No authentication provided
  nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a.
  [478] nvmet: adding queue 1 to ctrl 1.
  [407] nvmet: adding queue 2 to ctrl 1.
  [6549] nvmet: adding queue 3 to ctrl 1.
  [1760] nvmet: adding queue 4 to ctrl 1.
  nvme nvme2: NVME-FC{0}: controller connect complete
  nvme nvme2: NVME-FC{0}: new ctrl: NQN "blktests-subsystem-1"
  [478] nvmet: ctrl 1 reschedule traffic based keep-alive timer
  [407] nvmet: ctrl 1 update keep-alive timer for 1 secs
  nvme nvme2: NVME-FC{0}: io failed due to lldd error 6
  nvme nvme2: NVME-FC{0}: transport association event: transport detected io error
  nvme nvme2: NVME-FC{0}: resetting controller
  [478] nvmet: ctrl 1 stop keep-alive
  (NULL device *): {0:0} Association deleted
  nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
  (NULL device *): {0:0} Association freed
  (NULL device *): {0:0} Association created
  (NULL device *): Disconnect LS failed: No Association
  [407] nvmet: ctrl 1 start keep-alive timer for 1 secs
  [407] nvmet: check nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a
  [407] nvmet: nvmet_setup_dhgroup: ctrl 1 selecting dhgroup 0
  [407] nvmet: No authentication provided
  nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a.
  nvme nvme2: reconnect: revising io queue count from 4 to 1
  [478] nvmet: adding queue 1 to ctrl 1.
  nvme nvme2: NVME-FC{0}: controller connect complete
  [478] nvmet: ctrl 1 reschedule traffic based keep-alive timer
  [478] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [478] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [407] nvmet: ctrl 1 update keep-alive timer for 1 secs
  nvme nvme2: NVME-FC{0}: io failed due to lldd error 6
  nvme nvme2: NVME-FC{0}: transport association event: transport detected io error
  nvme nvme2: NVME-FC{0}: resetting controller
  [478] nvmet: ctrl 1 stop keep-alive
  (NULL device *): {0:0} Association deleted
  (NULL device *): {0:0} Association freed
  (NULL device *): Disconnect LS failed: No Association
  nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
  (NULL device *): {0:0} Association created
  [1760] nvmet: ctrl 1 start keep-alive timer for 1 secs
  [1760] nvmet: check nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a
  [1760] nvmet: nvmet_setup_dhgroup: ctrl 1 selecting dhgroup 0
  [1760] nvmet: No authentication provided
  nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a.
  nvme nvme2: reconnect: revising io queue count from 1 to 4
  [478] nvmet: adding queue 1 to ctrl 1.
  [407] nvmet: adding queue 2 to ctrl 1.
  [6549] nvmet: adding queue 3 to ctrl 1.
  [1760] nvmet: adding queue 4 to ctrl 1.
  nvme nvme2: NVME-FC{0}: controller connect complete
  [1760] nvmet: ctrl 1 reschedule traffic based keep-alive timer
  [478] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [478] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [478] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [478] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
  [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
  nvme nvme2: Removing ctrl: NQN "blktests-subsystem-1"
  [407] nvmet: ctrl 1 stop keep-alive
  (NULL device *): {0:0} Association deleted
  (NULL device *): {0:0} Association freed
  (NULL device *): Disconnect LS failed: No Association
  nvme_fc: nvme_fc_exit_module: waiting for ctlr deletes
  BTRFS info (device vda2): qgroup scan completed (inconsistency flag cleared)
  INFO: task modprobe:11758 blocked for more than 491 seconds.
        Tainted: G        W          6.4.0-rc2+ #3
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:modprobe        state:D stack:0     pid:11758 ppid:11585  flags:0x00004002
  Call Trace:
   <TASK>
   __schedule+0x17b5/0x4870
   ? finish_lock_switch+0x8f/0x120
   ? finish_task_switch+0x17f/0x5c0
   ? __sched_text_start+0x10/0x10
   ? __schedule+0x17bd/0x4870
   ? do_raw_spin_trylock+0xc9/0x1f0
   schedule+0xe6/0x1e0
   schedule_timeout+0x7e/0x1e0
   ? __cfi_schedule_timeout+0x10/0x10
   ? do_raw_spin_trylock+0xc9/0x1f0
   ? __cfi_lock_release+0x10/0x10
   ? do_raw_spin_unlock+0x116/0x8a0
   wait_for_common+0x377/0x600
   ? wait_for_completion+0x30/0x30
   cleanup_module+0x222/0x240 [nvme_fc bda1ef6f83d00208ba257c5d0ac34dd6bdb58bf9]
   __se_sys_delete_module+0x388/0x550
   ? __x64_sys_delete_module+0x50/0x50
   ? task_work_run+0x236/0x290
   ? syscall_enter_from_user_mode+0x2e/0x210
   do_syscall_64+0x6e/0xa0
   ? syscall_exit_to_user_mode+0x5e/0x220
   ? do_syscall_64+0x7d/0xa0
   ? syscall_exit_to_user_mode+0x5e/0x220
   ? do_syscall_64+0x7d/0xa0
   ? syscall_exit_to_user_mode+0x5e/0x220
   ? do_syscall_64+0x7d/0xa0
   entry_SYSCALL_64_after_hwframe+0x72/0xdc
  RIP: 0033:0x7fa80811aebb
  RSP: 002b:00007fff80ea0a88 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
  RAX: ffffffffffffffda RBX: 000055a0b5acf1f0 RCX: 00007fa80811aebb
  RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055a0b5acf258
  RBP: 0000000000000000 R08: 1999999999999999 R09: 0000000000000000
  R10: 00007fa808193ac0 R11: 0000000000000206 R12: 00007fff80ea0ad0
  R13: 000055a0b5acf1f0 R14: 0000000000000000 R15: 0000000000000000
   </TASK>
  INFO: lockdep is turned off.
 

with the patch nvme/048 test will at least terminate:


  # nvme_trtype=fc nvme_img_size=16M nvme_num_iter=10 ./check nvme
  nvme/002 (create many subsystems and test discovery)         [not run]
      nvme_trtype=fc is not supported in this test
  nvme/003 (test if we're sending keep-alives to a discovery controller) [passed]
      runtime  10.337s  ...  10.413s
  nvme/004 (test nvme and nvmet UUID NS descriptors)           [passed]
      runtime  1.817s  ...  1.988s
  nvme/005 (reset local loopback target)                       [passed]
      runtime  1.951s  ...  2.128s
  nvme/006 (create an NVMeOF target with a block device-backed ns) [passed]
      runtime  0.176s  ...  0.271s
  nvme/007 (create an NVMeOF target with a file-backed ns)     [passed]
      runtime  0.124s  ...  0.191s
  nvme/008 (create an NVMeOF host with a block device-backed ns) [passed]
      runtime  1.899s  ...  2.062s
  nvme/009 (create an NVMeOF host with a file-backed ns)       [passed]
      runtime  1.815s  ...  2.004s
  nvme/010 (run data verification fio job on NVMeOF block device-backed ns) [passed]
      runtime  2.795s  ...  3.367s
  nvme/011 (run data verification fio job on NVMeOF file-backed ns) [passed]
      runtime  3.196s  ...  3.934s
  nvme/012 (run mkfs and data verification fio job on NVMeOF block device-backed ns) [not run]
      nvme_img_size must be at least 350m
  nvme/013 (run mkfs and data verification fio job on NVMeOF file-backed ns) [not run]
      nvme_img_size must be at least 350m
  nvme/014 (flush a NVMeOF block device-backed ns)             [passed]
      runtime  2.012s  ...  2.270s
  nvme/015 (unit test for NVMe flush for file backed ns)       [passed]
      runtime  1.953s  ...  2.237s
  nvme/016 (create/delete many NVMeOF block device-backed ns and test discovery) [not run]
      nvme_trtype=fc is not supported in this test
  nvme/017 (create/delete many file-ns and test discovery)     [not run]
      nvme_trtype=fc is not supported in this test
  nvme/018 (unit test NVMe-oF out of range access on a file backend) [passed]
      runtime  1.862s  ...  1.996s
  nvme/019 (test NVMe DSM Discard command on NVMeOF block-device ns) [passed]
      runtime  1.807s  ...  2.075s
  nvme/020 (test NVMe DSM Discard command on NVMeOF file-backed ns) [passed]
      runtime  1.739s  ...  1.926s
  nvme/021 (test NVMe list command on NVMeOF file-backed ns)   [passed]
      runtime  1.807s  ...  2.041s
  nvme/022 (test NVMe reset command on NVMeOF file-backed ns)  [passed]
      runtime  1.944s  ...  2.066s
  nvme/023 (test NVMe smart-log command on NVMeOF block-device ns) [passed]
      runtime  1.772s  ...  1.901s
  nvme/024 (test NVMe smart-log command on NVMeOF file-backed ns) [passed]
      runtime  1.800s  ...  1.997s
  nvme/025 (test NVMe effects-log command on NVMeOF file-backed ns) [passed]
      runtime  1.806s  ...  1.955s
  nvme/026 (test NVMe ns-descs command on NVMeOF file-backed ns) [passed]
      runtime  1.837s  ...  1.945s
  nvme/027 (test NVMe ns-rescan command on NVMeOF file-backed ns) [passed]
      runtime  1.762s  ...  2.006s
  nvme/028 (test NVMe list-subsys command on NVMeOF file-backed ns) [passed]
      runtime  1.801s  ...  2.102s
  nvme/029 (test userspace IO via nvme-cli read/write interface) [passed]
      runtime  2.175s  ...  2.597s
  nvme/030 (ensure the discovery generation counter is updated appropriately) [passed]
      runtime  0.616s  ...  0.805s
  nvme/031 (test deletion of NVMeOF controllers immediately after setup) [passed]
      runtime  3.415s  ...  4.329s
  nvme/038 (test deletion of NVMeOF subsystem without enabling) [passed]
      runtime  0.064s  ...  0.114s
  nvme/040 (test nvme fabrics controller reset/disconnect operation during I/O)
  nvme/040 (test nvme fabrics controller reset/disconnect operation during I/O) [passed]
      runtime  7.854s  ...  8.058srk/blktests/results/tmpdir.nvme.040.qkb': Directory not empty
  modprobe: FATAL: Module loop is in use.
  modprobe: FATAL: Module loop is in use.
  modprobe: FATAL: Module loop is in use.
  modprobe: FATAL: Module loop is in use.
  modprobe: FATAL: Module loop is in use.
  modprobe: FATAL: Module loop is in use.
  modprobe: FATAL: Module loop is in use.
  modprobe: FATAL: Module loop is in use.
  modprobe: FATAL: Module loop is in use.
  modprobe: FATAL: Module loop is in use.
  nvme/041 (Create authenticated connections)                 
      runtime  0.302s  ...
  WARNING: Test did not clean up fc device: nvme2
  nvme/041 (Create authenticated connections)                  [failed]
      runtime  0.302s  ...  0.435s
      --- tests/nvme/041.out      2023-04-18 17:43:18.167745976 +0200
      +++ /home/wagi/work/blktests/results/nodev/nvme/041.out.bad 2023-06-06 14:07:48.836546072 +0200
      @@ -1,6 +1,8 @@
       Running nvme/041
       Test unauthenticated connection (should fail)
      +failed to lookup subsystem for controller nvme2
       NQN:blktests-subsystem-1 disconnected 0 controller(s)
       Test authenticated connection
      -NQN:blktests-subsystem-1 disconnected 1 controller(s)
      +failed to lookup subsystem for controller nvme2
      ...
      (Run 'diff -u tests/nvme/041.out /home/wagi/work/blktests/results/nodev/nvme/041.out.bad' to see the entire diff)
  nvme/042 (Test dhchap key types for authenticated connections) [passed]
      runtime  4.832s  ...  4.489s
  nvme/043 (Test hash and DH group variations for authenticated connections) [passed]
      runtime  2.698s  ...  9.475s
  nvme/044 (Test bi-directional authentication)               
      runtime  1.205s  ...
  WARNING: Test did not clean up fc device: nvme2
  nvme/044 (Test bi-directional authentication)                [failed]
      runtime  1.205s  ...  1.048s
      --- tests/nvme/044.out      2023-04-18 17:43:18.167745976 +0200
      +++ /home/wagi/work/blktests/results/nodev/nvme/044.out.bad 2023-06-06 14:08:08.524660313 +0200
      @@ -2,9 +2,12 @@
       Test host authentication
       NQN:blktests-subsystem-1 disconnected 1 controller(s)
       Test invalid ctrl authentication (should fail)
      +failed to lookup subsystem for controller nvme2
       NQN:blktests-subsystem-1 disconnected 0 controller(s)
       Test valid ctrl authentication
      -NQN:blktests-subsystem-1 disconnected 1 controller(s)
      ...
      (Run 'diff -u tests/nvme/044.out /home/wagi/work/blktests/results/nodev/nvme/044.out.bad' to see the entire diff)
  nvme/045 (Test re-authentication)                            [passed]
      runtime  6.133s  ...  4.742s
  nvme/047 (test different queue types for fabric transports)  [not run]
      nvme_trtype=fc is not supported in this test
  nvme/048 (Test queue count changes on reconnect)             [failed]
      runtime  16.300s  ...  16.278s
      --- tests/nvme/048.out      2023-06-05 17:20:59.370208331 +0200
      +++ /home/wagi/work/blktests/results/nodev/nvme/048.out.bad 2023-06-06 14:08:33.720806513 +0200
      @@ -1,3 +1,7 @@
       Running nvme/048
      +expected state "connecting" not  reached within 5 seconds
      +FAIL
      +expected state "connecting" not  reached within 5 seconds
      +FAIL
       NQN:blktests-subsystem-1 disconnected 1 controller(s)
       Test complete
  

Daniel Wagner (1):
  nvmet-fcloop: Do not wait on completion when unregister fails

changes:
v4:
  - drop module unload before fc resource cleanup patches
 
v3:
  - do not unlink rport twice
  - https://lore.kernel.org/linux-nvme/20230418130159.11075-1-dwagner@suse.de/

v2:
  - added additional fixes
  - https://lore.kernel.org/linux-nvme/20230411120718.14477-1-dwagner@suse.de/

v1:
  - initial version
  - https://lore.kernel.org/linux-nvme/20230411092209.12719-1-dwagner@suse.de/
  
 drivers/nvme/target/fcloop.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 1/1] nvmet-fcloop: Do not wait on completion when unregister fails
  2023-06-06 12:24 [PATCH v4 0/1] nvmet-fcloop: unblock module removal Daniel Wagner
@ 2023-06-06 12:24 ` Daniel Wagner
  2023-06-07  7:36   ` Christoph Hellwig
                     ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Daniel Wagner @ 2023-06-06 12:24 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	Daniel Wagner

The nvme_fc_unregister_localport() returns an error code in case that
the locaport pointer is NULL or has already been unegisterd. localport is
is either in the ONLINE state (all resources allocated) or has already
been put into DELETED state.

In this case we will never receive an wakeup call and thus any caller
will hang, e.g. module unload.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/nvme/target/fcloop.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
index 1ab3d900f2bf..c65a73433c05 100644
--- a/drivers/nvme/target/fcloop.c
+++ b/drivers/nvme/target/fcloop.c
@@ -1166,7 +1166,8 @@ __wait_localport_unreg(struct fcloop_lport *lport)
 
 	ret = nvme_fc_unregister_localport(lport->localport);
 
-	wait_for_completion(&lport->unreg_done);
+	if (!ret)
+		wait_for_completion(&lport->unreg_done);
 
 	kfree(lport);
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 1/1] nvmet-fcloop: Do not wait on completion when unregister fails
  2023-06-06 12:24 ` [PATCH v4 1/1] nvmet-fcloop: Do not wait on completion when unregister fails Daniel Wagner
@ 2023-06-07  7:36   ` Christoph Hellwig
  2023-06-07 15:09   ` Sagi Grimberg
  2023-06-09 15:48   ` Keith Busch
  2 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2023-06-07  7:36 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 1/1] nvmet-fcloop: Do not wait on completion when unregister fails
  2023-06-06 12:24 ` [PATCH v4 1/1] nvmet-fcloop: Do not wait on completion when unregister fails Daniel Wagner
  2023-06-07  7:36   ` Christoph Hellwig
@ 2023-06-07 15:09   ` Sagi Grimberg
  2023-06-09 15:48   ` Keith Busch
  2 siblings, 0 replies; 5+ messages in thread
From: Sagi Grimberg @ 2023-06-07 15:09 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme
  Cc: linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Hannes Reinecke

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 1/1] nvmet-fcloop: Do not wait on completion when unregister fails
  2023-06-06 12:24 ` [PATCH v4 1/1] nvmet-fcloop: Do not wait on completion when unregister fails Daniel Wagner
  2023-06-07  7:36   ` Christoph Hellwig
  2023-06-07 15:09   ` Sagi Grimberg
@ 2023-06-09 15:48   ` Keith Busch
  2 siblings, 0 replies; 5+ messages in thread
From: Keith Busch @ 2023-06-09 15:48 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke

Thanks, applied for nvme-6.5.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-06-09 15:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-06 12:24 [PATCH v4 0/1] nvmet-fcloop: unblock module removal Daniel Wagner
2023-06-06 12:24 ` [PATCH v4 1/1] nvmet-fcloop: Do not wait on completion when unregister fails Daniel Wagner
2023-06-07  7:36   ` Christoph Hellwig
2023-06-07 15:09   ` Sagi Grimberg
2023-06-09 15:48   ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).