All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: kwb <wangbing.kuang@shopee.com>
To: sagi@grimberg.me
Cc: axboe@fb.com, chunguang.xu@shopee.com, hch@lst.de,
	james.smart@broadcom.com, kbusch@kernel.org,
	linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
	wangbing.kuang@shopee.com
Subject: Re: [Bug Report] nvme connect deadlock in allocating tag
Date: Sun, 28 Apr 2024 18:25:27 +0800	[thread overview]
Message-ID: <20240428102527.37462-1-wangbing.kuang@shopee.com> (raw)
In-Reply-To: <d200fc7c-c781-49f1-8277-bdb5d537b1f4@grimberg.me>

>On 28/04/2024 12:16, Wangbing Kuang wrote:
>> "The error_recovery work should unquiesce the admin_q, which should fail
>> fast all pending admin commands,
>> so it is unclear to me how the connect process gets stuck."
>> I think the reason is: the command can be unquiesce but the tag cannot be
>> return until command success.
>
>The error recovery also cancels all pending requests. See 
>nvme_cancel_admin_tagset

nvme_cancel_admin_tagset can cancel requests before stop admin queue, but 
cannot cancel requests before next reconnect time.
The time line is:
recover failed(we can reproduce by hang io for more time) 
-> reconnect delay 
-> multi nvme list issue(used up tagset) 
-> reconnect start(wait for tag when call nvme_enabel_ctrl and nvme_wait_ready)


>>
>> "What is step (2) - make nvme io timeout to recover the connection?"
>> I use spdk-nvmf-target for backend.  It is easy to set read/write
>> nvmf-target io  hang and unhang.  So I just set the io hang for over 30
>> seconds, then trigger linux-nvmf-host trigger io timeout event. then io
>> timeout will trigger connection recover.
>> by the way, I use multipath=0
>
>Interesting, does this happen with multipath=Y ?
>I didn't expect people to be using multipath=0 for fabrics in the past few
>years.

No certain, I did not test on multipath=Y.We choose multipath=0 cos less code and we need only one path

>>
>> "Is this reproducing with upstream nvme? or is this some distro kernel
>> where this happens?"
>> it is reproduced in a kernel based from v5.15, but I think this is common
>> error.
>
>It would be beneficial to verify this.

ok, test need more time, but we can first verify it only in v5.15.

>Do you have the below patch applied?
>de105068fead ("nvme: fix reconnection fail due to reserved tag allocation")

yes, my modification is inspired from the commit. Chungguang.xu is my coleague

  parent reply	other threads:[~2024-04-28 10:25 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-28  6:31 [Bug Report] nvme connect deadlock in allocating tag kwb
2024-04-28  8:10 ` Sagi Grimberg
     [not found]   ` <CAG89T6Yyx4f6Dt__L=Ku4AwCzuEUhEOOEjuXbdE+GSSqVJVZ-g@mail.gmail.com>
2024-04-28  9:30     ` Sagi Grimberg
2024-04-28 10:25   ` kwb [this message]
2024-04-28 12:38     ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240428102527.37462-1-wangbing.kuang@shopee.com \
    --to=wangbing.kuang@shopee.com \
    --cc=axboe@fb.com \
    --cc=chunguang.xu@shopee.com \
    --cc=hch@lst.de \
    --cc=james.smart@broadcom.com \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.