Linux-NFS Archive mirror
 help / color / mirror / Atom feed
From: Zhihao Cheng <chengzhihao1@huawei.com>
To: Trond Myklebust <trondmy@kernel.org>, <anna@kernel.org>
Cc: <linux-nfs@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<yangerkun@huawei.com>, Li Lingfeng <lilingfeng3@huawei.com>
Subject: Re: [PATCH] NFSv4: Fix state recovery deadlock when server misses grace period
Date: Wed, 6 May 2026 09:25:59 +0800	[thread overview]
Message-ID: <c6dd0b1e-ace8-0d50-91f6-f8669d7b41b7@huawei.com> (raw)
In-Reply-To: <863ecd04-01be-114e-f625-47b6a918a546@huawei.com>

在 2026/4/23 17:05, Zhihao Cheng 写道:
> 在 2026/4/22 20:38, Trond Myklebust 写道:
>> On Wed, 2026-04-22 at 14:55 +0800, Zhihao Cheng wrote:
>>> 在 2026/4/22 14:44, Zhihao Cheng 写道:
>>> Add lilingfeng3@huawei.com
>>>> NFS server restart causes client to enter an infinite loop during
>>>> state
>>>> recovery. The state manager gets stuck in NFS4CLNT_RECLAIM_NOGRACE
>>>> processing,
>>>> with the server repeatedly returning NFS4ERR_GRACE for each file
>>>> iteration.
>>>> This problem is reported in [1].
>>>>
>>>> Trigger sequence:
>>>>    1. Client opens 2 files. After server reboot, client enters
>>>>       nfs4_do_reclaim(RECLAIM_REBOOT). Server misses grace period
>>>> and returns
>>>>       NFS4ERR_NO_GRACE, causing client to set
>>>> NFS4CLNT_RECLAIM_NOGRACE.
>>>>    2. Client enters nfs4_do_reclaim(RECLAIM_NOGRACE) to recover
>>>> first file.
>>>>       Server reboots again, open request returns NFS4ERR_BADSESSION,
>>>> client
>>>>       sets NFS4CLNT_SESSION_RESET.
>>>>    3. nfs4_reset_session calls nfs4_proc_create_session which fails
>>>> with
>>>>       ETIMEDOUT due to network¹ÊÕÏ, nfs4_handle_reclaim_lease_error
>>>> sets
>>>>       NFS4CLNT_LEASE_EXPIRED but does NOT set
>>>> NFS4CLNT_RECLAIM_REBOOT.
>>>>    4. When nfs4_reclaim_lease runs, because NFS4CLNT_RECLAIM_NOGRACE
>>>> is already
>>>>       set, it skips setting NFS4CLNT_RECLAIM_REBOOT (the bug,
>>>> modified by
>>>>       commit b42353ff8d346 ("NFSv4.1: Clean up
>>>> nfs4_reclaim_lease")).
>>>>    5. Server never receives RECLAIM_COMPLETE, so cl_flags lacks
>>>>       NFSD4_CLIENT_RECLAIM_COMPLETE. When processing subsequent
>>>> files,
>>>>       server always returns nfserr_grace, causing infinite retry
>>>> loop.
>>>>
>>>> Fix it by setting NFS4CLNT_RECLAIM_REBOOT in nfs4_reclaim_lease if
>>>> NFS4CLNT_SERVER_SCOPE_MISMATCH is not set, so that the client sends
>>>> RECLAIM_COMPLETE to the server first, allowing subsequent nograce
>>>> recovery to proceed.
>>>>
>>>> Fetch a reproducer in [2].
>>>>
>>>> [1]
>>>> https://lore.kernel.org/linux-nfs/55da00d4-a656-4ed2-ae57-7f881297a1b2@huawei.com/ 
>>>>
>>>> [2] https://bugzilla.kernel.org/show_bug.cgi?id=221399
>>>>
>>>> Fixes: b42353ff8d346 ("NFSv4.1: Clean up nfs4_reclaim_lease")
>>>> Cc: stable@vger.kernel.org
>>>> Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
>>>> Closes:
>>>> https://lore.kernel.org/linux-nfs/55da00d4-a656-4ed2-ae57-7f881297a1b2@huawei.com/ 
>>>>
>>>> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
>>>> ---
>>>>    fs/nfs/nfs4state.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>> index 305a772e5497..817327e73d88 100644
>>>> --- a/fs/nfs/nfs4state.c
>>>> +++ b/fs/nfs/nfs4state.c
>>>> @@ -2012,7 +2012,7 @@ static int nfs4_reclaim_lease(struct
>>>> nfs_client *clp)
>>>>            return nfs4_handle_reclaim_lease_error(clp,
>>>> status);
>>>>        if (test_and_clear_bit(NFS4CLNT_SERVER_SCOPE_MISMATCH,
>>>> &clp->cl_state))
>>>>            nfs4_state_start_reclaim_nograce(clp);
>>>> -    if (!test_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state))
>>>> +    else
>>>>            set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state);
>>>>        clear_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
>>>>        clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
>>>>
>>>
>> This will cause the client to try to do reboot recovery in a situation
>> where it isn't allowed to do so by the spec. We should never be setting
>> NFS4CLNT_RECLAIM_REBOOT if NFS4CLNT_RECLAIM_NOGRACE is already set.
> Hi Trond, the client can only do reclaim reboot during the grace 
> period(after server reboot), according to [1], any reclaim type messages 
> should be rejected by server for NOGRACE state, am I understanding right?
> [1] https://datatracker.ietf.org/doc/html/rfc5661#section-8.4.2.1
>>
>> One solution would be to just immediately call
>> nfs4_state_end_reclaim_reboot() if NFS4CLNT_RECLAIM_NOGRACE is set.
>>
friendly ping
> I tried with your suggestion, and the problem still happens:
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 5044bb4c870f..5b6cb3f7ff2e 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -2014,6 +2014,8 @@ static int nfs4_reclaim_lease(struct nfs_client *clp)
>                  nfs4_state_start_reclaim_nograce(clp);
>          if (!test_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state))
>                  set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state);
> +       else
> +               nfs4_state_end_reclaim_reboot(clp);
>          clear_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
>          clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
>          return 0;
> 
> The nfs4_state_end_reclaim_reboot sends RECLAIM_COMPLETE only if the clp 
> has NFS4CLNT_RECLAIM_REBOOT flag, but it does not.
> 
> The root cause is that the client has identified an incorrect server 
> status after the server's second reboot(since step 2). I think the 
> client should do reclaim reboot every time the server restarts, so we 
> should let client know the server reboots again, can we take following 
> methods?
> 1. Clear NFS4CLNT_RECLAIM_NOGRACE flag if client knows that the server 
> restart while doing nfs4_do_reclaim(NFS4CLNT_RECLAIM_NOGRACE). Client 
> could identify the server state by NFS4ERR_BADSESSION retcode:
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 5044bb4c870f..3f73dc3919a2 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -1875,6 +1875,9 @@ static int nfs4_do_reclaim(struct nfs_client *clp, 
> const struct nfs4_state_recov
>                           clp->cl_hostname, lost_locks);
>                   set_bit(ops->owner_flag_bit, &sp->so_flags);
>                   nfs4_put_state_owner(sp);
> +                if (ops == clp->cl_mvops->nograce_recovery_ops &&
> +                    status == -NFS4ERR_BADSESSION)
> +                    clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state);
>                   status = nfs4_recovery_handle_error(clp, status);
>                   nfs4_free_state_owners(&freeme);
>                   return (status != 0) ? status : -EAGAIN;
> 
> 2. Client knows the server restarts by error code 
> NFS4ERR_STALE_CLIENTID(nfs4_proc_create_session -> 
> nfs4_handle_reclaim_lease_error), but nfs4_reset_session won't retry if 
> the error code is ETIMEDOUT(eg. network failure in step 3), we should 
> let client retry nfs4_reset_session if network failure.
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 5044bb4c870f..62fd57e602d6 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -2459,6 +2459,8 @@ static int nfs4_reset_session(struct nfs_client *clp)
>       if (status) {
>           dprintk("%s: session reset failed with status %d for server 
> %s!\n",
>               __func__, status, clp->cl_hostname);
> +        if (status == -ETIMEDOUT)
> +            goto out;
>           status = nfs4_handle_reclaim_lease_error(clp, status);
>           goto out;
>       }
> @@ -2552,6 +2554,8 @@ static void nfs4_state_manager(struct nfs_client 
> *clp)
>           if (test_and_clear_bit(NFS4CLNT_SESSION_RESET, &clp->cl_state)) {
>               section = "reset session";
>               status = nfs4_reset_session(clp);
> +            if (status == -ETIMEDOUT)
> +                continue;
>               if (test_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state))
>                   continue;
>               if (status < 0)
> 3. Conditionaly set NFS4CLNT_RECLAIM_REBOOT in nfs4_reclaim_lease, if 
> the client has ever been recovered failed by 
> nfs4_do_reclaim(NFS4CLNT_RECLAIM_NOGRACE)->BAD_SESSION, mark clp with 
> NFS4CLNT_RECLAIM_REBOOT.
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 5044bb4c870f..82eb650cc135 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -1875,6 +1875,9 @@ static int nfs4_do_reclaim(struct nfs_client *clp, 
> const struct nfs4_state_recov
>                           clp->cl_hostname, lost_locks);
>                   set_bit(ops->owner_flag_bit, &sp->so_flags);
>                   nfs4_put_state_owner(sp);
> +                if (ops == clp->cl_mvops->nograce_recovery_ops &&
> +                    status == -NFS4ERR_BADSESSION)
> +                    __set_bit(NFS_CS_NOGRACE_REBOOT, &clp->cl_flags);
>                   status = nfs4_recovery_handle_error(clp, status);
>                   nfs4_free_state_owners(&freeme);
>                   return (status != 0) ? status : -EAGAIN;
> @@ -2012,7 +2015,7 @@ static int nfs4_reclaim_lease(struct nfs_client *clp)
>           return nfs4_handle_reclaim_lease_error(clp, status);
>       if (test_and_clear_bit(NFS4CLNT_SERVER_SCOPE_MISMATCH, 
> &clp->cl_state))
>           nfs4_state_start_reclaim_nograce(clp);
> -    if (!test_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state))
> +    if (!test_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state) || 
> test_bit(NFS_CS_NOGRACE_REBOOT, &clp->cl_flags))
>           set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state);
>       clear_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
>       clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
> @@ -2624,6 +2627,7 @@ static void nfs4_state_manager(struct nfs_client 
> *clp)
>                   continue;
>               if (status < 0)
>                   goto out_error;
> +            clear_bit(NFS_CS_NOGRACE_REBOOT, &clp->cl_flags);
>               clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state);
>           }
> 
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index 4daee27fa5eb..8d36a727513c 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -51,6 +51,7 @@ struct nfs_client {
>   #define NFS_CS_REUSEPORT    8        /* - reuse src port on reconnect */
>   #define NFS_CS_PNFS        9        /* - Server used for pnfs */
>   #define NFS_CS_NETUNREACH_FATAL    10        /* - ENETUNREACH errors 
> are fatal */
> +#define NFS_CS_NOGRACE_REBOOT    11        /* - Server reboot during 
> nograce recovery */
>       struct sockaddr_storage    cl_addr;    /* server identifier */
>       size_t            cl_addrlen;
>       char *            cl_hostname;    /* hostname of server */
> 
> .


      reply	other threads:[~2026-05-06  1:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-22  6:44 [PATCH] NFSv4: Fix state recovery deadlock when server misses grace period Zhihao Cheng
2026-04-22  6:55 ` Zhihao Cheng
2026-04-22 12:38   ` Trond Myklebust
2026-04-23  9:05     ` Zhihao Cheng
2026-05-06  1:25       ` Zhihao Cheng [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c6dd0b1e-ace8-0d50-91f6-f8669d7b41b7@huawei.com \
    --to=chengzhihao1@huawei.com \
    --cc=anna@kernel.org \
    --cc=lilingfeng3@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@kernel.org \
    --cc=yangerkun@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).