From: Zhihao Cheng <chengzhihao1@huawei.com>
To: Trond Myklebust <trondmy@kernel.org>, <anna@kernel.org>
Cc: <linux-nfs@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<yangerkun@huawei.com>, Li Lingfeng <lilingfeng3@huawei.com>
Subject: Re: [PATCH] NFSv4: Fix state recovery deadlock when server misses grace period
Date: Wed, 6 May 2026 09:25:59 +0800 [thread overview]
Message-ID: <c6dd0b1e-ace8-0d50-91f6-f8669d7b41b7@huawei.com> (raw)
In-Reply-To: <863ecd04-01be-114e-f625-47b6a918a546@huawei.com>
在 2026/4/23 17:05, Zhihao Cheng 写道:
> 在 2026/4/22 20:38, Trond Myklebust 写道:
>> On Wed, 2026-04-22 at 14:55 +0800, Zhihao Cheng wrote:
>>> 在 2026/4/22 14:44, Zhihao Cheng 写道:
>>> Add lilingfeng3@huawei.com
>>>> NFS server restart causes client to enter an infinite loop during
>>>> state
>>>> recovery. The state manager gets stuck in NFS4CLNT_RECLAIM_NOGRACE
>>>> processing,
>>>> with the server repeatedly returning NFS4ERR_GRACE for each file
>>>> iteration.
>>>> This problem is reported in [1].
>>>>
>>>> Trigger sequence:
>>>> 1. Client opens 2 files. After server reboot, client enters
>>>> nfs4_do_reclaim(RECLAIM_REBOOT). Server misses grace period
>>>> and returns
>>>> NFS4ERR_NO_GRACE, causing client to set
>>>> NFS4CLNT_RECLAIM_NOGRACE.
>>>> 2. Client enters nfs4_do_reclaim(RECLAIM_NOGRACE) to recover
>>>> first file.
>>>> Server reboots again, open request returns NFS4ERR_BADSESSION,
>>>> client
>>>> sets NFS4CLNT_SESSION_RESET.
>>>> 3. nfs4_reset_session calls nfs4_proc_create_session which fails
>>>> with
>>>> ETIMEDOUT due to network¹ÊÕÏ, nfs4_handle_reclaim_lease_error
>>>> sets
>>>> NFS4CLNT_LEASE_EXPIRED but does NOT set
>>>> NFS4CLNT_RECLAIM_REBOOT.
>>>> 4. When nfs4_reclaim_lease runs, because NFS4CLNT_RECLAIM_NOGRACE
>>>> is already
>>>> set, it skips setting NFS4CLNT_RECLAIM_REBOOT (the bug,
>>>> modified by
>>>> commit b42353ff8d346 ("NFSv4.1: Clean up
>>>> nfs4_reclaim_lease")).
>>>> 5. Server never receives RECLAIM_COMPLETE, so cl_flags lacks
>>>> NFSD4_CLIENT_RECLAIM_COMPLETE. When processing subsequent
>>>> files,
>>>> server always returns nfserr_grace, causing infinite retry
>>>> loop.
>>>>
>>>> Fix it by setting NFS4CLNT_RECLAIM_REBOOT in nfs4_reclaim_lease if
>>>> NFS4CLNT_SERVER_SCOPE_MISMATCH is not set, so that the client sends
>>>> RECLAIM_COMPLETE to the server first, allowing subsequent nograce
>>>> recovery to proceed.
>>>>
>>>> Fetch a reproducer in [2].
>>>>
>>>> [1]
>>>> https://lore.kernel.org/linux-nfs/55da00d4-a656-4ed2-ae57-7f881297a1b2@huawei.com/
>>>>
>>>> [2] https://bugzilla.kernel.org/show_bug.cgi?id=221399
>>>>
>>>> Fixes: b42353ff8d346 ("NFSv4.1: Clean up nfs4_reclaim_lease")
>>>> Cc: stable@vger.kernel.org
>>>> Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
>>>> Closes:
>>>> https://lore.kernel.org/linux-nfs/55da00d4-a656-4ed2-ae57-7f881297a1b2@huawei.com/
>>>>
>>>> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
>>>> ---
>>>> fs/nfs/nfs4state.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>> index 305a772e5497..817327e73d88 100644
>>>> --- a/fs/nfs/nfs4state.c
>>>> +++ b/fs/nfs/nfs4state.c
>>>> @@ -2012,7 +2012,7 @@ static int nfs4_reclaim_lease(struct
>>>> nfs_client *clp)
>>>> return nfs4_handle_reclaim_lease_error(clp,
>>>> status);
>>>> if (test_and_clear_bit(NFS4CLNT_SERVER_SCOPE_MISMATCH,
>>>> &clp->cl_state))
>>>> nfs4_state_start_reclaim_nograce(clp);
>>>> - if (!test_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state))
>>>> + else
>>>> set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state);
>>>> clear_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
>>>> clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
>>>>
>>>
>> This will cause the client to try to do reboot recovery in a situation
>> where it isn't allowed to do so by the spec. We should never be setting
>> NFS4CLNT_RECLAIM_REBOOT if NFS4CLNT_RECLAIM_NOGRACE is already set.
> Hi Trond, the client can only do reclaim reboot during the grace
> period(after server reboot), according to [1], any reclaim type messages
> should be rejected by server for NOGRACE state, am I understanding right?
> [1] https://datatracker.ietf.org/doc/html/rfc5661#section-8.4.2.1
>>
>> One solution would be to just immediately call
>> nfs4_state_end_reclaim_reboot() if NFS4CLNT_RECLAIM_NOGRACE is set.
>>
friendly ping
> I tried with your suggestion, and the problem still happens:
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 5044bb4c870f..5b6cb3f7ff2e 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -2014,6 +2014,8 @@ static int nfs4_reclaim_lease(struct nfs_client *clp)
> nfs4_state_start_reclaim_nograce(clp);
> if (!test_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state))
> set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state);
> + else
> + nfs4_state_end_reclaim_reboot(clp);
> clear_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
> clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
> return 0;
>
> The nfs4_state_end_reclaim_reboot sends RECLAIM_COMPLETE only if the clp
> has NFS4CLNT_RECLAIM_REBOOT flag, but it does not.
>
> The root cause is that the client has identified an incorrect server
> status after the server's second reboot(since step 2). I think the
> client should do reclaim reboot every time the server restarts, so we
> should let client know the server reboots again, can we take following
> methods?
> 1. Clear NFS4CLNT_RECLAIM_NOGRACE flag if client knows that the server
> restart while doing nfs4_do_reclaim(NFS4CLNT_RECLAIM_NOGRACE). Client
> could identify the server state by NFS4ERR_BADSESSION retcode:
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 5044bb4c870f..3f73dc3919a2 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -1875,6 +1875,9 @@ static int nfs4_do_reclaim(struct nfs_client *clp,
> const struct nfs4_state_recov
> clp->cl_hostname, lost_locks);
> set_bit(ops->owner_flag_bit, &sp->so_flags);
> nfs4_put_state_owner(sp);
> + if (ops == clp->cl_mvops->nograce_recovery_ops &&
> + status == -NFS4ERR_BADSESSION)
> + clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state);
> status = nfs4_recovery_handle_error(clp, status);
> nfs4_free_state_owners(&freeme);
> return (status != 0) ? status : -EAGAIN;
>
> 2. Client knows the server restarts by error code
> NFS4ERR_STALE_CLIENTID(nfs4_proc_create_session ->
> nfs4_handle_reclaim_lease_error), but nfs4_reset_session won't retry if
> the error code is ETIMEDOUT(eg. network failure in step 3), we should
> let client retry nfs4_reset_session if network failure.
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 5044bb4c870f..62fd57e602d6 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -2459,6 +2459,8 @@ static int nfs4_reset_session(struct nfs_client *clp)
> if (status) {
> dprintk("%s: session reset failed with status %d for server
> %s!\n",
> __func__, status, clp->cl_hostname);
> + if (status == -ETIMEDOUT)
> + goto out;
> status = nfs4_handle_reclaim_lease_error(clp, status);
> goto out;
> }
> @@ -2552,6 +2554,8 @@ static void nfs4_state_manager(struct nfs_client
> *clp)
> if (test_and_clear_bit(NFS4CLNT_SESSION_RESET, &clp->cl_state)) {
> section = "reset session";
> status = nfs4_reset_session(clp);
> + if (status == -ETIMEDOUT)
> + continue;
> if (test_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state))
> continue;
> if (status < 0)
> 3. Conditionaly set NFS4CLNT_RECLAIM_REBOOT in nfs4_reclaim_lease, if
> the client has ever been recovered failed by
> nfs4_do_reclaim(NFS4CLNT_RECLAIM_NOGRACE)->BAD_SESSION, mark clp with
> NFS4CLNT_RECLAIM_REBOOT.
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 5044bb4c870f..82eb650cc135 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -1875,6 +1875,9 @@ static int nfs4_do_reclaim(struct nfs_client *clp,
> const struct nfs4_state_recov
> clp->cl_hostname, lost_locks);
> set_bit(ops->owner_flag_bit, &sp->so_flags);
> nfs4_put_state_owner(sp);
> + if (ops == clp->cl_mvops->nograce_recovery_ops &&
> + status == -NFS4ERR_BADSESSION)
> + __set_bit(NFS_CS_NOGRACE_REBOOT, &clp->cl_flags);
> status = nfs4_recovery_handle_error(clp, status);
> nfs4_free_state_owners(&freeme);
> return (status != 0) ? status : -EAGAIN;
> @@ -2012,7 +2015,7 @@ static int nfs4_reclaim_lease(struct nfs_client *clp)
> return nfs4_handle_reclaim_lease_error(clp, status);
> if (test_and_clear_bit(NFS4CLNT_SERVER_SCOPE_MISMATCH,
> &clp->cl_state))
> nfs4_state_start_reclaim_nograce(clp);
> - if (!test_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state))
> + if (!test_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state) ||
> test_bit(NFS_CS_NOGRACE_REBOOT, &clp->cl_flags))
> set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state);
> clear_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
> clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
> @@ -2624,6 +2627,7 @@ static void nfs4_state_manager(struct nfs_client
> *clp)
> continue;
> if (status < 0)
> goto out_error;
> + clear_bit(NFS_CS_NOGRACE_REBOOT, &clp->cl_flags);
> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state);
> }
>
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index 4daee27fa5eb..8d36a727513c 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -51,6 +51,7 @@ struct nfs_client {
> #define NFS_CS_REUSEPORT 8 /* - reuse src port on reconnect */
> #define NFS_CS_PNFS 9 /* - Server used for pnfs */
> #define NFS_CS_NETUNREACH_FATAL 10 /* - ENETUNREACH errors
> are fatal */
> +#define NFS_CS_NOGRACE_REBOOT 11 /* - Server reboot during
> nograce recovery */
> struct sockaddr_storage cl_addr; /* server identifier */
> size_t cl_addrlen;
> char * cl_hostname; /* hostname of server */
>
> .
prev parent reply other threads:[~2026-05-06 1:26 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-22 6:44 [PATCH] NFSv4: Fix state recovery deadlock when server misses grace period Zhihao Cheng
2026-04-22 6:55 ` Zhihao Cheng
2026-04-22 12:38 ` Trond Myklebust
2026-04-23 9:05 ` Zhihao Cheng
2026-05-06 1:25 ` Zhihao Cheng [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c6dd0b1e-ace8-0d50-91f6-f8669d7b41b7@huawei.com \
--to=chengzhihao1@huawei.com \
--cc=anna@kernel.org \
--cc=lilingfeng3@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@kernel.org \
--cc=yangerkun@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).