From: yangerkun <yangerkun@huawei.com>
To: Jeff Layton <jlayton@kernel.org>, <trondmy@kernel.org>,
<anna@kernel.org>, <chuck.lever@oracle.com>
Cc: <linux-nfs@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<yangerkun@huaweicloud.com>, <lilingfeng3@huawei.com>,
<zhangjian496@h-partners.com>, <yi.zhang@huawei.com>
Subject: Re: [RFC PATCH] nfs: use nfsi->rwsem to protect traversal of the file lock list
Date: Fri, 8 May 2026 16:33:30 +0800 [thread overview]
Message-ID: <403e3267-70ef-4fb3-9700-0a14d28abc4c@huawei.com> (raw)
In-Reply-To: <163aebb3-4233-4aaa-872c-c36aa3fcb3af@huawei.com>
Gently ping...
在 2026/4/16 11:01, yangerkun 写道:
> Hi Anna and Trond,
>
> Could you please help check if there are any issues with this patch, and
> if there are none, could you help merge it in?
>
> Thanks,
> Erkun.
>
> 在 2026/3/9 22:09, Jeff Layton 写道:
>> On Thu, 2026-02-26 at 09:22 +0800, Yang Erkun wrote:
>>> Lingfeng identified a bug and suggested two solutions, but both appear
>>> to have issues.
>>>
>>> Generally, we cannot release flc_lock while iterating over the file lock
>>> list to avoid use-after-free (UAF) problems with file locks. However,
>>> functions like nfs_delegation_claim_locks and nfs4_reclaim_locks cannot
>>> adhere to this rule because recover_lock or nfs4_lock_delegation_recall
>>> may take a long time. To resolve this, NFS switches to using nfsi->rwsem
>>> for the same protection, and nfs_reclaim_locks follows this approach.
>>> Although nfs_delegation_claim_locks uses so_delegreturn_mutex instead,
>>> this is inadequate since a single inode can have multiple nfs4_state
>>> instances. Therefore, the fix is to also use nfsi->rwsem in this case.
>>>
>>> Furthermore, after commit c69899a17ca4 ("NFSv4: Update of VFS byte range
>>> lock must be atomic with the stateid update"), the functions
>>> nfs4_locku_done and nfs4_lock_done also break this rule because they
>>> call locks_lock_inode_wait without holding nfsi->rwsem. Simply adding
>>> this protection could cause many deadlocks, so instead, the call to
>>> locks_lock_inode_wait is moved into _nfs4_proc_setlk. Regarding the bug
>>> fixed by commit c69899a17ca4 ("NFSv4: Update of VFS byte range
>>> lock must be atomic with the stateid update"), it has been resolved
>>> after commit 0460253913e5 ("NFSv4: nfs4_do_open() is incorrectly
>>> triggering
>>> state recovery") because all slots are drained before calling
>>> nfs4_do_reclaim, which prevents concurrent stateid changes along this
>>> path.
>>> Also, nfs_delegation_claim_locks does not cause this concurrency either
>>> since when _nfs4_proc_setlk is called with NFS_DELEGATED_STATE, no
>>> RPC is
>>> sent, so nfs4_lock_done is not called. Therefore,
>>> nfs4_lock_delegation_recall from nfs_delegation_claim_locks is the first
>>> time the stateid is set.
>>>
>>> Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
>>> Closes: https://lore.kernel.org/all/20250419085709.1452492-1-
>>> lilingfeng3@huawei.com/
>>> Closes: https://lore.kernel.org/all/20250715030559.2906634-1-
>>> lilingfeng3@huawei.com/
>>> Fixes: c69899a17ca4 ("NFSv4: Update of VFS byte range lock must be
>>> atomic with the stateid update")
>>> Signed-off-by: Yang Erkun <yangerkun@huawei.com>
>>> ---
>>> fs/nfs/delegation.c | 9 ++++++++-
>>> fs/nfs/nfs4proc.c | 22 +++++++++++-----------
>>> include/linux/nfs_xdr.h | 1 -
>>> 3 files changed, 19 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
>>> index 122fb3f14ffb..9546d2195c25 100644
>>> --- a/fs/nfs/delegation.c
>>> +++ b/fs/nfs/delegation.c
>>> @@ -173,6 +173,7 @@ int nfs4_check_delegation(struct inode *inode,
>>> fmode_t type)
>>> static int nfs_delegation_claim_locks(struct nfs4_state *state,
>>> const nfs4_stateid *stateid)
>>> {
>>> struct inode *inode = state->inode;
>>> + struct nfs_inode *nfsi = NFS_I(inode);
>>> struct file_lock *fl;
>>> struct file_lock_context *flctx = locks_inode_context(inode);
>>> struct list_head *list;
>>> @@ -182,6 +183,9 @@ static int nfs_delegation_claim_locks(struct
>>> nfs4_state *state, const nfs4_state
>>> goto out;
>>> list = &flctx->flc_posix;
>>> +
>>> + /* Guard against reclaim and new lock/unlock calls */
>>> + down_write(&nfsi->rwsem);
>>> spin_lock(&flctx->flc_lock);
>>> restart:
>>> for_each_file_lock(fl, list) {
>>> @@ -189,8 +193,10 @@ static int nfs_delegation_claim_locks(struct
>>> nfs4_state *state, const nfs4_state
>>> continue;
>>> spin_unlock(&flctx->flc_lock);
>>> status = nfs4_lock_delegation_recall(fl, state, stateid);
>>> - if (status < 0)
>>> + if (status < 0) {
>>> + up_write(&nfsi->rwsem);
>>> goto out;
>>> + }
>>> spin_lock(&flctx->flc_lock);
>>> }
>>> if (list == &flctx->flc_posix) {
>>> @@ -198,6 +204,7 @@ static int nfs_delegation_claim_locks(struct
>>> nfs4_state *state, const nfs4_state
>>> goto restart;
>>> }
>>> spin_unlock(&flctx->flc_lock);
>>> + up_write(&nfsi->rwsem);
>>> out:
>>> return status;
>>> }
>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>> index 91bcf67bd743..9d6fbca8798b 100644
>>> --- a/fs/nfs/nfs4proc.c
>>> +++ b/fs/nfs/nfs4proc.c
>>> @@ -7076,7 +7076,6 @@ static void nfs4_locku_done(struct rpc_task
>>> *task, void *data)
>>> switch (task->tk_status) {
>>> case 0:
>>> renew_lease(calldata->server, calldata->timestamp);
>>> - locks_lock_inode_wait(calldata->lsp->ls_state->inode,
>>> &calldata->fl);
>>> if (nfs4_update_lock_stateid(calldata->lsp,
>>> &calldata->res.stateid))
>>> break;
>>> @@ -7344,11 +7343,6 @@ static void nfs4_lock_done(struct rpc_task
>>> *task, void *calldata)
>>> case 0:
>>> renew_lease(NFS_SERVER(d_inode(data->ctx->dentry)),
>>> data->timestamp);
>>> - if (data->arg.new_lock && !data->cancelled) {
>>> - data->fl.c.flc_flags &= ~(FL_SLEEP | FL_ACCESS);
>>> - if (locks_lock_inode_wait(lsp->ls_state->inode, &data-
>>> >fl) < 0)
>>> - goto out_restart;
>>> - }
>>> if (data->arg.new_lock_owner != 0) {
>>> nfs_confirm_seqid(&lsp->ls_seqid, 0);
>>> nfs4_stateid_copy(&lsp->ls_stateid, &data->res.stateid);
>>> @@ -7459,11 +7453,10 @@ static int _nfs4_do_setlk(struct nfs4_state
>>> *state, int cmd, struct file_lock *f
>>> msg.rpc_argp = &data->arg;
>>> msg.rpc_resp = &data->res;
>>> task_setup_data.callback_data = data;
>>> - if (recovery_type > NFS_LOCK_NEW) {
>>> - if (recovery_type == NFS_LOCK_RECLAIM)
>>> - data->arg.reclaim = NFS_LOCK_RECLAIM;
>>> - } else
>>> - data->arg.new_lock = 1;
>>> +
>>> + if (recovery_type == NFS_LOCK_RECLAIM)
>>> + data->arg.reclaim = NFS_LOCK_RECLAIM;
>>> +
>>> task = rpc_run_task(&task_setup_data);
>>> if (IS_ERR(task))
>>> return PTR_ERR(task);
>>> @@ -7573,6 +7566,13 @@ static int _nfs4_proc_setlk(struct nfs4_state
>>> *state, int cmd, struct file_lock
>>> up_read(&nfsi->rwsem);
>>> mutex_unlock(&sp->so_delegreturn_mutex);
>>> status = _nfs4_do_setlk(state, cmd, request, NFS_LOCK_NEW);
>>> + if (status)
>>> + goto out;
>>> +
>>> + down_read(&nfsi->rwsem);
>>> + request->c.flc_flags &= ~(FL_SLEEP | FL_ACCESS);
>>> + status = locks_lock_inode_wait(state->inode, request);
>>> + up_read(&nfsi->rwsem);
>>> out:
>>> request->c.flc_flags = flags;
>>> return status;
>>> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
>>> index ff1f12aa73d2..9599ad15c3ad 100644
>>> --- a/include/linux/nfs_xdr.h
>>> +++ b/include/linux/nfs_xdr.h
>>> @@ -580,7 +580,6 @@ struct nfs_lock_args {
>>> struct nfs_lowner lock_owner;
>>> unsigned char block : 1;
>>> unsigned char reclaim : 1;
>>> - unsigned char new_lock : 1;
>>> unsigned char new_lock_owner : 1;
>>> };
>>
>> Nice work!
>>
>> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>>
>>
>
next prev parent reply other threads:[~2026-05-08 8:33 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-26 1:22 [RFC PATCH] nfs: use nfsi->rwsem to protect traversal of the file lock list Yang Erkun
2026-02-26 8:34 ` yangerkun
2026-03-09 13:03 ` yangerkun
2026-03-09 14:09 ` Jeff Layton
2026-03-10 1:33 ` yangerkun
2026-04-16 3:01 ` yangerkun
2026-05-08 8:33 ` yangerkun [this message]
2026-03-09 14:12 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=403e3267-70ef-4fb3-9700-0a14d28abc4c@huawei.com \
--to=yangerkun@huawei.com \
--cc=anna@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=lilingfeng3@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@kernel.org \
--cc=yangerkun@huaweicloud.com \
--cc=yi.zhang@huawei.com \
--cc=zhangjian496@h-partners.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).