Linux-NFS Archive mirror
 help / color / mirror / Atom feed
From: Dan Aloni <dan.aloni@vastdata.com>
To: Jeff Layton <jlayton@kernel.org>
Cc: Sagi Grimberg <sagi@grimberg.me>,
	linux-nfs@vger.kernel.org, Chuck Lever <chuck.lever@oracle.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler
Date: Tue, 21 May 2024 17:36:38 +0300	[thread overview]
Message-ID: <20240521143638.34u3kn6nm5qslaao@gmail.com> (raw)
In-Reply-To: <fa1a77fee6403454444fffce839924157778df95.camel@kernel.org>

On 2024-05-21 09:22:46, Jeff Layton wrote:
> On Tue, 2024-05-21 at 15:58 +0300, Sagi Grimberg wrote:
> > There is an inherent race where a symlink file may have been overriden
> > (by a different client) between lookup and readlink, resulting in a
> > spurious EIO error returned to userspace. Fix this by propagating back
> > ESTALE errors such that the vfs will retry the lookup/get_link (similar
> > to nfs4_file_open) at least once.
[..]
> FWIW, I think we shouldn't try to do any retry looping on ESTALE beyond
> what we already do.
> 
> Yes, we can sometimes trigger ESTALE errors to bubble up to userland if
> we really thrash the underlying filesystem when testing, but I think
> that's actually desirable:
> 
> If you have real workloads across multiple machines that are racing
> with other that tightly, then you should probably be using some sort of
> locking or other synchronization. If it's clever enough that it
> doesn''t need that, then it should be able to deal with the occasional
> ESTALE error by retrying on its own.

We saw an issue in a real workload employing the method I describe in
the following test case, where the user was surprised getting an error.

It's a symlink atomicity semantics case, where there's a symlink that is
frequently being overridden using a rename. This use case works well
with local file systems and with some other network file systems
implementations (this was also noticeable as other options where
tested).

There is fixed set of regular files `test_file_{1,2,3}`, and a 'shunt'
symlink that keeps getting recreated to one of them:

    while true; do
        i=1;
	while (( i <= 3 )) ; do
	    ln -s -f test_file_$i shunt
	    i=$((i + 1))
	done
    done

Behind the scenes `ln` creates a symlink with a random name, then
performs the `rename` system call to override `shunt`. When another
client is trying to call `open` via the symlink so it would necessarily
resolve to one of the regular files client-side. The previous FH of `shunt`
becomes stale and sometimes fails this test.

It is why we saw a retry loop being worthwhile to implement, whether
inside the NFS client or outside in the VFS layer, the justification for
it was to prevent the workload from breaking when moving between file
systems.

-- 
Dan Aloni

  reply	other threads:[~2024-05-21 14:36 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-21 12:58 [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler Sagi Grimberg
2024-05-21 13:22 ` Jeff Layton
2024-05-21 14:36   ` Dan Aloni [this message]
2024-05-21 15:05   ` Sagi Grimberg
2024-05-21 15:13     ` Trond Myklebust
2024-05-21 15:24       ` Chuck Lever III
2024-05-22  4:41         ` Dan Aloni
2024-05-22 12:11           ` Trond Myklebust
2024-05-21 16:09       ` Sagi Grimberg
2024-05-22 19:40         ` Sagi Grimberg
2024-05-22 21:04           ` Trond Myklebust
2024-05-22 21:19             ` Sagi Grimberg
2024-05-30 18:08               ` Sagi Grimberg
2024-05-30 18:21                 ` Trond Myklebust
2024-05-31  4:24                   ` Sagi Grimberg
2024-05-22 21:20             ` Jeff Layton
2024-05-21 14:02 ` Chuck Lever
2024-05-21 14:59   ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240521143638.34u3kn6nm5qslaao@gmail.com \
    --to=dan.aloni@vastdata.com \
    --cc=chuck.lever@oracle.com \
    --cc=hch@lst.de \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).