From: Nikolay Amiantov <ab@fmap.me>
To: linux-nfs@vger.kernel.org
Subject: Help tracking down a possible race
Date: Thu, 16 Apr 2026 00:24:09 +0700 [thread overview]
Message-ID: <42bcb43b-5782-4353-b3f7-6e68d919f3c6@fmap.me> (raw)
Hi all,
tl;dr: weird test results in
https://github.com/abbradar/nfs_stale_cache_test possibly showing a race
in NFS, or that I'm stupid :)
Disclaimer: I'm usually not using NFS at all, and I'm now in a rabbit
hole while researching a bug in a network FS using FUSE (JuiceFS
[1]). Since I'm way over my head here (my first time with VFS/FUSE/NFS
kernel subsystems, some prior experience with kernelspace) I may not be
understanding what I'm talking about at all, so feel free to send me
away if I'm missing something obvious.
The race I'm talking about goes like this:
* On a host A, a writer appends to a file. In the MRE I have it just
writes 0xAA one byte at a time;
* On a host B, we simultaneously:
+ Read the file, also a byte at a time, possibly from multiple
processes/threads simultaneously;
+ At the same time, hammer the same file with stat() calls.
In this case you may randomly read a zero byte instead of the byte you
are expecting to read.
After hunting this bug in JuiceFS, I went down to the FUSE level and
managed to implement an MRE [2]. The setup is similar, only instead of a
writer there is a FUSE FS presenting a slowly growing file.
After much (disclosure: LLM-assisted) research of the kernel code, the
race, as I understand it, is actually relevant to *any* network FS when
updates may happen bypassing the cache layer. I tried checking if it
happens with NFS, and indeed, I can randomly observe zero bytes:
https://github.com/abbradar/nfs_stale_cache_test . I'm repeating my
understanding of the issue here for convenience:
------
When a file grows remotely, the page before the old EOF in the read
cache contains zero-fill beyond the old size. Those zeroes are valid
while new size <= old size (they are beyond EOF), but become stale once
the new size is updated to reflect the remote growth: the remote host
wrote real data there, but the local cache still has the old zero-fill.
In filemap_read() (mm/filemap.c) we have:
```
do {
...
error = filemap_get_pages(iocb, ...); // (1) get cached folios
...
isize = i_size_read(inode); // (2) get file size
...
// (3) copy from folio to user, capped at isize
} while (...);
```
If we grow the inode size in-between (1) and (2), the race happens; the
old page gets capped at the new size, so the userspace reads zeroes
where there should be actual data.
To trigger this bug, something must change the inode size in parallel
with a read and not come from a user's `write()` since writes are
coherent with reads via the cache layer. In a network FS this may happen
on getattr when we discover that the remote file has grown, and update
the inode's size. When this happens we need to mark the cache pages as
stale, but there is no way to "lock" the page and the inode size
simultaneously, so the race cannot be fixed just by stalling the cache
in getattr.
NFS does stall the cache already — it sets NFS_INO_INVALID_DATA, then
before we read invalidates the cache as needed. However the window
between (1) and (2) is still there.
------
Apart from the patches introducing NFS_INO_INVALIDATING I can't find any
prior discussion of this issue for FUSE-based, NFS, CIFS or any other
network FS.
I'd be glad for any help denying or confirming my findings, or
generally pointing me in the right direction.
Cheers,
Nikolay.
1: https://github.com/juicedata/juicefs/issues/5038
2: https://github.com/abbradar/fuse_growtest
reply other threads:[~2026-04-15 17:29 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42bcb43b-5782-4353-b3f7-6e68d919f3c6@fmap.me \
--to=ab@fmap.me \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).