Linux-XFS Archive mirror
 help / color / mirror / Atom feed
From: Leah Rumancik <leah.rumancik@gmail.com>
To: linux-xfs@vger.kernel.org
Cc: amir73il@gmail.com, chandan.babu@oracle.com, fred@cloudflare.com,
	mngyadam@amazon.com, "Darrick J. Wong" <djwong@kernel.org>,
	Gao Xiang <hsiangkao@linux.alibaba.com>,
	Dave Chinner <dchinner@redhat.com>,
	Leah Rumancik <leah.rumancik@gmail.com>
Subject: [PATCH 6.1 CANDIDATE 14/24] xfs: invalidate block device page cache during unmount
Date: Fri, 26 Apr 2024 14:55:01 -0700	[thread overview]
Message-ID: <20240426215512.2673806-15-leah.rumancik@gmail.com> (raw)
In-Reply-To: <20240426215512.2673806-1-leah.rumancik@gmail.com>

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 032e160305f6872e590c77f11896fb28365c6d6c ]

Every now and then I see fstests failures on aarch64 (64k pages) that
trigger on the following sequence:

mkfs.xfs $dev
mount $dev $mnt
touch $mnt/a
umount $mnt
xfs_db -c 'path /a' -c 'print' $dev

99% of the time this succeeds, but every now and then xfs_db cannot find
/a and fails.  This turns out to be a race involving udev/blkid, the
page cache for the block device, and the xfs_db process.

udev is triggered whenever anyone closes a block device or unmounts it.
The default udev rules invoke blkid to read the fs super and create
symlinks to the bdev under /dev/disk.  For this, it uses buffered reads
through the page cache.

xfs_db also uses buffered reads to examine metadata.  There is no
coordination between xfs_db and udev, which means that they can run
concurrently.  Note there is no coordination between the kernel and
blkid either.

On a system with 64k pages, the page cache can cache the superblock and
the root inode (and hence the root dir) with the same 64k page.  If
udev spawns blkid after the mkfs and the system is busy enough that it
is still running when xfs_db starts up, they'll both read from the same
page in the pagecache.

The unmount writes updated inode metadata to disk directly.  The XFS
buffer cache does not use the bdev pagecache, nor does it invalidate the
pagecache on umount.  If the above scenario occurs, the pagecache no
longer reflects what's on disk, xfs_db reads the stale metadata, and
fails to find /a.  Most of the time this succeeds because closing a bdev
invalidates the page cache, but when processes race, everyone loses.

Fix the problem by invalidating the bdev pagecache after flushing the
bdev, so that xfs_db will see up to date metadata.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
---
 fs/xfs/xfs_buf.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index dde346450952..54c774af6e1c 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1945,6 +1945,7 @@ xfs_free_buftarg(
 	list_lru_destroy(&btp->bt_lru);
 
 	blkdev_issue_flush(btp->bt_bdev);
+	invalidate_bdev(btp->bt_bdev);
 	fs_put_dax(btp->bt_daxdev, btp->bt_mount);
 
 	kmem_free(btp);
-- 
2.44.0.769.g3c40516874-goog


  parent reply	other threads:[~2024-04-26 21:56 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-26 21:54 [PATCH 6.1 CANDIDATE 00/24] more backport proposals for linux-6.1.y Leah Rumancik
2024-04-26 21:54 ` [PATCH 6.1 CANDIDATE 01/24] xfs: write page faults in iomap are not buffered writes Leah Rumancik
2024-04-26 21:54 ` [PATCH 6.1 CANDIDATE 02/24] xfs: punching delalloc extents on write failure is racy Leah Rumancik
2024-04-26 21:54 ` [PATCH 6.1 CANDIDATE 03/24] xfs: use byte ranges for write cleanup ranges Leah Rumancik
2024-04-26 21:54 ` [PATCH 6.1 CANDIDATE 04/24] xfs,iomap: move delalloc punching to iomap Leah Rumancik
2024-04-26 21:54 ` [PATCH 6.1 CANDIDATE 05/24] iomap: buffered write failure should not truncate the page cache Leah Rumancik
2024-04-26 21:54 ` [PATCH 6.1 CANDIDATE 06/24] xfs: xfs_bmap_punch_delalloc_range() should take a byte range Leah Rumancik
2024-04-26 21:54 ` [PATCH 6.1 CANDIDATE 07/24] iomap: write iomap validity checks Leah Rumancik
2024-04-26 21:54 ` [PATCH 6.1 CANDIDATE 08/24] xfs: use iomap_valid method to detect stale cached iomaps Leah Rumancik
2024-04-26 21:54 ` [PATCH 6.1 CANDIDATE 09/24] xfs: drop write error injection is unfixable, remove it Leah Rumancik
2024-04-26 21:54 ` [PATCH 6.1 CANDIDATE 10/24] xfs: fix off-by-one-block in xfs_discard_folio() Leah Rumancik
2024-04-26 21:54 ` [PATCH 6.1 CANDIDATE 11/24] xfs: fix incorrect error-out in xfs_remove Leah Rumancik
2024-04-26 21:54 ` [PATCH 6.1 CANDIDATE 12/24] xfs: fix sb write verify for lazysbcount Leah Rumancik
2024-04-26 21:55 ` [PATCH 6.1 CANDIDATE 13/24] xfs: fix incorrect i_nlink caused by inode racing Leah Rumancik
2024-04-26 21:55 ` Leah Rumancik [this message]
2024-04-26 21:55 ` [PATCH 6.1 CANDIDATE 15/24] xfs: attach dquots to inode before reading data/cow fork mappings Leah Rumancik
2024-04-26 21:55 ` [PATCH 6.1 CANDIDATE 16/24] xfs: wait iclog complete before tearing down AIL Leah Rumancik
2024-04-26 21:55 ` [PATCH 6.1 CANDIDATE 17/24] xfs: fix super block buf log item UAF during force shutdown Leah Rumancik
2024-04-26 21:55 ` [PATCH 6.1 CANDIDATE 18/24] xfs: hoist refcount record merge predicates Leah Rumancik
2024-04-26 21:55 ` [PATCH 6.1 CANDIDATE 19/24] xfs: estimate post-merge refcounts correctly Leah Rumancik
2024-04-26 21:55 ` [PATCH 6.1 CANDIDATE 20/24] xfs: invalidate xfs_bufs when allocating cow extents Leah Rumancik
2024-04-26 21:55 ` [PATCH 6.1 CANDIDATE 21/24] xfs: allow inode inactivation during a ro mount log recovery Leah Rumancik
2024-04-26 21:55 ` [PATCH 6.1 CANDIDATE 22/24] xfs: fix log recovery when unknown rocompat bits are set Leah Rumancik
2024-04-26 21:55 ` [PATCH 6.1 CANDIDATE 23/24] xfs: get root inode correctly at bulkstat Leah Rumancik
2024-04-26 21:55 ` [PATCH 6.1 CANDIDATE 24/24] xfs: short circuit xfs_growfs_data_private() if delta is zero Leah Rumancik
2024-04-26 23:14 ` [PATCH 6.1 CANDIDATE 00/24] more backport proposals for linux-6.1.y Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240426215512.2673806-15-leah.rumancik@gmail.com \
    --to=leah.rumancik@gmail.com \
    --cc=amir73il@gmail.com \
    --cc=chandan.babu@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=fred@cloudflare.com \
    --cc=hsiangkao@linux.alibaba.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mngyadam@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).