[PATCH 0/2] btrfs: fix data corruption/hang/rsv leak in subpage zoned cases

Linux-BTRFS Archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] btrfs: fix data corruption/hang/rsv leak in subpage zoned cases
@ 2024-03-05 22:15 Qu Wenruo
  2024-03-05 22:15 ` [PATCH 1/2] btrfs: do not clear page dirty inside extent_write_locked_range() Qu Wenruo
  2024-03-05 22:15 ` [PATCH 2/2] btrfs: make extent_write_locked_range() to handle subpage writeback correctly Qu Wenruo
  0 siblings, 2 replies; 3+ messages in thread
From: Qu Wenruo @ 2024-03-05 22:15 UTC (permalink / raw)
  To: linux-btrfs

[REPO]
https://github.com/adam900710/linux/tree/subpage_delalloc

The repo includes the subpage delalloc rework, or subpage zoned won't
work at all.

Although my previous subpage delalloc rework fixes quite a lot of
crashes of subpage + zoned support, it's still pretty easy to cause rsv
leak using single thread fsstress.

It turns out that, it's not a simple problem of some rsv leak, but
certain dirty data ranges never got written back and just skipped with
its dirty flags cleared, no wonder that would lead to rsv leak.

The root cause is again in the extent_write_locked_range() function
doing weird subpage incompatible behaviors, especially when it clears
the page dirty flag for the whole page, causing __extent_writepage_io()
unable to locate any dirty ranges to be submitted.

The first patch would solve the problem, meanwhile for the 2nd patch
it's a cleanup, as we will never hit the error for current subpage +
zoned cases.

Qu Wenruo (2):
  btrfs: do not clear page dirty inside extent_write_locked_range()
  btrfs: make extent_write_locked_range() to handle subpage writeback
    correctly

 fs/btrfs/extent_io.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

-- 
2.44.0

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] btrfs: do not clear page dirty inside extent_write_locked_range()
  2024-03-05 22:15 [PATCH 0/2] btrfs: fix data corruption/hang/rsv leak in subpage zoned cases Qu Wenruo
@ 2024-03-05 22:15 ` Qu Wenruo
  2024-03-05 22:15 ` [PATCH 2/2] btrfs: make extent_write_locked_range() to handle subpage writeback correctly Qu Wenruo
  1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2024-03-05 22:15 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
For subpage + zoned case, btrfs can easily hang with the following
workload, even with previous subpage delalloc rework:

 # mkfs.btrfs -f $dev
 # mount $dev $mnt
 # xfs_io -f -c "pwrite 32k 128k" $mnt/foobar
 # umount $mnt

The system would hang at unmount due to unfinished ordered extents.

Above $dev is a tcmu-runner emulated zoned HDD, which has a max zone
append size of 64K.

[CAUSE]
There is a bug involved in extent_write_locked_range() (well, I'm
already surprised by how many subpage incompatible code are inside that
function):

- If @pages_dirty is true, we will clear the page dirty flag for the
  whole page

  This means, for above case, since the max zone append size is 64K,
  we got an ordered extent sized 64K, resulting the following writeback
  range:

  0      32K        64K       96K       128K    192K    256K
  |      |//////////|/////////|/////////|///////|       |
          \     Write back    /

  |///| = subpage dirty range

  Since we clear the dirty flag for the page at 64K before entering
  __extent_writepage_io(), result the following page flags:

  0      32K        64K       96K       128K    192K    256K
  |      |          |         |         |///////|       |

  Then for the next delalloc range run, we would create ordered extent
  for the range [96K, 192K) and writeback the range.

  But since the whole 2nd page has no dirty flag set, we only submit
  the range [128K, 192K), meanwhile our ordered extent is still in 64K
  size, it would never be properly finished.
  And this also mean, dirty data is not properly submitted for
  writeback, and would cause data corruption.

This bug only affects subpage and zoned case.
For non-subpage and zoned case, find_next_dirty_byte() just return the
whole page no matter if it has dirty flags or not.

For subpage and non-zoned case, we never go into
extent_write_locked_range().

[FIX]
Just do not clear the page dirty at all.
As __extent_writepage_io() would do a more accurate, subpage compatible
clear for page dirty anyway.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index fb63055f42f3..bdd0e29ba848 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2290,10 +2290,8 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page,
 
 		page = find_get_page(mapping, cur >> PAGE_SHIFT);
 		ASSERT(PageLocked(page));
-		if (pages_dirty && page != locked_page) {
+		if (pages_dirty && page != locked_page)
 			ASSERT(PageDirty(page));
-			clear_page_dirty_for_io(page);
-		}
 
 		ret = __extent_writepage_io(BTRFS_I(inode), page, cur, cur_len,
 					    &bio_ctrl, i_size, &nr);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] btrfs: make extent_write_locked_range() to handle subpage writeback correctly
  2024-03-05 22:15 [PATCH 0/2] btrfs: fix data corruption/hang/rsv leak in subpage zoned cases Qu Wenruo
  2024-03-05 22:15 ` [PATCH 1/2] btrfs: do not clear page dirty inside extent_write_locked_range() Qu Wenruo
@ 2024-03-05 22:15 ` Qu Wenruo
  1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2024-03-05 22:15 UTC (permalink / raw)
  To: linux-btrfs

When extent_write_locked_range() generated an inline extent, it would
set and finish the writeback for the whole page.

Although currently it's safe since subpage disables inline creation,
for the sake of consistency, let it go with subpage helpers to set and
clear the writeback flags.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index bdd0e29ba848..0a194dd659e7 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2286,6 +2286,7 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page,
 		u64 cur_end = min(round_down(cur, PAGE_SIZE) + PAGE_SIZE - 1, end);
 		u32 cur_len = cur_end + 1 - cur;
 		struct page *page;
+		struct folio *folio;
 		int nr = 0;
 
 		page = find_get_page(mapping, cur >> PAGE_SHIFT);
@@ -2300,8 +2301,9 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page,
 
 		/* Make sure the mapping tag for page dirty gets cleared. */
 		if (nr == 0) {
-			set_page_writeback(page);
-			end_page_writeback(page);
+			folio = page_folio(page);
+			btrfs_folio_set_writeback(fs_info, folio, cur, cur_len);
+			btrfs_folio_clear_writeback(fs_info, folio, cur, cur_len);
 		}
 		if (ret) {
 			btrfs_mark_ordered_io_finished(BTRFS_I(inode), page,
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-03-05 22:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-05 22:15 [PATCH 0/2] btrfs: fix data corruption/hang/rsv leak in subpage zoned cases Qu Wenruo
2024-03-05 22:15 ` [PATCH 1/2] btrfs: do not clear page dirty inside extent_write_locked_range() Qu Wenruo
2024-03-05 22:15 ` [PATCH 2/2] btrfs: make extent_write_locked_range() to handle subpage writeback correctly Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).