All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* optimize COW end I/O remapping v2
@ 2024-04-29  4:49 Christoph Hellwig
  2024-04-29  4:49 ` [PATCH 1/8] xfs: upgrade the extent counters in xfs_reflink_end_cow_extent later Christoph Hellwig
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Christoph Hellwig @ 2024-04-29  4:49 UTC (permalink / raw
  To: Chandan Babu R, Darrick J. Wong; +Cc: linux-xfs

Hi all,

this series optimizes how we handle unmapping the old data fork
extents when finishing COW I/O.  To get there I also did a bunch
of cleanups for helpers called by this code.

Changes since v1:
 - remove the racy if_bytes in xfs_reflink_end_cow_extent enirely
 - pass an unsigned value to xfs_quota_unreserve_blkres
 - rename xfs_iext_count_upgrade to xfs_iext_count_ensure
 - only check for XFS_ERRTAG_REDUCE_MAX_IEXTENTS once in
   xfs_iext_count_ensure
 - roll the transaction after 128 EFIs
 - use a different name for the irec of the remapped extent in
   xfs_reflink_end_cow_extent

Diffstat:
 libxfs/xfs_attr.c       |    5 -
 libxfs/xfs_bmap.c       |   21 +-----
 libxfs/xfs_bmap.h       |    2 
 libxfs/xfs_inode_fork.c |   57 +++++++----------
 libxfs/xfs_inode_fork.h |    6 -
 libxfs/xfs_trans_resv.h |    7 ++
 scrub/newbt.c           |    2 
 scrub/reap.c            |    2 
 scrub/repair.h          |    8 --
 xfs_aops.c              |    6 -
 xfs_bmap_item.c         |    4 -
 xfs_bmap_util.c         |   33 ++-------
 xfs_bmap_util.h         |    2 
 xfs_dquot.c             |    5 -
 xfs_iomap.c             |   13 +--
 xfs_quota.h             |   23 ++----
 xfs_reflink.c           |  160 ++++++++++++++++++++++++++----------------------
 xfs_rtalloc.c           |    5 -
 18 files changed, 159 insertions(+), 202 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/8] xfs: upgrade the extent counters in xfs_reflink_end_cow_extent later
  2024-04-29  4:49 optimize COW end I/O remapping v2 Christoph Hellwig
@ 2024-04-29  4:49 ` Christoph Hellwig
  2024-04-29 15:26   ` Darrick J. Wong
  2024-04-29  4:49 ` [PATCH 2/8] xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent Christoph Hellwig
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2024-04-29  4:49 UTC (permalink / raw
  To: Chandan Babu R, Darrick J. Wong; +Cc: linux-xfs

Defer the extent counter size upgrade until we know we're going to
modify the extent mapping.  This also defers dirtying the transaction
and will allow us safely back out later in the function in later
changes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_reflink.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 7da0e8f961d351..9ce37d366534c3 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -751,14 +751,6 @@ xfs_reflink_end_cow_extent(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
-	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
-			XFS_IEXT_REFLINK_END_COW_CNT);
-	if (error == -EFBIG)
-		error = xfs_iext_count_upgrade(tp, ip,
-				XFS_IEXT_REFLINK_END_COW_CNT);
-	if (error)
-		goto out_cancel;
-
 	/*
 	 * In case of racing, overlapping AIO writes no COW extents might be
 	 * left by the time I/O completes for the loser of the race.  In that
@@ -787,6 +779,14 @@ xfs_reflink_end_cow_extent(
 	del = got;
 	xfs_trim_extent(&del, *offset_fsb, end_fsb - *offset_fsb);
 
+	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+			XFS_IEXT_REFLINK_END_COW_CNT);
+	if (error == -EFBIG)
+		error = xfs_iext_count_upgrade(tp, ip,
+				XFS_IEXT_REFLINK_END_COW_CNT);
+	if (error)
+		goto out_cancel;
+
 	/* Grab the corresponding mapping in the data fork. */
 	nmaps = 1;
 	error = xfs_bmapi_read(ip, del.br_startoff, del.br_blockcount, &data,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/8] xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent
  2024-04-29  4:49 optimize COW end I/O remapping v2 Christoph Hellwig
  2024-04-29  4:49 ` [PATCH 1/8] xfs: upgrade the extent counters in xfs_reflink_end_cow_extent later Christoph Hellwig
@ 2024-04-29  4:49 ` Christoph Hellwig
  2024-04-29 15:27   ` Darrick J. Wong
  2024-04-29  4:49 ` [PATCH 3/8] xfs: consolidate the xfs_quota_reserve_blkres definitions Christoph Hellwig
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2024-04-29  4:49 UTC (permalink / raw
  To: Chandan Babu R, Darrick J. Wong; +Cc: linux-xfs

Accessing if_bytes without the ilock is racy.  Remove the initial
if_bytes == 0 check in xfs_reflink_end_cow_extent and let
ext_iext_lookup_extent fail for this case after we've taken the ilock.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_reflink.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 9ce37d366534c3..0ab2ef5b58f6c4 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -731,12 +731,6 @@ xfs_reflink_end_cow_extent(
 	int			nmaps;
 	int			error;
 
-	/* No COW extents?  That's easy! */
-	if (ifp->if_bytes == 0) {
-		*offset_fsb = end_fsb;
-		return 0;
-	}
-
 	resblks = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK);
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0,
 			XFS_TRANS_RESERVE, &tp);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/8] xfs: consolidate the xfs_quota_reserve_blkres definitions
  2024-04-29  4:49 optimize COW end I/O remapping v2 Christoph Hellwig
  2024-04-29  4:49 ` [PATCH 1/8] xfs: upgrade the extent counters in xfs_reflink_end_cow_extent later Christoph Hellwig
  2024-04-29  4:49 ` [PATCH 2/8] xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent Christoph Hellwig
@ 2024-04-29  4:49 ` Christoph Hellwig
  2024-04-29  4:49 ` [PATCH 4/8] xfs: xfs_quota_unreserve_blkres can't fail Christoph Hellwig
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2024-04-29  4:49 UTC (permalink / raw
  To: Chandan Babu R, Darrick J. Wong; +Cc: linux-xfs

xfs_trans_reserve_quota_nblks is already stubbed out if quota support
is disabled, no need for an extra xfs_quota_reserve_blkres stub.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_quota.h | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h
index 85a4ae1a17f672..621ea9d7cf06d9 100644
--- a/fs/xfs/xfs_quota.h
+++ b/fs/xfs/xfs_quota.h
@@ -123,12 +123,6 @@ extern int xfs_qm_newmount(struct xfs_mount *, uint *, uint *);
 extern void xfs_qm_mount_quotas(struct xfs_mount *);
 extern void xfs_qm_unmount(struct xfs_mount *);
 extern void xfs_qm_unmount_quotas(struct xfs_mount *);
-
-static inline int
-xfs_quota_reserve_blkres(struct xfs_inode *ip, int64_t blocks)
-{
-	return xfs_trans_reserve_quota_nblks(NULL, ip, blocks, 0, false);
-}
 bool xfs_inode_near_dquot_enforcement(struct xfs_inode *ip, xfs_dqtype_t type);
 
 # ifdef CONFIG_XFS_LIVE_HOOKS
@@ -187,12 +181,6 @@ static inline int xfs_trans_reserve_quota_bydquots(struct xfs_trans *tp,
 	return 0;
 }
 
-static inline int
-xfs_quota_reserve_blkres(struct xfs_inode *ip, int64_t blocks)
-{
-	return 0;
-}
-
 static inline int
 xfs_trans_reserve_quota_icreate(struct xfs_trans *tp, struct xfs_dquot *udqp,
 		struct xfs_dquot *gdqp, struct xfs_dquot *pdqp, int64_t dblocks)
@@ -221,6 +209,12 @@ xfs_trans_reserve_quota_icreate(struct xfs_trans *tp, struct xfs_dquot *udqp,
 
 #endif /* CONFIG_XFS_QUOTA */
 
+static inline int
+xfs_quota_reserve_blkres(struct xfs_inode *ip, int64_t blocks)
+{
+	return xfs_trans_reserve_quota_nblks(NULL, ip, blocks, 0, false);
+}
+
 static inline int
 xfs_quota_unreserve_blkres(struct xfs_inode *ip, int64_t blocks)
 {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/8] xfs: xfs_quota_unreserve_blkres can't fail
  2024-04-29  4:49 optimize COW end I/O remapping v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2024-04-29  4:49 ` [PATCH 3/8] xfs: consolidate the xfs_quota_reserve_blkres definitions Christoph Hellwig
@ 2024-04-29  4:49 ` Christoph Hellwig
  2024-04-29 15:29   ` Darrick J. Wong
  2024-04-29  4:49 ` [PATCH 5/8] xfs: simplify iext overflow checking and upgrade Christoph Hellwig
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2024-04-29  4:49 UTC (permalink / raw
  To: Chandan Babu R, Darrick J. Wong; +Cc: linux-xfs

Unreserving quotas can't fail due to quota limits, and we'll notice a
shut down file system a bit later in all the callers anyway.  Return
void and remove the error checking and propagation in the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 16 +++++-----------
 fs/xfs/libxfs/xfs_bmap.h |  2 +-
 fs/xfs/xfs_aops.c        |  6 +-----
 fs/xfs/xfs_bmap_util.c   |  9 +++------
 fs/xfs/xfs_bmap_util.h   |  2 +-
 fs/xfs/xfs_iomap.c       |  4 ++--
 fs/xfs/xfs_quota.h       |  7 ++++---
 fs/xfs/xfs_reflink.c     | 11 +++--------
 8 files changed, 20 insertions(+), 37 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 6053f5e5c71eec..68e80e8eaaeebe 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4870,7 +4870,7 @@ xfs_bmap_split_indlen(
 	*indlen2 = len2;
 }
 
-int
+void
 xfs_bmap_del_extent_delay(
 	struct xfs_inode	*ip,
 	int			whichfork,
@@ -4886,7 +4886,6 @@ xfs_bmap_del_extent_delay(
 	xfs_filblks_t		got_indlen, new_indlen, stolen = 0;
 	uint32_t		state = xfs_bmap_fork_to_state(whichfork);
 	uint64_t		fdblocks;
-	int			error = 0;
 	bool			isrt;
 
 	XFS_STATS_INC(mp, xs_del_exlist);
@@ -4906,9 +4905,7 @@ xfs_bmap_del_extent_delay(
 	 * sb counters as we might have to borrow some blocks for the
 	 * indirect block accounting.
 	 */
-	error = xfs_quota_unreserve_blkres(ip, del->br_blockcount);
-	if (error)
-		return error;
+	xfs_quota_unreserve_blkres(ip, del->br_blockcount);
 	ip->i_delayed_blks -= del->br_blockcount;
 
 	if (got->br_startoff == del->br_startoff)
@@ -5006,7 +5003,6 @@ xfs_bmap_del_extent_delay(
 
 	xfs_add_fdblocks(mp, fdblocks);
 	xfs_mod_delalloc(ip, -(int64_t)del->br_blockcount, -da_diff);
-	return error;
 }
 
 void
@@ -5564,18 +5560,16 @@ __xfs_bunmapi(
 
 delete:
 		if (wasdel) {
-			error = xfs_bmap_del_extent_delay(ip, whichfork, &icur,
-					&got, &del);
+			xfs_bmap_del_extent_delay(ip, whichfork, &icur, &got, &del);
 		} else {
 			error = xfs_bmap_del_extent_real(ip, tp, &icur, cur,
 					&del, &tmp_logflags, whichfork,
 					flags);
 			logflags |= tmp_logflags;
+			if (error)
+				goto error0;
 		}
 
-		if (error)
-			goto error0;
-
 		end = del.br_startoff - 1;
 nodelete:
 		/*
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index e98849eb9bbae3..667b0c2b33d1d5 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -202,7 +202,7 @@ int	xfs_bmapi_write(struct xfs_trans *tp, struct xfs_inode *ip,
 int	xfs_bunmapi(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_fileoff_t bno, xfs_filblks_t len, uint32_t flags,
 		xfs_extnum_t nexts, int *done);
-int	xfs_bmap_del_extent_delay(struct xfs_inode *ip, int whichfork,
+void	xfs_bmap_del_extent_delay(struct xfs_inode *ip, int whichfork,
 		struct xfs_iext_cursor *cur, struct xfs_bmbt_irec *got,
 		struct xfs_bmbt_irec *del);
 void	xfs_bmap_del_extent_cow(struct xfs_inode *ip,
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 3f428620ebf2a3..c51bc17f5cfa03 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -469,7 +469,6 @@ xfs_discard_folio(
 {
 	struct xfs_inode	*ip = XFS_I(folio->mapping->host);
 	struct xfs_mount	*mp = ip->i_mount;
-	int			error;
 
 	if (xfs_is_shutdown(mp))
 		return;
@@ -483,11 +482,8 @@ xfs_discard_folio(
 	 * byte of the next folio. Hence the end offset is only dependent on the
 	 * folio itself and not the start offset that is passed in.
 	 */
-	error = xfs_bmap_punch_delalloc_range(ip, pos,
+	xfs_bmap_punch_delalloc_range(ip, pos,
 				folio_pos(folio) + folio_size(folio));
-
-	if (error && !xfs_is_shutdown(mp))
-		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
 }
 
 static const struct iomap_writeback_ops xfs_writeback_ops = {
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 53aa90a0ee3a85..df370d7112dc54 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -440,7 +440,7 @@ xfs_getbmap(
  * if the ranges only partially overlap them, so it is up to the caller to
  * ensure that partial blocks are not passed in.
  */
-int
+void
 xfs_bmap_punch_delalloc_range(
 	struct xfs_inode	*ip,
 	xfs_off_t		start_byte,
@@ -452,7 +452,6 @@ xfs_bmap_punch_delalloc_range(
 	xfs_fileoff_t		end_fsb = XFS_B_TO_FSB(mp, end_byte);
 	struct xfs_bmbt_irec	got, del;
 	struct xfs_iext_cursor	icur;
-	int			error = 0;
 
 	ASSERT(!xfs_need_iread_extents(ifp));
 
@@ -476,15 +475,13 @@ xfs_bmap_punch_delalloc_range(
 			continue;
 		}
 
-		error = xfs_bmap_del_extent_delay(ip, XFS_DATA_FORK, &icur,
-						  &got, &del);
-		if (error || !xfs_iext_get_extent(ifp, &icur, &got))
+		xfs_bmap_del_extent_delay(ip, XFS_DATA_FORK, &icur, &got, &del);
+		if (!xfs_iext_get_extent(ifp, &icur, &got))
 			break;
 	}
 
 out_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	return error;
 }
 
 /*
diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
index 77ecbb753ef207..51f84d8ff372fa 100644
--- a/fs/xfs/xfs_bmap_util.h
+++ b/fs/xfs/xfs_bmap_util.h
@@ -30,7 +30,7 @@ xfs_bmap_rtalloc(struct xfs_bmalloca *ap)
 }
 #endif /* CONFIG_XFS_RT */
 
-int	xfs_bmap_punch_delalloc_range(struct xfs_inode *ip,
+void	xfs_bmap_punch_delalloc_range(struct xfs_inode *ip,
 		xfs_off_t start_byte, xfs_off_t end_byte);
 
 struct kgetbmap {
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 9ce0f6b9df93e6..c06fca2e751c7c 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1211,8 +1211,8 @@ xfs_buffered_write_delalloc_punch(
 	loff_t			offset,
 	loff_t			length)
 {
-	return xfs_bmap_punch_delalloc_range(XFS_I(inode), offset,
-			offset + length);
+	xfs_bmap_punch_delalloc_range(XFS_I(inode), offset, offset + length);
+	return 0;
 }
 
 static int
diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h
index 621ea9d7cf06d9..23d71a55bbc006 100644
--- a/fs/xfs/xfs_quota.h
+++ b/fs/xfs/xfs_quota.h
@@ -215,10 +215,11 @@ xfs_quota_reserve_blkres(struct xfs_inode *ip, int64_t blocks)
 	return xfs_trans_reserve_quota_nblks(NULL, ip, blocks, 0, false);
 }
 
-static inline int
-xfs_quota_unreserve_blkres(struct xfs_inode *ip, int64_t blocks)
+static inline void
+xfs_quota_unreserve_blkres(struct xfs_inode *ip, uint64_t blocks)
 {
-	return xfs_quota_reserve_blkres(ip, -blocks);
+	/* don't return an error as unreserving quotas can't fail */
+	xfs_quota_reserve_blkres(ip, -(int64_t)blocks);
 }
 
 extern int xfs_mount_reset_sbqflags(struct xfs_mount *);
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 0ab2ef5b58f6c4..02cb6c2b257058 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -606,10 +606,8 @@ xfs_reflink_cancel_cow_blocks(
 		trace_xfs_reflink_cancel_cow(ip, &del);
 
 		if (isnullstartblock(del.br_startblock)) {
-			error = xfs_bmap_del_extent_delay(ip, XFS_COW_FORK,
-					&icur, &got, &del);
-			if (error)
-				break;
+			xfs_bmap_del_extent_delay(ip, XFS_COW_FORK, &icur, &got,
+					&del);
 		} else if (del.br_state == XFS_EXT_UNWRITTEN || cancel_real) {
 			ASSERT((*tpp)->t_highest_agno == NULLAGNUMBER);
 
@@ -632,10 +630,7 @@ xfs_reflink_cancel_cow_blocks(
 			xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
 
 			/* Remove the quota reservation */
-			error = xfs_quota_unreserve_blkres(ip,
-					del.br_blockcount);
-			if (error)
-				break;
+			xfs_quota_unreserve_blkres(ip, del.br_blockcount);
 		} else {
 			/* Didn't do anything, push cursor back. */
 			xfs_iext_prev(ifp, &icur);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/8] xfs: simplify iext overflow checking and upgrade
  2024-04-29  4:49 optimize COW end I/O remapping v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2024-04-29  4:49 ` [PATCH 4/8] xfs: xfs_quota_unreserve_blkres can't fail Christoph Hellwig
@ 2024-04-29  4:49 ` Christoph Hellwig
  2024-04-29 15:32   ` Darrick J. Wong
  2024-04-29  4:49 ` [PATCH 6/8] xfs: lift XREP_MAX_ITRUNCATE_EFIS out of the scrub code Christoph Hellwig
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2024-04-29  4:49 UTC (permalink / raw
  To: Chandan Babu R, Darrick J. Wong; +Cc: linux-xfs

Currently the calls to xfs_iext_count_may_overflow and
xfs_iext_count_upgrade are always paired.  Merge them into a single
function to simplify the callers and the actual check and upgrade
logic itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_attr.c       |  5 +--
 fs/xfs/libxfs/xfs_bmap.c       |  5 +--
 fs/xfs/libxfs/xfs_inode_fork.c | 57 +++++++++++++++-------------------
 fs/xfs/libxfs/xfs_inode_fork.h |  6 ++--
 fs/xfs/xfs_bmap_item.c         |  4 +--
 fs/xfs/xfs_bmap_util.c         | 24 +++-----------
 fs/xfs/xfs_dquot.c             |  5 +--
 fs/xfs/xfs_iomap.c             |  9 ++----
 fs/xfs/xfs_reflink.c           |  9 ++----
 fs/xfs/xfs_rtalloc.c           |  5 +--
 10 files changed, 41 insertions(+), 88 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 1c2a27fce08a9d..ded92ccefe9f6d 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -1050,11 +1050,8 @@ xfs_attr_set(
 		return error;
 
 	if (op != XFS_ATTRUPDATE_REMOVE || xfs_inode_hasattr(dp)) {
-		error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
+		error = xfs_iext_count_ensure(args->trans, dp, XFS_ATTR_FORK,
 				XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
-		if (error == -EFBIG)
-			error = xfs_iext_count_upgrade(args->trans, dp,
-					XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
 		if (error)
 			goto out_trans_cancel;
 	}
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 68e80e8eaaeebe..9a55ce4f1f0d45 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4621,11 +4621,8 @@ xfs_bmapi_convert_delalloc(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
-	error = xfs_iext_count_may_overflow(ip, whichfork,
+	error = xfs_iext_count_ensure(tp, ip, whichfork,
 			XFS_IEXT_ADD_NOSPLIT_CNT);
-	if (error == -EFBIG)
-		error = xfs_iext_count_upgrade(tp, ip,
-				XFS_IEXT_ADD_NOSPLIT_CNT);
 	if (error)
 		goto out_trans_cancel;
 
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 7d660a9739090a..82e670dd1212c4 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -765,53 +765,46 @@ xfs_ifork_verify_local_attr(
 	return 0;
 }
 
+/*
+ * Check if the inode fork supports adding nr_to_add more extents.
+ *
+ * If it doesn't but we can upgrade it to large extent counters, do the upgrade.
+ * If we can't upgrade or are already using big counters but still can't fit the
+ * additional extents, return -EFBIG.
+ */
 int
-xfs_iext_count_may_overflow(
+xfs_iext_count_ensure(
+	struct xfs_trans	*tp,
 	struct xfs_inode	*ip,
 	int			whichfork,
-	int			nr_to_add)
+	uint			nr_to_add)
 {
+	struct xfs_mount	*mp = ip->i_mount;
+	bool			has_large =
+		xfs_inode_has_large_extent_counts(ip);
 	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, whichfork);
-	uint64_t		max_exts;
 	uint64_t		nr_exts;
 
+	ASSERT(nr_to_add <= XFS_MAX_EXTCNT_UPGRADE_NR);
+
 	if (whichfork == XFS_COW_FORK)
 		return 0;
 
-	max_exts = xfs_iext_max_nextents(xfs_inode_has_large_extent_counts(ip),
-				whichfork);
-
-	if (XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
-		max_exts = 10;
-
+	/* no point in upgrading if if_nextents overflows */
 	nr_exts = ifp->if_nextents + nr_to_add;
-	if (nr_exts < ifp->if_nextents || nr_exts > max_exts)
+	if (nr_exts < ifp->if_nextents)
 		return -EFBIG;
 
-	return 0;
-}
-
-/*
- * Upgrade this inode's extent counter fields to be able to handle a potential
- * increase in the extent count by nr_to_add.  Normally this is the same
- * quantity that caused xfs_iext_count_may_overflow() to return -EFBIG.
- */
-int
-xfs_iext_count_upgrade(
-	struct xfs_trans	*tp,
-	struct xfs_inode	*ip,
-	uint			nr_to_add)
-{
-	ASSERT(nr_to_add <= XFS_MAX_EXTCNT_UPGRADE_NR);
-
-	if (!xfs_has_large_extent_counts(ip->i_mount) ||
-	    xfs_inode_has_large_extent_counts(ip) ||
-	    XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
+	if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_REDUCE_MAX_IEXTENTS) &&
+	    nr_exts > 10)
 		return -EFBIG;
 
-	ip->i_diflags2 |= XFS_DIFLAG2_NREXT64;
-	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-
+	if (nr_exts > xfs_iext_max_nextents(has_large, whichfork)) {
+		if (has_large || !xfs_has_large_extent_counts(mp))
+			return -EFBIG;
+		ip->i_diflags2 |= XFS_DIFLAG2_NREXT64;
+		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	}
 	return 0;
 }
 
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index bd53eb951b6515..9e1456f5cc2c85 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -256,10 +256,8 @@ extern void xfs_ifork_init_cow(struct xfs_inode *ip);
 
 int xfs_ifork_verify_local_data(struct xfs_inode *ip);
 int xfs_ifork_verify_local_attr(struct xfs_inode *ip);
-int xfs_iext_count_may_overflow(struct xfs_inode *ip, int whichfork,
-		int nr_to_add);
-int xfs_iext_count_upgrade(struct xfs_trans *tp, struct xfs_inode *ip,
-		uint nr_to_add);
+int xfs_iext_count_ensure(struct xfs_trans *tp, struct xfs_inode *ip,
+		int whichfork, uint nr_to_add);
 bool xfs_ifork_is_realtime(struct xfs_inode *ip, int whichfork);
 
 /* returns true if the fork has extents but they are not read in yet. */
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index d27859a684aa69..38067d02ee3ca7 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -524,9 +524,7 @@ xfs_bmap_recover_work(
 	else
 		iext_delta = XFS_IEXT_PUNCH_HOLE_CNT;
 
-	error = xfs_iext_count_may_overflow(ip, work->bi_whichfork, iext_delta);
-	if (error == -EFBIG)
-		error = xfs_iext_count_upgrade(tp, ip, iext_delta);
+	error = xfs_iext_count_ensure(tp, ip, work->bi_whichfork, iext_delta);
 	if (error)
 		goto err_cancel;
 
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index df370d7112dc54..cad3b3e4f1c33e 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -710,11 +710,8 @@ xfs_alloc_file_space(
 		if (error)
 			break;
 
-		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+		error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK,
 				XFS_IEXT_ADD_NOSPLIT_CNT);
-		if (error == -EFBIG)
-			error = xfs_iext_count_upgrade(tp, ip,
-					XFS_IEXT_ADD_NOSPLIT_CNT);
 		if (error)
 			goto error;
 
@@ -772,10 +769,8 @@ xfs_unmap_extent(
 	if (error)
 		return error;
 
-	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+	error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK,
 			XFS_IEXT_PUNCH_HOLE_CNT);
-	if (error == -EFBIG)
-		error = xfs_iext_count_upgrade(tp, ip, XFS_IEXT_PUNCH_HOLE_CNT);
 	if (error)
 		goto out_trans_cancel;
 
@@ -1051,10 +1046,8 @@ xfs_insert_file_space(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
-	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+	error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK,
 			XFS_IEXT_PUNCH_HOLE_CNT);
-	if (error == -EFBIG)
-		error = xfs_iext_count_upgrade(tp, ip, XFS_IEXT_PUNCH_HOLE_CNT);
 	if (error)
 		goto out_trans_cancel;
 
@@ -1280,23 +1273,16 @@ xfs_swap_extent_rmap(
 			trace_xfs_swap_extent_rmap_remap_piece(tip, &uirec);
 
 			if (xfs_bmap_is_real_extent(&uirec)) {
-				error = xfs_iext_count_may_overflow(ip,
-						XFS_DATA_FORK,
+				error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK,
 						XFS_IEXT_SWAP_RMAP_CNT);
-				if (error == -EFBIG)
-					error = xfs_iext_count_upgrade(tp, ip,
-							XFS_IEXT_SWAP_RMAP_CNT);
 				if (error)
 					goto out;
 			}
 
 			if (xfs_bmap_is_real_extent(&irec)) {
-				error = xfs_iext_count_may_overflow(tip,
+				error = xfs_iext_count_ensure(tp, tip,
 						XFS_DATA_FORK,
 						XFS_IEXT_SWAP_RMAP_CNT);
-				if (error == -EFBIG)
-					error = xfs_iext_count_upgrade(tp, ip,
-							XFS_IEXT_SWAP_RMAP_CNT);
 				if (error)
 					goto out;
 			}
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 13aba84bd64afb..c2e66d392399dd 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -341,11 +341,8 @@ xfs_dquot_disk_alloc(
 		goto err_cancel;
 	}
 
-	error = xfs_iext_count_may_overflow(quotip, XFS_DATA_FORK,
+	error = xfs_iext_count_ensure(tp, quotip, XFS_DATA_FORK,
 			XFS_IEXT_ADD_NOSPLIT_CNT);
-	if (error == -EFBIG)
-		error = xfs_iext_count_upgrade(tp, quotip,
-				XFS_IEXT_ADD_NOSPLIT_CNT);
 	if (error)
 		goto err_cancel;
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index c06fca2e751c7c..128ad834ca69b1 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -299,9 +299,7 @@ xfs_iomap_write_direct(
 	if (error)
 		return error;
 
-	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, nr_exts);
-	if (error == -EFBIG)
-		error = xfs_iext_count_upgrade(tp, ip, nr_exts);
+	error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK, nr_exts);
 	if (error)
 		goto out_trans_cancel;
 
@@ -625,11 +623,8 @@ xfs_iomap_write_unwritten(
 		if (error)
 			return error;
 
-		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+		error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK,
 				XFS_IEXT_WRITE_UNWRITTEN_CNT);
-		if (error == -EFBIG)
-			error = xfs_iext_count_upgrade(tp, ip,
-					XFS_IEXT_WRITE_UNWRITTEN_CNT);
 		if (error)
 			goto error_on_bmapi_transaction;
 
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 02cb6c2b257058..af388f2caef304 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -768,11 +768,8 @@ xfs_reflink_end_cow_extent(
 	del = got;
 	xfs_trim_extent(&del, *offset_fsb, end_fsb - *offset_fsb);
 
-	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+	error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK,
 			XFS_IEXT_REFLINK_END_COW_CNT);
-	if (error == -EFBIG)
-		error = xfs_iext_count_upgrade(tp, ip,
-				XFS_IEXT_REFLINK_END_COW_CNT);
 	if (error)
 		goto out_cancel;
 
@@ -1272,9 +1269,7 @@ xfs_reflink_remap_extent(
 	if (dmap_written)
 		++iext_delta;
 
-	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, iext_delta);
-	if (error == -EFBIG)
-		error = xfs_iext_count_upgrade(tp, ip, iext_delta);
+	error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK, iext_delta);
 	if (error)
 		goto out_cancel;
 
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index b476a876478d93..37edf4c5ce73ad 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -695,11 +695,8 @@ xfs_growfs_rt_alloc(
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
 		xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
 
-		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
+		error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK,
 				XFS_IEXT_ADD_NOSPLIT_CNT);
-		if (error == -EFBIG)
-			error = xfs_iext_count_upgrade(tp, ip,
-					XFS_IEXT_ADD_NOSPLIT_CNT);
 		if (error)
 			goto out_trans_cancel;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/8] xfs: lift XREP_MAX_ITRUNCATE_EFIS out of the scrub code
  2024-04-29  4:49 optimize COW end I/O remapping v2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2024-04-29  4:49 ` [PATCH 5/8] xfs: simplify iext overflow checking and upgrade Christoph Hellwig
@ 2024-04-29  4:49 ` Christoph Hellwig
  2024-04-29  4:49 ` [PATCH 7/8] xfs: optimize extent remapping in xfs_reflink_end_cow_extent Christoph Hellwig
  2024-04-29  4:49 ` [PATCH 8/8] xfs: rename the del variable " Christoph Hellwig
  7 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2024-04-29  4:49 UTC (permalink / raw
  To: Chandan Babu R, Darrick J. Wong; +Cc: linux-xfs

We want to play the same trick in the COW end I/O handler, so lift this
constant out of the scrub directory and change the prefix to XFS_.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_trans_resv.h | 7 +++++++
 fs/xfs/scrub/newbt.c           | 2 +-
 fs/xfs/scrub/reap.c            | 2 +-
 fs/xfs/scrub/repair.h          | 8 --------
 4 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_trans_resv.h b/fs/xfs/libxfs/xfs_trans_resv.h
index 0554b9d775d269..51eb56560ee189 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.h
+++ b/fs/xfs/libxfs/xfs_trans_resv.h
@@ -95,6 +95,13 @@ struct xfs_trans_resv {
 #define	XFS_ITRUNCATE_LOG_COUNT_REFLINK	8
 #define	XFS_WRITE_LOG_COUNT_REFLINK	8
 
+/*
+ * This is the maximum number of deferred extent freeing item extents (EFIs)
+ * that we'll attach to a transaction without rolling the transaction to avoid
+ * overrunning a tr_itruncate reservation.
+ */
+#define XFS_MAX_ITRUNCATE_EFIS		128
+
 void xfs_trans_resv_calc(struct xfs_mount *mp, struct xfs_trans_resv *resp);
 uint xfs_allocfree_block_count(struct xfs_mount *mp, uint num_ops);
 
diff --git a/fs/xfs/scrub/newbt.c b/fs/xfs/scrub/newbt.c
index 4a0271123d94ea..872f97db2a6425 100644
--- a/fs/xfs/scrub/newbt.c
+++ b/fs/xfs/scrub/newbt.c
@@ -452,7 +452,7 @@ xrep_newbt_free(
 		}
 
 		freed += ret;
-		if (freed >= XREP_MAX_ITRUNCATE_EFIS) {
+		if (freed >= XFS_MAX_ITRUNCATE_EFIS) {
 			error = xrep_defer_finish(sc);
 			if (error)
 				goto junkit;
diff --git a/fs/xfs/scrub/reap.c b/fs/xfs/scrub/reap.c
index 01ceaa4efa16bf..433891b0d08c73 100644
--- a/fs/xfs/scrub/reap.c
+++ b/fs/xfs/scrub/reap.c
@@ -169,7 +169,7 @@ static inline bool xreap_want_roll(const struct xreap_state *rs)
 {
 	if (rs->force_roll)
 		return true;
-	if (rs->deferred > XREP_MAX_ITRUNCATE_EFIS)
+	if (rs->deferred > XFS_MAX_ITRUNCATE_EFIS)
 		return true;
 	if (rs->invalidated > XREAP_MAX_BINVAL)
 		return true;
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 0e0dc2bf985c21..ec76774afffb31 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -17,14 +17,6 @@ static inline int xrep_notsupported(struct xfs_scrub *sc)
 
 #ifdef CONFIG_XFS_ONLINE_REPAIR
 
-/*
- * This is the maximum number of deferred extent freeing item extents (EFIs)
- * that we'll attach to a transaction without rolling the transaction to avoid
- * overrunning a tr_itruncate reservation.
- */
-#define XREP_MAX_ITRUNCATE_EFIS	(128)
-
-
 /* Repair helpers */
 
 int xrep_attempt(struct xfs_scrub *sc, struct xchk_stats_run *run);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/8] xfs: optimize extent remapping in xfs_reflink_end_cow_extent
  2024-04-29  4:49 optimize COW end I/O remapping v2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2024-04-29  4:49 ` [PATCH 6/8] xfs: lift XREP_MAX_ITRUNCATE_EFIS out of the scrub code Christoph Hellwig
@ 2024-04-29  4:49 ` Christoph Hellwig
  2024-04-29 15:43   ` Darrick J. Wong
  2024-04-29  4:49 ` [PATCH 8/8] xfs: rename the del variable " Christoph Hellwig
  7 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2024-04-29  4:49 UTC (permalink / raw
  To: Chandan Babu R, Darrick J. Wong; +Cc: linux-xfs

xfs_reflink_end_cow_extent currently caps the range it works on to
fit both the existing extent (or hole) in the data fork and the
new COW range.  For overwrites of fragmented regions that is highly
inefficient, as we need to split the new region at every boundary,
just for it to be merge back in the next pass.

Switch to unmapping the old data using a chain of deferred bmap
and extent free ops ops first, and then handle remapping the new
data in one single transaction instead.

Note that this also switches from a write to an itruncate transaction
reservation as the xfs_reflink_end_cow_extent doesn't touch any of
the allocator data structures in the AGF or the RT inodes.  Instead
the lead transaction just unmaps blocks, and later they get freed,
COW records get freed and the new blocks get mapped into the inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_reflink.c | 111 ++++++++++++++++++++++++++++---------------
 1 file changed, 72 insertions(+), 39 deletions(-)

diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index af388f2caef304..e20db39d1cc46f 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -701,6 +701,72 @@ xfs_reflink_cancel_cow_range(
 	return error;
 }
 
+/*
+ * Unmap any old data covering the COW target.
+ */
+static int
+xfs_reflink_unmap_old_data(
+	struct xfs_trans	**tpp,
+	struct xfs_inode	*ip,
+	xfs_fileoff_t		offset_fsb,
+	xfs_fileoff_t		end_fsb)
+{
+	struct xfs_ifork	*ifp = &ip->i_df;
+	struct xfs_bmbt_irec	got, del;
+	struct xfs_iext_cursor	icur;
+	unsigned int		freed;
+	int			error;
+
+	ASSERT(!xfs_need_iread_extents(ifp));
+
+restart:
+	freed = 0;
+	if (!xfs_iext_lookup_extent_before(ip, ifp, &end_fsb, &icur, &got))
+		return 0;
+
+	while (got.br_startoff + got.br_blockcount > offset_fsb) {
+		del = got;
+		xfs_trim_extent(&del, offset_fsb, end_fsb - offset_fsb);
+
+		/* Extent delete may have bumped us forward */
+		if (!del.br_blockcount)
+			goto prev_extent;
+
+		trace_xfs_reflink_cow_remap_to(ip, &del);
+		if (isnullstartblock(del.br_startblock)) {
+			xfs_bmap_del_extent_delay(ip, XFS_DATA_FORK, &icur,
+					&got, &del);
+			goto refresh;
+		}
+
+		xfs_bmap_unmap_extent(*tpp, ip, XFS_DATA_FORK, &del);
+		xfs_refcount_decrease_extent(*tpp, &del);
+		xfs_trans_mod_dquot_byino(*tpp, ip, XFS_TRANS_DQ_BCOUNT,
+				-del.br_blockcount);
+
+		/*
+		 * We can't add an unlimited number of EFIs and thus deferred
+		 * unmapped items to a transaction.  Once we've filled our
+		 * quota roll the transaction, which requires us to restart
+		 * the lookup as the deferred item processing will change the
+		 * iext tree.
+		 */
+		if (++freed == XFS_MAX_ITRUNCATE_EFIS) {
+			error = xfs_defer_finish(tpp);
+			if (error)
+				return error;
+			goto restart;
+		}
+prev_extent:
+		xfs_iext_prev(ifp, &icur);
+refresh:
+		if (!xfs_iext_get_extent(ifp, &icur, &got))
+			break;
+	}
+
+	return 0;
+}
+
 /*
  * Remap part of the CoW fork into the data fork.
  *
@@ -718,16 +784,15 @@ xfs_reflink_end_cow_extent(
 	xfs_fileoff_t		end_fsb)
 {
 	struct xfs_iext_cursor	icur;
-	struct xfs_bmbt_irec	got, del, data;
+	struct xfs_bmbt_irec	got, del;
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_trans	*tp;
 	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, XFS_COW_FORK);
 	unsigned int		resblks;
-	int			nmaps;
 	int			error;
 
 	resblks = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK);
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0,
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks, 0,
 			XFS_TRANS_RESERVE, &tp);
 	if (error)
 		return error;
@@ -767,51 +832,19 @@ xfs_reflink_end_cow_extent(
 	}
 	del = got;
 	xfs_trim_extent(&del, *offset_fsb, end_fsb - *offset_fsb);
+	trace_xfs_reflink_cow_remap_from(ip, &del);
 
 	error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK,
 			XFS_IEXT_REFLINK_END_COW_CNT);
 	if (error)
 		goto out_cancel;
 
-	/* Grab the corresponding mapping in the data fork. */
-	nmaps = 1;
-	error = xfs_bmapi_read(ip, del.br_startoff, del.br_blockcount, &data,
-			&nmaps, 0);
+	/* Unmap the old data. */
+	error = xfs_reflink_unmap_old_data(&tp, ip, del.br_startoff,
+			del.br_startoff + del.br_blockcount);
 	if (error)
 		goto out_cancel;
 
-	/* We can only remap the smaller of the two extent sizes. */
-	data.br_blockcount = min(data.br_blockcount, del.br_blockcount);
-	del.br_blockcount = data.br_blockcount;
-
-	trace_xfs_reflink_cow_remap_from(ip, &del);
-	trace_xfs_reflink_cow_remap_to(ip, &data);
-
-	if (xfs_bmap_is_real_extent(&data)) {
-		/*
-		 * If the extent we're remapping is backed by storage (written
-		 * or not), unmap the extent and drop its refcount.
-		 */
-		xfs_bmap_unmap_extent(tp, ip, XFS_DATA_FORK, &data);
-		xfs_refcount_decrease_extent(tp, &data);
-		xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT,
-				-data.br_blockcount);
-	} else if (data.br_startblock == DELAYSTARTBLOCK) {
-		int		done;
-
-		/*
-		 * If the extent we're remapping is a delalloc reservation,
-		 * we can use the regular bunmapi function to release the
-		 * incore state.  Dropping the delalloc reservation takes care
-		 * of the quota reservation for us.
-		 */
-		error = xfs_bunmapi(NULL, ip, data.br_startoff,
-				data.br_blockcount, 0, 1, &done);
-		if (error)
-			goto out_cancel;
-		ASSERT(done);
-	}
-
 	/* Free the CoW orphan record. */
 	xfs_refcount_free_cow_extent(tp, del.br_startblock, del.br_blockcount);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 8/8] xfs: rename the del variable in xfs_reflink_end_cow_extent
  2024-04-29  4:49 optimize COW end I/O remapping v2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2024-04-29  4:49 ` [PATCH 7/8] xfs: optimize extent remapping in xfs_reflink_end_cow_extent Christoph Hellwig
@ 2024-04-29  4:49 ` Christoph Hellwig
  2024-04-29 15:33   ` Darrick J. Wong
  7 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2024-04-29  4:49 UTC (permalink / raw
  To: Chandan Babu R, Darrick J. Wong; +Cc: linux-xfs

del contains the new extent that we are remapping.  Give it a somewhat
less confusing name.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_reflink.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index e20db39d1cc46f..f4c4cd4ef72336 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -784,7 +784,7 @@ xfs_reflink_end_cow_extent(
 	xfs_fileoff_t		end_fsb)
 {
 	struct xfs_iext_cursor	icur;
-	struct xfs_bmbt_irec	got, del;
+	struct xfs_bmbt_irec	got, remap;
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_trans	*tp;
 	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, XFS_COW_FORK);
@@ -820,7 +820,7 @@ xfs_reflink_end_cow_extent(
 	 * Only remap real extents that contain data.  With AIO, speculative
 	 * preallocations can leak into the range we are called upon, and we
 	 * need to skip them.  Preserve @got for the eventual CoW fork
-	 * deletion; from now on @del represents the mapping that we're
+	 * deletion; from now on @remap represents the mapping that we're
 	 * actually remapping.
 	 */
 	while (!xfs_bmap_is_written_extent(&got)) {
@@ -830,9 +830,9 @@ xfs_reflink_end_cow_extent(
 			goto out_cancel;
 		}
 	}
-	del = got;
-	xfs_trim_extent(&del, *offset_fsb, end_fsb - *offset_fsb);
-	trace_xfs_reflink_cow_remap_from(ip, &del);
+	remap = got;
+	xfs_trim_extent(&remap, *offset_fsb, end_fsb - *offset_fsb);
+	trace_xfs_reflink_cow_remap_from(ip, &remap);
 
 	error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK,
 			XFS_IEXT_REFLINK_END_COW_CNT);
@@ -840,23 +840,24 @@ xfs_reflink_end_cow_extent(
 		goto out_cancel;
 
 	/* Unmap the old data. */
-	error = xfs_reflink_unmap_old_data(&tp, ip, del.br_startoff,
-			del.br_startoff + del.br_blockcount);
+	error = xfs_reflink_unmap_old_data(&tp, ip, remap.br_startoff,
+			remap.br_startoff + remap.br_blockcount);
 	if (error)
 		goto out_cancel;
 
 	/* Free the CoW orphan record. */
-	xfs_refcount_free_cow_extent(tp, del.br_startblock, del.br_blockcount);
+	xfs_refcount_free_cow_extent(tp, remap.br_startblock,
+			remap.br_blockcount);
 
 	/* Map the new blocks into the data fork. */
-	xfs_bmap_map_extent(tp, ip, XFS_DATA_FORK, &del);
+	xfs_bmap_map_extent(tp, ip, XFS_DATA_FORK, &remap);
 
 	/* Charge this new data fork mapping to the on-disk quota. */
 	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_DELBCOUNT,
-			(long)del.br_blockcount);
+			(long)remap.br_blockcount);
 
 	/* Remove the mapping from the CoW fork. */
-	xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
+	xfs_bmap_del_extent_cow(ip, &icur, &got, &remap);
 
 	error = xfs_trans_commit(tp);
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
@@ -864,7 +865,7 @@ xfs_reflink_end_cow_extent(
 		return error;
 
 	/* Update the caller about how much progress we made. */
-	*offset_fsb = del.br_startoff + del.br_blockcount;
+	*offset_fsb = remap.br_startoff + remap.br_blockcount;
 	return 0;
 
 out_cancel:
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/8] xfs: upgrade the extent counters in xfs_reflink_end_cow_extent later
  2024-04-29  4:49 ` [PATCH 1/8] xfs: upgrade the extent counters in xfs_reflink_end_cow_extent later Christoph Hellwig
@ 2024-04-29 15:26   ` Darrick J. Wong
  2024-04-29 17:16     ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2024-04-29 15:26 UTC (permalink / raw
  To: Christoph Hellwig; +Cc: Chandan Babu R, linux-xfs

On Mon, Apr 29, 2024 at 06:49:10AM +0200, Christoph Hellwig wrote:
> Defer the extent counter size upgrade until we know we're going to
> modify the extent mapping.  This also defers dirtying the transaction
> and will allow us safely back out later in the function in later
> changes.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_reflink.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 7da0e8f961d351..9ce37d366534c3 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -751,14 +751,6 @@ xfs_reflink_end_cow_extent(
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
>  	xfs_trans_ijoin(tp, ip, 0);
>  
> -	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> -			XFS_IEXT_REFLINK_END_COW_CNT);
> -	if (error == -EFBIG)
> -		error = xfs_iext_count_upgrade(tp, ip,
> -				XFS_IEXT_REFLINK_END_COW_CNT);
> -	if (error)
> -		goto out_cancel;
> -
>  	/*
>  	 * In case of racing, overlapping AIO writes no COW extents might be
>  	 * left by the time I/O completes for the loser of the race.  In that

I think this is actually a bug fix.  If an xfs_iext_count_upgrade
dirties the transaction and then xfs_iext_lookup_extent cancels the
transaction due to the overlapping AIO race, the _trans_cancel shuts
down the fs, right?

Fixes: 4f86bb4b66c9 ("xfs: Conditionally upgrade existing inodes to use large extent counters")
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> @@ -787,6 +779,14 @@ xfs_reflink_end_cow_extent(
>  	del = got;
>  	xfs_trim_extent(&del, *offset_fsb, end_fsb - *offset_fsb);
>  
> +	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> +			XFS_IEXT_REFLINK_END_COW_CNT);
> +	if (error == -EFBIG)
> +		error = xfs_iext_count_upgrade(tp, ip,
> +				XFS_IEXT_REFLINK_END_COW_CNT);
> +	if (error)
> +		goto out_cancel;
> +
>  	/* Grab the corresponding mapping in the data fork. */
>  	nmaps = 1;
>  	error = xfs_bmapi_read(ip, del.br_startoff, del.br_blockcount, &data,
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/8] xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent
  2024-04-29  4:49 ` [PATCH 2/8] xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent Christoph Hellwig
@ 2024-04-29 15:27   ` Darrick J. Wong
  2024-04-29 17:17     ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2024-04-29 15:27 UTC (permalink / raw
  To: Christoph Hellwig; +Cc: Chandan Babu R, linux-xfs

On Mon, Apr 29, 2024 at 06:49:11AM +0200, Christoph Hellwig wrote:
> Accessing if_bytes without the ilock is racy.  Remove the initial
> if_bytes == 0 check in xfs_reflink_end_cow_extent and let
> ext_iext_lookup_extent fail for this case after we've taken the ilock.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I wonder if this has any practical (mal)effects on the system?

Regardless,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_reflink.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 9ce37d366534c3..0ab2ef5b58f6c4 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -731,12 +731,6 @@ xfs_reflink_end_cow_extent(
>  	int			nmaps;
>  	int			error;
>  
> -	/* No COW extents?  That's easy! */
> -	if (ifp->if_bytes == 0) {
> -		*offset_fsb = end_fsb;
> -		return 0;
> -	}
> -
>  	resblks = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK);
>  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0,
>  			XFS_TRANS_RESERVE, &tp);
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/8] xfs: xfs_quota_unreserve_blkres can't fail
  2024-04-29  4:49 ` [PATCH 4/8] xfs: xfs_quota_unreserve_blkres can't fail Christoph Hellwig
@ 2024-04-29 15:29   ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2024-04-29 15:29 UTC (permalink / raw
  To: Christoph Hellwig; +Cc: Chandan Babu R, linux-xfs

On Mon, Apr 29, 2024 at 06:49:13AM +0200, Christoph Hellwig wrote:
> Unreserving quotas can't fail due to quota limits, and we'll notice a
> shut down file system a bit later in all the callers anyway.  Return
> void and remove the error checking and propagation in the callers.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good to me,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 16 +++++-----------
>  fs/xfs/libxfs/xfs_bmap.h |  2 +-
>  fs/xfs/xfs_aops.c        |  6 +-----
>  fs/xfs/xfs_bmap_util.c   |  9 +++------
>  fs/xfs/xfs_bmap_util.h   |  2 +-
>  fs/xfs/xfs_iomap.c       |  4 ++--
>  fs/xfs/xfs_quota.h       |  7 ++++---
>  fs/xfs/xfs_reflink.c     | 11 +++--------
>  8 files changed, 20 insertions(+), 37 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 6053f5e5c71eec..68e80e8eaaeebe 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4870,7 +4870,7 @@ xfs_bmap_split_indlen(
>  	*indlen2 = len2;
>  }
>  
> -int
> +void
>  xfs_bmap_del_extent_delay(
>  	struct xfs_inode	*ip,
>  	int			whichfork,
> @@ -4886,7 +4886,6 @@ xfs_bmap_del_extent_delay(
>  	xfs_filblks_t		got_indlen, new_indlen, stolen = 0;
>  	uint32_t		state = xfs_bmap_fork_to_state(whichfork);
>  	uint64_t		fdblocks;
> -	int			error = 0;
>  	bool			isrt;
>  
>  	XFS_STATS_INC(mp, xs_del_exlist);
> @@ -4906,9 +4905,7 @@ xfs_bmap_del_extent_delay(
>  	 * sb counters as we might have to borrow some blocks for the
>  	 * indirect block accounting.
>  	 */
> -	error = xfs_quota_unreserve_blkres(ip, del->br_blockcount);
> -	if (error)
> -		return error;
> +	xfs_quota_unreserve_blkres(ip, del->br_blockcount);
>  	ip->i_delayed_blks -= del->br_blockcount;
>  
>  	if (got->br_startoff == del->br_startoff)
> @@ -5006,7 +5003,6 @@ xfs_bmap_del_extent_delay(
>  
>  	xfs_add_fdblocks(mp, fdblocks);
>  	xfs_mod_delalloc(ip, -(int64_t)del->br_blockcount, -da_diff);
> -	return error;
>  }
>  
>  void
> @@ -5564,18 +5560,16 @@ __xfs_bunmapi(
>  
>  delete:
>  		if (wasdel) {
> -			error = xfs_bmap_del_extent_delay(ip, whichfork, &icur,
> -					&got, &del);
> +			xfs_bmap_del_extent_delay(ip, whichfork, &icur, &got, &del);
>  		} else {
>  			error = xfs_bmap_del_extent_real(ip, tp, &icur, cur,
>  					&del, &tmp_logflags, whichfork,
>  					flags);
>  			logflags |= tmp_logflags;
> +			if (error)
> +				goto error0;
>  		}
>  
> -		if (error)
> -			goto error0;
> -
>  		end = del.br_startoff - 1;
>  nodelete:
>  		/*
> diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> index e98849eb9bbae3..667b0c2b33d1d5 100644
> --- a/fs/xfs/libxfs/xfs_bmap.h
> +++ b/fs/xfs/libxfs/xfs_bmap.h
> @@ -202,7 +202,7 @@ int	xfs_bmapi_write(struct xfs_trans *tp, struct xfs_inode *ip,
>  int	xfs_bunmapi(struct xfs_trans *tp, struct xfs_inode *ip,
>  		xfs_fileoff_t bno, xfs_filblks_t len, uint32_t flags,
>  		xfs_extnum_t nexts, int *done);
> -int	xfs_bmap_del_extent_delay(struct xfs_inode *ip, int whichfork,
> +void	xfs_bmap_del_extent_delay(struct xfs_inode *ip, int whichfork,
>  		struct xfs_iext_cursor *cur, struct xfs_bmbt_irec *got,
>  		struct xfs_bmbt_irec *del);
>  void	xfs_bmap_del_extent_cow(struct xfs_inode *ip,
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 3f428620ebf2a3..c51bc17f5cfa03 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -469,7 +469,6 @@ xfs_discard_folio(
>  {
>  	struct xfs_inode	*ip = XFS_I(folio->mapping->host);
>  	struct xfs_mount	*mp = ip->i_mount;
> -	int			error;
>  
>  	if (xfs_is_shutdown(mp))
>  		return;
> @@ -483,11 +482,8 @@ xfs_discard_folio(
>  	 * byte of the next folio. Hence the end offset is only dependent on the
>  	 * folio itself and not the start offset that is passed in.
>  	 */
> -	error = xfs_bmap_punch_delalloc_range(ip, pos,
> +	xfs_bmap_punch_delalloc_range(ip, pos,
>  				folio_pos(folio) + folio_size(folio));
> -
> -	if (error && !xfs_is_shutdown(mp))
> -		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
>  }
>  
>  static const struct iomap_writeback_ops xfs_writeback_ops = {
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 53aa90a0ee3a85..df370d7112dc54 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -440,7 +440,7 @@ xfs_getbmap(
>   * if the ranges only partially overlap them, so it is up to the caller to
>   * ensure that partial blocks are not passed in.
>   */
> -int
> +void
>  xfs_bmap_punch_delalloc_range(
>  	struct xfs_inode	*ip,
>  	xfs_off_t		start_byte,
> @@ -452,7 +452,6 @@ xfs_bmap_punch_delalloc_range(
>  	xfs_fileoff_t		end_fsb = XFS_B_TO_FSB(mp, end_byte);
>  	struct xfs_bmbt_irec	got, del;
>  	struct xfs_iext_cursor	icur;
> -	int			error = 0;
>  
>  	ASSERT(!xfs_need_iread_extents(ifp));
>  
> @@ -476,15 +475,13 @@ xfs_bmap_punch_delalloc_range(
>  			continue;
>  		}
>  
> -		error = xfs_bmap_del_extent_delay(ip, XFS_DATA_FORK, &icur,
> -						  &got, &del);
> -		if (error || !xfs_iext_get_extent(ifp, &icur, &got))
> +		xfs_bmap_del_extent_delay(ip, XFS_DATA_FORK, &icur, &got, &del);
> +		if (!xfs_iext_get_extent(ifp, &icur, &got))
>  			break;
>  	}
>  
>  out_unlock:
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -	return error;
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
> index 77ecbb753ef207..51f84d8ff372fa 100644
> --- a/fs/xfs/xfs_bmap_util.h
> +++ b/fs/xfs/xfs_bmap_util.h
> @@ -30,7 +30,7 @@ xfs_bmap_rtalloc(struct xfs_bmalloca *ap)
>  }
>  #endif /* CONFIG_XFS_RT */
>  
> -int	xfs_bmap_punch_delalloc_range(struct xfs_inode *ip,
> +void	xfs_bmap_punch_delalloc_range(struct xfs_inode *ip,
>  		xfs_off_t start_byte, xfs_off_t end_byte);
>  
>  struct kgetbmap {
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 9ce0f6b9df93e6..c06fca2e751c7c 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1211,8 +1211,8 @@ xfs_buffered_write_delalloc_punch(
>  	loff_t			offset,
>  	loff_t			length)
>  {
> -	return xfs_bmap_punch_delalloc_range(XFS_I(inode), offset,
> -			offset + length);
> +	xfs_bmap_punch_delalloc_range(XFS_I(inode), offset, offset + length);
> +	return 0;
>  }
>  
>  static int
> diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h
> index 621ea9d7cf06d9..23d71a55bbc006 100644
> --- a/fs/xfs/xfs_quota.h
> +++ b/fs/xfs/xfs_quota.h
> @@ -215,10 +215,11 @@ xfs_quota_reserve_blkres(struct xfs_inode *ip, int64_t blocks)
>  	return xfs_trans_reserve_quota_nblks(NULL, ip, blocks, 0, false);
>  }
>  
> -static inline int
> -xfs_quota_unreserve_blkres(struct xfs_inode *ip, int64_t blocks)
> +static inline void
> +xfs_quota_unreserve_blkres(struct xfs_inode *ip, uint64_t blocks)
>  {
> -	return xfs_quota_reserve_blkres(ip, -blocks);
> +	/* don't return an error as unreserving quotas can't fail */
> +	xfs_quota_reserve_blkres(ip, -(int64_t)blocks);
>  }
>  
>  extern int xfs_mount_reset_sbqflags(struct xfs_mount *);
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 0ab2ef5b58f6c4..02cb6c2b257058 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -606,10 +606,8 @@ xfs_reflink_cancel_cow_blocks(
>  		trace_xfs_reflink_cancel_cow(ip, &del);
>  
>  		if (isnullstartblock(del.br_startblock)) {
> -			error = xfs_bmap_del_extent_delay(ip, XFS_COW_FORK,
> -					&icur, &got, &del);
> -			if (error)
> -				break;
> +			xfs_bmap_del_extent_delay(ip, XFS_COW_FORK, &icur, &got,
> +					&del);
>  		} else if (del.br_state == XFS_EXT_UNWRITTEN || cancel_real) {
>  			ASSERT((*tpp)->t_highest_agno == NULLAGNUMBER);
>  
> @@ -632,10 +630,7 @@ xfs_reflink_cancel_cow_blocks(
>  			xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
>  
>  			/* Remove the quota reservation */
> -			error = xfs_quota_unreserve_blkres(ip,
> -					del.br_blockcount);
> -			if (error)
> -				break;
> +			xfs_quota_unreserve_blkres(ip, del.br_blockcount);
>  		} else {
>  			/* Didn't do anything, push cursor back. */
>  			xfs_iext_prev(ifp, &icur);
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/8] xfs: simplify iext overflow checking and upgrade
  2024-04-29  4:49 ` [PATCH 5/8] xfs: simplify iext overflow checking and upgrade Christoph Hellwig
@ 2024-04-29 15:32   ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2024-04-29 15:32 UTC (permalink / raw
  To: Christoph Hellwig; +Cc: Chandan Babu R, linux-xfs

On Mon, Apr 29, 2024 at 06:49:14AM +0200, Christoph Hellwig wrote:
> Currently the calls to xfs_iext_count_may_overflow and
> xfs_iext_count_upgrade are always paired.  Merge them into a single
> function to simplify the callers and the actual check and upgrade
> logic itself.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Much cleaner; I wish I could remember why we did it this way in the
first place...

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_attr.c       |  5 +--
>  fs/xfs/libxfs/xfs_bmap.c       |  5 +--
>  fs/xfs/libxfs/xfs_inode_fork.c | 57 +++++++++++++++-------------------
>  fs/xfs/libxfs/xfs_inode_fork.h |  6 ++--
>  fs/xfs/xfs_bmap_item.c         |  4 +--
>  fs/xfs/xfs_bmap_util.c         | 24 +++-----------
>  fs/xfs/xfs_dquot.c             |  5 +--
>  fs/xfs/xfs_iomap.c             |  9 ++----
>  fs/xfs/xfs_reflink.c           |  9 ++----
>  fs/xfs/xfs_rtalloc.c           |  5 +--
>  10 files changed, 41 insertions(+), 88 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 1c2a27fce08a9d..ded92ccefe9f6d 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -1050,11 +1050,8 @@ xfs_attr_set(
>  		return error;
>  
>  	if (op != XFS_ATTRUPDATE_REMOVE || xfs_inode_hasattr(dp)) {
> -		error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
> +		error = xfs_iext_count_ensure(args->trans, dp, XFS_ATTR_FORK,
>  				XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
> -		if (error == -EFBIG)
> -			error = xfs_iext_count_upgrade(args->trans, dp,
> -					XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
>  		if (error)
>  			goto out_trans_cancel;
>  	}
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 68e80e8eaaeebe..9a55ce4f1f0d45 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4621,11 +4621,8 @@ xfs_bmapi_convert_delalloc(
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
>  	xfs_trans_ijoin(tp, ip, 0);
>  
> -	error = xfs_iext_count_may_overflow(ip, whichfork,
> +	error = xfs_iext_count_ensure(tp, ip, whichfork,
>  			XFS_IEXT_ADD_NOSPLIT_CNT);
> -	if (error == -EFBIG)
> -		error = xfs_iext_count_upgrade(tp, ip,
> -				XFS_IEXT_ADD_NOSPLIT_CNT);
>  	if (error)
>  		goto out_trans_cancel;
>  
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
> index 7d660a9739090a..82e670dd1212c4 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.c
> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> @@ -765,53 +765,46 @@ xfs_ifork_verify_local_attr(
>  	return 0;
>  }
>  
> +/*
> + * Check if the inode fork supports adding nr_to_add more extents.
> + *
> + * If it doesn't but we can upgrade it to large extent counters, do the upgrade.
> + * If we can't upgrade or are already using big counters but still can't fit the
> + * additional extents, return -EFBIG.
> + */
>  int
> -xfs_iext_count_may_overflow(
> +xfs_iext_count_ensure(
> +	struct xfs_trans	*tp,
>  	struct xfs_inode	*ip,
>  	int			whichfork,
> -	int			nr_to_add)
> +	uint			nr_to_add)
>  {
> +	struct xfs_mount	*mp = ip->i_mount;
> +	bool			has_large =
> +		xfs_inode_has_large_extent_counts(ip);
>  	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, whichfork);
> -	uint64_t		max_exts;
>  	uint64_t		nr_exts;
>  
> +	ASSERT(nr_to_add <= XFS_MAX_EXTCNT_UPGRADE_NR);
> +
>  	if (whichfork == XFS_COW_FORK)
>  		return 0;
>  
> -	max_exts = xfs_iext_max_nextents(xfs_inode_has_large_extent_counts(ip),
> -				whichfork);
> -
> -	if (XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
> -		max_exts = 10;
> -
> +	/* no point in upgrading if if_nextents overflows */
>  	nr_exts = ifp->if_nextents + nr_to_add;
> -	if (nr_exts < ifp->if_nextents || nr_exts > max_exts)
> +	if (nr_exts < ifp->if_nextents)
>  		return -EFBIG;
>  
> -	return 0;
> -}
> -
> -/*
> - * Upgrade this inode's extent counter fields to be able to handle a potential
> - * increase in the extent count by nr_to_add.  Normally this is the same
> - * quantity that caused xfs_iext_count_may_overflow() to return -EFBIG.
> - */
> -int
> -xfs_iext_count_upgrade(
> -	struct xfs_trans	*tp,
> -	struct xfs_inode	*ip,
> -	uint			nr_to_add)
> -{
> -	ASSERT(nr_to_add <= XFS_MAX_EXTCNT_UPGRADE_NR);
> -
> -	if (!xfs_has_large_extent_counts(ip->i_mount) ||
> -	    xfs_inode_has_large_extent_counts(ip) ||
> -	    XFS_TEST_ERROR(false, ip->i_mount, XFS_ERRTAG_REDUCE_MAX_IEXTENTS))
> +	if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_REDUCE_MAX_IEXTENTS) &&
> +	    nr_exts > 10)
>  		return -EFBIG;
>  
> -	ip->i_diflags2 |= XFS_DIFLAG2_NREXT64;
> -	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> -
> +	if (nr_exts > xfs_iext_max_nextents(has_large, whichfork)) {
> +		if (has_large || !xfs_has_large_extent_counts(mp))
> +			return -EFBIG;
> +		ip->i_diflags2 |= XFS_DIFLAG2_NREXT64;
> +		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> +	}
>  	return 0;
>  }
>  
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
> index bd53eb951b6515..9e1456f5cc2c85 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.h
> +++ b/fs/xfs/libxfs/xfs_inode_fork.h
> @@ -256,10 +256,8 @@ extern void xfs_ifork_init_cow(struct xfs_inode *ip);
>  
>  int xfs_ifork_verify_local_data(struct xfs_inode *ip);
>  int xfs_ifork_verify_local_attr(struct xfs_inode *ip);
> -int xfs_iext_count_may_overflow(struct xfs_inode *ip, int whichfork,
> -		int nr_to_add);
> -int xfs_iext_count_upgrade(struct xfs_trans *tp, struct xfs_inode *ip,
> -		uint nr_to_add);
> +int xfs_iext_count_ensure(struct xfs_trans *tp, struct xfs_inode *ip,
> +		int whichfork, uint nr_to_add);
>  bool xfs_ifork_is_realtime(struct xfs_inode *ip, int whichfork);
>  
>  /* returns true if the fork has extents but they are not read in yet. */
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index d27859a684aa69..38067d02ee3ca7 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -524,9 +524,7 @@ xfs_bmap_recover_work(
>  	else
>  		iext_delta = XFS_IEXT_PUNCH_HOLE_CNT;
>  
> -	error = xfs_iext_count_may_overflow(ip, work->bi_whichfork, iext_delta);
> -	if (error == -EFBIG)
> -		error = xfs_iext_count_upgrade(tp, ip, iext_delta);
> +	error = xfs_iext_count_ensure(tp, ip, work->bi_whichfork, iext_delta);
>  	if (error)
>  		goto err_cancel;
>  
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index df370d7112dc54..cad3b3e4f1c33e 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -710,11 +710,8 @@ xfs_alloc_file_space(
>  		if (error)
>  			break;
>  
> -		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> +		error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK,
>  				XFS_IEXT_ADD_NOSPLIT_CNT);
> -		if (error == -EFBIG)
> -			error = xfs_iext_count_upgrade(tp, ip,
> -					XFS_IEXT_ADD_NOSPLIT_CNT);
>  		if (error)
>  			goto error;
>  
> @@ -772,10 +769,8 @@ xfs_unmap_extent(
>  	if (error)
>  		return error;
>  
> -	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> +	error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK,
>  			XFS_IEXT_PUNCH_HOLE_CNT);
> -	if (error == -EFBIG)
> -		error = xfs_iext_count_upgrade(tp, ip, XFS_IEXT_PUNCH_HOLE_CNT);
>  	if (error)
>  		goto out_trans_cancel;
>  
> @@ -1051,10 +1046,8 @@ xfs_insert_file_space(
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
>  	xfs_trans_ijoin(tp, ip, 0);
>  
> -	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> +	error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK,
>  			XFS_IEXT_PUNCH_HOLE_CNT);
> -	if (error == -EFBIG)
> -		error = xfs_iext_count_upgrade(tp, ip, XFS_IEXT_PUNCH_HOLE_CNT);
>  	if (error)
>  		goto out_trans_cancel;
>  
> @@ -1280,23 +1273,16 @@ xfs_swap_extent_rmap(
>  			trace_xfs_swap_extent_rmap_remap_piece(tip, &uirec);
>  
>  			if (xfs_bmap_is_real_extent(&uirec)) {
> -				error = xfs_iext_count_may_overflow(ip,
> -						XFS_DATA_FORK,
> +				error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK,
>  						XFS_IEXT_SWAP_RMAP_CNT);
> -				if (error == -EFBIG)
> -					error = xfs_iext_count_upgrade(tp, ip,
> -							XFS_IEXT_SWAP_RMAP_CNT);
>  				if (error)
>  					goto out;
>  			}
>  
>  			if (xfs_bmap_is_real_extent(&irec)) {
> -				error = xfs_iext_count_may_overflow(tip,
> +				error = xfs_iext_count_ensure(tp, tip,
>  						XFS_DATA_FORK,
>  						XFS_IEXT_SWAP_RMAP_CNT);
> -				if (error == -EFBIG)
> -					error = xfs_iext_count_upgrade(tp, ip,
> -							XFS_IEXT_SWAP_RMAP_CNT);
>  				if (error)
>  					goto out;
>  			}
> diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> index 13aba84bd64afb..c2e66d392399dd 100644
> --- a/fs/xfs/xfs_dquot.c
> +++ b/fs/xfs/xfs_dquot.c
> @@ -341,11 +341,8 @@ xfs_dquot_disk_alloc(
>  		goto err_cancel;
>  	}
>  
> -	error = xfs_iext_count_may_overflow(quotip, XFS_DATA_FORK,
> +	error = xfs_iext_count_ensure(tp, quotip, XFS_DATA_FORK,
>  			XFS_IEXT_ADD_NOSPLIT_CNT);
> -	if (error == -EFBIG)
> -		error = xfs_iext_count_upgrade(tp, quotip,
> -				XFS_IEXT_ADD_NOSPLIT_CNT);
>  	if (error)
>  		goto err_cancel;
>  
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index c06fca2e751c7c..128ad834ca69b1 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -299,9 +299,7 @@ xfs_iomap_write_direct(
>  	if (error)
>  		return error;
>  
> -	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, nr_exts);
> -	if (error == -EFBIG)
> -		error = xfs_iext_count_upgrade(tp, ip, nr_exts);
> +	error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK, nr_exts);
>  	if (error)
>  		goto out_trans_cancel;
>  
> @@ -625,11 +623,8 @@ xfs_iomap_write_unwritten(
>  		if (error)
>  			return error;
>  
> -		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> +		error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK,
>  				XFS_IEXT_WRITE_UNWRITTEN_CNT);
> -		if (error == -EFBIG)
> -			error = xfs_iext_count_upgrade(tp, ip,
> -					XFS_IEXT_WRITE_UNWRITTEN_CNT);
>  		if (error)
>  			goto error_on_bmapi_transaction;
>  
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 02cb6c2b257058..af388f2caef304 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -768,11 +768,8 @@ xfs_reflink_end_cow_extent(
>  	del = got;
>  	xfs_trim_extent(&del, *offset_fsb, end_fsb - *offset_fsb);
>  
> -	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> +	error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK,
>  			XFS_IEXT_REFLINK_END_COW_CNT);
> -	if (error == -EFBIG)
> -		error = xfs_iext_count_upgrade(tp, ip,
> -				XFS_IEXT_REFLINK_END_COW_CNT);
>  	if (error)
>  		goto out_cancel;
>  
> @@ -1272,9 +1269,7 @@ xfs_reflink_remap_extent(
>  	if (dmap_written)
>  		++iext_delta;
>  
> -	error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, iext_delta);
> -	if (error == -EFBIG)
> -		error = xfs_iext_count_upgrade(tp, ip, iext_delta);
> +	error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK, iext_delta);
>  	if (error)
>  		goto out_cancel;
>  
> diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> index b476a876478d93..37edf4c5ce73ad 100644
> --- a/fs/xfs/xfs_rtalloc.c
> +++ b/fs/xfs/xfs_rtalloc.c
> @@ -695,11 +695,8 @@ xfs_growfs_rt_alloc(
>  		xfs_ilock(ip, XFS_ILOCK_EXCL);
>  		xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
>  
> -		error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK,
> +		error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK,
>  				XFS_IEXT_ADD_NOSPLIT_CNT);
> -		if (error == -EFBIG)
> -			error = xfs_iext_count_upgrade(tp, ip,
> -					XFS_IEXT_ADD_NOSPLIT_CNT);
>  		if (error)
>  			goto out_trans_cancel;
>  
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 8/8] xfs: rename the del variable in xfs_reflink_end_cow_extent
  2024-04-29  4:49 ` [PATCH 8/8] xfs: rename the del variable " Christoph Hellwig
@ 2024-04-29 15:33   ` Darrick J. Wong
  2024-04-29 17:18     ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2024-04-29 15:33 UTC (permalink / raw
  To: Christoph Hellwig; +Cc: Chandan Babu R, linux-xfs

On Mon, Apr 29, 2024 at 06:49:17AM +0200, Christoph Hellwig wrote:
> del contains the new extent that we are remapping.  Give it a somewhat
> less confusing name.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

That's a much better choice of name!
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_reflink.c | 25 +++++++++++++------------
>  1 file changed, 13 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index e20db39d1cc46f..f4c4cd4ef72336 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -784,7 +784,7 @@ xfs_reflink_end_cow_extent(
>  	xfs_fileoff_t		end_fsb)
>  {
>  	struct xfs_iext_cursor	icur;
> -	struct xfs_bmbt_irec	got, del;
> +	struct xfs_bmbt_irec	got, remap;
>  	struct xfs_mount	*mp = ip->i_mount;
>  	struct xfs_trans	*tp;
>  	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, XFS_COW_FORK);
> @@ -820,7 +820,7 @@ xfs_reflink_end_cow_extent(
>  	 * Only remap real extents that contain data.  With AIO, speculative
>  	 * preallocations can leak into the range we are called upon, and we
>  	 * need to skip them.  Preserve @got for the eventual CoW fork
> -	 * deletion; from now on @del represents the mapping that we're
> +	 * deletion; from now on @remap represents the mapping that we're
>  	 * actually remapping.
>  	 */
>  	while (!xfs_bmap_is_written_extent(&got)) {
> @@ -830,9 +830,9 @@ xfs_reflink_end_cow_extent(
>  			goto out_cancel;
>  		}
>  	}
> -	del = got;
> -	xfs_trim_extent(&del, *offset_fsb, end_fsb - *offset_fsb);
> -	trace_xfs_reflink_cow_remap_from(ip, &del);
> +	remap = got;
> +	xfs_trim_extent(&remap, *offset_fsb, end_fsb - *offset_fsb);
> +	trace_xfs_reflink_cow_remap_from(ip, &remap);
>  
>  	error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK,
>  			XFS_IEXT_REFLINK_END_COW_CNT);
> @@ -840,23 +840,24 @@ xfs_reflink_end_cow_extent(
>  		goto out_cancel;
>  
>  	/* Unmap the old data. */
> -	error = xfs_reflink_unmap_old_data(&tp, ip, del.br_startoff,
> -			del.br_startoff + del.br_blockcount);
> +	error = xfs_reflink_unmap_old_data(&tp, ip, remap.br_startoff,
> +			remap.br_startoff + remap.br_blockcount);
>  	if (error)
>  		goto out_cancel;
>  
>  	/* Free the CoW orphan record. */
> -	xfs_refcount_free_cow_extent(tp, del.br_startblock, del.br_blockcount);
> +	xfs_refcount_free_cow_extent(tp, remap.br_startblock,
> +			remap.br_blockcount);
>  
>  	/* Map the new blocks into the data fork. */
> -	xfs_bmap_map_extent(tp, ip, XFS_DATA_FORK, &del);
> +	xfs_bmap_map_extent(tp, ip, XFS_DATA_FORK, &remap);
>  
>  	/* Charge this new data fork mapping to the on-disk quota. */
>  	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_DELBCOUNT,
> -			(long)del.br_blockcount);
> +			(long)remap.br_blockcount);
>  
>  	/* Remove the mapping from the CoW fork. */
> -	xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
> +	xfs_bmap_del_extent_cow(ip, &icur, &got, &remap);
>  
>  	error = xfs_trans_commit(tp);
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> @@ -864,7 +865,7 @@ xfs_reflink_end_cow_extent(
>  		return error;
>  
>  	/* Update the caller about how much progress we made. */
> -	*offset_fsb = del.br_startoff + del.br_blockcount;
> +	*offset_fsb = remap.br_startoff + remap.br_blockcount;
>  	return 0;
>  
>  out_cancel:
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/8] xfs: optimize extent remapping in xfs_reflink_end_cow_extent
  2024-04-29  4:49 ` [PATCH 7/8] xfs: optimize extent remapping in xfs_reflink_end_cow_extent Christoph Hellwig
@ 2024-04-29 15:43   ` Darrick J. Wong
  2024-04-30  9:38     ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2024-04-29 15:43 UTC (permalink / raw
  To: Christoph Hellwig; +Cc: Chandan Babu R, linux-xfs

On Mon, Apr 29, 2024 at 06:49:16AM +0200, Christoph Hellwig wrote:
> xfs_reflink_end_cow_extent currently caps the range it works on to
> fit both the existing extent (or hole) in the data fork and the
> new COW range.  For overwrites of fragmented regions that is highly
> inefficient, as we need to split the new region at every boundary,
> just for it to be merge back in the next pass.
> 
> Switch to unmapping the old data using a chain of deferred bmap
> and extent free ops ops first, and then handle remapping the new
> data in one single transaction instead.
> 
> Note that this also switches from a write to an itruncate transaction
> reservation as the xfs_reflink_end_cow_extent doesn't touch any of
> the allocator data structures in the AGF or the RT inodes.  Instead
> the lead transaction just unmaps blocks, and later they get freed,
> COW records get freed and the new blocks get mapped into the inode.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_reflink.c | 111 ++++++++++++++++++++++++++++---------------
>  1 file changed, 72 insertions(+), 39 deletions(-)
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index af388f2caef304..e20db39d1cc46f 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -701,6 +701,72 @@ xfs_reflink_cancel_cow_range(
>  	return error;
>  }
>  
> +/*
> + * Unmap any old data covering the COW target.
> + */
> +static int
> +xfs_reflink_unmap_old_data(
> +	struct xfs_trans	**tpp,
> +	struct xfs_inode	*ip,
> +	xfs_fileoff_t		offset_fsb,
> +	xfs_fileoff_t		end_fsb)
> +{
> +	struct xfs_ifork	*ifp = &ip->i_df;
> +	struct xfs_bmbt_irec	got, del;
> +	struct xfs_iext_cursor	icur;
> +	unsigned int		freed;
> +	int			error;
> +
> +	ASSERT(!xfs_need_iread_extents(ifp));
> +
> +restart:
> +	freed = 0;
> +	if (!xfs_iext_lookup_extent_before(ip, ifp, &end_fsb, &icur, &got))
> +		return 0;
> +
> +	while (got.br_startoff + got.br_blockcount > offset_fsb) {
> +		del = got;
> +		xfs_trim_extent(&del, offset_fsb, end_fsb - offset_fsb);
> +
> +		/* Extent delete may have bumped us forward */
> +		if (!del.br_blockcount)
> +			goto prev_extent;
> +
> +		trace_xfs_reflink_cow_remap_to(ip, &del);
> +		if (isnullstartblock(del.br_startblock)) {
> +			xfs_bmap_del_extent_delay(ip, XFS_DATA_FORK, &icur,
> +					&got, &del);
> +			goto refresh;
> +		}
> +
> +		xfs_bmap_unmap_extent(*tpp, ip, XFS_DATA_FORK, &del);
> +		xfs_refcount_decrease_extent(*tpp, &del);
> +		xfs_trans_mod_dquot_byino(*tpp, ip, XFS_TRANS_DQ_BCOUNT,
> +				-del.br_blockcount);
> +
> +		/*
> +		 * We can't add an unlimited number of EFIs and thus deferred
> +		 * unmapped items to a transaction.  Once we've filled our
> +		 * quota roll the transaction, which requires us to restart
> +		 * the lookup as the deferred item processing will change the
> +		 * iext tree.

Nit: This loop actually queues multiple intent items -- one BUI to
handle the unmap, one RUI if the rmapbt needs updating, one CUI to
decrement the old data fork extent's refcount, and one EFI if that was
the last ref to that space.  So I guess 128 of these is small enough not
to overflow a tr_itruncate transaction...

> +		 */
> +		if (++freed == XFS_MAX_ITRUNCATE_EFIS) {
> +			error = xfs_defer_finish(tpp);

...but the bigger problem here is that defer_finish rolls the
transaction having not added the log intent items for the cow ->
data fork remap operation.  If we commit the old *tpp and crash before
the new *tpp commits, then log recovery will zap the data fork extent
without replacing it with anything, and the file contents turn to
zeroes.

To fix this, you'd need to do this stuff (copy-pasted from
xfs_reflink_end_cow_extent):

	/* Free the CoW orphan record. */
	xfs_refcount_free_cow_extent(tp, remap.br_startblock, remap.br_blockcount);

	/* Map the new blocks into the data fork. */
	xfs_bmap_map_extent(tp, ip, XFS_DATA_FORK, &remap);

	/* Charge this new data fork mapping to the on-disk quota. */
	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_DELBCOUNT,
			(long)remap.br_blockcount);

	/* Remove the mapping from the CoW fork. */
	xfs_bmap_del_extent_cow(ip, &icur, &got, &remap);

before the xfs_defer_finish call, and with the same file block range as
was unmapped in this transaction.

--D

> +			if (error)
> +				return error;
> +			goto restart;
> +		}
> +prev_extent:
> +		xfs_iext_prev(ifp, &icur);
> +refresh:
> +		if (!xfs_iext_get_extent(ifp, &icur, &got))
> +			break;
> +	}
> +
> +	return 0;
> +}
> +
>  /*
>   * Remap part of the CoW fork into the data fork.
>   *
> @@ -718,16 +784,15 @@ xfs_reflink_end_cow_extent(
>  	xfs_fileoff_t		end_fsb)
>  {
>  	struct xfs_iext_cursor	icur;
> -	struct xfs_bmbt_irec	got, del, data;
> +	struct xfs_bmbt_irec	got, del;
>  	struct xfs_mount	*mp = ip->i_mount;
>  	struct xfs_trans	*tp;
>  	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, XFS_COW_FORK);
>  	unsigned int		resblks;
> -	int			nmaps;
>  	int			error;
>  
>  	resblks = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK);
> -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0,
> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks, 0,
>  			XFS_TRANS_RESERVE, &tp);
>  	if (error)
>  		return error;
> @@ -767,51 +832,19 @@ xfs_reflink_end_cow_extent(
>  	}
>  	del = got;
>  	xfs_trim_extent(&del, *offset_fsb, end_fsb - *offset_fsb);
> +	trace_xfs_reflink_cow_remap_from(ip, &del);
>  
>  	error = xfs_iext_count_ensure(tp, ip, XFS_DATA_FORK,
>  			XFS_IEXT_REFLINK_END_COW_CNT);
>  	if (error)
>  		goto out_cancel;
>  
> -	/* Grab the corresponding mapping in the data fork. */
> -	nmaps = 1;
> -	error = xfs_bmapi_read(ip, del.br_startoff, del.br_blockcount, &data,
> -			&nmaps, 0);
> +	/* Unmap the old data. */
> +	error = xfs_reflink_unmap_old_data(&tp, ip, del.br_startoff,
> +			del.br_startoff + del.br_blockcount);
>  	if (error)
>  		goto out_cancel;
>  
> -	/* We can only remap the smaller of the two extent sizes. */
> -	data.br_blockcount = min(data.br_blockcount, del.br_blockcount);
> -	del.br_blockcount = data.br_blockcount;
> -
> -	trace_xfs_reflink_cow_remap_from(ip, &del);
> -	trace_xfs_reflink_cow_remap_to(ip, &data);
> -
> -	if (xfs_bmap_is_real_extent(&data)) {
> -		/*
> -		 * If the extent we're remapping is backed by storage (written
> -		 * or not), unmap the extent and drop its refcount.
> -		 */
> -		xfs_bmap_unmap_extent(tp, ip, XFS_DATA_FORK, &data);
> -		xfs_refcount_decrease_extent(tp, &data);
> -		xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT,
> -				-data.br_blockcount);
> -	} else if (data.br_startblock == DELAYSTARTBLOCK) {
> -		int		done;
> -
> -		/*
> -		 * If the extent we're remapping is a delalloc reservation,
> -		 * we can use the regular bunmapi function to release the
> -		 * incore state.  Dropping the delalloc reservation takes care
> -		 * of the quota reservation for us.
> -		 */
> -		error = xfs_bunmapi(NULL, ip, data.br_startoff,
> -				data.br_blockcount, 0, 1, &done);
> -		if (error)
> -			goto out_cancel;
> -		ASSERT(done);
> -	}
> -
>  	/* Free the CoW orphan record. */
>  	xfs_refcount_free_cow_extent(tp, del.br_startblock, del.br_blockcount);
>  
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/8] xfs: upgrade the extent counters in xfs_reflink_end_cow_extent later
  2024-04-29 15:26   ` Darrick J. Wong
@ 2024-04-29 17:16     ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2024-04-29 17:16 UTC (permalink / raw
  To: Darrick J. Wong; +Cc: Christoph Hellwig, Chandan Babu R, linux-xfs

On Mon, Apr 29, 2024 at 08:26:52AM -0700, Darrick J. Wong wrote:
> I think this is actually a bug fix.  If an xfs_iext_count_upgrade
> dirties the transaction and then xfs_iext_lookup_extent cancels the
> transaction due to the overlapping AIO race, the _trans_cancel shuts
> down the fs, right?

True.  Pretty unlikely to hit, but still a bug.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/8] xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent
  2024-04-29 15:27   ` Darrick J. Wong
@ 2024-04-29 17:17     ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2024-04-29 17:17 UTC (permalink / raw
  To: Darrick J. Wong; +Cc: Christoph Hellwig, Chandan Babu R, linux-xfs

On Mon, Apr 29, 2024 at 08:27:43AM -0700, Darrick J. Wong wrote:
> On Mon, Apr 29, 2024 at 06:49:11AM +0200, Christoph Hellwig wrote:
> > Accessing if_bytes without the ilock is racy.  Remove the initial
> > if_bytes == 0 check in xfs_reflink_end_cow_extent and let
> > ext_iext_lookup_extent fail for this case after we've taken the ilock.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> I wonder if this has any practical (mal)effects on the system?

No, just found by code audit.  I though I did hit it but that was another
bug, but in the process I did a somewhat paranoid audit.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 8/8] xfs: rename the del variable in xfs_reflink_end_cow_extent
  2024-04-29 15:33   ` Darrick J. Wong
@ 2024-04-29 17:18     ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2024-04-29 17:18 UTC (permalink / raw
  To: Darrick J. Wong; +Cc: Christoph Hellwig, Chandan Babu R, linux-xfs

On Mon, Apr 29, 2024 at 08:33:56AM -0700, Darrick J. Wong wrote:
> On Mon, Apr 29, 2024 at 06:49:17AM +0200, Christoph Hellwig wrote:
> > del contains the new extent that we are remapping.  Give it a somewhat
> > less confusing name.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> That's a much better choice of name!

It's your choice so it must be good.  I actually slightly prefer the old
"new" naming :)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/8] xfs: optimize extent remapping in xfs_reflink_end_cow_extent
  2024-04-29 15:43   ` Darrick J. Wong
@ 2024-04-30  9:38     ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2024-04-30  9:38 UTC (permalink / raw
  To: Darrick J. Wong; +Cc: Christoph Hellwig, Chandan Babu R, linux-xfs

On Mon, Apr 29, 2024 at 08:43:44AM -0700, Darrick J. Wong wrote:
> Nit: This loop actually queues multiple intent items -- one BUI to
> handle the unmap, one RUI if the rmapbt needs updating, one CUI to
> decrement the old data fork extent's refcount, and one EFI if that was
> the last ref to that space.  So I guess 128 of these is small enough not
> to overflow a tr_itruncate transaction...

I've not actually had 128 hit by xfstests, to stress this patch I did
reduce the number to 4.  I played around with asserts a bit and
I can reliably hit 64 items, but I haven't tried bisecting further.

> before the xfs_defer_finish call, and with the same file block range as
> was unmapped in this transaction.

Indeed.  That's going to be a big rework, so for now I'm just going
to resend the reset of the series to get the fix and the cleanups in.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-04-30  9:38 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-29  4:49 optimize COW end I/O remapping v2 Christoph Hellwig
2024-04-29  4:49 ` [PATCH 1/8] xfs: upgrade the extent counters in xfs_reflink_end_cow_extent later Christoph Hellwig
2024-04-29 15:26   ` Darrick J. Wong
2024-04-29 17:16     ` Christoph Hellwig
2024-04-29  4:49 ` [PATCH 2/8] xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent Christoph Hellwig
2024-04-29 15:27   ` Darrick J. Wong
2024-04-29 17:17     ` Christoph Hellwig
2024-04-29  4:49 ` [PATCH 3/8] xfs: consolidate the xfs_quota_reserve_blkres definitions Christoph Hellwig
2024-04-29  4:49 ` [PATCH 4/8] xfs: xfs_quota_unreserve_blkres can't fail Christoph Hellwig
2024-04-29 15:29   ` Darrick J. Wong
2024-04-29  4:49 ` [PATCH 5/8] xfs: simplify iext overflow checking and upgrade Christoph Hellwig
2024-04-29 15:32   ` Darrick J. Wong
2024-04-29  4:49 ` [PATCH 6/8] xfs: lift XREP_MAX_ITRUNCATE_EFIS out of the scrub code Christoph Hellwig
2024-04-29  4:49 ` [PATCH 7/8] xfs: optimize extent remapping in xfs_reflink_end_cow_extent Christoph Hellwig
2024-04-29 15:43   ` Darrick J. Wong
2024-04-30  9:38     ` Christoph Hellwig
2024-04-29  4:49 ` [PATCH 8/8] xfs: rename the del variable " Christoph Hellwig
2024-04-29 15:33   ` Darrick J. Wong
2024-04-29 17:18     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.