* [PATCH v3 00/15] fuse: use iomap for buffered reads + readahead
@ 2025-09-16 23:44 Joanne Koong
2025-09-16 23:44 ` [PATCH v3 01/15] iomap: move bio read logic into helper function Joanne Koong
` (15 more replies)
0 siblings, 16 replies; 40+ messages in thread
From: Joanne Koong @ 2025-09-16 23:44 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
This series adds fuse iomap support for buffered reads and readahead.
This is needed so that granular uptodate tracking can be used in fuse when
large folios are enabled so that only the non-uptodate portions of the folio
need to be read in instead of having to read in the entire folio. It also is
needed in order to turn on large folios for servers that use the writeback
cache since otherwise there is a race condition that may lead to data
corruption if there is a partial write, then a read and the read happens
before the write has undergone writeback, since otherwise the folio will not
be marked uptodate from the partial write so the read will read in the entire
folio from disk, which will overwrite the partial write.
This is on top of commit 1228c548bb98 ("Merge branch 'vfs-6.18.writeback' into
vfs.all") in Christian's vfs.all tree.
This series was run through fstests on fuse passthrough_hp with an
out-of kernel patch enabling fuse large folios.
This patchset does not enable large folios on fuse yet. That will be part
of a different patchset.
Thanks,
Joanne
Changelog
---------
v2:
https://lore.kernel.org/linux-fsdevel/20250908185122.3199171-1-joannelkoong@gmail.com/
v2 -> v3:
* Incorporate Christoph's feedback
- Change naming to iomap_bio_* instead of iomap_xxx_bio
- Take his patch for moving bio logic into its own file (patch 11)
- Make ->read_folio_range interface not need pos arg (patch 9)
- Make ->submit_read return void (patch 9)
- Merge cur_folio_in_bio rename w/ tracking folio_owned internally (patch 7)
- Drop patch propagating error and replace with void return (patch 12)
- Make bias code better to read (patch 10)
* Add WARN_ON_ONCE check in iteration refactoring (patch 4)
* Rename ->read_submit to ->submit_read (patch 9)
v1:
https://lore.kernel.org/linux-fsdevel/20250829235627.4053234-1-joannelkoong@gmail.com/
v1 -> v2:
* Don't pass in caller-provided arg through iter->private, pass it through
ctx->private instead (Darrick & Christoph)
* Separate 'bias' for ifs->read_bytes_pending into separate patch (Christoph)
* Rework read/readahead interface to take in struct iomap_read_folio_ctx
(Christoph)
* Add patch for removing fuse fc->blkbits workaround, now that Miklos's tree
has been merged into Christian's
Joanne Koong (15):
iomap: move bio read logic into helper function
iomap: move read/readahead bio submission logic into helper function
iomap: store read/readahead bio generically
iomap: iterate over entire folio in iomap_readpage_iter()
iomap: rename iomap_readpage_iter() to iomap_read_folio_iter()
iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx
iomap: track read/readahead folio ownership internally
iomap: add public start/finish folio read helpers
iomap: add caller-provided callbacks for read and readahead
iomap: add bias for async read requests
iomap: move buffered io bio logic into new file
iomap: make iomap_read_folio() a void return
fuse: use iomap for read_folio
fuse: use iomap for readahead
fuse: remove fc->blkbits workaround for partial writes
.../filesystems/iomap/operations.rst | 45 +++
block/fops.c | 5 +-
fs/erofs/data.c | 5 +-
fs/fuse/dir.c | 2 +-
fs/fuse/file.c | 289 +++++++++++-------
fs/fuse/fuse_i.h | 8 -
fs/fuse/inode.c | 13 +-
fs/gfs2/aops.c | 6 +-
fs/iomap/Makefile | 3 +-
fs/iomap/bio.c | 90 ++++++
fs/iomap/buffered-io.c | 273 ++++++++---------
fs/iomap/internal.h | 12 +
fs/xfs/xfs_aops.c | 5 +-
fs/zonefs/file.c | 5 +-
include/linux/iomap.h | 65 +++-
15 files changed, 530 insertions(+), 296 deletions(-)
create mode 100644 fs/iomap/bio.c
--
2.47.3
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v3 01/15] iomap: move bio read logic into helper function
2025-09-16 23:44 [PATCH v3 00/15] fuse: use iomap for buffered reads + readahead Joanne Koong
@ 2025-09-16 23:44 ` Joanne Koong
2025-09-18 21:27 ` Darrick J. Wong
2025-09-16 23:44 ` [PATCH v3 02/15] iomap: move read/readahead bio submission " Joanne Koong
` (14 subsequent siblings)
15 siblings, 1 reply; 40+ messages in thread
From: Joanne Koong @ 2025-09-16 23:44 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Move the iomap_readpage_iter() bio read logic into a separate helper
function, iomap_bio_read_folio_range(). This is needed to make iomap
read/readahead more generically usable, especially for filesystems that
do not require CONFIG_BLOCK.
Additionally rename buffered write's iomap_read_folio_range() function
to iomap_bio_read_folio_range_sync() to better describe its synchronous
behavior.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 68 ++++++++++++++++++++++++------------------
1 file changed, 39 insertions(+), 29 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index fd827398afd2..05399aaa1361 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -357,36 +357,15 @@ struct iomap_readpage_ctx {
struct readahead_control *rac;
};
-static int iomap_readpage_iter(struct iomap_iter *iter,
- struct iomap_readpage_ctx *ctx)
+static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
+ struct iomap_readpage_ctx *ctx, loff_t pos, size_t plen)
{
+ struct folio *folio = ctx->cur_folio;
const struct iomap *iomap = &iter->iomap;
- loff_t pos = iter->pos;
+ struct iomap_folio_state *ifs = folio->private;
+ size_t poff = offset_in_folio(folio, pos);
loff_t length = iomap_length(iter);
- struct folio *folio = ctx->cur_folio;
- struct iomap_folio_state *ifs;
- size_t poff, plen;
sector_t sector;
- int ret;
-
- if (iomap->type == IOMAP_INLINE) {
- ret = iomap_read_inline_data(iter, folio);
- if (ret)
- return ret;
- return iomap_iter_advance(iter, &length);
- }
-
- /* zero post-eof blocks as the page may be mapped */
- ifs = ifs_alloc(iter->inode, folio, iter->flags);
- iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen);
- if (plen == 0)
- goto done;
-
- if (iomap_block_needs_zeroing(iter, pos)) {
- folio_zero_range(folio, poff, plen);
- iomap_set_range_uptodate(folio, poff, plen);
- goto done;
- }
ctx->cur_folio_in_bio = true;
if (ifs) {
@@ -425,6 +404,37 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
ctx->bio->bi_end_io = iomap_read_end_io;
bio_add_folio_nofail(ctx->bio, folio, plen, poff);
}
+}
+
+static int iomap_readpage_iter(struct iomap_iter *iter,
+ struct iomap_readpage_ctx *ctx)
+{
+ const struct iomap *iomap = &iter->iomap;
+ loff_t pos = iter->pos;
+ loff_t length = iomap_length(iter);
+ struct folio *folio = ctx->cur_folio;
+ size_t poff, plen;
+ int ret;
+
+ if (iomap->type == IOMAP_INLINE) {
+ ret = iomap_read_inline_data(iter, folio);
+ if (ret)
+ return ret;
+ return iomap_iter_advance(iter, &length);
+ }
+
+ /* zero post-eof blocks as the page may be mapped */
+ ifs_alloc(iter->inode, folio, iter->flags);
+ iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen);
+ if (plen == 0)
+ goto done;
+
+ if (iomap_block_needs_zeroing(iter, pos)) {
+ folio_zero_range(folio, poff, plen);
+ iomap_set_range_uptodate(folio, poff, plen);
+ } else {
+ iomap_bio_read_folio_range(iter, ctx, pos, plen);
+ }
done:
/*
@@ -549,7 +559,7 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
}
EXPORT_SYMBOL_GPL(iomap_readahead);
-static int iomap_read_folio_range(const struct iomap_iter *iter,
+static int iomap_bio_read_folio_range_sync(const struct iomap_iter *iter,
struct folio *folio, loff_t pos, size_t len)
{
const struct iomap *srcmap = iomap_iter_srcmap(iter);
@@ -562,7 +572,7 @@ static int iomap_read_folio_range(const struct iomap_iter *iter,
return submit_bio_wait(&bio);
}
#else
-static int iomap_read_folio_range(const struct iomap_iter *iter,
+static int iomap_bio_read_folio_range_sync(const struct iomap_iter *iter,
struct folio *folio, loff_t pos, size_t len)
{
WARN_ON_ONCE(1);
@@ -739,7 +749,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter,
status = write_ops->read_folio_range(iter,
folio, block_start, plen);
else
- status = iomap_read_folio_range(iter,
+ status = iomap_bio_read_folio_range_sync(iter,
folio, block_start, plen);
if (status)
return status;
--
2.47.3
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 02/15] iomap: move read/readahead bio submission logic into helper function
2025-09-16 23:44 [PATCH v3 00/15] fuse: use iomap for buffered reads + readahead Joanne Koong
2025-09-16 23:44 ` [PATCH v3 01/15] iomap: move bio read logic into helper function Joanne Koong
@ 2025-09-16 23:44 ` Joanne Koong
2025-09-18 21:27 ` Darrick J. Wong
2025-09-16 23:44 ` [PATCH v3 03/15] iomap: store read/readahead bio generically Joanne Koong
` (13 subsequent siblings)
15 siblings, 1 reply; 40+ messages in thread
From: Joanne Koong @ 2025-09-16 23:44 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Move the read/readahead bio submission logic into a separate helper.
This is needed to make iomap read/readahead more generically usable,
especially for filesystems that do not require CONFIG_BLOCK.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 30 ++++++++++++++++--------------
1 file changed, 16 insertions(+), 14 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 05399aaa1361..ee96558b6d99 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -357,6 +357,14 @@ struct iomap_readpage_ctx {
struct readahead_control *rac;
};
+static void iomap_bio_submit_read(struct iomap_readpage_ctx *ctx)
+{
+ struct bio *bio = ctx->bio;
+
+ if (bio)
+ submit_bio(bio);
+}
+
static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
struct iomap_readpage_ctx *ctx, loff_t pos, size_t plen)
{
@@ -382,8 +390,7 @@ static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
gfp_t orig_gfp = gfp;
unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
- if (ctx->bio)
- submit_bio(ctx->bio);
+ iomap_bio_submit_read(ctx);
if (ctx->rac) /* same as readahead_gfp_mask */
gfp |= __GFP_NORETRY | __GFP_NOWARN;
@@ -478,13 +485,10 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
while ((ret = iomap_iter(&iter, ops)) > 0)
iter.status = iomap_read_folio_iter(&iter, &ctx);
- if (ctx.bio) {
- submit_bio(ctx.bio);
- WARN_ON_ONCE(!ctx.cur_folio_in_bio);
- } else {
- WARN_ON_ONCE(ctx.cur_folio_in_bio);
+ iomap_bio_submit_read(&ctx);
+
+ if (!ctx.cur_folio_in_bio)
folio_unlock(folio);
- }
/*
* Just like mpage_readahead and block_read_full_folio, we always
@@ -550,12 +554,10 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
while (iomap_iter(&iter, ops) > 0)
iter.status = iomap_readahead_iter(&iter, &ctx);
- if (ctx.bio)
- submit_bio(ctx.bio);
- if (ctx.cur_folio) {
- if (!ctx.cur_folio_in_bio)
- folio_unlock(ctx.cur_folio);
- }
+ iomap_bio_submit_read(&ctx);
+
+ if (ctx.cur_folio && !ctx.cur_folio_in_bio)
+ folio_unlock(ctx.cur_folio);
}
EXPORT_SYMBOL_GPL(iomap_readahead);
--
2.47.3
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 03/15] iomap: store read/readahead bio generically
2025-09-16 23:44 [PATCH v3 00/15] fuse: use iomap for buffered reads + readahead Joanne Koong
2025-09-16 23:44 ` [PATCH v3 01/15] iomap: move bio read logic into helper function Joanne Koong
2025-09-16 23:44 ` [PATCH v3 02/15] iomap: move read/readahead bio submission " Joanne Koong
@ 2025-09-16 23:44 ` Joanne Koong
2025-09-18 21:29 ` Darrick J. Wong
2025-09-16 23:44 ` [PATCH v3 04/15] iomap: iterate over entire folio in iomap_readpage_iter() Joanne Koong
` (12 subsequent siblings)
15 siblings, 1 reply; 40+ messages in thread
From: Joanne Koong @ 2025-09-16 23:44 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Store the iomap_readpage_ctx bio generically as a "void *read_ctx".
This makes the read/readahead interface more generic, which allows it to
be used by filesystems that may not be block-based and may not have
CONFIG_BLOCK set.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 29 ++++++++++++++---------------
1 file changed, 14 insertions(+), 15 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index ee96558b6d99..2a1709e0757b 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -353,13 +353,13 @@ static void iomap_read_end_io(struct bio *bio)
struct iomap_readpage_ctx {
struct folio *cur_folio;
bool cur_folio_in_bio;
- struct bio *bio;
+ void *read_ctx;
struct readahead_control *rac;
};
static void iomap_bio_submit_read(struct iomap_readpage_ctx *ctx)
{
- struct bio *bio = ctx->bio;
+ struct bio *bio = ctx->read_ctx;
if (bio)
submit_bio(bio);
@@ -374,6 +374,7 @@ static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
size_t poff = offset_in_folio(folio, pos);
loff_t length = iomap_length(iter);
sector_t sector;
+ struct bio *bio = ctx->read_ctx;
ctx->cur_folio_in_bio = true;
if (ifs) {
@@ -383,9 +384,8 @@ static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
}
sector = iomap_sector(iomap, pos);
- if (!ctx->bio ||
- bio_end_sector(ctx->bio) != sector ||
- !bio_add_folio(ctx->bio, folio, plen, poff)) {
+ if (!bio || bio_end_sector(bio) != sector ||
+ !bio_add_folio(bio, folio, plen, poff)) {
gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
gfp_t orig_gfp = gfp;
unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
@@ -394,22 +394,21 @@ static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
if (ctx->rac) /* same as readahead_gfp_mask */
gfp |= __GFP_NORETRY | __GFP_NOWARN;
- ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
- REQ_OP_READ, gfp);
+ bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs), REQ_OP_READ,
+ gfp);
/*
* If the bio_alloc fails, try it again for a single page to
* avoid having to deal with partial page reads. This emulates
* what do_mpage_read_folio does.
*/
- if (!ctx->bio) {
- ctx->bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ,
- orig_gfp);
- }
+ if (!bio)
+ bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
if (ctx->rac)
- ctx->bio->bi_opf |= REQ_RAHEAD;
- ctx->bio->bi_iter.bi_sector = sector;
- ctx->bio->bi_end_io = iomap_read_end_io;
- bio_add_folio_nofail(ctx->bio, folio, plen, poff);
+ bio->bi_opf |= REQ_RAHEAD;
+ bio->bi_iter.bi_sector = sector;
+ bio->bi_end_io = iomap_read_end_io;
+ bio_add_folio_nofail(bio, folio, plen, poff);
+ ctx->read_ctx = bio;
}
}
--
2.47.3
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 04/15] iomap: iterate over entire folio in iomap_readpage_iter()
2025-09-16 23:44 [PATCH v3 00/15] fuse: use iomap for buffered reads + readahead Joanne Koong
` (2 preceding siblings ...)
2025-09-16 23:44 ` [PATCH v3 03/15] iomap: store read/readahead bio generically Joanne Koong
@ 2025-09-16 23:44 ` Joanne Koong
2025-09-18 21:37 ` Darrick J. Wong
2025-09-22 22:33 ` Joanne Koong
2025-09-16 23:44 ` [PATCH v3 05/15] iomap: rename iomap_readpage_iter() to iomap_read_folio_iter() Joanne Koong
` (11 subsequent siblings)
15 siblings, 2 replies; 40+ messages in thread
From: Joanne Koong @ 2025-09-16 23:44 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc, Christoph Hellwig
Iterate over all non-uptodate ranges in a single call to
iomap_readpage_iter() instead of leaving the partial folio iteration to
the caller.
This will be useful for supporting caller-provided async folio read
callbacks (added in later commit) because that will require tracking
when the first and last async read request for a folio is sent, in order
to prevent premature read completion of the folio.
This additionally makes the iomap_readahead_iter() logic a bit simpler.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
fs/iomap/buffered-io.c | 69 ++++++++++++++++++++----------------------
1 file changed, 32 insertions(+), 37 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 2a1709e0757b..0c4ba2a63490 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -420,6 +420,7 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
loff_t length = iomap_length(iter);
struct folio *folio = ctx->cur_folio;
size_t poff, plen;
+ loff_t count;
int ret;
if (iomap->type == IOMAP_INLINE) {
@@ -431,39 +432,33 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
/* zero post-eof blocks as the page may be mapped */
ifs_alloc(iter->inode, folio, iter->flags);
- iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen);
- if (plen == 0)
- goto done;
- if (iomap_block_needs_zeroing(iter, pos)) {
- folio_zero_range(folio, poff, plen);
- iomap_set_range_uptodate(folio, poff, plen);
- } else {
- iomap_bio_read_folio_range(iter, ctx, pos, plen);
- }
+ length = min_t(loff_t, length,
+ folio_size(folio) - offset_in_folio(folio, pos));
+ while (length) {
+ iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff,
+ &plen);
-done:
- /*
- * Move the caller beyond our range so that it keeps making progress.
- * For that, we have to include any leading non-uptodate ranges, but
- * we can skip trailing ones as they will be handled in the next
- * iteration.
- */
- length = pos - iter->pos + plen;
- return iomap_iter_advance(iter, &length);
-}
+ count = pos - iter->pos + plen;
+ if (WARN_ON_ONCE(count > length))
+ return -EIO;
-static int iomap_read_folio_iter(struct iomap_iter *iter,
- struct iomap_readpage_ctx *ctx)
-{
- int ret;
+ if (plen == 0)
+ return iomap_iter_advance(iter, &count);
- while (iomap_length(iter)) {
- ret = iomap_readpage_iter(iter, ctx);
+ if (iomap_block_needs_zeroing(iter, pos)) {
+ folio_zero_range(folio, poff, plen);
+ iomap_set_range_uptodate(folio, poff, plen);
+ } else {
+ iomap_bio_read_folio_range(iter, ctx, pos, plen);
+ }
+
+ length -= count;
+ ret = iomap_iter_advance(iter, &count);
if (ret)
return ret;
+ pos = iter->pos;
}
-
return 0;
}
@@ -482,7 +477,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
trace_iomap_readpage(iter.inode, 1);
while ((ret = iomap_iter(&iter, ops)) > 0)
- iter.status = iomap_read_folio_iter(&iter, &ctx);
+ iter.status = iomap_readpage_iter(&iter, &ctx);
iomap_bio_submit_read(&ctx);
@@ -504,16 +499,16 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
int ret;
while (iomap_length(iter)) {
- if (ctx->cur_folio &&
- offset_in_folio(ctx->cur_folio, iter->pos) == 0) {
- if (!ctx->cur_folio_in_bio)
- folio_unlock(ctx->cur_folio);
- ctx->cur_folio = NULL;
- }
- if (!ctx->cur_folio) {
- ctx->cur_folio = readahead_folio(ctx->rac);
- ctx->cur_folio_in_bio = false;
- }
+ if (ctx->cur_folio && !ctx->cur_folio_in_bio)
+ folio_unlock(ctx->cur_folio);
+ ctx->cur_folio = readahead_folio(ctx->rac);
+ /*
+ * We should never in practice hit this case since the iter
+ * length matches the readahead length.
+ */
+ if (WARN_ON_ONCE(!ctx->cur_folio))
+ return -EINVAL;
+ ctx->cur_folio_in_bio = false;
ret = iomap_readpage_iter(iter, ctx);
if (ret)
return ret;
--
2.47.3
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 05/15] iomap: rename iomap_readpage_iter() to iomap_read_folio_iter()
2025-09-16 23:44 [PATCH v3 00/15] fuse: use iomap for buffered reads + readahead Joanne Koong
` (3 preceding siblings ...)
2025-09-16 23:44 ` [PATCH v3 04/15] iomap: iterate over entire folio in iomap_readpage_iter() Joanne Koong
@ 2025-09-16 23:44 ` Joanne Koong
2025-09-16 23:44 ` [PATCH v3 06/15] iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx Joanne Koong
` (10 subsequent siblings)
15 siblings, 0 replies; 40+ messages in thread
From: Joanne Koong @ 2025-09-16 23:44 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc, Christoph Hellwig
->readpage was deprecated and reads are now on folios.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
fs/iomap/buffered-io.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 0c4ba2a63490..d6266cb702e3 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -412,7 +412,7 @@ static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
}
}
-static int iomap_readpage_iter(struct iomap_iter *iter,
+static int iomap_read_folio_iter(struct iomap_iter *iter,
struct iomap_readpage_ctx *ctx)
{
const struct iomap *iomap = &iter->iomap;
@@ -477,7 +477,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
trace_iomap_readpage(iter.inode, 1);
while ((ret = iomap_iter(&iter, ops)) > 0)
- iter.status = iomap_readpage_iter(&iter, &ctx);
+ iter.status = iomap_read_folio_iter(&iter, &ctx);
iomap_bio_submit_read(&ctx);
@@ -509,7 +509,7 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
if (WARN_ON_ONCE(!ctx->cur_folio))
return -EINVAL;
ctx->cur_folio_in_bio = false;
- ret = iomap_readpage_iter(iter, ctx);
+ ret = iomap_read_folio_iter(iter, ctx);
if (ret)
return ret;
}
--
2.47.3
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 06/15] iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx
2025-09-16 23:44 [PATCH v3 00/15] fuse: use iomap for buffered reads + readahead Joanne Koong
` (4 preceding siblings ...)
2025-09-16 23:44 ` [PATCH v3 05/15] iomap: rename iomap_readpage_iter() to iomap_read_folio_iter() Joanne Koong
@ 2025-09-16 23:44 ` Joanne Koong
2025-09-16 23:44 ` [PATCH v3 07/15] iomap: track read/readahead folio ownership internally Joanne Koong
` (9 subsequent siblings)
15 siblings, 0 replies; 40+ messages in thread
From: Joanne Koong @ 2025-09-16 23:44 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc, Christoph Hellwig
->readpage was deprecated and reads are now on folios.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
fs/iomap/buffered-io.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index d6266cb702e3..6c5a631848b7 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -350,14 +350,14 @@ static void iomap_read_end_io(struct bio *bio)
bio_put(bio);
}
-struct iomap_readpage_ctx {
+struct iomap_read_folio_ctx {
struct folio *cur_folio;
bool cur_folio_in_bio;
void *read_ctx;
struct readahead_control *rac;
};
-static void iomap_bio_submit_read(struct iomap_readpage_ctx *ctx)
+static void iomap_bio_submit_read(struct iomap_read_folio_ctx *ctx)
{
struct bio *bio = ctx->read_ctx;
@@ -366,7 +366,7 @@ static void iomap_bio_submit_read(struct iomap_readpage_ctx *ctx)
}
static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
- struct iomap_readpage_ctx *ctx, loff_t pos, size_t plen)
+ struct iomap_read_folio_ctx *ctx, loff_t pos, size_t plen)
{
struct folio *folio = ctx->cur_folio;
const struct iomap *iomap = &iter->iomap;
@@ -413,7 +413,7 @@ static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
}
static int iomap_read_folio_iter(struct iomap_iter *iter,
- struct iomap_readpage_ctx *ctx)
+ struct iomap_read_folio_ctx *ctx)
{
const struct iomap *iomap = &iter->iomap;
loff_t pos = iter->pos;
@@ -469,7 +469,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
.pos = folio_pos(folio),
.len = folio_size(folio),
};
- struct iomap_readpage_ctx ctx = {
+ struct iomap_read_folio_ctx ctx = {
.cur_folio = folio,
};
int ret;
@@ -494,7 +494,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
EXPORT_SYMBOL_GPL(iomap_read_folio);
static int iomap_readahead_iter(struct iomap_iter *iter,
- struct iomap_readpage_ctx *ctx)
+ struct iomap_read_folio_ctx *ctx)
{
int ret;
@@ -539,7 +539,7 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
.pos = readahead_pos(rac),
.len = readahead_length(rac),
};
- struct iomap_readpage_ctx ctx = {
+ struct iomap_read_folio_ctx ctx = {
.rac = rac,
};
--
2.47.3
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 07/15] iomap: track read/readahead folio ownership internally
2025-09-16 23:44 [PATCH v3 00/15] fuse: use iomap for buffered reads + readahead Joanne Koong
` (5 preceding siblings ...)
2025-09-16 23:44 ` [PATCH v3 06/15] iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx Joanne Koong
@ 2025-09-16 23:44 ` Joanne Koong
2025-09-18 21:49 ` Darrick J. Wong
2025-09-16 23:44 ` [PATCH v3 08/15] iomap: add public start/finish folio read helpers Joanne Koong
` (8 subsequent siblings)
15 siblings, 1 reply; 40+ messages in thread
From: Joanne Koong @ 2025-09-16 23:44 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
The purpose of "struct iomap_read_folio_ctx->cur_folio_in_bio" is to
track folio ownership to know who is responsible for unlocking it.
Rename "cur_folio_in_bio" to "cur_folio_owned" to better reflect this
purpose and so that this can be generically used later on by filesystems
that are not block-based.
Since "struct iomap_read_folio_ctx" will be made a public interface
later on when read/readahead takes in caller-provided callbacks, track
the folio ownership state internally instead of exposing it in "struct
iomap_read_folio_ctx" to make the interface simpler for end users.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 34 +++++++++++++++++++++++-----------
1 file changed, 23 insertions(+), 11 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 6c5a631848b7..587bbdbd24bc 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -352,7 +352,6 @@ static void iomap_read_end_io(struct bio *bio)
struct iomap_read_folio_ctx {
struct folio *cur_folio;
- bool cur_folio_in_bio;
void *read_ctx;
struct readahead_control *rac;
};
@@ -376,7 +375,6 @@ static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
sector_t sector;
struct bio *bio = ctx->read_ctx;
- ctx->cur_folio_in_bio = true;
if (ifs) {
spin_lock_irq(&ifs->state_lock);
ifs->read_bytes_pending += plen;
@@ -413,7 +411,7 @@ static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
}
static int iomap_read_folio_iter(struct iomap_iter *iter,
- struct iomap_read_folio_ctx *ctx)
+ struct iomap_read_folio_ctx *ctx, bool *cur_folio_owned)
{
const struct iomap *iomap = &iter->iomap;
loff_t pos = iter->pos;
@@ -450,6 +448,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
folio_zero_range(folio, poff, plen);
iomap_set_range_uptodate(folio, poff, plen);
} else {
+ *cur_folio_owned = true;
iomap_bio_read_folio_range(iter, ctx, pos, plen);
}
@@ -472,16 +471,22 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
struct iomap_read_folio_ctx ctx = {
.cur_folio = folio,
};
+ /*
+ * If an external IO helper takes ownership of the folio, it is
+ * responsible for unlocking it when the read completes.
+ */
+ bool cur_folio_owned = false;
int ret;
trace_iomap_readpage(iter.inode, 1);
while ((ret = iomap_iter(&iter, ops)) > 0)
- iter.status = iomap_read_folio_iter(&iter, &ctx);
+ iter.status = iomap_read_folio_iter(&iter, &ctx,
+ &cur_folio_owned);
iomap_bio_submit_read(&ctx);
- if (!ctx.cur_folio_in_bio)
+ if (!cur_folio_owned)
folio_unlock(folio);
/*
@@ -494,12 +499,13 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
EXPORT_SYMBOL_GPL(iomap_read_folio);
static int iomap_readahead_iter(struct iomap_iter *iter,
- struct iomap_read_folio_ctx *ctx)
+ struct iomap_read_folio_ctx *ctx,
+ bool *cur_folio_owned)
{
int ret;
while (iomap_length(iter)) {
- if (ctx->cur_folio && !ctx->cur_folio_in_bio)
+ if (ctx->cur_folio && !*cur_folio_owned)
folio_unlock(ctx->cur_folio);
ctx->cur_folio = readahead_folio(ctx->rac);
/*
@@ -508,8 +514,8 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
*/
if (WARN_ON_ONCE(!ctx->cur_folio))
return -EINVAL;
- ctx->cur_folio_in_bio = false;
- ret = iomap_read_folio_iter(iter, ctx);
+ *cur_folio_owned = false;
+ ret = iomap_read_folio_iter(iter, ctx, cur_folio_owned);
if (ret)
return ret;
}
@@ -542,15 +548,21 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
struct iomap_read_folio_ctx ctx = {
.rac = rac,
};
+ /*
+ * If an external IO helper takes ownership of the folio, it is
+ * responsible for unlocking it when the read completes.
+ */
+ bool cur_folio_owned = false;
trace_iomap_readahead(rac->mapping->host, readahead_count(rac));
while (iomap_iter(&iter, ops) > 0)
- iter.status = iomap_readahead_iter(&iter, &ctx);
+ iter.status = iomap_readahead_iter(&iter, &ctx,
+ &cur_folio_owned);
iomap_bio_submit_read(&ctx);
- if (ctx.cur_folio && !ctx.cur_folio_in_bio)
+ if (ctx.cur_folio && !cur_folio_owned)
folio_unlock(ctx.cur_folio);
}
EXPORT_SYMBOL_GPL(iomap_readahead);
--
2.47.3
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 08/15] iomap: add public start/finish folio read helpers
2025-09-16 23:44 [PATCH v3 00/15] fuse: use iomap for buffered reads + readahead Joanne Koong
` (6 preceding siblings ...)
2025-09-16 23:44 ` [PATCH v3 07/15] iomap: track read/readahead folio ownership internally Joanne Koong
@ 2025-09-16 23:44 ` Joanne Koong
2025-09-16 23:44 ` [PATCH v3 09/15] iomap: add caller-provided callbacks for read and readahead Joanne Koong
` (7 subsequent siblings)
15 siblings, 0 replies; 40+ messages in thread
From: Joanne Koong @ 2025-09-16 23:44 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc, Christoph Hellwig
Move ifs read_bytes_pending increment logic into a separate helper,
iomap_start_folio_read(), which will be needed later on by
caller-provided read callbacks (added in a later commit) for
read/readahead. This is the counterpart to the currently existing
iomap_finish_folio_read().
Make iomap_start_folio_read() and iomap_finish_folio_read() publicly
accessible. These need to be accessible in order for caller-provided
read callbacks to use.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
fs/iomap/buffered-io.c | 26 +++++++++++++++++---------
include/linux/iomap.h | 3 +++
2 files changed, 20 insertions(+), 9 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 587bbdbd24bc..379438970347 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -317,9 +317,20 @@ static int iomap_read_inline_data(const struct iomap_iter *iter,
return 0;
}
-#ifdef CONFIG_BLOCK
-static void iomap_finish_folio_read(struct folio *folio, size_t off,
- size_t len, int error)
+void iomap_start_folio_read(struct folio *folio, size_t len)
+{
+ struct iomap_folio_state *ifs = folio->private;
+
+ if (ifs) {
+ spin_lock_irq(&ifs->state_lock);
+ ifs->read_bytes_pending += len;
+ spin_unlock_irq(&ifs->state_lock);
+ }
+}
+EXPORT_SYMBOL_GPL(iomap_start_folio_read);
+
+void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
+ int error)
{
struct iomap_folio_state *ifs = folio->private;
bool uptodate = !error;
@@ -339,7 +350,9 @@ static void iomap_finish_folio_read(struct folio *folio, size_t off,
if (finished)
folio_end_read(folio, uptodate);
}
+EXPORT_SYMBOL_GPL(iomap_finish_folio_read);
+#ifdef CONFIG_BLOCK
static void iomap_read_end_io(struct bio *bio)
{
int error = blk_status_to_errno(bio->bi_status);
@@ -369,17 +382,12 @@ static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
{
struct folio *folio = ctx->cur_folio;
const struct iomap *iomap = &iter->iomap;
- struct iomap_folio_state *ifs = folio->private;
size_t poff = offset_in_folio(folio, pos);
loff_t length = iomap_length(iter);
sector_t sector;
struct bio *bio = ctx->read_ctx;
- if (ifs) {
- spin_lock_irq(&ifs->state_lock);
- ifs->read_bytes_pending += plen;
- spin_unlock_irq(&ifs->state_lock);
- }
+ iomap_start_folio_read(folio, plen);
sector = iomap_sector(iomap, pos);
if (!bio || bio_end_sector(bio) != sector ||
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 73dceabc21c8..0938c4a57f4c 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -467,6 +467,9 @@ ssize_t iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, struct folio *folio,
loff_t pos, loff_t end_pos, unsigned int dirty_len);
int iomap_ioend_writeback_submit(struct iomap_writepage_ctx *wpc, int error);
+void iomap_start_folio_read(struct folio *folio, size_t len);
+void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
+ int error);
void iomap_start_folio_write(struct inode *inode, struct folio *folio,
size_t len);
void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
--
2.47.3
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 09/15] iomap: add caller-provided callbacks for read and readahead
2025-09-16 23:44 [PATCH v3 00/15] fuse: use iomap for buffered reads + readahead Joanne Koong
` (7 preceding siblings ...)
2025-09-16 23:44 ` [PATCH v3 08/15] iomap: add public start/finish folio read helpers Joanne Koong
@ 2025-09-16 23:44 ` Joanne Koong
2025-09-16 23:44 ` [PATCH v3 10/15] iomap: add bias for async read requests Joanne Koong
` (6 subsequent siblings)
15 siblings, 0 replies; 40+ messages in thread
From: Joanne Koong @ 2025-09-16 23:44 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Add caller-provided callbacks for read and readahead so that it can be
used generically, especially by filesystems that are not block-based.
In particular, this:
* Modifies the read and readahead interface to take in a
struct iomap_read_folio_ctx that is publicly defined as:
struct iomap_read_folio_ctx {
const struct iomap_read_ops *ops;
struct folio *cur_folio;
struct readahead_control *rac;
void *read_ctx;
};
where struct iomap_read_ops is defined as:
struct iomap_read_ops {
int (*read_folio_range)(const struct iomap_iter *iter,
struct iomap_read_folio_ctx *ctx,
size_t len);
void (*read_submit)(struct iomap_read_folio_ctx *ctx);
};
read_folio_range() reads in the folio range and is required by the
caller to provide. read_submit() is optional and is used for
submitting any pending read requests.
* Modifies existing filesystems that use iomap for read and readahead to
use the new API, through the new statically inlined helpers
iomap_bio_read_folio() and iomap_bio_readahead(). There is no change
in functinality for those filesystems.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
.../filesystems/iomap/operations.rst | 45 ++++++++++++
block/fops.c | 5 +-
fs/erofs/data.c | 5 +-
fs/gfs2/aops.c | 6 +-
fs/iomap/buffered-io.c | 69 +++++++++++--------
fs/xfs/xfs_aops.c | 5 +-
fs/zonefs/file.c | 5 +-
include/linux/iomap.h | 62 ++++++++++++++++-
8 files changed, 159 insertions(+), 43 deletions(-)
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index 067ed8e14ef3..dbb193415c0e 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -135,6 +135,29 @@ These ``struct kiocb`` flags are significant for buffered I/O with iomap:
* ``IOCB_DONTCACHE``: Turns on ``IOMAP_DONTCACHE``.
+``struct iomap_read_ops``
+--------------------------
+
+.. code-block:: c
+
+ struct iomap_read_ops {
+ int (*read_folio_range)(const struct iomap_iter *iter,
+ struct iomap_read_folio_ctx *ctx, size_t len);
+ void (*submit_read)(struct iomap_read_folio_ctx *ctx);
+ };
+
+iomap calls these functions:
+
+ - ``read_folio_range``: Called to read in the range. This must be provided
+ by the caller. The caller is responsible for calling
+ iomap_start_folio_read() and iomap_finish_folio_read() before and after
+ reading in the folio range. This should be done even if an error is
+ encountered during the read. This returns 0 on success or a negative error
+ on failure.
+
+ - ``submit_read``: Submit any pending read requests. This function is
+ optional.
+
Internal per-Folio State
------------------------
@@ -182,6 +205,28 @@ The ``flags`` argument to ``->iomap_begin`` will be set to zero.
The pagecache takes whatever locks it needs before calling the
filesystem.
+Both ``iomap_readahead`` and ``iomap_read_folio`` pass in a ``struct
+iomap_read_folio_ctx``:
+
+.. code-block:: c
+
+ struct iomap_read_folio_ctx {
+ const struct iomap_read_ops *ops;
+ struct folio *cur_folio;
+ struct readahead_control *rac;
+ void *read_ctx;
+ };
+
+``iomap_readahead`` must set:
+ * ``ops->read_folio_range()`` and ``rac``
+
+``iomap_read_folio`` must set:
+ * ``ops->read_folio_range()`` and ``cur_folio``
+
+``ops->submit_read()`` and ``read_ctx`` are optional. ``read_ctx`` is used to
+pass in any custom data the caller needs accessible in the ops callbacks for
+fulfilling reads.
+
Buffered Writes
---------------
diff --git a/block/fops.c b/block/fops.c
index ddbc69c0922b..a2c2391d8dfa 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -533,12 +533,13 @@ const struct address_space_operations def_blk_aops = {
#else /* CONFIG_BUFFER_HEAD */
static int blkdev_read_folio(struct file *file, struct folio *folio)
{
- return iomap_read_folio(folio, &blkdev_iomap_ops);
+ iomap_bio_read_folio(folio, &blkdev_iomap_ops);
+ return 0;
}
static void blkdev_readahead(struct readahead_control *rac)
{
- iomap_readahead(rac, &blkdev_iomap_ops);
+ iomap_bio_readahead(rac, &blkdev_iomap_ops);
}
static ssize_t blkdev_writeback_range(struct iomap_writepage_ctx *wpc,
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 3b1ba571c728..be4191b33321 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -371,7 +371,8 @@ static int erofs_read_folio(struct file *file, struct folio *folio)
{
trace_erofs_read_folio(folio, true);
- return iomap_read_folio(folio, &erofs_iomap_ops);
+ iomap_bio_read_folio(folio, &erofs_iomap_ops);
+ return 0;
}
static void erofs_readahead(struct readahead_control *rac)
@@ -379,7 +380,7 @@ static void erofs_readahead(struct readahead_control *rac)
trace_erofs_readahead(rac->mapping->host, readahead_index(rac),
readahead_count(rac), true);
- return iomap_readahead(rac, &erofs_iomap_ops);
+ iomap_bio_readahead(rac, &erofs_iomap_ops);
}
static sector_t erofs_bmap(struct address_space *mapping, sector_t block)
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 47d74afd63ac..38d4f343187a 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -424,11 +424,11 @@ static int gfs2_read_folio(struct file *file, struct folio *folio)
struct inode *inode = folio->mapping->host;
struct gfs2_inode *ip = GFS2_I(inode);
struct gfs2_sbd *sdp = GFS2_SB(inode);
- int error;
+ int error = 0;
if (!gfs2_is_jdata(ip) ||
(i_blocksize(inode) == PAGE_SIZE && !folio_buffers(folio))) {
- error = iomap_read_folio(folio, &gfs2_iomap_ops);
+ iomap_bio_read_folio(folio, &gfs2_iomap_ops);
} else if (gfs2_is_stuffed(ip)) {
error = stuffed_read_folio(ip, folio);
} else {
@@ -503,7 +503,7 @@ static void gfs2_readahead(struct readahead_control *rac)
else if (gfs2_is_jdata(ip))
mpage_readahead(rac, gfs2_block_map);
else
- iomap_readahead(rac, &gfs2_iomap_ops);
+ iomap_bio_readahead(rac, &gfs2_iomap_ops);
}
/**
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 379438970347..561378f2b9bb 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -363,12 +363,6 @@ static void iomap_read_end_io(struct bio *bio)
bio_put(bio);
}
-struct iomap_read_folio_ctx {
- struct folio *cur_folio;
- void *read_ctx;
- struct readahead_control *rac;
-};
-
static void iomap_bio_submit_read(struct iomap_read_folio_ctx *ctx)
{
struct bio *bio = ctx->read_ctx;
@@ -377,11 +371,12 @@ static void iomap_bio_submit_read(struct iomap_read_folio_ctx *ctx)
submit_bio(bio);
}
-static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
- struct iomap_read_folio_ctx *ctx, loff_t pos, size_t plen)
+static int iomap_bio_read_folio_range(const struct iomap_iter *iter,
+ struct iomap_read_folio_ctx *ctx, size_t plen)
{
struct folio *folio = ctx->cur_folio;
const struct iomap *iomap = &iter->iomap;
+ loff_t pos = iter->pos;
size_t poff = offset_in_folio(folio, pos);
loff_t length = iomap_length(iter);
sector_t sector;
@@ -416,8 +411,15 @@ static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
bio_add_folio_nofail(bio, folio, plen, poff);
ctx->read_ctx = bio;
}
+ return 0;
}
+const struct iomap_read_ops iomap_bio_read_ops = {
+ .read_folio_range = iomap_bio_read_folio_range,
+ .submit_read = iomap_bio_submit_read,
+};
+EXPORT_SYMBOL_GPL(iomap_bio_read_ops);
+
static int iomap_read_folio_iter(struct iomap_iter *iter,
struct iomap_read_folio_ctx *ctx, bool *cur_folio_owned)
{
@@ -426,7 +428,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
loff_t length = iomap_length(iter);
struct folio *folio = ctx->cur_folio;
size_t poff, plen;
- loff_t count;
+ loff_t delta;
int ret;
if (iomap->type == IOMAP_INLINE) {
@@ -445,23 +447,30 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff,
&plen);
- count = pos - iter->pos + plen;
- if (WARN_ON_ONCE(count > length))
+ delta = pos - iter->pos;
+ if (WARN_ON_ONCE(delta + plen > length))
return -EIO;
+ length -= delta + plen;
+
+ ret = iomap_iter_advance(iter, &delta);
+ if (ret)
+ return ret;
if (plen == 0)
- return iomap_iter_advance(iter, &count);
+ return 0;
if (iomap_block_needs_zeroing(iter, pos)) {
folio_zero_range(folio, poff, plen);
iomap_set_range_uptodate(folio, poff, plen);
} else {
*cur_folio_owned = true;
- iomap_bio_read_folio_range(iter, ctx, pos, plen);
+ ret = ctx->ops->read_folio_range(iter, ctx, plen);
+ if (ret)
+ return ret;
}
- length -= count;
- ret = iomap_iter_advance(iter, &count);
+ delta = plen;
+ ret = iomap_iter_advance(iter, &delta);
if (ret)
return ret;
pos = iter->pos;
@@ -469,16 +478,15 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
return 0;
}
-int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
+int iomap_read_folio(const struct iomap_ops *ops,
+ struct iomap_read_folio_ctx *ctx)
{
+ struct folio *folio = ctx->cur_folio;
struct iomap_iter iter = {
.inode = folio->mapping->host,
.pos = folio_pos(folio),
.len = folio_size(folio),
};
- struct iomap_read_folio_ctx ctx = {
- .cur_folio = folio,
- };
/*
* If an external IO helper takes ownership of the folio, it is
* responsible for unlocking it when the read completes.
@@ -489,10 +497,11 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
trace_iomap_readpage(iter.inode, 1);
while ((ret = iomap_iter(&iter, ops)) > 0)
- iter.status = iomap_read_folio_iter(&iter, &ctx,
+ iter.status = iomap_read_folio_iter(&iter, ctx,
&cur_folio_owned);
- iomap_bio_submit_read(&ctx);
+ if (ctx->ops->submit_read)
+ ctx->ops->submit_read(ctx);
if (!cur_folio_owned)
folio_unlock(folio);
@@ -533,8 +542,8 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
/**
* iomap_readahead - Attempt to read pages from a file.
- * @rac: Describes the pages to be read.
* @ops: The operations vector for the filesystem.
+ * @ctx: The ctx used for issuing readahead.
*
* This function is for filesystems to call to implement their readahead
* address_space operation.
@@ -546,16 +555,15 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
* function is called with memalloc_nofs set, so allocations will not cause
* the filesystem to be reentered.
*/
-void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
+void iomap_readahead(const struct iomap_ops *ops,
+ struct iomap_read_folio_ctx *ctx)
{
+ struct readahead_control *rac = ctx->rac;
struct iomap_iter iter = {
.inode = rac->mapping->host,
.pos = readahead_pos(rac),
.len = readahead_length(rac),
};
- struct iomap_read_folio_ctx ctx = {
- .rac = rac,
- };
/*
* If an external IO helper takes ownership of the folio, it is
* responsible for unlocking it when the read completes.
@@ -565,13 +573,14 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
trace_iomap_readahead(rac->mapping->host, readahead_count(rac));
while (iomap_iter(&iter, ops) > 0)
- iter.status = iomap_readahead_iter(&iter, &ctx,
+ iter.status = iomap_readahead_iter(&iter, ctx,
&cur_folio_owned);
- iomap_bio_submit_read(&ctx);
+ if (ctx->ops->submit_read)
+ ctx->ops->submit_read(ctx);
- if (ctx.cur_folio && !cur_folio_owned)
- folio_unlock(ctx.cur_folio);
+ if (ctx->cur_folio && !cur_folio_owned)
+ folio_unlock(ctx->cur_folio);
}
EXPORT_SYMBOL_GPL(iomap_readahead);
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index a26f79815533..0c2ed00733f2 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -742,14 +742,15 @@ xfs_vm_read_folio(
struct file *unused,
struct folio *folio)
{
- return iomap_read_folio(folio, &xfs_read_iomap_ops);
+ iomap_bio_read_folio(folio, &xfs_read_iomap_ops);
+ return 0;
}
STATIC void
xfs_vm_readahead(
struct readahead_control *rac)
{
- iomap_readahead(rac, &xfs_read_iomap_ops);
+ iomap_bio_readahead(rac, &xfs_read_iomap_ops);
}
static int
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index fd3a5922f6c3..4d6e7eb52966 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -112,12 +112,13 @@ static const struct iomap_ops zonefs_write_iomap_ops = {
static int zonefs_read_folio(struct file *unused, struct folio *folio)
{
- return iomap_read_folio(folio, &zonefs_read_iomap_ops);
+ iomap_bio_read_folio(folio, &zonefs_read_iomap_ops);
+ return 0;
}
static void zonefs_readahead(struct readahead_control *rac)
{
- iomap_readahead(rac, &zonefs_read_iomap_ops);
+ iomap_bio_readahead(rac, &zonefs_read_iomap_ops);
}
/*
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 0938c4a57f4c..4a168ebb40f5 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -16,6 +16,7 @@ struct inode;
struct iomap_iter;
struct iomap_dio;
struct iomap_writepage_ctx;
+struct iomap_read_folio_ctx;
struct iov_iter;
struct kiocb;
struct page;
@@ -339,8 +340,10 @@ static inline bool iomap_want_unshare_iter(const struct iomap_iter *iter)
ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
const struct iomap_ops *ops,
const struct iomap_write_ops *write_ops, void *private);
-int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
-void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
+int iomap_read_folio(const struct iomap_ops *ops,
+ struct iomap_read_folio_ctx *ctx);
+void iomap_readahead(const struct iomap_ops *ops,
+ struct iomap_read_folio_ctx *ctx);
bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len);
bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
@@ -478,6 +481,35 @@ void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
int iomap_writeback_folio(struct iomap_writepage_ctx *wpc, struct folio *folio);
int iomap_writepages(struct iomap_writepage_ctx *wpc);
+struct iomap_read_folio_ctx {
+ const struct iomap_read_ops *ops;
+ struct folio *cur_folio;
+ struct readahead_control *rac;
+ void *read_ctx;
+};
+
+struct iomap_read_ops {
+ /*
+ * Read in a folio range.
+ *
+ * The caller is responsible for calling iomap_start_folio_read() and
+ * iomap_finish_folio_read() before and after reading in the folio
+ * range. This should be done even if an error is encountered during the
+ * read.
+ *
+ * Returns 0 on success or a negative error on failure.
+ */
+ int (*read_folio_range)(const struct iomap_iter *iter,
+ struct iomap_read_folio_ctx *ctx, size_t len);
+
+ /*
+ * Submit any pending read requests.
+ *
+ * This is optional.
+ */
+ void (*submit_read)(struct iomap_read_folio_ctx *ctx);
+};
+
/*
* Flags for direct I/O ->end_io:
*/
@@ -543,4 +575,30 @@ int iomap_swapfile_activate(struct swap_info_struct *sis,
extern struct bio_set iomap_ioend_bioset;
+#ifdef CONFIG_BLOCK
+extern const struct iomap_read_ops iomap_bio_read_ops;
+
+static inline void iomap_bio_read_folio(struct folio *folio,
+ const struct iomap_ops *ops)
+{
+ struct iomap_read_folio_ctx ctx = {
+ .ops = &iomap_bio_read_ops,
+ .cur_folio = folio,
+ };
+
+ iomap_read_folio(ops, &ctx);
+}
+
+static inline void iomap_bio_readahead(struct readahead_control *rac,
+ const struct iomap_ops *ops)
+{
+ struct iomap_read_folio_ctx ctx = {
+ .ops = &iomap_bio_read_ops,
+ .rac = rac,
+ };
+
+ iomap_readahead(ops, &ctx);
+}
+#endif /* CONFIG_BLOCK */
+
#endif /* LINUX_IOMAP_H */
--
2.47.3
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 10/15] iomap: add bias for async read requests
2025-09-16 23:44 [PATCH v3 00/15] fuse: use iomap for buffered reads + readahead Joanne Koong
` (8 preceding siblings ...)
2025-09-16 23:44 ` [PATCH v3 09/15] iomap: add caller-provided callbacks for read and readahead Joanne Koong
@ 2025-09-16 23:44 ` Joanne Koong
2025-09-18 22:30 ` Darrick J. Wong
2025-09-22 23:19 ` Joanne Koong
2025-09-16 23:44 ` [PATCH v3 11/15] iomap: move buffered io bio logic into new file Joanne Koong
` (5 subsequent siblings)
15 siblings, 2 replies; 40+ messages in thread
From: Joanne Koong @ 2025-09-16 23:44 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Non-block-based filesystems will be using iomap read/readahead. If they
handle reading in ranges asynchronously and fulfill those read requests
on an ongoing basis (instead of all together at the end), then there is
the possibility that the read on the folio may be prematurely ended if
earlier async requests complete before the later ones have been issued.
For example if there is a large folio and a readahead request for 16
pages in that folio, if doing readahead on those 16 pages is split into
4 async requests and the first request is sent off and then completed
before we have sent off the second request, then when the first request
calls iomap_finish_folio_read(), ifs->read_bytes_pending would be 0,
which would end the read and unlock the folio prematurely.
To mitigate this, a "bias" is added to ifs->read_bytes_pending before
the first range is forwarded to the caller and removed after the last
range has been forwarded.
iomap writeback does this with their async requests as well to prevent
prematurely ending writeback.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 55 ++++++++++++++++++++++++++++++++++++------
1 file changed, 47 insertions(+), 8 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 561378f2b9bb..667a49cb5ae5 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -420,6 +420,38 @@ const struct iomap_read_ops iomap_bio_read_ops = {
};
EXPORT_SYMBOL_GPL(iomap_bio_read_ops);
+/*
+ * Add a bias to ifs->read_bytes_pending to prevent the read on the folio from
+ * being ended prematurely.
+ *
+ * Otherwise, if the ranges are read asynchronously and read requests are
+ * fulfilled on an ongoing basis, there is the possibility that the read on the
+ * folio may be prematurely ended if earlier async requests complete before the
+ * later ones have been issued.
+ */
+static void iomap_read_add_bias(struct folio *folio)
+{
+ iomap_start_folio_read(folio, 1);
+}
+
+static void iomap_read_remove_bias(struct folio *folio, bool *cur_folio_owned)
+{
+ struct iomap_folio_state *ifs = folio->private;
+ bool finished, uptodate;
+
+ if (ifs) {
+ spin_lock_irq(&ifs->state_lock);
+ ifs->read_bytes_pending -= 1;
+ finished = !ifs->read_bytes_pending;
+ if (finished)
+ uptodate = ifs_is_fully_uptodate(folio, ifs);
+ spin_unlock_irq(&ifs->state_lock);
+ if (finished)
+ folio_end_read(folio, uptodate);
+ *cur_folio_owned = true;
+ }
+}
+
static int iomap_read_folio_iter(struct iomap_iter *iter,
struct iomap_read_folio_ctx *ctx, bool *cur_folio_owned)
{
@@ -429,7 +461,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
struct folio *folio = ctx->cur_folio;
size_t poff, plen;
loff_t delta;
- int ret;
+ int ret = 0;
if (iomap->type == IOMAP_INLINE) {
ret = iomap_read_inline_data(iter, folio);
@@ -441,6 +473,8 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
/* zero post-eof blocks as the page may be mapped */
ifs_alloc(iter->inode, folio, iter->flags);
+ iomap_read_add_bias(folio);
+
length = min_t(loff_t, length,
folio_size(folio) - offset_in_folio(folio, pos));
while (length) {
@@ -448,16 +482,18 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
&plen);
delta = pos - iter->pos;
- if (WARN_ON_ONCE(delta + plen > length))
- return -EIO;
+ if (WARN_ON_ONCE(delta + plen > length)) {
+ ret = -EIO;
+ break;
+ }
length -= delta + plen;
ret = iomap_iter_advance(iter, &delta);
if (ret)
- return ret;
+ break;
if (plen == 0)
- return 0;
+ break;
if (iomap_block_needs_zeroing(iter, pos)) {
folio_zero_range(folio, poff, plen);
@@ -466,16 +502,19 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
*cur_folio_owned = true;
ret = ctx->ops->read_folio_range(iter, ctx, plen);
if (ret)
- return ret;
+ break;
}
delta = plen;
ret = iomap_iter_advance(iter, &delta);
if (ret)
- return ret;
+ break;
pos = iter->pos;
}
- return 0;
+
+ iomap_read_remove_bias(folio, cur_folio_owned);
+
+ return ret;
}
int iomap_read_folio(const struct iomap_ops *ops,
--
2.47.3
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 11/15] iomap: move buffered io bio logic into new file
2025-09-16 23:44 [PATCH v3 00/15] fuse: use iomap for buffered reads + readahead Joanne Koong
` (9 preceding siblings ...)
2025-09-16 23:44 ` [PATCH v3 10/15] iomap: add bias for async read requests Joanne Koong
@ 2025-09-16 23:44 ` Joanne Koong
2025-09-17 21:40 ` kernel test robot
` (2 more replies)
2025-09-16 23:44 ` [PATCH v3 12/15] iomap: make iomap_read_folio() a void return Joanne Koong
` (4 subsequent siblings)
15 siblings, 3 replies; 40+ messages in thread
From: Joanne Koong @ 2025-09-16 23:44 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Move bio logic in the buffered io code into its own file and remove
CONFIG_BLOCK gating for iomap read/readahead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/Makefile | 3 +-
fs/iomap/bio.c | 90 ++++++++++++++++++++++++++++++++++++++++++
fs/iomap/buffered-io.c | 90 +-----------------------------------------
fs/iomap/internal.h | 12 ++++++
4 files changed, 105 insertions(+), 90 deletions(-)
create mode 100644 fs/iomap/bio.c
diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
index f7e1c8534c46..a572b8808524 100644
--- a/fs/iomap/Makefile
+++ b/fs/iomap/Makefile
@@ -14,5 +14,6 @@ iomap-y += trace.o \
iomap-$(CONFIG_BLOCK) += direct-io.o \
ioend.o \
fiemap.o \
- seek.o
+ seek.o \
+ bio.o
iomap-$(CONFIG_SWAP) += swapfile.o
diff --git a/fs/iomap/bio.c b/fs/iomap/bio.c
new file mode 100644
index 000000000000..8a51c9d70268
--- /dev/null
+++ b/fs/iomap/bio.c
@@ -0,0 +1,90 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2010 Red Hat, Inc.
+ * Copyright (C) 2016-2023 Christoph Hellwig.
+ */
+#include <linux/iomap.h>
+#include <linux/pagemap.h>
+#include "internal.h"
+#include "trace.h"
+
+static void iomap_read_end_io(struct bio *bio)
+{
+ int error = blk_status_to_errno(bio->bi_status);
+ struct folio_iter fi;
+
+ bio_for_each_folio_all(fi, bio)
+ iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
+ bio_put(bio);
+}
+
+static void iomap_bio_submit_read(struct iomap_read_folio_ctx *ctx)
+{
+ struct bio *bio = ctx->read_ctx;
+
+ if (bio)
+ submit_bio(bio);
+}
+
+static int iomap_bio_read_folio_range(const struct iomap_iter *iter,
+ struct iomap_read_folio_ctx *ctx, size_t plen)
+{
+ struct folio *folio = ctx->cur_folio;
+ const struct iomap *iomap = &iter->iomap;
+ loff_t pos = iter->pos;
+ size_t poff = offset_in_folio(folio, pos);
+ loff_t length = iomap_length(iter);
+ sector_t sector;
+ struct bio *bio = ctx->read_ctx;
+
+ iomap_start_folio_read(folio, plen);
+
+ sector = iomap_sector(iomap, pos);
+ if (!bio || bio_end_sector(bio) != sector ||
+ !bio_add_folio(bio, folio, plen, poff)) {
+ gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
+ gfp_t orig_gfp = gfp;
+ unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
+
+ if (bio)
+ submit_bio(bio);
+
+ if (ctx->rac) /* same as readahead_gfp_mask */
+ gfp |= __GFP_NORETRY | __GFP_NOWARN;
+ bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs), REQ_OP_READ,
+ gfp);
+ /*
+ * If the bio_alloc fails, try it again for a single page to
+ * avoid having to deal with partial page reads. This emulates
+ * what do_mpage_read_folio does.
+ */
+ if (!bio)
+ bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
+ if (ctx->rac)
+ bio->bi_opf |= REQ_RAHEAD;
+ bio->bi_iter.bi_sector = sector;
+ bio->bi_end_io = iomap_read_end_io;
+ bio_add_folio_nofail(bio, folio, plen, poff);
+ ctx->read_ctx = bio;
+ }
+ return 0;
+}
+
+const struct iomap_read_ops iomap_bio_read_ops = {
+ .read_folio_range = iomap_bio_read_folio_range,
+ .submit_read = iomap_bio_submit_read,
+};
+EXPORT_SYMBOL_GPL(iomap_bio_read_ops);
+
+int iomap_bio_read_folio_range_sync(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos, size_t len)
+{
+ const struct iomap *srcmap = iomap_iter_srcmap(iter);
+ struct bio_vec bvec;
+ struct bio bio;
+
+ bio_init(&bio, srcmap->bdev, &bvec, 1, REQ_OP_READ);
+ bio.bi_iter.bi_sector = iomap_sector(srcmap, pos);
+ bio_add_folio_nofail(&bio, folio, len, offset_in_folio(folio, pos));
+ return submit_bio_wait(&bio);
+}
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 667a49cb5ae5..72258b0109ec 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -8,6 +8,7 @@
#include <linux/writeback.h>
#include <linux/swap.h>
#include <linux/migrate.h>
+#include "internal.h"
#include "trace.h"
#include "../internal.h"
@@ -352,74 +353,6 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
}
EXPORT_SYMBOL_GPL(iomap_finish_folio_read);
-#ifdef CONFIG_BLOCK
-static void iomap_read_end_io(struct bio *bio)
-{
- int error = blk_status_to_errno(bio->bi_status);
- struct folio_iter fi;
-
- bio_for_each_folio_all(fi, bio)
- iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
- bio_put(bio);
-}
-
-static void iomap_bio_submit_read(struct iomap_read_folio_ctx *ctx)
-{
- struct bio *bio = ctx->read_ctx;
-
- if (bio)
- submit_bio(bio);
-}
-
-static int iomap_bio_read_folio_range(const struct iomap_iter *iter,
- struct iomap_read_folio_ctx *ctx, size_t plen)
-{
- struct folio *folio = ctx->cur_folio;
- const struct iomap *iomap = &iter->iomap;
- loff_t pos = iter->pos;
- size_t poff = offset_in_folio(folio, pos);
- loff_t length = iomap_length(iter);
- sector_t sector;
- struct bio *bio = ctx->read_ctx;
-
- iomap_start_folio_read(folio, plen);
-
- sector = iomap_sector(iomap, pos);
- if (!bio || bio_end_sector(bio) != sector ||
- !bio_add_folio(bio, folio, plen, poff)) {
- gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
- gfp_t orig_gfp = gfp;
- unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
-
- iomap_bio_submit_read(ctx);
-
- if (ctx->rac) /* same as readahead_gfp_mask */
- gfp |= __GFP_NORETRY | __GFP_NOWARN;
- bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs), REQ_OP_READ,
- gfp);
- /*
- * If the bio_alloc fails, try it again for a single page to
- * avoid having to deal with partial page reads. This emulates
- * what do_mpage_read_folio does.
- */
- if (!bio)
- bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
- if (ctx->rac)
- bio->bi_opf |= REQ_RAHEAD;
- bio->bi_iter.bi_sector = sector;
- bio->bi_end_io = iomap_read_end_io;
- bio_add_folio_nofail(bio, folio, plen, poff);
- ctx->read_ctx = bio;
- }
- return 0;
-}
-
-const struct iomap_read_ops iomap_bio_read_ops = {
- .read_folio_range = iomap_bio_read_folio_range,
- .submit_read = iomap_bio_submit_read,
-};
-EXPORT_SYMBOL_GPL(iomap_bio_read_ops);
-
/*
* Add a bias to ifs->read_bytes_pending to prevent the read on the folio from
* being ended prematurely.
@@ -623,27 +556,6 @@ void iomap_readahead(const struct iomap_ops *ops,
}
EXPORT_SYMBOL_GPL(iomap_readahead);
-static int iomap_bio_read_folio_range_sync(const struct iomap_iter *iter,
- struct folio *folio, loff_t pos, size_t len)
-{
- const struct iomap *srcmap = iomap_iter_srcmap(iter);
- struct bio_vec bvec;
- struct bio bio;
-
- bio_init(&bio, srcmap->bdev, &bvec, 1, REQ_OP_READ);
- bio.bi_iter.bi_sector = iomap_sector(srcmap, pos);
- bio_add_folio_nofail(&bio, folio, len, offset_in_folio(folio, pos));
- return submit_bio_wait(&bio);
-}
-#else
-static int iomap_bio_read_folio_range_sync(const struct iomap_iter *iter,
- struct folio *folio, loff_t pos, size_t len)
-{
- WARN_ON_ONCE(1);
- return -EIO;
-}
-#endif /* CONFIG_BLOCK */
-
/*
* iomap_is_partially_uptodate checks whether blocks within a folio are
* uptodate or not.
diff --git a/fs/iomap/internal.h b/fs/iomap/internal.h
index d05cb3aed96e..7ab1033ab4b7 100644
--- a/fs/iomap/internal.h
+++ b/fs/iomap/internal.h
@@ -6,4 +6,16 @@
u32 iomap_finish_ioend_direct(struct iomap_ioend *ioend);
+#ifdef CONFIG_BLOCK
+int iomap_bio_read_folio_range_sync(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos, size_t len);
+#else
+int iomap_bio_read_folio_range_sync(const struct iomap_iter *iter,
+ struct folio *folio, loff_t pos, size_t len)
+{
+ WARN_ON_ONCE(1);
+ return -EIO;
+}
+#endif /* CONFIG_BLOCK */
+
#endif /* _IOMAP_INTERNAL_H */
--
2.47.3
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 12/15] iomap: make iomap_read_folio() a void return
2025-09-16 23:44 [PATCH v3 00/15] fuse: use iomap for buffered reads + readahead Joanne Koong
` (10 preceding siblings ...)
2025-09-16 23:44 ` [PATCH v3 11/15] iomap: move buffered io bio logic into new file Joanne Koong
@ 2025-09-16 23:44 ` Joanne Koong
2025-09-18 21:55 ` Darrick J. Wong
2025-09-16 23:44 ` [PATCH v3 13/15] fuse: use iomap for read_folio Joanne Koong
` (3 subsequent siblings)
15 siblings, 1 reply; 40+ messages in thread
From: Joanne Koong @ 2025-09-16 23:44 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
No errors are propagated in iomap_read_folio(). Change
iomap_read_folio() to a void return to make this clearer to callers.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/iomap/buffered-io.c | 9 +--------
include/linux/iomap.h | 2 +-
2 files changed, 2 insertions(+), 9 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 72258b0109ec..be535bd3aeca 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -450,7 +450,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
return ret;
}
-int iomap_read_folio(const struct iomap_ops *ops,
+void iomap_read_folio(const struct iomap_ops *ops,
struct iomap_read_folio_ctx *ctx)
{
struct folio *folio = ctx->cur_folio;
@@ -477,13 +477,6 @@ int iomap_read_folio(const struct iomap_ops *ops,
if (!cur_folio_owned)
folio_unlock(folio);
-
- /*
- * Just like mpage_readahead and block_read_full_folio, we always
- * return 0 and just set the folio error flag on errors. This
- * should be cleaned up throughout the stack eventually.
- */
- return 0;
}
EXPORT_SYMBOL_GPL(iomap_read_folio);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 4a168ebb40f5..fa55ec611fff 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -340,7 +340,7 @@ static inline bool iomap_want_unshare_iter(const struct iomap_iter *iter)
ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
const struct iomap_ops *ops,
const struct iomap_write_ops *write_ops, void *private);
-int iomap_read_folio(const struct iomap_ops *ops,
+void iomap_read_folio(const struct iomap_ops *ops,
struct iomap_read_folio_ctx *ctx);
void iomap_readahead(const struct iomap_ops *ops,
struct iomap_read_folio_ctx *ctx);
--
2.47.3
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 13/15] fuse: use iomap for read_folio
2025-09-16 23:44 [PATCH v3 00/15] fuse: use iomap for buffered reads + readahead Joanne Koong
` (11 preceding siblings ...)
2025-09-16 23:44 ` [PATCH v3 12/15] iomap: make iomap_read_folio() a void return Joanne Koong
@ 2025-09-16 23:44 ` Joanne Koong
2025-09-16 23:44 ` [PATCH v3 14/15] fuse: use iomap for readahead Joanne Koong
` (2 subsequent siblings)
15 siblings, 0 replies; 40+ messages in thread
From: Joanne Koong @ 2025-09-16 23:44 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Read folio data into the page cache using iomap. This gives us granular
uptodate tracking for large folios, which optimizes how much data needs
to be read in. If some portions of the folio are already uptodate (eg
through a prior write), we only need to read in the non-uptodate
portions.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/fuse/file.c | 81 +++++++++++++++++++++++++++++++++++---------------
1 file changed, 57 insertions(+), 24 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 4adcf09d4b01..4f27a3b0c20a 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -828,23 +828,70 @@ static int fuse_do_readfolio(struct file *file, struct folio *folio,
return 0;
}
+static int fuse_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
+ unsigned int flags, struct iomap *iomap,
+ struct iomap *srcmap)
+{
+ iomap->type = IOMAP_MAPPED;
+ iomap->length = length;
+ iomap->offset = offset;
+ return 0;
+}
+
+static const struct iomap_ops fuse_iomap_ops = {
+ .iomap_begin = fuse_iomap_begin,
+};
+
+struct fuse_fill_read_data {
+ struct file *file;
+};
+
+static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
+ struct iomap_read_folio_ctx *ctx,
+ size_t len)
+{
+ struct fuse_fill_read_data *data = ctx->read_ctx;
+ struct folio *folio = ctx->cur_folio;
+ loff_t pos = iter->pos;
+ size_t off = offset_in_folio(folio, pos);
+ struct file *file = data->file;
+ int ret;
+
+ /*
+ * for non-readahead read requests, do reads synchronously since
+ * it's not guaranteed that the server can handle out-of-order reads
+ */
+ iomap_start_folio_read(folio, len);
+ ret = fuse_do_readfolio(file, folio, off, len);
+ iomap_finish_folio_read(folio, off, len, ret);
+ return ret;
+}
+
+static const struct iomap_read_ops fuse_iomap_read_ops = {
+ .read_folio_range = fuse_iomap_read_folio_range_async,
+};
+
static int fuse_read_folio(struct file *file, struct folio *folio)
{
struct inode *inode = folio->mapping->host;
- int err;
+ struct fuse_fill_read_data data = {
+ .file = file,
+ };
+ struct iomap_read_folio_ctx ctx = {
+ .cur_folio = folio,
+ .ops = &fuse_iomap_read_ops,
+ .read_ctx = &data,
- err = -EIO;
- if (fuse_is_bad(inode))
- goto out;
+ };
- err = fuse_do_readfolio(file, folio, 0, folio_size(folio));
- if (!err)
- folio_mark_uptodate(folio);
+ if (fuse_is_bad(inode)) {
+ folio_unlock(folio);
+ return -EIO;
+ }
+ iomap_read_folio(&fuse_iomap_ops, &ctx);
fuse_invalidate_atime(inode);
- out:
- folio_unlock(folio);
- return err;
+ return 0;
}
static int fuse_iomap_read_folio_range(const struct iomap_iter *iter,
@@ -1394,20 +1441,6 @@ static const struct iomap_write_ops fuse_iomap_write_ops = {
.read_folio_range = fuse_iomap_read_folio_range,
};
-static int fuse_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
- unsigned int flags, struct iomap *iomap,
- struct iomap *srcmap)
-{
- iomap->type = IOMAP_MAPPED;
- iomap->length = length;
- iomap->offset = offset;
- return 0;
-}
-
-static const struct iomap_ops fuse_iomap_ops = {
- .iomap_begin = fuse_iomap_begin,
-};
-
static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
--
2.47.3
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 14/15] fuse: use iomap for readahead
2025-09-16 23:44 [PATCH v3 00/15] fuse: use iomap for buffered reads + readahead Joanne Koong
` (12 preceding siblings ...)
2025-09-16 23:44 ` [PATCH v3 13/15] fuse: use iomap for read_folio Joanne Koong
@ 2025-09-16 23:44 ` Joanne Koong
2025-09-18 22:35 ` Darrick J. Wong
2025-09-16 23:44 ` [PATCH v3 15/15] fuse: remove fc->blkbits workaround for partial writes Joanne Koong
2025-09-17 8:30 ` [syzbot ci] Re: fuse: use iomap for buffered reads + readahead syzbot ci
15 siblings, 1 reply; 40+ messages in thread
From: Joanne Koong @ 2025-09-16 23:44 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Do readahead in fuse using iomap. This gives us granular uptodate
tracking for large folios, which optimizes how much data needs to be
read in. If some portions of the folio are already uptodate (eg through
a prior write), we only need to read in the non-uptodate portions.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fuse/file.c | 220 ++++++++++++++++++++++++++++---------------------
1 file changed, 124 insertions(+), 96 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 4f27a3b0c20a..db0b1f20fee4 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -844,8 +844,65 @@ static const struct iomap_ops fuse_iomap_ops = {
struct fuse_fill_read_data {
struct file *file;
+
+ /* Fields below are used if sending the read request asynchronously */
+ struct fuse_conn *fc;
+ struct fuse_io_args *ia;
+ unsigned int nr_bytes;
};
+/* forward declarations */
+static bool fuse_folios_need_send(struct fuse_conn *fc, loff_t pos,
+ unsigned len, struct fuse_args_pages *ap,
+ unsigned cur_bytes, bool write);
+static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
+ unsigned int count, bool async);
+
+static int fuse_handle_readahead(struct folio *folio,
+ struct readahead_control *rac,
+ struct fuse_fill_read_data *data, loff_t pos,
+ size_t len)
+{
+ struct fuse_io_args *ia = data->ia;
+ size_t off = offset_in_folio(folio, pos);
+ struct fuse_conn *fc = data->fc;
+ struct fuse_args_pages *ap;
+ unsigned int nr_pages;
+
+ if (ia && fuse_folios_need_send(fc, pos, len, &ia->ap, data->nr_bytes,
+ false)) {
+ fuse_send_readpages(ia, data->file, data->nr_bytes,
+ fc->async_read);
+ data->nr_bytes = 0;
+ data->ia = NULL;
+ ia = NULL;
+ }
+ if (!ia) {
+ if (fc->num_background >= fc->congestion_threshold &&
+ rac->ra->async_size >= readahead_count(rac))
+ /*
+ * Congested and only async pages left, so skip the
+ * rest.
+ */
+ return -EAGAIN;
+
+ nr_pages = min(fc->max_pages, readahead_count(rac));
+ data->ia = fuse_io_alloc(NULL, nr_pages);
+ if (!data->ia)
+ return -ENOMEM;
+ ia = data->ia;
+ }
+ folio_get(folio);
+ ap = &ia->ap;
+ ap->folios[ap->num_folios] = folio;
+ ap->descs[ap->num_folios].offset = off;
+ ap->descs[ap->num_folios].length = len;
+ data->nr_bytes += len;
+ ap->num_folios++;
+
+ return 0;
+}
+
static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
struct iomap_read_folio_ctx *ctx,
size_t len)
@@ -857,18 +914,40 @@ static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
struct file *file = data->file;
int ret;
- /*
- * for non-readahead read requests, do reads synchronously since
- * it's not guaranteed that the server can handle out-of-order reads
- */
iomap_start_folio_read(folio, len);
- ret = fuse_do_readfolio(file, folio, off, len);
- iomap_finish_folio_read(folio, off, len, ret);
+ if (ctx->rac) {
+ ret = fuse_handle_readahead(folio, ctx->rac, data, pos, len);
+ /*
+ * If fuse_handle_readahead was successful, fuse_readpages_end
+ * will do the iomap_finish_folio_read, else we need to call it
+ * here
+ */
+ if (ret)
+ iomap_finish_folio_read(folio, off, len, ret);
+ } else {
+ /*
+ * for non-readahead read requests, do reads synchronously
+ * since it's not guaranteed that the server can handle
+ * out-of-order reads
+ */
+ ret = fuse_do_readfolio(file, folio, off, len);
+ iomap_finish_folio_read(folio, off, len, ret);
+ }
return ret;
}
+static void fuse_iomap_read_submit(struct iomap_read_folio_ctx *ctx)
+{
+ struct fuse_fill_read_data *data = ctx->read_ctx;
+
+ if (data->ia)
+ fuse_send_readpages(data->ia, data->file, data->nr_bytes,
+ data->fc->async_read);
+}
+
static const struct iomap_read_ops fuse_iomap_read_ops = {
.read_folio_range = fuse_iomap_read_folio_range_async,
+ .submit_read = fuse_iomap_read_submit,
};
static int fuse_read_folio(struct file *file, struct folio *folio)
@@ -930,7 +1009,8 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
}
for (i = 0; i < ap->num_folios; i++) {
- folio_end_read(ap->folios[i], !err);
+ iomap_finish_folio_read(ap->folios[i], ap->descs[i].offset,
+ ap->descs[i].length, err);
folio_put(ap->folios[i]);
}
if (ia->ff)
@@ -940,7 +1020,7 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
}
static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
- unsigned int count)
+ unsigned int count, bool async)
{
struct fuse_file *ff = file->private_data;
struct fuse_mount *fm = ff->fm;
@@ -962,7 +1042,7 @@ static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
fuse_read_args_fill(ia, file, pos, count, FUSE_READ);
ia->read.attr_ver = fuse_get_attr_version(fm->fc);
- if (fm->fc->async_read) {
+ if (async) {
ia->ff = fuse_file_get(ff);
ap->args.end = fuse_readpages_end;
err = fuse_simple_background(fm, &ap->args, GFP_KERNEL);
@@ -979,81 +1059,20 @@ static void fuse_readahead(struct readahead_control *rac)
{
struct inode *inode = rac->mapping->host;
struct fuse_conn *fc = get_fuse_conn(inode);
- unsigned int max_pages, nr_pages;
- struct folio *folio = NULL;
+ struct fuse_fill_read_data data = {
+ .file = rac->file,
+ .fc = fc,
+ };
+ struct iomap_read_folio_ctx ctx = {
+ .ops = &fuse_iomap_read_ops,
+ .rac = rac,
+ .read_ctx = &data
+ };
if (fuse_is_bad(inode))
return;
- max_pages = min_t(unsigned int, fc->max_pages,
- fc->max_read / PAGE_SIZE);
-
- /*
- * This is only accurate the first time through, since readahead_folio()
- * doesn't update readahead_count() from the previous folio until the
- * next call. Grab nr_pages here so we know how many pages we're going
- * to have to process. This means that we will exit here with
- * readahead_count() == folio_nr_pages(last_folio), but we will have
- * consumed all of the folios, and read_pages() will call
- * readahead_folio() again which will clean up the rac.
- */
- nr_pages = readahead_count(rac);
-
- while (nr_pages) {
- struct fuse_io_args *ia;
- struct fuse_args_pages *ap;
- unsigned cur_pages = min(max_pages, nr_pages);
- unsigned int pages = 0;
-
- if (fc->num_background >= fc->congestion_threshold &&
- rac->ra->async_size >= readahead_count(rac))
- /*
- * Congested and only async pages left, so skip the
- * rest.
- */
- break;
-
- ia = fuse_io_alloc(NULL, cur_pages);
- if (!ia)
- break;
- ap = &ia->ap;
-
- while (pages < cur_pages) {
- unsigned int folio_pages;
-
- /*
- * This returns a folio with a ref held on it.
- * The ref needs to be held until the request is
- * completed, since the splice case (see
- * fuse_try_move_page()) drops the ref after it's
- * replaced in the page cache.
- */
- if (!folio)
- folio = __readahead_folio(rac);
-
- folio_pages = folio_nr_pages(folio);
- if (folio_pages > cur_pages - pages) {
- /*
- * Large folios belonging to fuse will never
- * have more pages than max_pages.
- */
- WARN_ON(!pages);
- break;
- }
-
- ap->folios[ap->num_folios] = folio;
- ap->descs[ap->num_folios].length = folio_size(folio);
- ap->num_folios++;
- pages += folio_pages;
- folio = NULL;
- }
- fuse_send_readpages(ia, rac->file, pages << PAGE_SHIFT);
- nr_pages -= pages;
- }
- if (folio) {
- folio_end_read(folio, false);
- folio_put(folio);
- }
+ iomap_readahead(&fuse_iomap_ops, &ctx);
}
static ssize_t fuse_cache_read_iter(struct kiocb *iocb, struct iov_iter *to)
@@ -2084,7 +2103,7 @@ struct fuse_fill_wb_data {
struct fuse_file *ff;
unsigned int max_folios;
/*
- * nr_bytes won't overflow since fuse_writepage_need_send() caps
+ * nr_bytes won't overflow since fuse_folios_need_send() caps
* wb requests to never exceed fc->max_pages (which has an upper bound
* of U16_MAX).
*/
@@ -2129,14 +2148,15 @@ static void fuse_writepages_send(struct inode *inode,
spin_unlock(&fi->lock);
}
-static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
- unsigned len, struct fuse_args_pages *ap,
- struct fuse_fill_wb_data *data)
+static bool fuse_folios_need_send(struct fuse_conn *fc, loff_t pos,
+ unsigned len, struct fuse_args_pages *ap,
+ unsigned cur_bytes, bool write)
{
struct folio *prev_folio;
struct fuse_folio_desc prev_desc;
- unsigned bytes = data->nr_bytes + len;
+ unsigned bytes = cur_bytes + len;
loff_t prev_pos;
+ size_t max_bytes = write ? fc->max_write : fc->max_read;
WARN_ON(!ap->num_folios);
@@ -2144,8 +2164,7 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
if ((bytes + PAGE_SIZE - 1) >> PAGE_SHIFT > fc->max_pages)
return true;
- /* Reached max write bytes */
- if (bytes > fc->max_write)
+ if (bytes > max_bytes)
return true;
/* Discontinuity */
@@ -2155,11 +2174,6 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
if (prev_pos != pos)
return true;
- /* Need to grow the pages array? If so, did the expansion fail? */
- if (ap->num_folios == data->max_folios &&
- !fuse_pages_realloc(data, fc->max_pages))
- return true;
-
return false;
}
@@ -2183,10 +2197,24 @@ static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
return -EIO;
}
- if (wpa && fuse_writepage_need_send(fc, pos, len, ap, data)) {
- fuse_writepages_send(inode, data);
- data->wpa = NULL;
- data->nr_bytes = 0;
+ if (wpa) {
+ bool send = fuse_folios_need_send(fc, pos, len, ap,
+ data->nr_bytes, true);
+
+ if (!send) {
+ /*
+ * Need to grow the pages array? If so, did the
+ * expansion fail?
+ */
+ send = (ap->num_folios == data->max_folios) &&
+ !fuse_pages_realloc(data, fc->max_pages);
+ }
+
+ if (send) {
+ fuse_writepages_send(inode, data);
+ data->wpa = NULL;
+ data->nr_bytes = 0;
+ }
}
if (data->wpa == NULL) {
--
2.47.3
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v3 15/15] fuse: remove fc->blkbits workaround for partial writes
2025-09-16 23:44 [PATCH v3 00/15] fuse: use iomap for buffered reads + readahead Joanne Koong
` (13 preceding siblings ...)
2025-09-16 23:44 ` [PATCH v3 14/15] fuse: use iomap for readahead Joanne Koong
@ 2025-09-16 23:44 ` Joanne Koong
2025-09-18 22:35 ` Darrick J. Wong
2025-09-17 8:30 ` [syzbot ci] Re: fuse: use iomap for buffered reads + readahead syzbot ci
15 siblings, 1 reply; 40+ messages in thread
From: Joanne Koong @ 2025-09-16 23:44 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
Now that fuse is integrated with iomap for read/readahead, we can remove
the workaround that was added in commit bd24d2108e9c ("fuse: fix fuseblk
i_blkbits for iomap partial writes"), which was previously needed to
avoid a race condition where an iomap partial write may be overwritten
by a read if blocksize < PAGE_SIZE. Now that fuse does iomap
read/readahead, this is protected against since there is granular
uptodate tracking of blocks, which means this workaround can be removed.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fuse/dir.c | 2 +-
fs/fuse/fuse_i.h | 8 --------
fs/fuse/inode.c | 13 +------------
3 files changed, 2 insertions(+), 21 deletions(-)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 5c569c3cb53f..ebee7e0b1cd3 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1199,7 +1199,7 @@ static void fuse_fillattr(struct mnt_idmap *idmap, struct inode *inode,
if (attr->blksize != 0)
blkbits = ilog2(attr->blksize);
else
- blkbits = fc->blkbits;
+ blkbits = inode->i_sb->s_blocksize_bits;
stat->blksize = 1 << blkbits;
}
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index cc428d04be3e..1647eb7ca6fa 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -975,14 +975,6 @@ struct fuse_conn {
/* Request timeout (in jiffies). 0 = no timeout */
unsigned int req_timeout;
} timeout;
-
- /*
- * This is a workaround until fuse uses iomap for reads.
- * For fuseblk servers, this represents the blocksize passed in at
- * mount time and for regular fuse servers, this is equivalent to
- * inode->i_blkbits.
- */
- u8 blkbits;
};
/*
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index fdecd5a90dee..5899a47faaef 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -292,7 +292,7 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
if (attr->blksize)
fi->cached_i_blkbits = ilog2(attr->blksize);
else
- fi->cached_i_blkbits = fc->blkbits;
+ fi->cached_i_blkbits = inode->i_sb->s_blocksize_bits;
/*
* Don't set the sticky bit in i_mode, unless we want the VFS
@@ -1810,21 +1810,10 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
err = -EINVAL;
if (!sb_set_blocksize(sb, ctx->blksize))
goto err;
- /*
- * This is a workaround until fuse hooks into iomap for reads.
- * Use PAGE_SIZE for the blocksize else if the writeback cache
- * is enabled, buffered writes go through iomap and a read may
- * overwrite partially written data if blocksize < PAGE_SIZE
- */
- fc->blkbits = sb->s_blocksize_bits;
- if (ctx->blksize != PAGE_SIZE &&
- !sb_set_blocksize(sb, PAGE_SIZE))
- goto err;
#endif
} else {
sb->s_blocksize = PAGE_SIZE;
sb->s_blocksize_bits = PAGE_SHIFT;
- fc->blkbits = sb->s_blocksize_bits;
}
sb->s_subtype = ctx->subtype;
--
2.47.3
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [syzbot ci] Re: fuse: use iomap for buffered reads + readahead
2025-09-16 23:44 [PATCH v3 00/15] fuse: use iomap for buffered reads + readahead Joanne Koong
` (14 preceding siblings ...)
2025-09-16 23:44 ` [PATCH v3 15/15] fuse: remove fc->blkbits workaround for partial writes Joanne Koong
@ 2025-09-17 8:30 ` syzbot ci
2025-09-17 19:59 ` Joanne Koong
15 siblings, 1 reply; 40+ messages in thread
From: syzbot ci @ 2025-09-17 8:30 UTC (permalink / raw)
To: brauner, djwong, gfs2, hch, hch, hsiangkao, joannelkoong,
kernel-team, linux-block, linux-doc, linux-fsdevel, linux-xfs,
miklos
Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v3] fuse: use iomap for buffered reads + readahead
https://lore.kernel.org/all/20250916234425.1274735-1-joannelkoong@gmail.com
* [PATCH v3 01/15] iomap: move bio read logic into helper function
* [PATCH v3 02/15] iomap: move read/readahead bio submission logic into helper function
* [PATCH v3 03/15] iomap: store read/readahead bio generically
* [PATCH v3 04/15] iomap: iterate over entire folio in iomap_readpage_iter()
* [PATCH v3 05/15] iomap: rename iomap_readpage_iter() to iomap_read_folio_iter()
* [PATCH v3 06/15] iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx
* [PATCH v3 07/15] iomap: track read/readahead folio ownership internally
* [PATCH v3 08/15] iomap: add public start/finish folio read helpers
* [PATCH v3 09/15] iomap: add caller-provided callbacks for read and readahead
* [PATCH v3 10/15] iomap: add bias for async read requests
* [PATCH v3 11/15] iomap: move buffered io bio logic into new file
* [PATCH v3 12/15] iomap: make iomap_read_folio() a void return
* [PATCH v3 13/15] fuse: use iomap for read_folio
* [PATCH v3 14/15] fuse: use iomap for readahead
* [PATCH v3 15/15] fuse: remove fc->blkbits workaround for partial writes
and found the following issues:
* WARNING in iomap_iter_advance
* WARNING in iomap_readahead
* kernel BUG in folio_end_read
Full report is available here:
https://ci.syzbot.org/series/6845596a-1ec9-4396-b9c4-48bddc606bef
***
WARNING in iomap_iter_advance
tree: torvalds
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base: f83ec76bf285bea5727f478a68b894f5543ca76e
arch: amd64
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config: https://ci.syzbot.org/builds/4ec82322-3509-4b63-9881-f639f1b61f20/config
C repro: https://ci.syzbot.org/findings/73807f10-d659-4291-84af-f982be7cabad/c_repro
syz repro: https://ci.syzbot.org/findings/73807f10-d659-4291-84af-f982be7cabad/syz_repro
loop0: detected capacity change from 0 to 16
erofs (device loop0): mounted with root inode @ nid 36.
------------[ cut here ]------------
WARNING: CPU: 1 PID: 5996 at fs/iomap/iter.c:22 iomap_iter_advance+0x2c1/0x2f0 fs/iomap/iter.c:22
Modules linked in:
CPU: 1 UID: 0 PID: 5996 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:iomap_iter_advance+0x2c1/0x2f0 fs/iomap/iter.c:22
Code: 74 08 4c 89 ef e8 2f f1 ce ff 49 89 5d 00 31 c0 48 83 c4 50 5b 41 5c 41 5d 41 5e 41 5f 5d e9 46 0c 29 09 cc e8 80 7d 6b ff 90 <0f> 0b 90 b8 fb ff ff ff eb dc 44 89 f9 80 e1 07 fe c1 38 c1 0f 8c
RSP: 0018:ffffc90002a6f118 EFLAGS: 00010293
RAX: ffffffff82543fe0 RBX: 0000000000001000 RCX: ffff88801ec80000
RDX: 0000000000000000 RSI: 0000000000000f8c RDI: 0000000000001000
RBP: 0000000000000074 R08: ffffea00043daf87 R09: 1ffffd400087b5f0
R10: dffffc0000000000 R11: fffff9400087b5f1 R12: 0000000000001000
R13: 0000000000000f8c R14: ffffc90002a6f340 R15: 0000000000000f8c
FS: 000055556dfe9500(0000) GS:ffff8881a3c15000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b31563fff CR3: 00000000262da000 CR4: 00000000000006f0
Call Trace:
<TASK>
iomap_read_folio_iter+0x614/0xb70 fs/iomap/buffered-io.c:424
iomap_read_folio+0x2e7/0x570 fs/iomap/buffered-io.c:472
iomap_bio_read_folio include/linux/iomap.h:589 [inline]
erofs_read_folio+0x13c/0x2e0 fs/erofs/data.c:374
filemap_read_folio+0x117/0x380 mm/filemap.c:2413
do_read_cache_folio+0x350/0x590 mm/filemap.c:3957
read_mapping_folio include/linux/pagemap.h:991 [inline]
erofs_bread+0x46f/0x7f0 fs/erofs/data.c:40
erofs_find_target_block fs/erofs/namei.c:103 [inline]
erofs_namei+0x36b/0x1030 fs/erofs/namei.c:177
erofs_lookup+0x148/0x340 fs/erofs/namei.c:206
lookup_open fs/namei.c:3686 [inline]
open_last_lookups fs/namei.c:3807 [inline]
path_openat+0x1101/0x3830 fs/namei.c:4043
do_filp_open+0x1fa/0x410 fs/namei.c:4073
do_sys_openat2+0x121/0x1c0 fs/open.c:1435
do_sys_open fs/open.c:1450 [inline]
__do_sys_openat fs/open.c:1466 [inline]
__se_sys_openat fs/open.c:1461 [inline]
__x64_sys_openat+0x138/0x170 fs/open.c:1461
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f86ddb8eba9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff12610128 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 00007f86dddd5fa0 RCX: 00007f86ddb8eba9
RDX: 0000000000043142 RSI: 0000200000000040 RDI: ffffffffffffff9c
RBP: 00007f86ddc11e19 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f86dddd5fa0 R14: 00007f86dddd5fa0 R15: 0000000000000004
</TASK>
***
WARNING in iomap_readahead
tree: torvalds
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base: f83ec76bf285bea5727f478a68b894f5543ca76e
arch: amd64
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config: https://ci.syzbot.org/builds/4ec82322-3509-4b63-9881-f639f1b61f20/config
C repro: https://ci.syzbot.org/findings/6ee3f719-e6ab-468c-92dc-e197c12d8171/c_repro
syz repro: https://ci.syzbot.org/findings/6ee3f719-e6ab-468c-92dc-e197c12d8171/syz_repro
XFS: noikeep mount option is deprecated.
XFS (loop0): Mounting V5 Filesystem bfdc47fc-10d8-4eed-a562-11a831b3f791
XFS (loop0): Ending clean mount
------------[ cut here ]------------
WARNING: CPU: 1 PID: 5993 at fs/iomap/buffered-io.c:497 iomap_readahead_iter fs/iomap/buffered-io.c:497 [inline]
WARNING: CPU: 1 PID: 5993 at fs/iomap/buffered-io.c:497 iomap_readahead+0x5ed/0xa40 fs/iomap/buffered-io.c:541
Modules linked in:
CPU: 1 UID: 0 PID: 5993 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:iomap_readahead_iter fs/iomap/buffered-io.c:497 [inline]
RIP: 0010:iomap_readahead+0x5ed/0xa40 fs/iomap/buffered-io.c:541
Code: a5 ff eb 35 e8 14 55 6b ff 48 8b 5c 24 20 48 8b 44 24 28 42 80 3c 28 00 74 08 48 89 df e8 8b c8 ce ff 48 c7 03 00 00 00 00 90 <0f> 0b 90 bb ea ff ff ff eb 53 e8 e4 54 6b ff 48 8b 44 24 28 42 80
RSP: 0018:ffffc90003096260 EFLAGS: 00010246
RAX: 1ffff92000612c8d RBX: ffffc90003096468 RCX: ffff888024038000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
RBP: ffffc90003096430 R08: ffffffff8fa3a637 R09: 1ffffffff1f474c6
R10: dffffc0000000000 R11: fffffbfff1f474c7 R12: 0000000000000001
R13: dffffc0000000000 R14: 0000000000000001 R15: 0000000000000001
FS: 00005555761de500(0000) GS:ffff8881a3c15000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b30f63fff CR3: 000000010defc000 CR4: 00000000000006f0
Call Trace:
<TASK>
iomap_bio_readahead include/linux/iomap.h:600 [inline]
xfs_vm_readahead+0x9f/0xe0 fs/xfs/xfs_aops.c:753
read_pages+0x17a/0x580 mm/readahead.c:160
page_cache_ra_order+0x8ca/0xd40 mm/readahead.c:512
filemap_get_pages+0x43c/0x1ea0 mm/filemap.c:2603
filemap_read+0x3f6/0x11a0 mm/filemap.c:2712
xfs_file_buffered_read+0x1a2/0x350 fs/xfs/xfs_file.c:292
xfs_file_read_iter+0x280/0x510 fs/xfs/xfs_file.c:317
__kernel_read+0x4cf/0x960 fs/read_write.c:530
integrity_kernel_read+0x89/0xd0 security/integrity/iint.c:28
ima_calc_file_hash_tfm security/integrity/ima/ima_crypto.c:480 [inline]
ima_calc_file_shash security/integrity/ima/ima_crypto.c:511 [inline]
ima_calc_file_hash+0x85e/0x16f0 security/integrity/ima/ima_crypto.c:568
ima_collect_measurement+0x428/0x8e0 security/integrity/ima/ima_api.c:293
process_measurement+0x1121/0x1a40 security/integrity/ima/ima_main.c:405
ima_file_check+0xd7/0x120 security/integrity/ima/ima_main.c:633
security_file_post_open+0xbb/0x290 security/security.c:3160
do_open fs/namei.c:3889 [inline]
path_openat+0x2f26/0x3830 fs/namei.c:4046
do_filp_open+0x1fa/0x410 fs/namei.c:4073
do_sys_openat2+0x121/0x1c0 fs/open.c:1435
do_sys_open fs/open.c:1450 [inline]
__do_sys_openat fs/open.c:1466 [inline]
__se_sys_openat fs/open.c:1461 [inline]
__x64_sys_openat+0x138/0x170 fs/open.c:1461
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fe33b38eba9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff93f3bee8 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 00007fe33b5d5fa0 RCX: 00007fe33b38eba9
RDX: 0000000000183042 RSI: 0000200000000740 RDI: ffffffffffffff9c
RBP: 00007fe33b411e19 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000015 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fe33b5d5fa0 R14: 00007fe33b5d5fa0 R15: 0000000000000004
</TASK>
***
kernel BUG in folio_end_read
tree: torvalds
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base: f83ec76bf285bea5727f478a68b894f5543ca76e
arch: amd64
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config: https://ci.syzbot.org/builds/4ec82322-3509-4b63-9881-f639f1b61f20/config
C repro: https://ci.syzbot.org/findings/caca928c-4ba3-49fe-b564-cbb9aeab1706/c_repro
syz repro: https://ci.syzbot.org/findings/caca928c-4ba3-49fe-b564-cbb9aeab1706/syz_repro
handle_mm_fault+0x40a/0x8e0 mm/memory.c:6364
do_user_addr_fault+0x764/0x1390 arch/x86/mm/fault.c:1387
handle_page_fault arch/x86/mm/fault.c:1476 [inline]
exc_page_fault+0x76/0xf0 arch/x86/mm/fault.c:1532
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
------------[ cut here ]------------
kernel BUG at mm/filemap.c:1525!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 5995 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:folio_end_read+0x22e/0x230 mm/filemap.c:1525
Code: 24 c9 ff 48 89 df 48 c7 c6 20 3e 94 8b e8 5a 63 31 ff 90 0f 0b e8 62 24 c9 ff 48 89 df 48 c7 c6 80 36 94 8b e8 43 63 31 ff 90 <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa
RSP: 0018:ffffc9000336eec8 EFLAGS: 00010246
RAX: 3b63e53e8591b500 RBX: ffffea0000c47280 RCX: 0000000000000000
RDX: 0000000000000007 RSI: ffffffff8d9ba482 RDI: 00000000ffffffff
RBP: 0000000000000001 R08: ffffffff8fa3a637 R09: 1ffffffff1f474c6
R10: dffffc0000000000 R11: fffffbfff1f474c7 R12: 1ffffd4000188e51
R13: 1ffffd4000188e50 R14: ffffea0000c47288 R15: 0000000000000008
FS: 00005555827bc500(0000) GS:ffff8880b8615000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b30d63fff CR3: 0000000027336000 CR4: 00000000000006f0
Call Trace:
<TASK>
iomap_read_remove_bias fs/iomap/buffered-io.c:383 [inline]
iomap_read_folio_iter+0x9af/0xb70 fs/iomap/buffered-io.c:448
iomap_readahead_iter fs/iomap/buffered-io.c:500 [inline]
iomap_readahead+0x632/0xa40 fs/iomap/buffered-io.c:541
iomap_bio_readahead include/linux/iomap.h:600 [inline]
xfs_vm_readahead+0x9f/0xe0 fs/xfs/xfs_aops.c:753
read_pages+0x17a/0x580 mm/readahead.c:160
page_cache_ra_order+0x8ca/0xd40 mm/readahead.c:512
filemap_readahead mm/filemap.c:2572 [inline]
filemap_get_pages+0xb22/0x1ea0 mm/filemap.c:2617
filemap_splice_read+0x581/0xc60 mm/filemap.c:2991
xfs_file_splice_read+0x2c4/0x600 fs/xfs/xfs_file.c:345
do_splice_read fs/splice.c:982 [inline]
splice_direct_to_actor+0x4a9/0xcc0 fs/splice.c:1086
do_splice_direct_actor fs/splice.c:1204 [inline]
do_splice_direct+0x181/0x270 fs/splice.c:1230
do_sendfile+0x4da/0x7e0 fs/read_write.c:1370
__do_sys_sendfile64 fs/read_write.c:1431 [inline]
__se_sys_sendfile64+0x13e/0x190 fs/read_write.c:1417
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f897438eba9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffec28881f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
RAX: ffffffffffffffda RBX: 00007f89745d5fa0 RCX: 00007f897438eba9
RDX: 0000000000000000 RSI: 0000000000000005 RDI: 0000000000000005
RBP: 00007f8974411e19 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000e0000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f89745d5fa0 R14: 00007f89745d5fa0 R15: 0000000000000004
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:folio_end_read+0x22e/0x230 mm/filemap.c:1525
Code: 24 c9 ff 48 89 df 48 c7 c6 20 3e 94 8b e8 5a 63 31 ff 90 0f 0b e8 62 24 c9 ff 48 89 df 48 c7 c6 80 36 94 8b e8 43 63 31 ff 90 <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa
RSP: 0018:ffffc9000336eec8 EFLAGS: 00010246
RAX: 3b63e53e8591b500 RBX: ffffea0000c47280 RCX: 0000000000000000
RDX: 0000000000000007 RSI: ffffffff8d9ba482 RDI: 00000000ffffffff
RBP: 0000000000000001 R08: ffffffff8fa3a637 R09: 1ffffffff1f474c6
R10: dffffc0000000000 R11: fffffbfff1f474c7 R12: 1ffffd4000188e51
R13: 1ffffd4000188e50 R14: ffffea0000c47288 R15: 0000000000000008
FS: 00005555827bc500(0000) GS:ffff8880b8615000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b30d63fff CR3: 0000000027336000 CR4: 00000000000006f0
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [syzbot ci] Re: fuse: use iomap for buffered reads + readahead
2025-09-17 8:30 ` [syzbot ci] Re: fuse: use iomap for buffered reads + readahead syzbot ci
@ 2025-09-17 19:59 ` Joanne Koong
2025-09-18 15:48 ` Aleksandr Nogikh
0 siblings, 1 reply; 40+ messages in thread
From: Joanne Koong @ 2025-09-17 19:59 UTC (permalink / raw)
To: syzbot ci, syzbot
Cc: brauner, djwong, gfs2, hch, hch, hsiangkao, kernel-team,
linux-block, linux-doc, linux-fsdevel, linux-xfs, miklos, syzbot,
syzkaller-bugs
On Wed, Sep 17, 2025 at 1:37 AM syzbot ci
<syzbot+ci9b5a486340e6bcdf@syzkaller.appspotmail.com> wrote:
>
> syzbot ci has tested the following series
>
> [v3] fuse: use iomap for buffered reads + readahead
> https://lore.kernel.org/all/20250916234425.1274735-1-joannelkoong@gmail.com
> * [PATCH v3 01/15] iomap: move bio read logic into helper function
> * [PATCH v3 02/15] iomap: move read/readahead bio submission logic into helper function
> * [PATCH v3 03/15] iomap: store read/readahead bio generically
> * [PATCH v3 04/15] iomap: iterate over entire folio in iomap_readpage_iter()
> * [PATCH v3 05/15] iomap: rename iomap_readpage_iter() to iomap_read_folio_iter()
> * [PATCH v3 06/15] iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx
> * [PATCH v3 07/15] iomap: track read/readahead folio ownership internally
> * [PATCH v3 08/15] iomap: add public start/finish folio read helpers
> * [PATCH v3 09/15] iomap: add caller-provided callbacks for read and readahead
> * [PATCH v3 10/15] iomap: add bias for async read requests
> * [PATCH v3 11/15] iomap: move buffered io bio logic into new file
> * [PATCH v3 12/15] iomap: make iomap_read_folio() a void return
> * [PATCH v3 13/15] fuse: use iomap for read_folio
> * [PATCH v3 14/15] fuse: use iomap for readahead
> * [PATCH v3 15/15] fuse: remove fc->blkbits workaround for partial writes
>
> and found the following issues:
> * WARNING in iomap_iter_advance
> * WARNING in iomap_readahead
> * kernel BUG in folio_end_read
>
> Full report is available here:
> https://ci.syzbot.org/series/6845596a-1ec9-4396-b9c4-48bddc606bef
>
> ***
>
Thanks. Do you get run on every patchset that is sent upstream or is
it random? Trying to figure out if this means v2 is right and i just
messed up v3 or if you just didn't run on v2.
Thanks,
Joanne
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 11/15] iomap: move buffered io bio logic into new file
2025-09-16 23:44 ` [PATCH v3 11/15] iomap: move buffered io bio logic into new file Joanne Koong
@ 2025-09-17 21:40 ` kernel test robot
2025-09-18 22:31 ` Darrick J. Wong
2025-09-19 15:32 ` Christoph Hellwig
2 siblings, 0 replies; 40+ messages in thread
From: kernel test robot @ 2025-09-17 21:40 UTC (permalink / raw)
To: Joanne Koong, brauner, miklos
Cc: oe-kbuild-all, hch, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
Hi Joanne,
kernel test robot noticed the following build warnings:
[auto build test WARNING on brauner-vfs/vfs.all]
[also build test WARNING on mszeredi-fuse/for-next axboe-block/for-next xiang-erofs/dev-test xiang-erofs/dev xiang-erofs/fixes gfs2/for-next xfs-linux/for-next linus/master v6.17-rc6 next-20250917]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Joanne-Koong/iomap-move-bio-read-logic-into-helper-function/20250917-075255
base: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all
patch link: https://lore.kernel.org/r/20250916234425.1274735-12-joannelkoong%40gmail.com
patch subject: [PATCH v3 11/15] iomap: move buffered io bio logic into new file
config: x86_64-randconfig-071-20250918 (https://download.01.org/0day-ci/archive/20250918/202509180515.Cc8A1Wu9-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250918/202509180515.Cc8A1Wu9-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509180515.Cc8A1Wu9-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from fs/iomap/buffered-io.c:11:
>> fs/iomap/internal.h:13:5: warning: no previous prototype for 'iomap_bio_read_folio_range_sync' [-Wmissing-prototypes]
13 | int iomap_bio_read_folio_range_sync(const struct iomap_iter *iter,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vim +/iomap_bio_read_folio_range_sync +13 fs/iomap/internal.h
8
9 #ifdef CONFIG_BLOCK
10 int iomap_bio_read_folio_range_sync(const struct iomap_iter *iter,
11 struct folio *folio, loff_t pos, size_t len);
12 #else
> 13 int iomap_bio_read_folio_range_sync(const struct iomap_iter *iter,
14 struct folio *folio, loff_t pos, size_t len)
15 {
16 WARN_ON_ONCE(1);
17 return -EIO;
18 }
19 #endif /* CONFIG_BLOCK */
20
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [syzbot ci] Re: fuse: use iomap for buffered reads + readahead
2025-09-17 19:59 ` Joanne Koong
@ 2025-09-18 15:48 ` Aleksandr Nogikh
2025-09-18 21:15 ` Joanne Koong
0 siblings, 1 reply; 40+ messages in thread
From: Aleksandr Nogikh @ 2025-09-18 15:48 UTC (permalink / raw)
To: Joanne Koong
Cc: syzbot ci, syzbot, brauner, djwong, gfs2, hch, hch, hsiangkao,
kernel-team, linux-block, linux-doc, linux-fsdevel, linux-xfs,
miklos, syzbot, syzkaller-bugs
Hi Joanne,
On Wed, Sep 17, 2025 at 9:59 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Wed, Sep 17, 2025 at 1:37 AM syzbot ci
> <syzbot+ci9b5a486340e6bcdf@syzkaller.appspotmail.com> wrote:
> >
> > syzbot ci has tested the following series
> >
> > [v3] fuse: use iomap for buffered reads + readahead
> > https://lore.kernel.org/all/20250916234425.1274735-1-joannelkoong@gmail.com
> > * [PATCH v3 01/15] iomap: move bio read logic into helper function
> > * [PATCH v3 02/15] iomap: move read/readahead bio submission logic into helper function
> > * [PATCH v3 03/15] iomap: store read/readahead bio generically
> > * [PATCH v3 04/15] iomap: iterate over entire folio in iomap_readpage_iter()
> > * [PATCH v3 05/15] iomap: rename iomap_readpage_iter() to iomap_read_folio_iter()
> > * [PATCH v3 06/15] iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx
> > * [PATCH v3 07/15] iomap: track read/readahead folio ownership internally
> > * [PATCH v3 08/15] iomap: add public start/finish folio read helpers
> > * [PATCH v3 09/15] iomap: add caller-provided callbacks for read and readahead
> > * [PATCH v3 10/15] iomap: add bias for async read requests
> > * [PATCH v3 11/15] iomap: move buffered io bio logic into new file
> > * [PATCH v3 12/15] iomap: make iomap_read_folio() a void return
> > * [PATCH v3 13/15] fuse: use iomap for read_folio
> > * [PATCH v3 14/15] fuse: use iomap for readahead
> > * [PATCH v3 15/15] fuse: remove fc->blkbits workaround for partial writes
> >
> > and found the following issues:
> > * WARNING in iomap_iter_advance
> > * WARNING in iomap_readahead
> > * kernel BUG in folio_end_read
> >
> > Full report is available here:
> > https://ci.syzbot.org/series/6845596a-1ec9-4396-b9c4-48bddc606bef
> >
> > ***
> >
> Thanks. Do you get run on every patchset that is sent upstream or is
> it random? Trying to figure out if this means v2 is right and i just
> messed up v3 or if you just didn't run on v2.
The intent is to run on every patchset, but since the system is
currently still in the experimental state, some of the series are
skipped due to various reasons. E.g. syzbot tried to process v2, but
failed to find the kernel tree to which the series applies without
problems: https://ci.syzbot.org/series/7085b21e-ae1e-4bf9-b486-24a82ea9b37d
In the original email, there are links to the C reproducers, so these
can be used locally to determine if v1/v2 were affected.
--
Aleksandr
>
> Thanks,
> Joanne
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [syzbot ci] Re: fuse: use iomap for buffered reads + readahead
2025-09-18 15:48 ` Aleksandr Nogikh
@ 2025-09-18 21:15 ` Joanne Koong
0 siblings, 0 replies; 40+ messages in thread
From: Joanne Koong @ 2025-09-18 21:15 UTC (permalink / raw)
To: Aleksandr Nogikh
Cc: syzbot ci, syzbot, brauner, djwong, gfs2, hch, hch, hsiangkao,
kernel-team, linux-block, linux-doc, linux-fsdevel, linux-xfs,
miklos, syzbot, syzkaller-bugs
On Thu, Sep 18, 2025 at 8:48 AM Aleksandr Nogikh <nogikh@google.com> wrote:
>
> Hi Joanne,
>
> On Wed, Sep 17, 2025 at 9:59 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Wed, Sep 17, 2025 at 1:37 AM syzbot ci
> > <syzbot+ci9b5a486340e6bcdf@syzkaller.appspotmail.com> wrote:
> > >
> > > syzbot ci has tested the following series
> > >
> > > [v3] fuse: use iomap for buffered reads + readahead
> > > https://lore.kernel.org/all/20250916234425.1274735-1-joannelkoong@gmail.com
> > > * [PATCH v3 01/15] iomap: move bio read logic into helper function
> > > * [PATCH v3 02/15] iomap: move read/readahead bio submission logic into helper function
> > > * [PATCH v3 03/15] iomap: store read/readahead bio generically
> > > * [PATCH v3 04/15] iomap: iterate over entire folio in iomap_readpage_iter()
> > > * [PATCH v3 05/15] iomap: rename iomap_readpage_iter() to iomap_read_folio_iter()
> > > * [PATCH v3 06/15] iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx
> > > * [PATCH v3 07/15] iomap: track read/readahead folio ownership internally
> > > * [PATCH v3 08/15] iomap: add public start/finish folio read helpers
> > > * [PATCH v3 09/15] iomap: add caller-provided callbacks for read and readahead
> > > * [PATCH v3 10/15] iomap: add bias for async read requests
> > > * [PATCH v3 11/15] iomap: move buffered io bio logic into new file
> > > * [PATCH v3 12/15] iomap: make iomap_read_folio() a void return
> > > * [PATCH v3 13/15] fuse: use iomap for read_folio
> > > * [PATCH v3 14/15] fuse: use iomap for readahead
> > > * [PATCH v3 15/15] fuse: remove fc->blkbits workaround for partial writes
> > >
> > > and found the following issues:
> > > * WARNING in iomap_iter_advance
> > > * WARNING in iomap_readahead
> > > * kernel BUG in folio_end_read
> > >
> > > Full report is available here:
> > > https://ci.syzbot.org/series/6845596a-1ec9-4396-b9c4-48bddc606bef
> > >
> > > ***
> > >
> > Thanks. Do you get run on every patchset that is sent upstream or is
> > it random? Trying to figure out if this means v2 is right and i just
> > messed up v3 or if you just didn't run on v2.
>
> The intent is to run on every patchset, but since the system is
> currently still in the experimental state, some of the series are
> skipped due to various reasons. E.g. syzbot tried to process v2, but
> failed to find the kernel tree to which the series applies without
> problems: https://ci.syzbot.org/series/7085b21e-ae1e-4bf9-b486-24a82ea9b37d
>
> In the original email, there are links to the C reproducers, so these
> can be used locally to determine if v1/v2 were affected.
Thanks, I was able to repro it using the syz-executor.
It turns out the bug is in the upstream code. It's because
iomap_adjust_read_range() assumes the position and length passed in
are always block-aligned, so uptodate blocks get skipped by block-size
granularity, but in the case of non-block-aligned positions and
lengths, this can underflow the returned length and "overflow" the
returned position (return a position that's beyond the size of the
folio).
The warning never showed up upstream because the underflowed plen and
overflowed pos offset each other in the calculation:
length = pos - iter->pos + plen;
return iomap_iter_advance(iter, &length);
but now in this patchset, iter gets advanced first by "pos -
iter->pos" and then by "plen", which surfaces the warning.
I'll submit a fix for this separately and then resend this fuse iomap
patchset rebased on top of that.
Thanks,
Joanne
>
> --
> Aleksandr
>
> >
> > Thanks,
> > Joanne
> >
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 01/15] iomap: move bio read logic into helper function
2025-09-16 23:44 ` [PATCH v3 01/15] iomap: move bio read logic into helper function Joanne Koong
@ 2025-09-18 21:27 ` Darrick J. Wong
0 siblings, 0 replies; 40+ messages in thread
From: Darrick J. Wong @ 2025-09-18 21:27 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
On Tue, Sep 16, 2025 at 04:44:11PM -0700, Joanne Koong wrote:
> Move the iomap_readpage_iter() bio read logic into a separate helper
> function, iomap_bio_read_folio_range(). This is needed to make iomap
> read/readahead more generically usable, especially for filesystems that
> do not require CONFIG_BLOCK.
>
> Additionally rename buffered write's iomap_read_folio_range() function
> to iomap_bio_read_folio_range_sync() to better describe its synchronous
> behavior.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/iomap/buffered-io.c | 68 ++++++++++++++++++++++++------------------
> 1 file changed, 39 insertions(+), 29 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index fd827398afd2..05399aaa1361 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -357,36 +357,15 @@ struct iomap_readpage_ctx {
> struct readahead_control *rac;
> };
>
> -static int iomap_readpage_iter(struct iomap_iter *iter,
> - struct iomap_readpage_ctx *ctx)
> +static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
> + struct iomap_readpage_ctx *ctx, loff_t pos, size_t plen)
/me wonders if you could shorten these function names to
iomap_bio_read_folio{,_sync} now, but ... eh.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> {
> + struct folio *folio = ctx->cur_folio;
> const struct iomap *iomap = &iter->iomap;
> - loff_t pos = iter->pos;
> + struct iomap_folio_state *ifs = folio->private;
> + size_t poff = offset_in_folio(folio, pos);
> loff_t length = iomap_length(iter);
> - struct folio *folio = ctx->cur_folio;
> - struct iomap_folio_state *ifs;
> - size_t poff, plen;
> sector_t sector;
> - int ret;
> -
> - if (iomap->type == IOMAP_INLINE) {
> - ret = iomap_read_inline_data(iter, folio);
> - if (ret)
> - return ret;
> - return iomap_iter_advance(iter, &length);
> - }
> -
> - /* zero post-eof blocks as the page may be mapped */
> - ifs = ifs_alloc(iter->inode, folio, iter->flags);
> - iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen);
> - if (plen == 0)
> - goto done;
> -
> - if (iomap_block_needs_zeroing(iter, pos)) {
> - folio_zero_range(folio, poff, plen);
> - iomap_set_range_uptodate(folio, poff, plen);
> - goto done;
> - }
>
> ctx->cur_folio_in_bio = true;
> if (ifs) {
> @@ -425,6 +404,37 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> ctx->bio->bi_end_io = iomap_read_end_io;
> bio_add_folio_nofail(ctx->bio, folio, plen, poff);
> }
> +}
> +
> +static int iomap_readpage_iter(struct iomap_iter *iter,
> + struct iomap_readpage_ctx *ctx)
> +{
> + const struct iomap *iomap = &iter->iomap;
> + loff_t pos = iter->pos;
> + loff_t length = iomap_length(iter);
> + struct folio *folio = ctx->cur_folio;
> + size_t poff, plen;
> + int ret;
> +
> + if (iomap->type == IOMAP_INLINE) {
> + ret = iomap_read_inline_data(iter, folio);
> + if (ret)
> + return ret;
> + return iomap_iter_advance(iter, &length);
> + }
> +
> + /* zero post-eof blocks as the page may be mapped */
> + ifs_alloc(iter->inode, folio, iter->flags);
> + iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen);
> + if (plen == 0)
> + goto done;
> +
> + if (iomap_block_needs_zeroing(iter, pos)) {
> + folio_zero_range(folio, poff, plen);
> + iomap_set_range_uptodate(folio, poff, plen);
> + } else {
> + iomap_bio_read_folio_range(iter, ctx, pos, plen);
> + }
>
> done:
> /*
> @@ -549,7 +559,7 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
> }
> EXPORT_SYMBOL_GPL(iomap_readahead);
>
> -static int iomap_read_folio_range(const struct iomap_iter *iter,
> +static int iomap_bio_read_folio_range_sync(const struct iomap_iter *iter,
> struct folio *folio, loff_t pos, size_t len)
> {
> const struct iomap *srcmap = iomap_iter_srcmap(iter);
> @@ -562,7 +572,7 @@ static int iomap_read_folio_range(const struct iomap_iter *iter,
> return submit_bio_wait(&bio);
> }
> #else
> -static int iomap_read_folio_range(const struct iomap_iter *iter,
> +static int iomap_bio_read_folio_range_sync(const struct iomap_iter *iter,
> struct folio *folio, loff_t pos, size_t len)
> {
> WARN_ON_ONCE(1);
> @@ -739,7 +749,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter,
> status = write_ops->read_folio_range(iter,
> folio, block_start, plen);
> else
> - status = iomap_read_folio_range(iter,
> + status = iomap_bio_read_folio_range_sync(iter,
> folio, block_start, plen);
> if (status)
> return status;
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 02/15] iomap: move read/readahead bio submission logic into helper function
2025-09-16 23:44 ` [PATCH v3 02/15] iomap: move read/readahead bio submission " Joanne Koong
@ 2025-09-18 21:27 ` Darrick J. Wong
0 siblings, 0 replies; 40+ messages in thread
From: Darrick J. Wong @ 2025-09-18 21:27 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
On Tue, Sep 16, 2025 at 04:44:12PM -0700, Joanne Koong wrote:
> Move the read/readahead bio submission logic into a separate helper.
> This is needed to make iomap read/readahead more generically usable,
> especially for filesystems that do not require CONFIG_BLOCK.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Looks good to me!
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/iomap/buffered-io.c | 30 ++++++++++++++++--------------
> 1 file changed, 16 insertions(+), 14 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 05399aaa1361..ee96558b6d99 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -357,6 +357,14 @@ struct iomap_readpage_ctx {
> struct readahead_control *rac;
> };
>
> +static void iomap_bio_submit_read(struct iomap_readpage_ctx *ctx)
> +{
> + struct bio *bio = ctx->bio;
> +
> + if (bio)
> + submit_bio(bio);
> +}
> +
> static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
> struct iomap_readpage_ctx *ctx, loff_t pos, size_t plen)
> {
> @@ -382,8 +390,7 @@ static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
> gfp_t orig_gfp = gfp;
> unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
>
> - if (ctx->bio)
> - submit_bio(ctx->bio);
> + iomap_bio_submit_read(ctx);
>
> if (ctx->rac) /* same as readahead_gfp_mask */
> gfp |= __GFP_NORETRY | __GFP_NOWARN;
> @@ -478,13 +485,10 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
> while ((ret = iomap_iter(&iter, ops)) > 0)
> iter.status = iomap_read_folio_iter(&iter, &ctx);
>
> - if (ctx.bio) {
> - submit_bio(ctx.bio);
> - WARN_ON_ONCE(!ctx.cur_folio_in_bio);
> - } else {
> - WARN_ON_ONCE(ctx.cur_folio_in_bio);
> + iomap_bio_submit_read(&ctx);
> +
> + if (!ctx.cur_folio_in_bio)
> folio_unlock(folio);
> - }
>
> /*
> * Just like mpage_readahead and block_read_full_folio, we always
> @@ -550,12 +554,10 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
> while (iomap_iter(&iter, ops) > 0)
> iter.status = iomap_readahead_iter(&iter, &ctx);
>
> - if (ctx.bio)
> - submit_bio(ctx.bio);
> - if (ctx.cur_folio) {
> - if (!ctx.cur_folio_in_bio)
> - folio_unlock(ctx.cur_folio);
> - }
> + iomap_bio_submit_read(&ctx);
> +
> + if (ctx.cur_folio && !ctx.cur_folio_in_bio)
> + folio_unlock(ctx.cur_folio);
> }
> EXPORT_SYMBOL_GPL(iomap_readahead);
>
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 03/15] iomap: store read/readahead bio generically
2025-09-16 23:44 ` [PATCH v3 03/15] iomap: store read/readahead bio generically Joanne Koong
@ 2025-09-18 21:29 ` Darrick J. Wong
0 siblings, 0 replies; 40+ messages in thread
From: Darrick J. Wong @ 2025-09-18 21:29 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
On Tue, Sep 16, 2025 at 04:44:13PM -0700, Joanne Koong wrote:
> Store the iomap_readpage_ctx bio generically as a "void *read_ctx".
> This makes the read/readahead interface more generic, which allows it to
> be used by filesystems that may not be block-based and may not have
> CONFIG_BLOCK set.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Wheee type changes :)
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/iomap/buffered-io.c | 29 ++++++++++++++---------------
> 1 file changed, 14 insertions(+), 15 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index ee96558b6d99..2a1709e0757b 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -353,13 +353,13 @@ static void iomap_read_end_io(struct bio *bio)
> struct iomap_readpage_ctx {
> struct folio *cur_folio;
> bool cur_folio_in_bio;
> - struct bio *bio;
> + void *read_ctx;
> struct readahead_control *rac;
> };
>
> static void iomap_bio_submit_read(struct iomap_readpage_ctx *ctx)
> {
> - struct bio *bio = ctx->bio;
> + struct bio *bio = ctx->read_ctx;
>
> if (bio)
> submit_bio(bio);
> @@ -374,6 +374,7 @@ static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
> size_t poff = offset_in_folio(folio, pos);
> loff_t length = iomap_length(iter);
> sector_t sector;
> + struct bio *bio = ctx->read_ctx;
>
> ctx->cur_folio_in_bio = true;
> if (ifs) {
> @@ -383,9 +384,8 @@ static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
> }
>
> sector = iomap_sector(iomap, pos);
> - if (!ctx->bio ||
> - bio_end_sector(ctx->bio) != sector ||
> - !bio_add_folio(ctx->bio, folio, plen, poff)) {
> + if (!bio || bio_end_sector(bio) != sector ||
> + !bio_add_folio(bio, folio, plen, poff)) {
> gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
> gfp_t orig_gfp = gfp;
> unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
> @@ -394,22 +394,21 @@ static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
>
> if (ctx->rac) /* same as readahead_gfp_mask */
> gfp |= __GFP_NORETRY | __GFP_NOWARN;
> - ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
> - REQ_OP_READ, gfp);
> + bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs), REQ_OP_READ,
> + gfp);
> /*
> * If the bio_alloc fails, try it again for a single page to
> * avoid having to deal with partial page reads. This emulates
> * what do_mpage_read_folio does.
> */
> - if (!ctx->bio) {
> - ctx->bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ,
> - orig_gfp);
> - }
> + if (!bio)
> + bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
> if (ctx->rac)
> - ctx->bio->bi_opf |= REQ_RAHEAD;
> - ctx->bio->bi_iter.bi_sector = sector;
> - ctx->bio->bi_end_io = iomap_read_end_io;
> - bio_add_folio_nofail(ctx->bio, folio, plen, poff);
> + bio->bi_opf |= REQ_RAHEAD;
> + bio->bi_iter.bi_sector = sector;
> + bio->bi_end_io = iomap_read_end_io;
> + bio_add_folio_nofail(bio, folio, plen, poff);
> + ctx->read_ctx = bio;
> }
> }
>
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 04/15] iomap: iterate over entire folio in iomap_readpage_iter()
2025-09-16 23:44 ` [PATCH v3 04/15] iomap: iterate over entire folio in iomap_readpage_iter() Joanne Koong
@ 2025-09-18 21:37 ` Darrick J. Wong
2025-09-22 22:33 ` Joanne Koong
1 sibling, 0 replies; 40+ messages in thread
From: Darrick J. Wong @ 2025-09-18 21:37 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc, Christoph Hellwig
On Tue, Sep 16, 2025 at 04:44:14PM -0700, Joanne Koong wrote:
> Iterate over all non-uptodate ranges in a single call to
> iomap_readpage_iter() instead of leaving the partial folio iteration to
> the caller.
>
> This will be useful for supporting caller-provided async folio read
> callbacks (added in later commit) because that will require tracking
> when the first and last async read request for a folio is sent, in order
> to prevent premature read completion of the folio.
>
> This additionally makes the iomap_readahead_iter() logic a bit simpler.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
This looks pretty straightforward, so
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/iomap/buffered-io.c | 69 ++++++++++++++++++++----------------------
> 1 file changed, 32 insertions(+), 37 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 2a1709e0757b..0c4ba2a63490 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -420,6 +420,7 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> loff_t length = iomap_length(iter);
> struct folio *folio = ctx->cur_folio;
> size_t poff, plen;
> + loff_t count;
> int ret;
>
> if (iomap->type == IOMAP_INLINE) {
> @@ -431,39 +432,33 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
>
> /* zero post-eof blocks as the page may be mapped */
> ifs_alloc(iter->inode, folio, iter->flags);
> - iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen);
> - if (plen == 0)
> - goto done;
>
> - if (iomap_block_needs_zeroing(iter, pos)) {
> - folio_zero_range(folio, poff, plen);
> - iomap_set_range_uptodate(folio, poff, plen);
> - } else {
> - iomap_bio_read_folio_range(iter, ctx, pos, plen);
> - }
> + length = min_t(loff_t, length,
> + folio_size(folio) - offset_in_folio(folio, pos));
> + while (length) {
> + iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff,
> + &plen);
>
> -done:
> - /*
> - * Move the caller beyond our range so that it keeps making progress.
> - * For that, we have to include any leading non-uptodate ranges, but
> - * we can skip trailing ones as they will be handled in the next
> - * iteration.
> - */
> - length = pos - iter->pos + plen;
> - return iomap_iter_advance(iter, &length);
> -}
> + count = pos - iter->pos + plen;
> + if (WARN_ON_ONCE(count > length))
> + return -EIO;
>
> -static int iomap_read_folio_iter(struct iomap_iter *iter,
> - struct iomap_readpage_ctx *ctx)
> -{
> - int ret;
> + if (plen == 0)
> + return iomap_iter_advance(iter, &count);
>
> - while (iomap_length(iter)) {
> - ret = iomap_readpage_iter(iter, ctx);
> + if (iomap_block_needs_zeroing(iter, pos)) {
> + folio_zero_range(folio, poff, plen);
> + iomap_set_range_uptodate(folio, poff, plen);
> + } else {
> + iomap_bio_read_folio_range(iter, ctx, pos, plen);
> + }
> +
> + length -= count;
> + ret = iomap_iter_advance(iter, &count);
> if (ret)
> return ret;
> + pos = iter->pos;
> }
> -
> return 0;
> }
>
> @@ -482,7 +477,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
> trace_iomap_readpage(iter.inode, 1);
>
> while ((ret = iomap_iter(&iter, ops)) > 0)
> - iter.status = iomap_read_folio_iter(&iter, &ctx);
> + iter.status = iomap_readpage_iter(&iter, &ctx);
>
> iomap_bio_submit_read(&ctx);
>
> @@ -504,16 +499,16 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
> int ret;
>
> while (iomap_length(iter)) {
> - if (ctx->cur_folio &&
> - offset_in_folio(ctx->cur_folio, iter->pos) == 0) {
> - if (!ctx->cur_folio_in_bio)
> - folio_unlock(ctx->cur_folio);
> - ctx->cur_folio = NULL;
> - }
> - if (!ctx->cur_folio) {
> - ctx->cur_folio = readahead_folio(ctx->rac);
> - ctx->cur_folio_in_bio = false;
> - }
> + if (ctx->cur_folio && !ctx->cur_folio_in_bio)
> + folio_unlock(ctx->cur_folio);
> + ctx->cur_folio = readahead_folio(ctx->rac);
> + /*
> + * We should never in practice hit this case since the iter
> + * length matches the readahead length.
> + */
> + if (WARN_ON_ONCE(!ctx->cur_folio))
> + return -EINVAL;
> + ctx->cur_folio_in_bio = false;
> ret = iomap_readpage_iter(iter, ctx);
> if (ret)
> return ret;
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 07/15] iomap: track read/readahead folio ownership internally
2025-09-16 23:44 ` [PATCH v3 07/15] iomap: track read/readahead folio ownership internally Joanne Koong
@ 2025-09-18 21:49 ` Darrick J. Wong
2025-09-19 18:14 ` Joanne Koong
0 siblings, 1 reply; 40+ messages in thread
From: Darrick J. Wong @ 2025-09-18 21:49 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
On Tue, Sep 16, 2025 at 04:44:17PM -0700, Joanne Koong wrote:
> The purpose of "struct iomap_read_folio_ctx->cur_folio_in_bio" is to
> track folio ownership to know who is responsible for unlocking it.
> Rename "cur_folio_in_bio" to "cur_folio_owned" to better reflect this
> purpose and so that this can be generically used later on by filesystems
> that are not block-based.
Hrmm, well if this is becoming a private variable, then I'd shorten it
to "folio_owned" since it's not really in the ctx/cur anymore, but I
might be bikeshedding now. :)
> Since "struct iomap_read_folio_ctx" will be made a public interface
> later on when read/readahead takes in caller-provided callbacks, track
> the folio ownership state internally instead of exposing it in "struct
> iomap_read_folio_ctx" to make the interface simpler for end users.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/iomap/buffered-io.c | 34 +++++++++++++++++++++++-----------
> 1 file changed, 23 insertions(+), 11 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 6c5a631848b7..587bbdbd24bc 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -352,7 +352,6 @@ static void iomap_read_end_io(struct bio *bio)
>
> struct iomap_read_folio_ctx {
> struct folio *cur_folio;
> - bool cur_folio_in_bio;
> void *read_ctx;
> struct readahead_control *rac;
> };
> @@ -376,7 +375,6 @@ static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
> sector_t sector;
> struct bio *bio = ctx->read_ctx;
>
> - ctx->cur_folio_in_bio = true;
> if (ifs) {
> spin_lock_irq(&ifs->state_lock);
> ifs->read_bytes_pending += plen;
> @@ -413,7 +411,7 @@ static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
> }
>
> static int iomap_read_folio_iter(struct iomap_iter *iter,
> - struct iomap_read_folio_ctx *ctx)
> + struct iomap_read_folio_ctx *ctx, bool *cur_folio_owned)
> {
> const struct iomap *iomap = &iter->iomap;
> loff_t pos = iter->pos;
> @@ -450,6 +448,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> folio_zero_range(folio, poff, plen);
> iomap_set_range_uptodate(folio, poff, plen);
> } else {
> + *cur_folio_owned = true;
> iomap_bio_read_folio_range(iter, ctx, pos, plen);
> }
>
> @@ -472,16 +471,22 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
> struct iomap_read_folio_ctx ctx = {
> .cur_folio = folio,
> };
> + /*
> + * If an external IO helper takes ownership of the folio, it is
> + * responsible for unlocking it when the read completes.
Not sure what "external" means here -- I think for your project it means
a custom folio read method supplied by a filesystem, but for the exist
code (xfs submitting unaltered bios), that part is still mostly internal
to iomap.
If we were *only* refactoring code I would suggest s/external/async/
because that's what the bio code does, but a filesystem supplying its
own folio read function could very well fill the folio synchronously and
it'd still have to unlock the folio.
Maybe just get rid of the word external? The rest of the code changes
look fine to me.
--D
> + */
> + bool cur_folio_owned = false;
> int ret;
>
> trace_iomap_readpage(iter.inode, 1);
>
> while ((ret = iomap_iter(&iter, ops)) > 0)
> - iter.status = iomap_read_folio_iter(&iter, &ctx);
> + iter.status = iomap_read_folio_iter(&iter, &ctx,
> + &cur_folio_owned);
>
> iomap_bio_submit_read(&ctx);
>
> - if (!ctx.cur_folio_in_bio)
> + if (!cur_folio_owned)
> folio_unlock(folio);
>
> /*
> @@ -494,12 +499,13 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
> EXPORT_SYMBOL_GPL(iomap_read_folio);
>
> static int iomap_readahead_iter(struct iomap_iter *iter,
> - struct iomap_read_folio_ctx *ctx)
> + struct iomap_read_folio_ctx *ctx,
> + bool *cur_folio_owned)
> {
> int ret;
>
> while (iomap_length(iter)) {
> - if (ctx->cur_folio && !ctx->cur_folio_in_bio)
> + if (ctx->cur_folio && !*cur_folio_owned)
> folio_unlock(ctx->cur_folio);
> ctx->cur_folio = readahead_folio(ctx->rac);
> /*
> @@ -508,8 +514,8 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
> */
> if (WARN_ON_ONCE(!ctx->cur_folio))
> return -EINVAL;
> - ctx->cur_folio_in_bio = false;
> - ret = iomap_read_folio_iter(iter, ctx);
> + *cur_folio_owned = false;
> + ret = iomap_read_folio_iter(iter, ctx, cur_folio_owned);
> if (ret)
> return ret;
> }
> @@ -542,15 +548,21 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
> struct iomap_read_folio_ctx ctx = {
> .rac = rac,
> };
> + /*
> + * If an external IO helper takes ownership of the folio, it is
> + * responsible for unlocking it when the read completes.
> + */
> + bool cur_folio_owned = false;
>
> trace_iomap_readahead(rac->mapping->host, readahead_count(rac));
>
> while (iomap_iter(&iter, ops) > 0)
> - iter.status = iomap_readahead_iter(&iter, &ctx);
> + iter.status = iomap_readahead_iter(&iter, &ctx,
> + &cur_folio_owned);
>
> iomap_bio_submit_read(&ctx);
>
> - if (ctx.cur_folio && !ctx.cur_folio_in_bio)
> + if (ctx.cur_folio && !cur_folio_owned)
> folio_unlock(ctx.cur_folio);
> }
> EXPORT_SYMBOL_GPL(iomap_readahead);
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 12/15] iomap: make iomap_read_folio() a void return
2025-09-16 23:44 ` [PATCH v3 12/15] iomap: make iomap_read_folio() a void return Joanne Koong
@ 2025-09-18 21:55 ` Darrick J. Wong
0 siblings, 0 replies; 40+ messages in thread
From: Darrick J. Wong @ 2025-09-18 21:55 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
On Tue, Sep 16, 2025 at 04:44:22PM -0700, Joanne Koong wrote:
> No errors are propagated in iomap_read_folio(). Change
> iomap_read_folio() to a void return to make this clearer to callers.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Yesssssss
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/iomap/buffered-io.c | 9 +--------
> include/linux/iomap.h | 2 +-
> 2 files changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 72258b0109ec..be535bd3aeca 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -450,7 +450,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> return ret;
> }
>
> -int iomap_read_folio(const struct iomap_ops *ops,
> +void iomap_read_folio(const struct iomap_ops *ops,
> struct iomap_read_folio_ctx *ctx)
> {
> struct folio *folio = ctx->cur_folio;
> @@ -477,13 +477,6 @@ int iomap_read_folio(const struct iomap_ops *ops,
>
> if (!cur_folio_owned)
> folio_unlock(folio);
> -
> - /*
> - * Just like mpage_readahead and block_read_full_folio, we always
> - * return 0 and just set the folio error flag on errors. This
> - * should be cleaned up throughout the stack eventually.
> - */
> - return 0;
> }
> EXPORT_SYMBOL_GPL(iomap_read_folio);
>
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 4a168ebb40f5..fa55ec611fff 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -340,7 +340,7 @@ static inline bool iomap_want_unshare_iter(const struct iomap_iter *iter)
> ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
> const struct iomap_ops *ops,
> const struct iomap_write_ops *write_ops, void *private);
> -int iomap_read_folio(const struct iomap_ops *ops,
> +void iomap_read_folio(const struct iomap_ops *ops,
> struct iomap_read_folio_ctx *ctx);
> void iomap_readahead(const struct iomap_ops *ops,
> struct iomap_read_folio_ctx *ctx);
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 10/15] iomap: add bias for async read requests
2025-09-16 23:44 ` [PATCH v3 10/15] iomap: add bias for async read requests Joanne Koong
@ 2025-09-18 22:30 ` Darrick J. Wong
2025-09-19 18:34 ` Joanne Koong
2025-09-22 18:33 ` Christoph Hellwig
2025-09-22 23:19 ` Joanne Koong
1 sibling, 2 replies; 40+ messages in thread
From: Darrick J. Wong @ 2025-09-18 22:30 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
On Tue, Sep 16, 2025 at 04:44:20PM -0700, Joanne Koong wrote:
> Non-block-based filesystems will be using iomap read/readahead. If they
> handle reading in ranges asynchronously and fulfill those read requests
> on an ongoing basis (instead of all together at the end), then there is
> the possibility that the read on the folio may be prematurely ended if
> earlier async requests complete before the later ones have been issued.
>
> For example if there is a large folio and a readahead request for 16
> pages in that folio, if doing readahead on those 16 pages is split into
> 4 async requests and the first request is sent off and then completed
> before we have sent off the second request, then when the first request
> calls iomap_finish_folio_read(), ifs->read_bytes_pending would be 0,
> which would end the read and unlock the folio prematurely.
>
> To mitigate this, a "bias" is added to ifs->read_bytes_pending before
> the first range is forwarded to the caller and removed after the last
> range has been forwarded.
>
> iomap writeback does this with their async requests as well to prevent
> prematurely ending writeback.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/iomap/buffered-io.c | 55 ++++++++++++++++++++++++++++++++++++------
> 1 file changed, 47 insertions(+), 8 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 561378f2b9bb..667a49cb5ae5 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -420,6 +420,38 @@ const struct iomap_read_ops iomap_bio_read_ops = {
> };
> EXPORT_SYMBOL_GPL(iomap_bio_read_ops);
>
> +/*
> + * Add a bias to ifs->read_bytes_pending to prevent the read on the folio from
> + * being ended prematurely.
> + *
> + * Otherwise, if the ranges are read asynchronously and read requests are
> + * fulfilled on an ongoing basis, there is the possibility that the read on the
> + * folio may be prematurely ended if earlier async requests complete before the
> + * later ones have been issued.
> + */
> +static void iomap_read_add_bias(struct folio *folio)
> +{
> + iomap_start_folio_read(folio, 1);
I wonder, could you achieve the same effect by elevating
read_bytes_pending by the number of bytes that we think we have to read,
and subtracting from it as the completions come in or we decide that no
read is necessary?
(That might just be overthinking the plumbing though)
--D
> +}
> +
> +static void iomap_read_remove_bias(struct folio *folio, bool *cur_folio_owned)
> +{
> + struct iomap_folio_state *ifs = folio->private;
> + bool finished, uptodate;
> +
> + if (ifs) {
> + spin_lock_irq(&ifs->state_lock);
> + ifs->read_bytes_pending -= 1;
> + finished = !ifs->read_bytes_pending;
> + if (finished)
> + uptodate = ifs_is_fully_uptodate(folio, ifs);
> + spin_unlock_irq(&ifs->state_lock);
> + if (finished)
> + folio_end_read(folio, uptodate);
> + *cur_folio_owned = true;
> + }
> +}
> +
> static int iomap_read_folio_iter(struct iomap_iter *iter,
> struct iomap_read_folio_ctx *ctx, bool *cur_folio_owned)
> {
> @@ -429,7 +461,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> struct folio *folio = ctx->cur_folio;
> size_t poff, plen;
> loff_t delta;
> - int ret;
> + int ret = 0;
>
> if (iomap->type == IOMAP_INLINE) {
> ret = iomap_read_inline_data(iter, folio);
> @@ -441,6 +473,8 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> /* zero post-eof blocks as the page may be mapped */
> ifs_alloc(iter->inode, folio, iter->flags);
>
> + iomap_read_add_bias(folio);
> +
> length = min_t(loff_t, length,
> folio_size(folio) - offset_in_folio(folio, pos));
> while (length) {
> @@ -448,16 +482,18 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> &plen);
>
> delta = pos - iter->pos;
> - if (WARN_ON_ONCE(delta + plen > length))
> - return -EIO;
> + if (WARN_ON_ONCE(delta + plen > length)) {
> + ret = -EIO;
> + break;
> + }
> length -= delta + plen;
>
> ret = iomap_iter_advance(iter, &delta);
> if (ret)
> - return ret;
> + break;
>
> if (plen == 0)
> - return 0;
> + break;
>
> if (iomap_block_needs_zeroing(iter, pos)) {
> folio_zero_range(folio, poff, plen);
> @@ -466,16 +502,19 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> *cur_folio_owned = true;
> ret = ctx->ops->read_folio_range(iter, ctx, plen);
> if (ret)
> - return ret;
> + break;
> }
>
> delta = plen;
> ret = iomap_iter_advance(iter, &delta);
> if (ret)
> - return ret;
> + break;
> pos = iter->pos;
> }
> - return 0;
> +
> + iomap_read_remove_bias(folio, cur_folio_owned);
> +
> + return ret;
> }
>
> int iomap_read_folio(const struct iomap_ops *ops,
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 11/15] iomap: move buffered io bio logic into new file
2025-09-16 23:44 ` [PATCH v3 11/15] iomap: move buffered io bio logic into new file Joanne Koong
2025-09-17 21:40 ` kernel test robot
@ 2025-09-18 22:31 ` Darrick J. Wong
2025-09-19 15:33 ` Christoph Hellwig
2025-09-19 15:32 ` Christoph Hellwig
2 siblings, 1 reply; 40+ messages in thread
From: Darrick J. Wong @ 2025-09-18 22:31 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
On Tue, Sep 16, 2025 at 04:44:21PM -0700, Joanne Koong wrote:
> Move bio logic in the buffered io code into its own file and remove
> CONFIG_BLOCK gating for iomap read/readahead.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
/me wonders how long until we end up doing the same thing with the
directio code but in the meantime it beats #ifdef everywhere
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/iomap/Makefile | 3 +-
> fs/iomap/bio.c | 90 ++++++++++++++++++++++++++++++++++++++++++
> fs/iomap/buffered-io.c | 90 +-----------------------------------------
> fs/iomap/internal.h | 12 ++++++
> 4 files changed, 105 insertions(+), 90 deletions(-)
> create mode 100644 fs/iomap/bio.c
>
> diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
> index f7e1c8534c46..a572b8808524 100644
> --- a/fs/iomap/Makefile
> +++ b/fs/iomap/Makefile
> @@ -14,5 +14,6 @@ iomap-y += trace.o \
> iomap-$(CONFIG_BLOCK) += direct-io.o \
> ioend.o \
> fiemap.o \
> - seek.o
> + seek.o \
> + bio.o
> iomap-$(CONFIG_SWAP) += swapfile.o
> diff --git a/fs/iomap/bio.c b/fs/iomap/bio.c
> new file mode 100644
> index 000000000000..8a51c9d70268
> --- /dev/null
> +++ b/fs/iomap/bio.c
> @@ -0,0 +1,90 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2010 Red Hat, Inc.
> + * Copyright (C) 2016-2023 Christoph Hellwig.
> + */
> +#include <linux/iomap.h>
> +#include <linux/pagemap.h>
> +#include "internal.h"
> +#include "trace.h"
> +
> +static void iomap_read_end_io(struct bio *bio)
> +{
> + int error = blk_status_to_errno(bio->bi_status);
> + struct folio_iter fi;
> +
> + bio_for_each_folio_all(fi, bio)
> + iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
> + bio_put(bio);
> +}
> +
> +static void iomap_bio_submit_read(struct iomap_read_folio_ctx *ctx)
> +{
> + struct bio *bio = ctx->read_ctx;
> +
> + if (bio)
> + submit_bio(bio);
> +}
> +
> +static int iomap_bio_read_folio_range(const struct iomap_iter *iter,
> + struct iomap_read_folio_ctx *ctx, size_t plen)
> +{
> + struct folio *folio = ctx->cur_folio;
> + const struct iomap *iomap = &iter->iomap;
> + loff_t pos = iter->pos;
> + size_t poff = offset_in_folio(folio, pos);
> + loff_t length = iomap_length(iter);
> + sector_t sector;
> + struct bio *bio = ctx->read_ctx;
> +
> + iomap_start_folio_read(folio, plen);
> +
> + sector = iomap_sector(iomap, pos);
> + if (!bio || bio_end_sector(bio) != sector ||
> + !bio_add_folio(bio, folio, plen, poff)) {
> + gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
> + gfp_t orig_gfp = gfp;
> + unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
> +
> + if (bio)
> + submit_bio(bio);
> +
> + if (ctx->rac) /* same as readahead_gfp_mask */
> + gfp |= __GFP_NORETRY | __GFP_NOWARN;
> + bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs), REQ_OP_READ,
> + gfp);
> + /*
> + * If the bio_alloc fails, try it again for a single page to
> + * avoid having to deal with partial page reads. This emulates
> + * what do_mpage_read_folio does.
> + */
> + if (!bio)
> + bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
> + if (ctx->rac)
> + bio->bi_opf |= REQ_RAHEAD;
> + bio->bi_iter.bi_sector = sector;
> + bio->bi_end_io = iomap_read_end_io;
> + bio_add_folio_nofail(bio, folio, plen, poff);
> + ctx->read_ctx = bio;
> + }
> + return 0;
> +}
> +
> +const struct iomap_read_ops iomap_bio_read_ops = {
> + .read_folio_range = iomap_bio_read_folio_range,
> + .submit_read = iomap_bio_submit_read,
> +};
> +EXPORT_SYMBOL_GPL(iomap_bio_read_ops);
> +
> +int iomap_bio_read_folio_range_sync(const struct iomap_iter *iter,
> + struct folio *folio, loff_t pos, size_t len)
> +{
> + const struct iomap *srcmap = iomap_iter_srcmap(iter);
> + struct bio_vec bvec;
> + struct bio bio;
> +
> + bio_init(&bio, srcmap->bdev, &bvec, 1, REQ_OP_READ);
> + bio.bi_iter.bi_sector = iomap_sector(srcmap, pos);
> + bio_add_folio_nofail(&bio, folio, len, offset_in_folio(folio, pos));
> + return submit_bio_wait(&bio);
> +}
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 667a49cb5ae5..72258b0109ec 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -8,6 +8,7 @@
> #include <linux/writeback.h>
> #include <linux/swap.h>
> #include <linux/migrate.h>
> +#include "internal.h"
> #include "trace.h"
>
> #include "../internal.h"
> @@ -352,74 +353,6 @@ void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len,
> }
> EXPORT_SYMBOL_GPL(iomap_finish_folio_read);
>
> -#ifdef CONFIG_BLOCK
> -static void iomap_read_end_io(struct bio *bio)
> -{
> - int error = blk_status_to_errno(bio->bi_status);
> - struct folio_iter fi;
> -
> - bio_for_each_folio_all(fi, bio)
> - iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
> - bio_put(bio);
> -}
> -
> -static void iomap_bio_submit_read(struct iomap_read_folio_ctx *ctx)
> -{
> - struct bio *bio = ctx->read_ctx;
> -
> - if (bio)
> - submit_bio(bio);
> -}
> -
> -static int iomap_bio_read_folio_range(const struct iomap_iter *iter,
> - struct iomap_read_folio_ctx *ctx, size_t plen)
> -{
> - struct folio *folio = ctx->cur_folio;
> - const struct iomap *iomap = &iter->iomap;
> - loff_t pos = iter->pos;
> - size_t poff = offset_in_folio(folio, pos);
> - loff_t length = iomap_length(iter);
> - sector_t sector;
> - struct bio *bio = ctx->read_ctx;
> -
> - iomap_start_folio_read(folio, plen);
> -
> - sector = iomap_sector(iomap, pos);
> - if (!bio || bio_end_sector(bio) != sector ||
> - !bio_add_folio(bio, folio, plen, poff)) {
> - gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL);
> - gfp_t orig_gfp = gfp;
> - unsigned int nr_vecs = DIV_ROUND_UP(length, PAGE_SIZE);
> -
> - iomap_bio_submit_read(ctx);
> -
> - if (ctx->rac) /* same as readahead_gfp_mask */
> - gfp |= __GFP_NORETRY | __GFP_NOWARN;
> - bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs), REQ_OP_READ,
> - gfp);
> - /*
> - * If the bio_alloc fails, try it again for a single page to
> - * avoid having to deal with partial page reads. This emulates
> - * what do_mpage_read_folio does.
> - */
> - if (!bio)
> - bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ, orig_gfp);
> - if (ctx->rac)
> - bio->bi_opf |= REQ_RAHEAD;
> - bio->bi_iter.bi_sector = sector;
> - bio->bi_end_io = iomap_read_end_io;
> - bio_add_folio_nofail(bio, folio, plen, poff);
> - ctx->read_ctx = bio;
> - }
> - return 0;
> -}
> -
> -const struct iomap_read_ops iomap_bio_read_ops = {
> - .read_folio_range = iomap_bio_read_folio_range,
> - .submit_read = iomap_bio_submit_read,
> -};
> -EXPORT_SYMBOL_GPL(iomap_bio_read_ops);
> -
> /*
> * Add a bias to ifs->read_bytes_pending to prevent the read on the folio from
> * being ended prematurely.
> @@ -623,27 +556,6 @@ void iomap_readahead(const struct iomap_ops *ops,
> }
> EXPORT_SYMBOL_GPL(iomap_readahead);
>
> -static int iomap_bio_read_folio_range_sync(const struct iomap_iter *iter,
> - struct folio *folio, loff_t pos, size_t len)
> -{
> - const struct iomap *srcmap = iomap_iter_srcmap(iter);
> - struct bio_vec bvec;
> - struct bio bio;
> -
> - bio_init(&bio, srcmap->bdev, &bvec, 1, REQ_OP_READ);
> - bio.bi_iter.bi_sector = iomap_sector(srcmap, pos);
> - bio_add_folio_nofail(&bio, folio, len, offset_in_folio(folio, pos));
> - return submit_bio_wait(&bio);
> -}
> -#else
> -static int iomap_bio_read_folio_range_sync(const struct iomap_iter *iter,
> - struct folio *folio, loff_t pos, size_t len)
> -{
> - WARN_ON_ONCE(1);
> - return -EIO;
> -}
> -#endif /* CONFIG_BLOCK */
> -
> /*
> * iomap_is_partially_uptodate checks whether blocks within a folio are
> * uptodate or not.
> diff --git a/fs/iomap/internal.h b/fs/iomap/internal.h
> index d05cb3aed96e..7ab1033ab4b7 100644
> --- a/fs/iomap/internal.h
> +++ b/fs/iomap/internal.h
> @@ -6,4 +6,16 @@
>
> u32 iomap_finish_ioend_direct(struct iomap_ioend *ioend);
>
> +#ifdef CONFIG_BLOCK
> +int iomap_bio_read_folio_range_sync(const struct iomap_iter *iter,
> + struct folio *folio, loff_t pos, size_t len);
> +#else
> +int iomap_bio_read_folio_range_sync(const struct iomap_iter *iter,
> + struct folio *folio, loff_t pos, size_t len)
> +{
> + WARN_ON_ONCE(1);
> + return -EIO;
> +}
> +#endif /* CONFIG_BLOCK */
> +
> #endif /* _IOMAP_INTERNAL_H */
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 14/15] fuse: use iomap for readahead
2025-09-16 23:44 ` [PATCH v3 14/15] fuse: use iomap for readahead Joanne Koong
@ 2025-09-18 22:35 ` Darrick J. Wong
0 siblings, 0 replies; 40+ messages in thread
From: Darrick J. Wong @ 2025-09-18 22:35 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
On Tue, Sep 16, 2025 at 04:44:24PM -0700, Joanne Koong wrote:
> Do readahead in fuse using iomap. This gives us granular uptodate
> tracking for large folios, which optimizes how much data needs to be
> read in. If some portions of the folio are already uptodate (eg through
> a prior write), we only need to read in the non-uptodate portions.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Looks generally ok to me,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/fuse/file.c | 220 ++++++++++++++++++++++++++++---------------------
> 1 file changed, 124 insertions(+), 96 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 4f27a3b0c20a..db0b1f20fee4 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -844,8 +844,65 @@ static const struct iomap_ops fuse_iomap_ops = {
>
> struct fuse_fill_read_data {
> struct file *file;
> +
> + /* Fields below are used if sending the read request asynchronously */
> + struct fuse_conn *fc;
> + struct fuse_io_args *ia;
> + unsigned int nr_bytes;
> };
>
> +/* forward declarations */
> +static bool fuse_folios_need_send(struct fuse_conn *fc, loff_t pos,
> + unsigned len, struct fuse_args_pages *ap,
> + unsigned cur_bytes, bool write);
> +static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
> + unsigned int count, bool async);
> +
> +static int fuse_handle_readahead(struct folio *folio,
> + struct readahead_control *rac,
> + struct fuse_fill_read_data *data, loff_t pos,
> + size_t len)
> +{
> + struct fuse_io_args *ia = data->ia;
> + size_t off = offset_in_folio(folio, pos);
> + struct fuse_conn *fc = data->fc;
> + struct fuse_args_pages *ap;
> + unsigned int nr_pages;
> +
> + if (ia && fuse_folios_need_send(fc, pos, len, &ia->ap, data->nr_bytes,
> + false)) {
> + fuse_send_readpages(ia, data->file, data->nr_bytes,
> + fc->async_read);
> + data->nr_bytes = 0;
> + data->ia = NULL;
> + ia = NULL;
> + }
> + if (!ia) {
> + if (fc->num_background >= fc->congestion_threshold &&
> + rac->ra->async_size >= readahead_count(rac))
> + /*
> + * Congested and only async pages left, so skip the
> + * rest.
> + */
> + return -EAGAIN;
> +
> + nr_pages = min(fc->max_pages, readahead_count(rac));
> + data->ia = fuse_io_alloc(NULL, nr_pages);
> + if (!data->ia)
> + return -ENOMEM;
> + ia = data->ia;
> + }
> + folio_get(folio);
> + ap = &ia->ap;
> + ap->folios[ap->num_folios] = folio;
> + ap->descs[ap->num_folios].offset = off;
> + ap->descs[ap->num_folios].length = len;
> + data->nr_bytes += len;
> + ap->num_folios++;
> +
> + return 0;
> +}
> +
> static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
> struct iomap_read_folio_ctx *ctx,
> size_t len)
> @@ -857,18 +914,40 @@ static int fuse_iomap_read_folio_range_async(const struct iomap_iter *iter,
> struct file *file = data->file;
> int ret;
>
> - /*
> - * for non-readahead read requests, do reads synchronously since
> - * it's not guaranteed that the server can handle out-of-order reads
> - */
> iomap_start_folio_read(folio, len);
> - ret = fuse_do_readfolio(file, folio, off, len);
> - iomap_finish_folio_read(folio, off, len, ret);
> + if (ctx->rac) {
> + ret = fuse_handle_readahead(folio, ctx->rac, data, pos, len);
> + /*
> + * If fuse_handle_readahead was successful, fuse_readpages_end
> + * will do the iomap_finish_folio_read, else we need to call it
> + * here
> + */
> + if (ret)
> + iomap_finish_folio_read(folio, off, len, ret);
> + } else {
> + /*
> + * for non-readahead read requests, do reads synchronously
> + * since it's not guaranteed that the server can handle
> + * out-of-order reads
> + */
> + ret = fuse_do_readfolio(file, folio, off, len);
> + iomap_finish_folio_read(folio, off, len, ret);
> + }
> return ret;
> }
>
> +static void fuse_iomap_read_submit(struct iomap_read_folio_ctx *ctx)
> +{
> + struct fuse_fill_read_data *data = ctx->read_ctx;
> +
> + if (data->ia)
> + fuse_send_readpages(data->ia, data->file, data->nr_bytes,
> + data->fc->async_read);
> +}
> +
> static const struct iomap_read_ops fuse_iomap_read_ops = {
> .read_folio_range = fuse_iomap_read_folio_range_async,
> + .submit_read = fuse_iomap_read_submit,
> };
>
> static int fuse_read_folio(struct file *file, struct folio *folio)
> @@ -930,7 +1009,8 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
> }
>
> for (i = 0; i < ap->num_folios; i++) {
> - folio_end_read(ap->folios[i], !err);
> + iomap_finish_folio_read(ap->folios[i], ap->descs[i].offset,
> + ap->descs[i].length, err);
> folio_put(ap->folios[i]);
> }
> if (ia->ff)
> @@ -940,7 +1020,7 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
> }
>
> static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
> - unsigned int count)
> + unsigned int count, bool async)
> {
> struct fuse_file *ff = file->private_data;
> struct fuse_mount *fm = ff->fm;
> @@ -962,7 +1042,7 @@ static void fuse_send_readpages(struct fuse_io_args *ia, struct file *file,
>
> fuse_read_args_fill(ia, file, pos, count, FUSE_READ);
> ia->read.attr_ver = fuse_get_attr_version(fm->fc);
> - if (fm->fc->async_read) {
> + if (async) {
> ia->ff = fuse_file_get(ff);
> ap->args.end = fuse_readpages_end;
> err = fuse_simple_background(fm, &ap->args, GFP_KERNEL);
> @@ -979,81 +1059,20 @@ static void fuse_readahead(struct readahead_control *rac)
> {
> struct inode *inode = rac->mapping->host;
> struct fuse_conn *fc = get_fuse_conn(inode);
> - unsigned int max_pages, nr_pages;
> - struct folio *folio = NULL;
> + struct fuse_fill_read_data data = {
> + .file = rac->file,
> + .fc = fc,
> + };
> + struct iomap_read_folio_ctx ctx = {
> + .ops = &fuse_iomap_read_ops,
> + .rac = rac,
> + .read_ctx = &data
> + };
>
> if (fuse_is_bad(inode))
> return;
>
> - max_pages = min_t(unsigned int, fc->max_pages,
> - fc->max_read / PAGE_SIZE);
> -
> - /*
> - * This is only accurate the first time through, since readahead_folio()
> - * doesn't update readahead_count() from the previous folio until the
> - * next call. Grab nr_pages here so we know how many pages we're going
> - * to have to process. This means that we will exit here with
> - * readahead_count() == folio_nr_pages(last_folio), but we will have
> - * consumed all of the folios, and read_pages() will call
> - * readahead_folio() again which will clean up the rac.
> - */
> - nr_pages = readahead_count(rac);
> -
> - while (nr_pages) {
> - struct fuse_io_args *ia;
> - struct fuse_args_pages *ap;
> - unsigned cur_pages = min(max_pages, nr_pages);
> - unsigned int pages = 0;
> -
> - if (fc->num_background >= fc->congestion_threshold &&
> - rac->ra->async_size >= readahead_count(rac))
> - /*
> - * Congested and only async pages left, so skip the
> - * rest.
> - */
> - break;
> -
> - ia = fuse_io_alloc(NULL, cur_pages);
> - if (!ia)
> - break;
> - ap = &ia->ap;
> -
> - while (pages < cur_pages) {
> - unsigned int folio_pages;
> -
> - /*
> - * This returns a folio with a ref held on it.
> - * The ref needs to be held until the request is
> - * completed, since the splice case (see
> - * fuse_try_move_page()) drops the ref after it's
> - * replaced in the page cache.
> - */
> - if (!folio)
> - folio = __readahead_folio(rac);
> -
> - folio_pages = folio_nr_pages(folio);
> - if (folio_pages > cur_pages - pages) {
> - /*
> - * Large folios belonging to fuse will never
> - * have more pages than max_pages.
> - */
> - WARN_ON(!pages);
> - break;
> - }
> -
> - ap->folios[ap->num_folios] = folio;
> - ap->descs[ap->num_folios].length = folio_size(folio);
> - ap->num_folios++;
> - pages += folio_pages;
> - folio = NULL;
> - }
> - fuse_send_readpages(ia, rac->file, pages << PAGE_SHIFT);
> - nr_pages -= pages;
> - }
> - if (folio) {
> - folio_end_read(folio, false);
> - folio_put(folio);
> - }
> + iomap_readahead(&fuse_iomap_ops, &ctx);
> }
>
> static ssize_t fuse_cache_read_iter(struct kiocb *iocb, struct iov_iter *to)
> @@ -2084,7 +2103,7 @@ struct fuse_fill_wb_data {
> struct fuse_file *ff;
> unsigned int max_folios;
> /*
> - * nr_bytes won't overflow since fuse_writepage_need_send() caps
> + * nr_bytes won't overflow since fuse_folios_need_send() caps
> * wb requests to never exceed fc->max_pages (which has an upper bound
> * of U16_MAX).
> */
> @@ -2129,14 +2148,15 @@ static void fuse_writepages_send(struct inode *inode,
> spin_unlock(&fi->lock);
> }
>
> -static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
> - unsigned len, struct fuse_args_pages *ap,
> - struct fuse_fill_wb_data *data)
> +static bool fuse_folios_need_send(struct fuse_conn *fc, loff_t pos,
> + unsigned len, struct fuse_args_pages *ap,
> + unsigned cur_bytes, bool write)
> {
> struct folio *prev_folio;
> struct fuse_folio_desc prev_desc;
> - unsigned bytes = data->nr_bytes + len;
> + unsigned bytes = cur_bytes + len;
> loff_t prev_pos;
> + size_t max_bytes = write ? fc->max_write : fc->max_read;
>
> WARN_ON(!ap->num_folios);
>
> @@ -2144,8 +2164,7 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
> if ((bytes + PAGE_SIZE - 1) >> PAGE_SHIFT > fc->max_pages)
> return true;
>
> - /* Reached max write bytes */
> - if (bytes > fc->max_write)
> + if (bytes > max_bytes)
> return true;
>
> /* Discontinuity */
> @@ -2155,11 +2174,6 @@ static bool fuse_writepage_need_send(struct fuse_conn *fc, loff_t pos,
> if (prev_pos != pos)
> return true;
>
> - /* Need to grow the pages array? If so, did the expansion fail? */
> - if (ap->num_folios == data->max_folios &&
> - !fuse_pages_realloc(data, fc->max_pages))
> - return true;
> -
> return false;
> }
>
> @@ -2183,10 +2197,24 @@ static ssize_t fuse_iomap_writeback_range(struct iomap_writepage_ctx *wpc,
> return -EIO;
> }
>
> - if (wpa && fuse_writepage_need_send(fc, pos, len, ap, data)) {
> - fuse_writepages_send(inode, data);
> - data->wpa = NULL;
> - data->nr_bytes = 0;
> + if (wpa) {
> + bool send = fuse_folios_need_send(fc, pos, len, ap,
> + data->nr_bytes, true);
> +
> + if (!send) {
> + /*
> + * Need to grow the pages array? If so, did the
> + * expansion fail?
> + */
> + send = (ap->num_folios == data->max_folios) &&
> + !fuse_pages_realloc(data, fc->max_pages);
> + }
> +
> + if (send) {
> + fuse_writepages_send(inode, data);
> + data->wpa = NULL;
> + data->nr_bytes = 0;
> + }
> }
>
> if (data->wpa == NULL) {
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 15/15] fuse: remove fc->blkbits workaround for partial writes
2025-09-16 23:44 ` [PATCH v3 15/15] fuse: remove fc->blkbits workaround for partial writes Joanne Koong
@ 2025-09-18 22:35 ` Darrick J. Wong
0 siblings, 0 replies; 40+ messages in thread
From: Darrick J. Wong @ 2025-09-18 22:35 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
On Tue, Sep 16, 2025 at 04:44:25PM -0700, Joanne Koong wrote:
> Now that fuse is integrated with iomap for read/readahead, we can remove
> the workaround that was added in commit bd24d2108e9c ("fuse: fix fuseblk
> i_blkbits for iomap partial writes"), which was previously needed to
> avoid a race condition where an iomap partial write may be overwritten
> by a read if blocksize < PAGE_SIZE. Now that fuse does iomap
> read/readahead, this is protected against since there is granular
> uptodate tracking of blocks, which means this workaround can be removed.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Oh goody!
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/fuse/dir.c | 2 +-
> fs/fuse/fuse_i.h | 8 --------
> fs/fuse/inode.c | 13 +------------
> 3 files changed, 2 insertions(+), 21 deletions(-)
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 5c569c3cb53f..ebee7e0b1cd3 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1199,7 +1199,7 @@ static void fuse_fillattr(struct mnt_idmap *idmap, struct inode *inode,
> if (attr->blksize != 0)
> blkbits = ilog2(attr->blksize);
> else
> - blkbits = fc->blkbits;
> + blkbits = inode->i_sb->s_blocksize_bits;
>
> stat->blksize = 1 << blkbits;
> }
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index cc428d04be3e..1647eb7ca6fa 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -975,14 +975,6 @@ struct fuse_conn {
> /* Request timeout (in jiffies). 0 = no timeout */
> unsigned int req_timeout;
> } timeout;
> -
> - /*
> - * This is a workaround until fuse uses iomap for reads.
> - * For fuseblk servers, this represents the blocksize passed in at
> - * mount time and for regular fuse servers, this is equivalent to
> - * inode->i_blkbits.
> - */
> - u8 blkbits;
> };
>
> /*
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index fdecd5a90dee..5899a47faaef 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -292,7 +292,7 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
> if (attr->blksize)
> fi->cached_i_blkbits = ilog2(attr->blksize);
> else
> - fi->cached_i_blkbits = fc->blkbits;
> + fi->cached_i_blkbits = inode->i_sb->s_blocksize_bits;
>
> /*
> * Don't set the sticky bit in i_mode, unless we want the VFS
> @@ -1810,21 +1810,10 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
> err = -EINVAL;
> if (!sb_set_blocksize(sb, ctx->blksize))
> goto err;
> - /*
> - * This is a workaround until fuse hooks into iomap for reads.
> - * Use PAGE_SIZE for the blocksize else if the writeback cache
> - * is enabled, buffered writes go through iomap and a read may
> - * overwrite partially written data if blocksize < PAGE_SIZE
> - */
> - fc->blkbits = sb->s_blocksize_bits;
> - if (ctx->blksize != PAGE_SIZE &&
> - !sb_set_blocksize(sb, PAGE_SIZE))
> - goto err;
> #endif
> } else {
> sb->s_blocksize = PAGE_SIZE;
> sb->s_blocksize_bits = PAGE_SHIFT;
> - fc->blkbits = sb->s_blocksize_bits;
> }
>
> sb->s_subtype = ctx->subtype;
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 11/15] iomap: move buffered io bio logic into new file
2025-09-16 23:44 ` [PATCH v3 11/15] iomap: move buffered io bio logic into new file Joanne Koong
2025-09-17 21:40 ` kernel test robot
2025-09-18 22:31 ` Darrick J. Wong
@ 2025-09-19 15:32 ` Christoph Hellwig
2 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2025-09-19 15:32 UTC (permalink / raw)
To: Joanne Koong
Cc: brauner, miklos, hch, djwong, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Tue, Sep 16, 2025 at 04:44:21PM -0700, Joanne Koong wrote:
> Move bio logic in the buffered io code into its own file and remove
> CONFIG_BLOCK gating for iomap read/readahead.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
If you want to add my signoff as the first one it also needs me as
the From: line. I don't really care either way, we just need to be
consistent.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 11/15] iomap: move buffered io bio logic into new file
2025-09-18 22:31 ` Darrick J. Wong
@ 2025-09-19 15:33 ` Christoph Hellwig
0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2025-09-19 15:33 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Joanne Koong, brauner, miklos, hch, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc
On Thu, Sep 18, 2025 at 03:31:31PM -0700, Darrick J. Wong wrote:
> On Tue, Sep 16, 2025 at 04:44:21PM -0700, Joanne Koong wrote:
> > Move bio logic in the buffered io code into its own file and remove
> > CONFIG_BLOCK gating for iomap read/readahead.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>
> /me wonders how long until we end up doing the same thing with the
> directio code but in the meantime it beats #ifdef everywhere
So far the direct I/O code depends on CONFIG_BLOCK. It feels a bit
more tied to the block code to me, but if someone finds other uses
for it we'll get to it eventually.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 07/15] iomap: track read/readahead folio ownership internally
2025-09-18 21:49 ` Darrick J. Wong
@ 2025-09-19 18:14 ` Joanne Koong
0 siblings, 0 replies; 40+ messages in thread
From: Joanne Koong @ 2025-09-19 18:14 UTC (permalink / raw)
To: Darrick J. Wong
Cc: brauner, miklos, hch, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
On Thu, Sep 18, 2025 at 2:49 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Tue, Sep 16, 2025 at 04:44:17PM -0700, Joanne Koong wrote:
> > The purpose of "struct iomap_read_folio_ctx->cur_folio_in_bio" is to
> > track folio ownership to know who is responsible for unlocking it.
> > Rename "cur_folio_in_bio" to "cur_folio_owned" to better reflect this
> > purpose and so that this can be generically used later on by filesystems
> > that are not block-based.
>
> Hrmm, well if this is becoming a private variable, then I'd shorten it
> to "folio_owned" since it's not really in the ctx/cur anymore, but I
> might be bikeshedding now. :)
Oo I'll do this for ->read_folio (and for the arg name in
iomap_read_folio_iter()) since there's only 1 folio which makes it
obvious which folio it is, but for ->readahead I'll keep it as
"cur_folio_owned" to make it more obvious that the folio owned state
refers to the folio in ctx->cur_folio.
>
> > Since "struct iomap_read_folio_ctx" will be made a public interface
> > later on when read/readahead takes in caller-provided callbacks, track
> > the folio ownership state internally instead of exposing it in "struct
> > iomap_read_folio_ctx" to make the interface simpler for end users.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> > fs/iomap/buffered-io.c | 34 +++++++++++++++++++++++-----------
> > 1 file changed, 23 insertions(+), 11 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 6c5a631848b7..587bbdbd24bc 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -352,7 +352,6 @@ static void iomap_read_end_io(struct bio *bio)
> >
> > struct iomap_read_folio_ctx {
> > struct folio *cur_folio;
> > - bool cur_folio_in_bio;
> > void *read_ctx;
> > struct readahead_control *rac;
> > };
> > @@ -376,7 +375,6 @@ static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
> > sector_t sector;
> > struct bio *bio = ctx->read_ctx;
> >
> > - ctx->cur_folio_in_bio = true;
> > if (ifs) {
> > spin_lock_irq(&ifs->state_lock);
> > ifs->read_bytes_pending += plen;
> > @@ -413,7 +411,7 @@ static void iomap_bio_read_folio_range(const struct iomap_iter *iter,
> > }
> >
> > static int iomap_read_folio_iter(struct iomap_iter *iter,
> > - struct iomap_read_folio_ctx *ctx)
> > + struct iomap_read_folio_ctx *ctx, bool *cur_folio_owned)
> > {
> > const struct iomap *iomap = &iter->iomap;
> > loff_t pos = iter->pos;
> > @@ -450,6 +448,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> > folio_zero_range(folio, poff, plen);
> > iomap_set_range_uptodate(folio, poff, plen);
> > } else {
> > + *cur_folio_owned = true;
> > iomap_bio_read_folio_range(iter, ctx, pos, plen);
> > }
> >
> > @@ -472,16 +471,22 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
> > struct iomap_read_folio_ctx ctx = {
> > .cur_folio = folio,
> > };
> > + /*
> > + * If an external IO helper takes ownership of the folio, it is
> > + * responsible for unlocking it when the read completes.
>
> Not sure what "external" means here -- I think for your project it means
> a custom folio read method supplied by a filesystem, but for the exist
> code (xfs submitting unaltered bios), that part is still mostly internal
> to iomap.
>
> If we were *only* refactoring code I would suggest s/external/async/
> because that's what the bio code does, but a filesystem supplying its
> own folio read function could very well fill the folio synchronously and
> it'd still have to unlock the folio.
>
> Maybe just get rid of the word external? The rest of the code changes
> look fine to me.
Sounds good, I'll get rid of "external".
Thanks for reviewing this patchset!
>
> --D
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 10/15] iomap: add bias for async read requests
2025-09-18 22:30 ` Darrick J. Wong
@ 2025-09-19 18:34 ` Joanne Koong
2025-09-22 18:33 ` Christoph Hellwig
1 sibling, 0 replies; 40+ messages in thread
From: Joanne Koong @ 2025-09-19 18:34 UTC (permalink / raw)
To: Darrick J. Wong
Cc: brauner, miklos, hch, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
On Thu, Sep 18, 2025 at 3:30 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Tue, Sep 16, 2025 at 04:44:20PM -0700, Joanne Koong wrote:
> > Non-block-based filesystems will be using iomap read/readahead. If they
> > handle reading in ranges asynchronously and fulfill those read requests
> > on an ongoing basis (instead of all together at the end), then there is
> > the possibility that the read on the folio may be prematurely ended if
> > earlier async requests complete before the later ones have been issued.
> >
> > For example if there is a large folio and a readahead request for 16
> > pages in that folio, if doing readahead on those 16 pages is split into
> > 4 async requests and the first request is sent off and then completed
> > before we have sent off the second request, then when the first request
> > calls iomap_finish_folio_read(), ifs->read_bytes_pending would be 0,
> > which would end the read and unlock the folio prematurely.
> >
> > To mitigate this, a "bias" is added to ifs->read_bytes_pending before
> > the first range is forwarded to the caller and removed after the last
> > range has been forwarded.
> >
> > iomap writeback does this with their async requests as well to prevent
> > prematurely ending writeback.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> > fs/iomap/buffered-io.c | 55 ++++++++++++++++++++++++++++++++++++------
> > 1 file changed, 47 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 561378f2b9bb..667a49cb5ae5 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -420,6 +420,38 @@ const struct iomap_read_ops iomap_bio_read_ops = {
> > };
> > EXPORT_SYMBOL_GPL(iomap_bio_read_ops);
> >
> > +/*
> > + * Add a bias to ifs->read_bytes_pending to prevent the read on the folio from
> > + * being ended prematurely.
> > + *
> > + * Otherwise, if the ranges are read asynchronously and read requests are
> > + * fulfilled on an ongoing basis, there is the possibility that the read on the
> > + * folio may be prematurely ended if earlier async requests complete before the
> > + * later ones have been issued.
> > + */
> > +static void iomap_read_add_bias(struct folio *folio)
> > +{
> > + iomap_start_folio_read(folio, 1);
>
> I wonder, could you achieve the same effect by elevating
> read_bytes_pending by the number of bytes that we think we have to read,
> and subtracting from it as the completions come in or we decide that no
> read is necessary?
>
This is an interesting idea and I think it works (eg we set
read_bytes_pending to the folio size, keep track of how many
non-uptodate bytes are read in, then at the end subtract
read_bytes_pending by folio_size - bytes_read_in). Personally I find
this bias incrementing/decrementing by 1 approach simplest and easier
to read and reason about, but maybe I'm just biased (pun intended, I
guess :P). I don't feel strongly about this so if you do, I'm happy to
change this.
> (That might just be overthinking the plumbing though)
>
> --D
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 10/15] iomap: add bias for async read requests
2025-09-18 22:30 ` Darrick J. Wong
2025-09-19 18:34 ` Joanne Koong
@ 2025-09-22 18:33 ` Christoph Hellwig
2025-09-22 20:54 ` Matthew Wilcox
1 sibling, 1 reply; 40+ messages in thread
From: Christoph Hellwig @ 2025-09-22 18:33 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Joanne Koong, brauner, miklos, hch, hsiangkao, linux-block, gfs2,
linux-fsdevel, kernel-team, linux-xfs, linux-doc, Ritesh Harjani,
Matthew Wilcow
On Thu, Sep 18, 2025 at 03:30:18PM -0700, Darrick J. Wong wrote:
> > + iomap_start_folio_read(folio, 1);
>
> I wonder, could you achieve the same effect by elevating
> read_bytes_pending by the number of bytes that we think we have to read,
> and subtracting from it as the completions come in or we decide that no
> read is necessary?
Weren't we going to look into something like that anyway to stop
the read code from building bios larger than the map to support the
extN boundary conditions? I'm trying to find the details of that,
IIRC willy suggested it. Because once we touch this area for
non-trivial changes it might be a good idea to get that done, or at
least do the prep work.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 10/15] iomap: add bias for async read requests
2025-09-22 18:33 ` Christoph Hellwig
@ 2025-09-22 20:54 ` Matthew Wilcox
2025-09-24 22:56 ` Joanne Koong
0 siblings, 1 reply; 40+ messages in thread
From: Matthew Wilcox @ 2025-09-22 20:54 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Darrick J. Wong, Joanne Koong, brauner, miklos, hsiangkao,
linux-block, gfs2, linux-fsdevel, kernel-team, linux-xfs,
linux-doc, Ritesh Harjani
On Mon, Sep 22, 2025 at 11:33:54AM -0700, Christoph Hellwig wrote:
> On Thu, Sep 18, 2025 at 03:30:18PM -0700, Darrick J. Wong wrote:
> > > + iomap_start_folio_read(folio, 1);
> >
> > I wonder, could you achieve the same effect by elevating
> > read_bytes_pending by the number of bytes that we think we have to read,
> > and subtracting from it as the completions come in or we decide that no
> > read is necessary?
>
> Weren't we going to look into something like that anyway to stop
> the read code from building bios larger than the map to support the
> extN boundary conditions? I'm trying to find the details of that,
> IIRC willy suggested it. Because once we touch this area for
> non-trivial changes it might be a good idea to get that done, or at
> least do the prep work.
Yes, I did suggest it. Basically, we would initialise read_bytes_pending
to folio_size(), then subtract from it either when a request comes in
or we decide to memset a hole. When it reaches zero, we have decided
on the fate of every byte in the folio.
It's fewer atomics for folios which contain no holes, which is the case
we should be optimising for anyway.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 04/15] iomap: iterate over entire folio in iomap_readpage_iter()
2025-09-16 23:44 ` [PATCH v3 04/15] iomap: iterate over entire folio in iomap_readpage_iter() Joanne Koong
2025-09-18 21:37 ` Darrick J. Wong
@ 2025-09-22 22:33 ` Joanne Koong
1 sibling, 0 replies; 40+ messages in thread
From: Joanne Koong @ 2025-09-22 22:33 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc, Christoph Hellwig
On Tue, Sep 16, 2025 at 4:50 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> Iterate over all non-uptodate ranges in a single call to
> iomap_readpage_iter() instead of leaving the partial folio iteration to
> the caller.
>
> This will be useful for supporting caller-provided async folio read
> callbacks (added in later commit) because that will require tracking
> when the first and last async read request for a folio is sent, in order
> to prevent premature read completion of the folio.
>
> This additionally makes the iomap_readahead_iter() logic a bit simpler.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/iomap/buffered-io.c | 69 ++++++++++++++++++++----------------------
> 1 file changed, 32 insertions(+), 37 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 2a1709e0757b..0c4ba2a63490 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -420,6 +420,7 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
> loff_t length = iomap_length(iter);
> struct folio *folio = ctx->cur_folio;
> size_t poff, plen;
> + loff_t count;
> int ret;
>
> if (iomap->type == IOMAP_INLINE) {
> @@ -431,39 +432,33 @@ static int iomap_readpage_iter(struct iomap_iter *iter,
>
> /* zero post-eof blocks as the page may be mapped */
> ifs_alloc(iter->inode, folio, iter->flags);
> - iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen);
> - if (plen == 0)
> - goto done;
>
> - if (iomap_block_needs_zeroing(iter, pos)) {
> - folio_zero_range(folio, poff, plen);
> - iomap_set_range_uptodate(folio, poff, plen);
> - } else {
> - iomap_bio_read_folio_range(iter, ctx, pos, plen);
> - }
> + length = min_t(loff_t, length,
> + folio_size(folio) - offset_in_folio(folio, pos));
> + while (length) {
> + iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff,
> + &plen);
>
> -done:
> - /*
> - * Move the caller beyond our range so that it keeps making progress.
> - * For that, we have to include any leading non-uptodate ranges, but
> - * we can skip trailing ones as they will be handled in the next
> - * iteration.
> - */
> - length = pos - iter->pos + plen;
> - return iomap_iter_advance(iter, &length);
> -}
> + count = pos - iter->pos + plen;
> + if (WARN_ON_ONCE(count > length))
> + return -EIO;
>
> -static int iomap_read_folio_iter(struct iomap_iter *iter,
> - struct iomap_readpage_ctx *ctx)
> -{
> - int ret;
> + if (plen == 0)
> + return iomap_iter_advance(iter, &count);
>
> - while (iomap_length(iter)) {
> - ret = iomap_readpage_iter(iter, ctx);
> + if (iomap_block_needs_zeroing(iter, pos)) {
> + folio_zero_range(folio, poff, plen);
> + iomap_set_range_uptodate(folio, poff, plen);
> + } else {
> + iomap_bio_read_folio_range(iter, ctx, pos, plen);
> + }
> +
> + length -= count;
> + ret = iomap_iter_advance(iter, &count);
> if (ret)
> return ret;
> + pos = iter->pos;
> }
> -
> return 0;
> }
>
> @@ -482,7 +477,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
> trace_iomap_readpage(iter.inode, 1);
>
> while ((ret = iomap_iter(&iter, ops)) > 0)
> - iter.status = iomap_read_folio_iter(&iter, &ctx);
> + iter.status = iomap_readpage_iter(&iter, &ctx);
>
> iomap_bio_submit_read(&ctx);
>
> @@ -504,16 +499,16 @@ static int iomap_readahead_iter(struct iomap_iter *iter,
> int ret;
>
> while (iomap_length(iter)) {
> - if (ctx->cur_folio &&
> - offset_in_folio(ctx->cur_folio, iter->pos) == 0) {
> - if (!ctx->cur_folio_in_bio)
> - folio_unlock(ctx->cur_folio);
> - ctx->cur_folio = NULL;
> - }
> - if (!ctx->cur_folio) {
> - ctx->cur_folio = readahead_folio(ctx->rac);
> - ctx->cur_folio_in_bio = false;
> - }
> + if (ctx->cur_folio && !ctx->cur_folio_in_bio)
> + folio_unlock(ctx->cur_folio);
> + ctx->cur_folio = readahead_folio(ctx->rac);
Unfortunately, this logic simplification here doesn't work. It still
needs to check "offset_in_folio() == 0" because the iomap mapping may
only map in part of the folio, in which case the next round of
iomap_iter() should still operate on the same folio. I'll make this
change in v4.
> + /*
> + * We should never in practice hit this case since the iter
> + * length matches the readahead length.
> + */
> + if (WARN_ON_ONCE(!ctx->cur_folio))
> + return -EINVAL;
> + ctx->cur_folio_in_bio = false;
> ret = iomap_readpage_iter(iter, ctx);
> if (ret)
> return ret;
> --
> 2.47.3
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 10/15] iomap: add bias for async read requests
2025-09-16 23:44 ` [PATCH v3 10/15] iomap: add bias for async read requests Joanne Koong
2025-09-18 22:30 ` Darrick J. Wong
@ 2025-09-22 23:19 ` Joanne Koong
1 sibling, 0 replies; 40+ messages in thread
From: Joanne Koong @ 2025-09-22 23:19 UTC (permalink / raw)
To: brauner, miklos
Cc: hch, djwong, hsiangkao, linux-block, gfs2, linux-fsdevel,
kernel-team, linux-xfs, linux-doc
On Tue, Sep 16, 2025 at 4:50 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> Non-block-based filesystems will be using iomap read/readahead. If they
> handle reading in ranges asynchronously and fulfill those read requests
> on an ongoing basis (instead of all together at the end), then there is
> the possibility that the read on the folio may be prematurely ended if
> earlier async requests complete before the later ones have been issued.
>
> For example if there is a large folio and a readahead request for 16
> pages in that folio, if doing readahead on those 16 pages is split into
> 4 async requests and the first request is sent off and then completed
> before we have sent off the second request, then when the first request
> calls iomap_finish_folio_read(), ifs->read_bytes_pending would be 0,
> which would end the read and unlock the folio prematurely.
>
> To mitigate this, a "bias" is added to ifs->read_bytes_pending before
> the first range is forwarded to the caller and removed after the last
> range has been forwarded.
>
> iomap writeback does this with their async requests as well to prevent
> prematurely ending writeback.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/iomap/buffered-io.c | 55 ++++++++++++++++++++++++++++++++++++------
> 1 file changed, 47 insertions(+), 8 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 561378f2b9bb..667a49cb5ae5 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -420,6 +420,38 @@ const struct iomap_read_ops iomap_bio_read_ops = {
> };
> EXPORT_SYMBOL_GPL(iomap_bio_read_ops);
>
> +/*
> + * Add a bias to ifs->read_bytes_pending to prevent the read on the folio from
> + * being ended prematurely.
> + *
> + * Otherwise, if the ranges are read asynchronously and read requests are
> + * fulfilled on an ongoing basis, there is the possibility that the read on the
> + * folio may be prematurely ended if earlier async requests complete before the
> + * later ones have been issued.
> + */
> +static void iomap_read_add_bias(struct folio *folio)
> +{
> + iomap_start_folio_read(folio, 1);
> +}
> +
> +static void iomap_read_remove_bias(struct folio *folio, bool *cur_folio_owned)
> +{
> + struct iomap_folio_state *ifs = folio->private;
> + bool finished, uptodate;
> +
> + if (ifs) {
> + spin_lock_irq(&ifs->state_lock);
> + ifs->read_bytes_pending -= 1;
> + finished = !ifs->read_bytes_pending;
> + if (finished)
> + uptodate = ifs_is_fully_uptodate(folio, ifs);
> + spin_unlock_irq(&ifs->state_lock);
> + if (finished)
> + folio_end_read(folio, uptodate);
> + *cur_folio_owned = true;
> + }
> +}
> +
> static int iomap_read_folio_iter(struct iomap_iter *iter,
> struct iomap_read_folio_ctx *ctx, bool *cur_folio_owned)
> {
> @@ -429,7 +461,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> struct folio *folio = ctx->cur_folio;
> size_t poff, plen;
> loff_t delta;
> - int ret;
> + int ret = 0;
>
> if (iomap->type == IOMAP_INLINE) {
> ret = iomap_read_inline_data(iter, folio);
> @@ -441,6 +473,8 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> /* zero post-eof blocks as the page may be mapped */
> ifs_alloc(iter->inode, folio, iter->flags);
>
> + iomap_read_add_bias(folio);
Same here, it's not guaranteed that the whole folio is parsed here
because the current iomap mapping may only have part of the folio
mapped. The bias needs to be added before the first iomap_iter() call
and removed after all iomap_iter() calls are complete. I'll make this
change for v4.
> +
> length = min_t(loff_t, length,
> folio_size(folio) - offset_in_folio(folio, pos));
> while (length) {
> @@ -448,16 +482,18 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> &plen);
>
> delta = pos - iter->pos;
> - if (WARN_ON_ONCE(delta + plen > length))
> - return -EIO;
> + if (WARN_ON_ONCE(delta + plen > length)) {
> + ret = -EIO;
> + break;
> + }
> length -= delta + plen;
>
> ret = iomap_iter_advance(iter, &delta);
> if (ret)
> - return ret;
> + break;
>
> if (plen == 0)
> - return 0;
> + break;
>
> if (iomap_block_needs_zeroing(iter, pos)) {
> folio_zero_range(folio, poff, plen);
> @@ -466,16 +502,19 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> *cur_folio_owned = true;
> ret = ctx->ops->read_folio_range(iter, ctx, plen);
> if (ret)
> - return ret;
> + break;
> }
>
> delta = plen;
> ret = iomap_iter_advance(iter, &delta);
> if (ret)
> - return ret;
> + break;
> pos = iter->pos;
> }
> - return 0;
> +
> + iomap_read_remove_bias(folio, cur_folio_owned);
> +
> + return ret;
> }
>
> int iomap_read_folio(const struct iomap_ops *ops,
> --
> 2.47.3
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3 10/15] iomap: add bias for async read requests
2025-09-22 20:54 ` Matthew Wilcox
@ 2025-09-24 22:56 ` Joanne Koong
0 siblings, 0 replies; 40+ messages in thread
From: Joanne Koong @ 2025-09-24 22:56 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Christoph Hellwig, Darrick J. Wong, brauner, miklos, hsiangkao,
linux-block, gfs2, linux-fsdevel, kernel-team, linux-xfs,
linux-doc, Ritesh Harjani
On Mon, Sep 22, 2025 at 1:54 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Sep 22, 2025 at 11:33:54AM -0700, Christoph Hellwig wrote:
> > On Thu, Sep 18, 2025 at 03:30:18PM -0700, Darrick J. Wong wrote:
> > > > + iomap_start_folio_read(folio, 1);
> > >
> > > I wonder, could you achieve the same effect by elevating
> > > read_bytes_pending by the number of bytes that we think we have to read,
> > > and subtracting from it as the completions come in or we decide that no
> > > read is necessary?
> >
> > Weren't we going to look into something like that anyway to stop
> > the read code from building bios larger than the map to support the
> > extN boundary conditions? I'm trying to find the details of that,
Doesn't the iomap code already currently do this when it uses the
trimmed iomap length (eg "loff_t length = iomap_length(iter)") in
iomap_readpage_iter() for how much to read in?
> > IIRC willy suggested it. Because once we touch this area for
> > non-trivial changes it might be a good idea to get that done, or at
> > least do the prep work.
>
> Yes, I did suggest it. Basically, we would initialise read_bytes_pending
> to folio_size(), then subtract from it either when a request comes in
> or we decide to memset a hole. When it reaches zero, we have decided
> on the fate of every byte in the folio.
>
> It's fewer atomics for folios which contain no holes, which is the case
> we should be optimising for anyway.
I think we can even skip subtracting when we encounter a hole and just
tally it all up at the end if we just keep track of how many bytes the
caller asynchronously reads in, and then just do read_bytes_pending -=
folio_size() - bytes_read_in to offset it. Actually, looking at this
more, I think it must be done this way otherwise handling errors gets
tricky.
I had missed that this approach leads to fewer atomics since now this
gets rid of the caller having to increment it for every range read in.
This approach does seem better. I'll make this change for v5. We
should probably do the same thing for writeback.
Thanks,
Joanne
^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2025-09-24 22:56 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-16 23:44 [PATCH v3 00/15] fuse: use iomap for buffered reads + readahead Joanne Koong
2025-09-16 23:44 ` [PATCH v3 01/15] iomap: move bio read logic into helper function Joanne Koong
2025-09-18 21:27 ` Darrick J. Wong
2025-09-16 23:44 ` [PATCH v3 02/15] iomap: move read/readahead bio submission " Joanne Koong
2025-09-18 21:27 ` Darrick J. Wong
2025-09-16 23:44 ` [PATCH v3 03/15] iomap: store read/readahead bio generically Joanne Koong
2025-09-18 21:29 ` Darrick J. Wong
2025-09-16 23:44 ` [PATCH v3 04/15] iomap: iterate over entire folio in iomap_readpage_iter() Joanne Koong
2025-09-18 21:37 ` Darrick J. Wong
2025-09-22 22:33 ` Joanne Koong
2025-09-16 23:44 ` [PATCH v3 05/15] iomap: rename iomap_readpage_iter() to iomap_read_folio_iter() Joanne Koong
2025-09-16 23:44 ` [PATCH v3 06/15] iomap: rename iomap_readpage_ctx struct to iomap_read_folio_ctx Joanne Koong
2025-09-16 23:44 ` [PATCH v3 07/15] iomap: track read/readahead folio ownership internally Joanne Koong
2025-09-18 21:49 ` Darrick J. Wong
2025-09-19 18:14 ` Joanne Koong
2025-09-16 23:44 ` [PATCH v3 08/15] iomap: add public start/finish folio read helpers Joanne Koong
2025-09-16 23:44 ` [PATCH v3 09/15] iomap: add caller-provided callbacks for read and readahead Joanne Koong
2025-09-16 23:44 ` [PATCH v3 10/15] iomap: add bias for async read requests Joanne Koong
2025-09-18 22:30 ` Darrick J. Wong
2025-09-19 18:34 ` Joanne Koong
2025-09-22 18:33 ` Christoph Hellwig
2025-09-22 20:54 ` Matthew Wilcox
2025-09-24 22:56 ` Joanne Koong
2025-09-22 23:19 ` Joanne Koong
2025-09-16 23:44 ` [PATCH v3 11/15] iomap: move buffered io bio logic into new file Joanne Koong
2025-09-17 21:40 ` kernel test robot
2025-09-18 22:31 ` Darrick J. Wong
2025-09-19 15:33 ` Christoph Hellwig
2025-09-19 15:32 ` Christoph Hellwig
2025-09-16 23:44 ` [PATCH v3 12/15] iomap: make iomap_read_folio() a void return Joanne Koong
2025-09-18 21:55 ` Darrick J. Wong
2025-09-16 23:44 ` [PATCH v3 13/15] fuse: use iomap for read_folio Joanne Koong
2025-09-16 23:44 ` [PATCH v3 14/15] fuse: use iomap for readahead Joanne Koong
2025-09-18 22:35 ` Darrick J. Wong
2025-09-16 23:44 ` [PATCH v3 15/15] fuse: remove fc->blkbits workaround for partial writes Joanne Koong
2025-09-18 22:35 ` Darrick J. Wong
2025-09-17 8:30 ` [syzbot ci] Re: fuse: use iomap for buffered reads + readahead syzbot ci
2025-09-17 19:59 ` Joanne Koong
2025-09-18 15:48 ` Aleksandr Nogikh
2025-09-18 21:15 ` Joanne Koong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).