From: Christoph Hellwig <hch@lst.de>
To: Christian Brauner <brauner@kernel.org>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
Chandan Babu R <chandan.babu@oracle.com>,
Zhang Yi <yi.zhang@huaweicloud.com>,
Ritesh Harjani <ritesh.list@gmail.com>,
Jens Axboe <axboe@kernel.dk>,
Andreas Gruenbacher <agruenba@redhat.com>,
Damien Le Moal <dlemoal@kernel.org>,
Naohiro Aota <naohiro.aota@wdc.com>,
Johannes Thumshirn <jth@kernel.org>,
linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-block@vger.kernel.org, gfs2@lists.linux.dev
Subject: [PATCH 12/14] iomap: submit ioends immediately
Date: Thu, 7 Dec 2023 08:27:08 +0100 [thread overview]
Message-ID: <20231207072710.176093-13-hch@lst.de> (raw)
In-Reply-To: <20231207072710.176093-1-hch@lst.de>
Currently the writeback code delays submitting fill ioends until we
reach the end of the folio. The reason for that is that otherwise
the end I/O handler could clear the writeback bit before we've even
finished submitting all I/O for the folio.
Add a bias to ifs->write_bytes_pending while we are submitting I/O
for a folio so that it never reaches zero until all I/O is completed
to prevent the early writeback bit clearing, and remove the now
superfluous submit_list.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
fs/iomap/buffered-io.c | 162 +++++++++++++++++++----------------------
1 file changed, 76 insertions(+), 86 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 622764ed649d57..4339bc422b245d 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1620,30 +1620,34 @@ static void iomap_writepage_end_bio(struct bio *bio)
* Submit the final bio for an ioend.
*
* If @error is non-zero, it means that we have a situation where some part of
- * the submission process has failed after we've marked pages for writeback
- * and unlocked them. In this situation, we need to fail the bio instead of
- * submitting it. This typically only happens on a filesystem shutdown.
+ * the submission process has failed after we've marked pages for writeback.
+ * We cannot cancel ioend directly in that case, so call the bio end I/O handler
+ * with the error status here to run the normal I/O completion handler to clear
+ * the writeback bit and let the file system proess the errors.
*/
-static int
-iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend,
- int error)
+static int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
{
+ if (!wpc->ioend)
+ return error;
+
+ /*
+ * Let the file systems prepare the I/O submission and hook in an I/O
+ * comletion handler. This also needs to happen in case after a
+ * failure happened so that the file system end I/O handler gets called
+ * to clean up.
+ */
if (wpc->ops->prepare_ioend)
- error = wpc->ops->prepare_ioend(ioend, error);
+ error = wpc->ops->prepare_ioend(wpc->ioend, error);
+
if (error) {
- /*
- * If we're failing the IO now, just mark the ioend with an
- * error and finish it. This will run IO completion immediately
- * as there is only one reference to the ioend at this point in
- * time.
- */
- ioend->io_bio.bi_status = errno_to_blk_status(error);
- bio_endio(&ioend->io_bio);
- return error;
+ wpc->ioend->io_bio.bi_status = errno_to_blk_status(error);
+ bio_endio(&wpc->ioend->io_bio);
+ } else {
+ submit_bio(&wpc->ioend->io_bio);
}
- submit_bio(&ioend->io_bio);
- return 0;
+ wpc->ioend = NULL;
+ return error;
}
static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
@@ -1697,19 +1701,28 @@ static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos)
/*
* Test to see if we have an existing ioend structure that we could append to
* first; otherwise finish off the current ioend and start another.
+ *
+ * If a new ioend is created and cached, the old ioend is submitted to the block
+ * layer instantly. Batching optimisations are provided by higher level block
+ * plugging.
+ *
+ * At the end of a writeback pass, there will be a cached ioend remaining on the
+ * writepage context that the caller will need to submit.
*/
-static void iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
+static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
struct writeback_control *wbc, struct folio *folio,
- struct inode *inode, loff_t pos, struct list_head *iolist)
+ struct inode *inode, loff_t pos)
{
struct iomap_folio_state *ifs = folio->private;
unsigned len = i_blocksize(inode);
size_t poff = offset_in_folio(folio, pos);
+ int error;
if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos)) {
new_ioend:
- if (wpc->ioend)
- list_add(&wpc->ioend->io_list, iolist);
+ error = iomap_submit_ioend(wpc, 0);
+ if (error)
+ return error;
wpc->ioend = iomap_alloc_ioend(wpc, wbc, inode, pos);
}
@@ -1720,12 +1733,12 @@ static void iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
atomic_add(len, &ifs->write_bytes_pending);
wpc->ioend->io_size += len;
wbc_account_cgroup_owner(wbc, &folio->page, len);
+ return 0;
}
static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
struct writeback_control *wbc, struct folio *folio,
- struct inode *inode, u64 pos, unsigned *count,
- struct list_head *submit_list)
+ struct inode *inode, u64 pos, unsigned *count)
{
int error;
@@ -1742,8 +1755,9 @@ static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
case IOMAP_HOLE:
break;
default:
- iomap_add_to_ioend(wpc, wbc, folio, inode, pos, submit_list);
- (*count)++;
+ error = iomap_add_to_ioend(wpc, wbc, folio, inode, pos);
+ if (!error)
+ (*count)++;
}
fail:
@@ -1819,35 +1833,21 @@ static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
return true;
}
-/*
- * We implement an immediate ioend submission policy here to avoid needing to
- * chain multiple ioends and hence nest mempool allocations which can violate
- * the forward progress guarantees we need to provide. The current ioend we're
- * adding blocks to is cached in the writepage context, and if the new block
- * doesn't append to the cached ioend, it will create a new ioend and cache that
- * instead.
- *
- * If a new ioend is created and cached, the old ioend is returned and queued
- * locally for submission once the entire page is processed or an error has been
- * detected. While ioends are submitted immediately after they are completed,
- * batching optimisations are provided by higher level block plugging.
- *
- * At the end of a writeback pass, there will be a cached ioend remaining on the
- * writepage context that the caller will need to submit.
- */
static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
struct writeback_control *wbc, struct folio *folio)
{
struct iomap_folio_state *ifs = folio->private;
struct inode *inode = folio->mapping->host;
- struct iomap_ioend *ioend, *next;
unsigned len = i_blocksize(inode);
unsigned nblocks = i_blocks_per_folio(inode, folio);
u64 pos = folio_pos(folio);
u64 end_pos = pos + folio_size(folio);
unsigned count = 0;
int error = 0, i;
- LIST_HEAD(submit_list);
+
+ WARN_ON_ONCE(!folio_test_locked(folio));
+ WARN_ON_ONCE(folio_test_dirty(folio));
+ WARN_ON_ONCE(folio_test_writeback(folio));
trace_iomap_writepage(inode, pos, folio_size(folio));
@@ -1857,12 +1857,27 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
}
WARN_ON_ONCE(end_pos <= pos);
- if (!ifs && nblocks > 1) {
- ifs = ifs_alloc(inode, folio, 0);
- iomap_set_range_dirty(folio, 0, end_pos - pos);
+ if (nblocks > 1) {
+ if (!ifs) {
+ ifs = ifs_alloc(inode, folio, 0);
+ iomap_set_range_dirty(folio, 0, end_pos - pos);
+ }
+
+ /*
+ * Keep the I/O completion handler from clearing the writeback
+ * bit until we have submitted all blocks by adding a bias to
+ * ifs->write_bytes_pending, which is dropped after submitting
+ * all blocks.
+ */
+ WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending) != 0);
+ atomic_inc(&ifs->write_bytes_pending);
}
- WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) != 0);
+ /*
+ * Set the writeback bit ASAP, as the I/O completion for the single
+ * block per folio case happen hit as soon as we're submitting the bio.
+ */
+ folio_start_writeback(folio);
/*
* Walk through the folio to find areas to write back. If we
@@ -1873,18 +1888,13 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
if (ifs && !ifs_block_is_dirty(folio, ifs, i))
continue;
error = iomap_writepage_map_blocks(wpc, wbc, folio, inode, pos,
- &count, &submit_list);
+ &count);
if (error)
break;
}
if (count)
wpc->nr_folios++;
- WARN_ON_ONCE(!wpc->ioend && !list_empty(&submit_list));
- WARN_ON_ONCE(!folio_test_locked(folio));
- WARN_ON_ONCE(folio_test_writeback(folio));
- WARN_ON_ONCE(folio_test_dirty(folio));
-
/*
* We can have dirty bits set past end of file in page_mkwrite path
* while mapping the last partial folio. Hence it's better to clear
@@ -1893,38 +1903,20 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
iomap_clear_range_dirty(folio, 0, folio_size(folio));
/*
- * If the page hasn't been added to the ioend, it won't be affected by
- * I/O completion and we must unlock it now.
+ * Usually the writeback bit is cleared by the I/O completion handler.
+ * But we may end up either not actually writing any blocks, or (when
+ * there are multiple blocks in a folio) all I/O might have finished
+ * already at this point. In that case we need to clear the writeback
+ * bit ourselves right after unlocking the page.
*/
- if (error && !count) {
- folio_unlock(folio);
- goto done;
- }
-
- folio_start_writeback(folio);
folio_unlock(folio);
-
- /*
- * Preserve the original error if there was one; catch
- * submission errors here and propagate into subsequent ioend
- * submissions.
- */
- list_for_each_entry_safe(ioend, next, &submit_list, io_list) {
- int error2;
-
- list_del_init(&ioend->io_list);
- error2 = iomap_submit_ioend(wpc, ioend, error);
- if (error2 && !error)
- error = error2;
+ if (ifs) {
+ if (atomic_dec_and_test(&ifs->write_bytes_pending))
+ folio_end_writeback(folio);
+ } else {
+ if (!count)
+ folio_end_writeback(folio);
}
-
- /*
- * We can end up here with no error and nothing to write only if we race
- * with a partial page truncate on a sub-page block sized filesystem.
- */
- if (!count)
- folio_end_writeback(folio);
-done:
mapping_set_error(inode->i_mapping, error);
return error;
}
@@ -1952,9 +1944,7 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
wpc->ops = ops;
ret = write_cache_pages(mapping, wbc, iomap_do_writepage, wpc);
- if (!wpc->ioend)
- return ret;
- return iomap_submit_ioend(wpc, wpc->ioend, ret);
+ return iomap_submit_ioend(wpc, ret);
}
EXPORT_SYMBOL_GPL(iomap_writepages);
--
2.39.2
next prev parent reply other threads:[~2023-12-07 7:27 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-07 7:26 map multiple blocks per ->map_blocks in iomap writeback Christoph Hellwig
2023-12-07 7:26 ` [PATCH 01/14] iomap: clear the per-folio dirty bits on all writeback failures Christoph Hellwig
2023-12-07 7:26 ` [PATCH 02/14] iomap: treat inline data in iomap_writepage_map as an I/O error Christoph Hellwig
2023-12-07 7:26 ` [PATCH 03/14] iomap: move the io_folios field out of struct iomap_ioend Christoph Hellwig
2023-12-07 7:27 ` [PATCH 04/14] iomap: move the PF_MEMALLOC check to iomap_writepages Christoph Hellwig
2023-12-07 7:27 ` [PATCH 05/14] iomap: factor out a iomap_writepage_handle_eof helper Christoph Hellwig
2023-12-07 7:27 ` [PATCH 06/14] iomap: move all remaining per-folio logic into iomap_writepage_map Christoph Hellwig
2023-12-07 7:27 ` [PATCH 07/14] iomap: clean up the iomap_alloc_ioend calling convention Christoph Hellwig
2023-12-07 7:27 ` [PATCH 08/14] iomap: move the iomap_sector sector calculation out of iomap_add_to_ioend Christoph Hellwig
2023-12-07 7:27 ` [PATCH 09/14] iomap: don't chain bios Christoph Hellwig
2023-12-07 7:27 ` [PATCH 10/14] iomap: only call mapping_set_error once for each failed bio Christoph Hellwig
2023-12-07 7:27 ` [PATCH 11/14] iomap: factor out a iomap_writepage_map_block helper Christoph Hellwig
2023-12-07 7:27 ` Christoph Hellwig [this message]
2023-12-07 7:27 ` [PATCH 13/14] iomap: map multiple blocks at a time Christoph Hellwig
2023-12-07 13:39 ` Zhang Yi
2023-12-07 15:03 ` Christoph Hellwig
2023-12-08 7:33 ` Zhang Yi
2023-12-07 7:27 ` [PATCH 14/14] iomap: pass the length of the dirty region to ->map_blocks Christoph Hellwig
2023-12-11 10:45 ` map multiple blocks per ->map_blocks in iomap writeback Christian Brauner
2024-02-01 6:07 ` Christoph Hellwig
2024-02-01 6:18 ` Darrick J. Wong
2024-02-01 13:23 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231207072710.176093-13-hch@lst.de \
--to=hch@lst.de \
--cc=agruenba@redhat.com \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=chandan.babu@oracle.com \
--cc=djwong@kernel.org \
--cc=dlemoal@kernel.org \
--cc=gfs2@lists.linux.dev \
--cc=jth@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=naohiro.aota@wdc.com \
--cc=ritesh.list@gmail.com \
--cc=yi.zhang@huaweicloud.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).