Linux-Fsdevel Archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: John Garry <john.g.garry@oracle.com>
Cc: axboe@kernel.dk, brauner@kernel.org, djwong@kernel.org,
	viro@zeniv.linux.org.uk, jack@suse.cz, akpm@linux-foundation.org,
	dchinner@redhat.com, tytso@mit.edu, hch@lst.de,
	martin.petersen@oracle.com, nilay@linux.ibm.com,
	ritesh.list@gmail.com, mcgrof@kernel.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, ojaswin@linux.ibm.com, p.raghav@samsung.com,
	jbongio@google.com, okiselev@amazon.com
Subject: Re: [PATCH RFC 5/7] fs: iomap: buffered atomic write support
Date: Mon, 22 Apr 2024 16:03:56 +0100	[thread overview]
Message-ID: <ZiZ8XGZz46D3PRKr@casper.infradead.org> (raw)
In-Reply-To: <20240422143923.3927601-6-john.g.garry@oracle.com>

On Mon, Apr 22, 2024 at 02:39:21PM +0000, John Garry wrote:
> Add special handling of PG_atomic flag to iomap buffered write path.
> 
> To flag an iomap iter for an atomic write, set IOMAP_ATOMIC.
> 
> For a folio associated with a write which has IOMAP_ATOMIC set, set
> PG_atomic.
> 
> Otherwise, when IOMAP_ATOMIC is unset, clear PG_atomic.
> 
> This means that for an "atomic" folio which has not been written back, it
> loses it "atomicity". So if userspace issues a write with RWF_ATOMIC set
> and another write with RWF_ATOMIC unset and which fully or partially
> overwrites that same region as the first write, that folio is not written
> back atomically. For such a scenario to occur, it would be considered a
> userspace usage error.
> 
> To ensure that a buffered atomic write is written back atomically when
> the write syscall returns, RWF_SYNC or similar needs to be used (in
> conjunction with RWF_ATOMIC).
> 
> As a safety check, when getting a folio for an atomic write in
> iomap_get_folio(), ensure that the length matches the inode mapping folio
> order-limit.
> 
> Only a single BIO should ever be submitted for an atomic write. So modify
> iomap_add_to_ioend() to ensure that we don't try to write back an atomic
> folio as part of a larger mixed-atomicity BIO.
> 
> In iomap_alloc_ioend(), handle an atomic write by setting REQ_ATOMIC for
> the allocated BIO.
> 
> When a folio is written back, again clear PG_atomic, as it is no longer
> required. I assume it will not be needlessly written back a second time...

I'm not taking a position on the mechanism yet; need to think about it
some more.  But there's a hole here I also don't have a solution to,
so we can all start thinking about it.

In iomap_write_iter(), we call copy_folio_from_iter_atomic().  Through no
fault of the application, if the range crosses a page boundary, we might
partially copy the bytes from the first page, then take a page fault on
the second page, hence doing a short write into the folio.  And there's
nothing preventing writeback from writing back a partially copied folio.

Now, if it's not dirty, then it can't be written back.  So if we're
doing an atomic write, we could clear the dirty bit after calling
iomap_write_begin() (given the usage scenarios we've discussed, it should
always be clear ...)

We need to prevent the "fall back to a short copy" logic in
iomap_write_iter() as well.  But then we also need to make sure we don't
get stuck in a loop, so maybe go three times around, and if it's still
not readable as a chunk, -EFAULT?

  reply	other threads:[~2024-04-22 15:04 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-22 14:39 [PATCH RFC 0/7] buffered block atomic writes John Garry
2024-04-22 14:39 ` [PATCH RFC 1/7] fs: Rename STATX{_ATTR}_WRITE_ATOMIC -> STATX{_ATTR}_WRITE_ATOMIC_DIO John Garry
2024-04-22 14:39 ` [PATCH RFC 2/7] filemap: Change mapping_set_folio_min_order() -> mapping_set_folio_orders() John Garry
2024-04-25 14:47   ` Pankaj Raghav (Samsung)
2024-04-26  8:02     ` John Garry
2024-04-22 14:39 ` [PATCH RFC 3/7] mm: Add PG_atomic John Garry
2024-04-22 14:39 ` [PATCH RFC 4/7] fs: Add initial buffered atomic write support info to statx John Garry
2024-04-22 14:39 ` [PATCH RFC 5/7] fs: iomap: buffered atomic write support John Garry
2024-04-22 15:03   ` Matthew Wilcox [this message]
2024-04-22 16:02     ` John Garry
2024-04-22 14:39 ` [PATCH RFC 6/7] fs: xfs: buffered atomic writes statx support John Garry
2024-04-22 14:39 ` [PATCH RFC 7/7] fs: xfs: Enable buffered atomic writes John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZiZ8XGZz46D3PRKr@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jbongio@google.com \
    --cc=john.g.garry@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mcgrof@kernel.org \
    --cc=nilay@linux.ibm.com \
    --cc=ojaswin@linux.ibm.com \
    --cc=okiselev@amazon.com \
    --cc=p.raghav@samsung.com \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).