All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Chao Yu <chao@kernel.org>
To: Eric Biggers <ebiggers@kernel.org>, Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-fsdevel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH] f2fs: remove broken support for allocating DIO writes
Date: Fri, 20 Aug 2021 17:35:21 +0800	[thread overview]
Message-ID: <c4e5c71d-1652-7174-fa36-674fab4e61df@kernel.org> (raw)
In-Reply-To: <YRsY6dyHyaChkQ6n@gmail.com>

On 2021/8/17 10:03, Eric Biggers wrote:
> On Mon, Aug 02, 2021 at 06:34:48PM -0700, Jaegeuk Kim wrote:
>> On 08/03, Chao Yu wrote:
>>> On 2021/8/3 2:23, Jaegeuk Kim wrote:
>>>> On 08/02, Chao Yu wrote:
>>>>> On 2021/8/2 12:39, Eric Biggers wrote:
>>>>>> On Fri, Jul 30, 2021 at 10:46:16PM -0400, Theodore Ts'o wrote:
>>>>>>> On Fri, Jul 30, 2021 at 12:17:26PM -0700, Eric Biggers wrote:
>>>>>>>>> Currently, non-overwrite DIO writes are fundamentally unsafe on f2fs as
>>>>>>>>> they require preallocating blocks, but f2fs doesn't support unwritten
>>>>>>>>> blocks and therefore has to preallocate the blocks as regular blocks.
>>>>>>>>> f2fs has no way to reliably roll back such preallocations, so as a
>>>>>>>>> result, f2fs will leak uninitialized blocks to users if a DIO write
>>>>>>>>> doesn't fully complete.
>>>>>>>
>>>>>>> There's another way of solving this problem which doesn't require
>>>>>>> supporting unwritten blocks.  What a file system *could* do is to
>>>>>>> allocate the blocks, but *not* update the on-disk data structures ---
>>>>>>> so the allocation happens in memory only, so you know that the
>>>>>>> physical blocks won't get used for another files, and then issue the
>>>>>>> data block writes.  On the block I/O completion, trigger a workqueue
>>>>>>> function which updates the on-disk metadata to assign physical blocks
>>>>>>> to the inode.
>>>>>>>
>>>>>>> That way if you crash before the data I/O has a chance to complete,
>>>>>>> the on-disk logical block -> physical block map hasn't been updated
>>>>>>> yet, and so you don't need to worry about leaking uninitialized blocks.
>>>>>
>>>>> Thanks for your suggestion, I think it makes sense.
>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> 					- Ted
>>>>>>
>>>>>> Jaegeuk and Chao, any idea how feasible it would be for f2fs to do this?
>>>>>
>>>>> Firstly, let's notice that below metadata will be touched during DIO
>>>>> preallocation flow:
>>>>> - log header
>>>>> - sit bitmap/count
>>>>> - free seg/sec bitmap/count
>>>>> - dirty seg/sec bitmap/count
>>>>>
>>>>> And there is one case we need to concern about is: checkpoint() can be
>>>>> triggered randomly in between dio_preallocate() and dio_end_io(), we should
>>>>> not persist any DIO preallocation related metadata during checkpoint(),
>>>>> otherwise, sudden power-cut after the checkpoint will corrupt filesytem.
>>>>>
>>>>> So it needs to well separate two kinds of metadata update:
>>>>> a) belong to dio preallocation
>>>>> b) the left one
>>>>>
>>>>> After that, it will simply checkpoint() flow to just flush metadata b), for
>>>>> other flow, like GC, data/node allocation, it needs to query/update metadata
>>>>> after we combine metadata a) and b).
>>>>>
>>>>> In addition, there is an existing in-memory log header framework in f2fs,
>>>>> based on this fwk, it's very easy for us to add a new in-memory log header
>>>>> for DIO preallocation.
>>>>>
>>>>> So it seems feasible for me until now...
>>>>>
>>>>> Jaegeuk, any other concerns about the implementation details?
>>>>
>>>> Hmm, I'm still trying to deal with this as a corner case where the writes
>>>> haven't completed due to an error. How about keeping the preallocated block
>>>> offsets and releasing them if we get an error? Do we need to handle EIO right?
>>>
>>> What about the case that CP + SPO following DIO preallocation? User will
>>> encounter uninitialized block after recovery.
>>
>> I think buffered writes as a workaround can expose the last unwritten block as
>> well, if SPO happens right after block allocation. We may need to compromise
>> at certain level?
>>
> 
> Freeing preallocated blocks on error would be better than nothing, although note
> that the preallocated blocks may have filled an arbitrary sequence of holes --
> so simply truncating past EOF would *not* be sufficient.
> 
> But really filesystems need to be designed to never expose uninitialized data,
> even if I/O errors or a sudden power failure occurs.  It is unfortunate that
> f2fs apparently wasn't designed with that goal in mind.
> 
> In any case, I don't think we can proceed with any other f2fs direct I/O
> improvements until this data leakage bug can be solved one way or another.  If
> my patch to remove support for allocating writes isn't acceptable and the
> desired solution is going to require some more invasive f2fs surgery, are you or
> Chao going to work on it?  I'm not sure there's much I can do here.

I may have time to take look into the implementation as I proposed above, maybe
just enabling this in FSYNC_MODE_STRICT mode if user concerns unwritten data?
thoughts?

> 
> - Eric
> 

WARNING: multiple messages have this Message-ID (diff)
From: Chao Yu <chao@kernel.org>
To: Eric Biggers <ebiggers@kernel.org>, Jaegeuk Kim <jaegeuk@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
	stable@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net
Subject: Re: [f2fs-dev] [PATCH] f2fs: remove broken support for allocating DIO writes
Date: Fri, 20 Aug 2021 17:35:21 +0800	[thread overview]
Message-ID: <c4e5c71d-1652-7174-fa36-674fab4e61df@kernel.org> (raw)
In-Reply-To: <YRsY6dyHyaChkQ6n@gmail.com>

On 2021/8/17 10:03, Eric Biggers wrote:
> On Mon, Aug 02, 2021 at 06:34:48PM -0700, Jaegeuk Kim wrote:
>> On 08/03, Chao Yu wrote:
>>> On 2021/8/3 2:23, Jaegeuk Kim wrote:
>>>> On 08/02, Chao Yu wrote:
>>>>> On 2021/8/2 12:39, Eric Biggers wrote:
>>>>>> On Fri, Jul 30, 2021 at 10:46:16PM -0400, Theodore Ts'o wrote:
>>>>>>> On Fri, Jul 30, 2021 at 12:17:26PM -0700, Eric Biggers wrote:
>>>>>>>>> Currently, non-overwrite DIO writes are fundamentally unsafe on f2fs as
>>>>>>>>> they require preallocating blocks, but f2fs doesn't support unwritten
>>>>>>>>> blocks and therefore has to preallocate the blocks as regular blocks.
>>>>>>>>> f2fs has no way to reliably roll back such preallocations, so as a
>>>>>>>>> result, f2fs will leak uninitialized blocks to users if a DIO write
>>>>>>>>> doesn't fully complete.
>>>>>>>
>>>>>>> There's another way of solving this problem which doesn't require
>>>>>>> supporting unwritten blocks.  What a file system *could* do is to
>>>>>>> allocate the blocks, but *not* update the on-disk data structures ---
>>>>>>> so the allocation happens in memory only, so you know that the
>>>>>>> physical blocks won't get used for another files, and then issue the
>>>>>>> data block writes.  On the block I/O completion, trigger a workqueue
>>>>>>> function which updates the on-disk metadata to assign physical blocks
>>>>>>> to the inode.
>>>>>>>
>>>>>>> That way if you crash before the data I/O has a chance to complete,
>>>>>>> the on-disk logical block -> physical block map hasn't been updated
>>>>>>> yet, and so you don't need to worry about leaking uninitialized blocks.
>>>>>
>>>>> Thanks for your suggestion, I think it makes sense.
>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> 					- Ted
>>>>>>
>>>>>> Jaegeuk and Chao, any idea how feasible it would be for f2fs to do this?
>>>>>
>>>>> Firstly, let's notice that below metadata will be touched during DIO
>>>>> preallocation flow:
>>>>> - log header
>>>>> - sit bitmap/count
>>>>> - free seg/sec bitmap/count
>>>>> - dirty seg/sec bitmap/count
>>>>>
>>>>> And there is one case we need to concern about is: checkpoint() can be
>>>>> triggered randomly in between dio_preallocate() and dio_end_io(), we should
>>>>> not persist any DIO preallocation related metadata during checkpoint(),
>>>>> otherwise, sudden power-cut after the checkpoint will corrupt filesytem.
>>>>>
>>>>> So it needs to well separate two kinds of metadata update:
>>>>> a) belong to dio preallocation
>>>>> b) the left one
>>>>>
>>>>> After that, it will simply checkpoint() flow to just flush metadata b), for
>>>>> other flow, like GC, data/node allocation, it needs to query/update metadata
>>>>> after we combine metadata a) and b).
>>>>>
>>>>> In addition, there is an existing in-memory log header framework in f2fs,
>>>>> based on this fwk, it's very easy for us to add a new in-memory log header
>>>>> for DIO preallocation.
>>>>>
>>>>> So it seems feasible for me until now...
>>>>>
>>>>> Jaegeuk, any other concerns about the implementation details?
>>>>
>>>> Hmm, I'm still trying to deal with this as a corner case where the writes
>>>> haven't completed due to an error. How about keeping the preallocated block
>>>> offsets and releasing them if we get an error? Do we need to handle EIO right?
>>>
>>> What about the case that CP + SPO following DIO preallocation? User will
>>> encounter uninitialized block after recovery.
>>
>> I think buffered writes as a workaround can expose the last unwritten block as
>> well, if SPO happens right after block allocation. We may need to compromise
>> at certain level?
>>
> 
> Freeing preallocated blocks on error would be better than nothing, although note
> that the preallocated blocks may have filled an arbitrary sequence of holes --
> so simply truncating past EOF would *not* be sufficient.
> 
> But really filesystems need to be designed to never expose uninitialized data,
> even if I/O errors or a sudden power failure occurs.  It is unfortunate that
> f2fs apparently wasn't designed with that goal in mind.
> 
> In any case, I don't think we can proceed with any other f2fs direct I/O
> improvements until this data leakage bug can be solved one way or another.  If
> my patch to remove support for allocating writes isn't acceptable and the
> desired solution is going to require some more invasive f2fs surgery, are you or
> Chao going to work on it?  I'm not sure there's much I can do here.

I may have time to take look into the implementation as I proposed above, maybe
just enabling this in FSYNC_MODE_STRICT mode if user concerns unwritten data?
thoughts?

> 
> - Eric
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

  parent reply	other threads:[~2021-08-20  9:35 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-28  1:51 [f2fs-dev] [PATCH] f2fs: remove broken support for allocating DIO writes Eric Biggers
2021-07-28  1:51 ` Eric Biggers
2021-07-30 19:17 ` [f2fs-dev] " Eric Biggers
2021-07-30 19:17   ` Eric Biggers
2021-07-30 22:12   ` [f2fs-dev] " Jaegeuk Kim
2021-07-30 22:12     ` Jaegeuk Kim
2021-07-30 22:19     ` Eric Biggers
2021-07-30 22:19       ` [f2fs-dev] " Eric Biggers
2021-07-31  1:05       ` Jaegeuk Kim
2021-07-31  1:05         ` [f2fs-dev] " Jaegeuk Kim
2021-07-31  1:18         ` Eric Biggers
2021-07-31  1:18           ` Eric Biggers
2021-07-31  2:46   ` Theodore Ts'o
2021-07-31  2:46     ` [f2fs-dev] " Theodore Ts'o
2021-08-02  4:39     ` Eric Biggers
2021-08-02  4:39       ` Eric Biggers
2021-08-02  9:00       ` Chao Yu
2021-08-02  9:00         ` [f2fs-dev] " Chao Yu
2021-08-02 18:23         ` Jaegeuk Kim
2021-08-02 18:23           ` Jaegeuk Kim
2021-08-03  1:19           ` [f2fs-dev] " Chao Yu
2021-08-03  1:19             ` Chao Yu
2021-08-03  1:34             ` [f2fs-dev] " Jaegeuk Kim
2021-08-03  1:34               ` Jaegeuk Kim
2021-08-17  2:03               ` [f2fs-dev] " Eric Biggers
2021-08-17  2:03                 ` Eric Biggers
2021-08-17  5:42                 ` [f2fs-dev] " Christoph Hellwig
2021-08-17  5:42                   ` Christoph Hellwig
2021-08-17 18:57                   ` Jaegeuk Kim
2021-08-17 18:57                     ` [f2fs-dev] " Jaegeuk Kim
2021-08-17 20:27                     ` Eric Biggers
2021-08-17 20:27                       ` [f2fs-dev] " Eric Biggers
2021-08-17 21:33                       ` Jaegeuk Kim
2021-08-17 21:33                         ` Jaegeuk Kim
2021-08-18  0:06                         ` [f2fs-dev] " Eric Biggers
2021-08-18  0:06                           ` Eric Biggers
2021-08-20  9:35                 ` Chao Yu [this message]
2021-08-20  9:35                   ` [f2fs-dev] " Chao Yu
2021-08-20 18:11                   ` Eric Biggers
2021-08-20 18:11                     ` Eric Biggers
2021-08-20 22:01                     ` [f2fs-dev] " Chao Yu
2021-08-20 22:01                       ` Chao Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c4e5c71d-1652-7174-fa36-674fab4e61df@kernel.org \
    --to=chao@kernel.org \
    --cc=ebiggers@kernel.org \
    --cc=jaegeuk@kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.