All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Ritesh Harjani <riteshh@linux.ibm.com>
Cc: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [Patch v2 41/42] btrfs: fix the use-after-free bug in writeback subpage helper
Date: Wed, 12 May 2021 09:49:11 +0800	[thread overview]
Message-ID: <fae358ed-8d14-8e14-2dc3-173637ec5e87@gmx.com> (raw)
In-Reply-To: <334a5fdd-28ee-4163-456c-adc4b2276d08@gmx.com>

Hi Ritesh,

The patchset gets updated, and I am already running the tests, so far so
good.

The new head is:
commit cb81da05e7899b8196c3c5e0b122798da3b94af0
Author: Qu Wenruo <wqu@suse.com>
Date:   Mon May 3 08:19:27 2021 +0800

     btrfs: remove io_failure_record::in_validation

I may have some minor change the to commit messages and comments
preparing for the next submit, but the code shouldn't change any more.


Just one note, thanks to your report on btrfs/028, I even find a data
corruption bug in relocation code.
Kudos (and of-course Reported-by tags) to you!

New changes since v2 patchset:

- Fix metadata read path ASSERT() when last eb is already dereferred
- Fix read repair related bugs
   * fix possible hang due to unreleased sectors after read error
   * fix double accounting in btrfs_subpage::readers

- Fix false alert when relocating data extent without csum
   This is really a false alert, the expected csum is always 0x00

- Fix a data corruption when relocating certain data extents layout
   This is a real corruption, both relocation and scrub will report
   error.

Thanks and happy testing!
Qu

On 2021/5/11 下午7:15, Qu Wenruo wrote:
>
>
> On 2021/5/11 下午6:48, Ritesh Harjani wrote:
>> On 21/05/10 09:10PM, Qu Wenruo wrote:
>>>
>>>
>>> On 2021/5/10 下午8:29, Ritesh Harjani wrote:
>>>> On 21/05/10 04:38PM, Qu Wenruo wrote:
>>>>> Hi Ritesh,
>>>>>
>>>>> I guess no error report so far is a good thing?
>>>> Sorry about the delay in starting of my testing. Was not keeping
>>>> well since
>>>> Friday onwards, hence could not start the testing. (Feeling much
>>>> better now).
>>>>
>>>> So -g quick passed w/o any fatal issues. But with -g auto I got a
>>>> kernel bug
>>>> with btrfs/28. Below is the report.
>>>>
>>>>>
>>>>> Just to report what my result is, I ran my latest github branch for
>>>>> the
>>>>> full weekend, over 50 hours, and around 20 runs of full generic/auto
>>>>> without defrag groups.
>>>>>
>>>>> And I see no crash at all.
>>>>>
>>>>> But there is a special note, there is a new patch, introduced just
>>>>> before the weekend (Fri May 7 09:31:43 2021 +0800), titled "btrfs:
>>>>> fix a
>>>>> possible use-after-free race in metadata read path", is a new fix
>>>>> for a
>>>>> bug I reproduced once locally.
>>>>
>>>> Yes,  I already have this in my tree. This is the latest patch in my
>>>> tree which
>>>> I am testing.
>>>> "btrfs: remove io_failure_record::in_validation"
>>>>
>>>>>
>>>>> The bug should only happen when read is slow and only happens for
>>>>> metadata read path.
>>>>>
>>>>> The details can be found in the commit message, although it's rare to
>>>>> hit, I have hit such problem around 3 times in total.
>>>>>
>>>>> Hopes you didn't hit any crash during your test.
>>>>
>>>> I am hitting below bug_on(). Since I saw your email just now, so I
>>>> am directly
>>>> reporting this failure, w/o analyzing. Please let me know if you
>>>> need anything
>>>> else from my end for this.
>>>>
>>>> I will halt the testing of "-g auto" for now. Once we have some
>>>> conclusion on
>>>> this one, then will resume the testing.
>>>
>>> Thanks for the reporting, I was still just looping generic tests, thus
>>> didn't yet start testing the btrfs tests.
>>>
>>> But considering no new crash in generic tests, I guess it's time to move
>>> forward.
>>>
>>>>
>>>> btrfs/028 32s ...     [10:41:18][  780.104573] run fstests btrfs/028
>>>> at 2021-05-10 10:41:18
>>>>
>>>> [  780.732073] BTRFS: device fsid
>>>> be9b827d-28ee-4a5e-80a0-e19971061a58 devid 1 transid 5 /dev/vdc
>>>> scanned by mkfs.btrfs (21129)
>>>> [  780.759754] BTRFS info (device vdc): disk space caching is enabled
>>>> [  780.759848] BTRFS info (device vdc): has skinny extents
>>>> [  780.759888] BTRFS warning (device vdc): read-write for sector
>>>> size 4096 with page size 65536 is experimental
>>>> <...>
>>>> [  784.580404] BTRFS info (device vdc): found 21 extents, stage:
>>>> move data extents
>>>> [  784.878376] BTRFS info (device vdc): found 13 extents, stage:
>>>> update data pointers
>>>> [  785.175349] BTRFS info (device vdc): balance: ended with status: 0
>>>> [  785.367729] BTRFS info (device vdc): balance: start -d
>>>> [  785.400884] BTRFS info (device vdc): relocating block group
>>>> 2446327808 flags data
>>>> [  785.527858] btrfs_print_data_csum_error: 18 callbacks suppressed
>>>> [  785.527865] BTRFS warning (device vdc): csum failed root -9 ino
>>>> 262 off 393216 csum 0x8941f998 expected csum 0x9439dda4 mirror 1
>>>
>>> Checking the test case btrfs/028, it shouldn't have any error when
>>> relocating the block groups, thus it's definitely something wrong in the
>>> balance code.
>>>
>>> Thanks for the report, I'll give you an update after finishing the local
>>> btrfs test groups.
>>>
>>> Thanks for your confirmation, really helps a lot!
>>
>> Hi Qu,
>>
>> FYI - I re-tested "-g auto" with btrfs/028 test excluded. I didn't
>> find any
>> other failure.
>
> That's too kind of you.
>
> It's not a surprise for it too pass generic tests, but since I haven't
> yet run full btrfs test group, it passing other btrfs tests is really a
> good news.
>
>> Please let me know once you have a fix for btrfs/028, I can
>> re-test the whole tree again.
>
> Fix on the way, in fact btrfs/028 already shows several bugs I didn't
> expect at all, some spoilers:
>
> - The crash in btrfs_subpage_end_reader()
>    It turns out to be a bug in the read time refactor patches. ("btrfs:
>    submit read time repair only for each corrupted sector")
>    Fixed in the original patch.
>
> - Possible hang for certain data repair failure
>    The same cause as above bug.
>    Fixed in the original patch.
>
> - False alert for data reloc, with expected csum 0x00
>    A bug in btrfs_verify_data_csum() which from the very beginning it
>    doesn't take subpage into consideration.
>    Fixed in a new patch.
>
> - False alert for data reloc, with random expected csum
>    Still debugging, hopes to be the last bug in the series.
>
> Will give another update when the last bug get solved.
>
> Thanks,
> Qu
>>
>> Thanks
>> ritesh
>>
>>
>>> Qu
>>>
>>>> [  785.528406] btrfs_dev_stat_print_on_error: 18 callbacks suppressed
>>>> [  785.528409] BTRFS error (device vdc): bdev /dev/vdc errs: wr 0,
>>>> rd 0, flush 0, corrupt 1, gen 0
>>>> [  785.528857] BTRFS warning (device vdc): csum failed root -9 ino
>>>> 262 off 397312 csum 0x8941f998 expected csum 0x9439dda4 mirror 1
>>>> [  785.529166] BTRFS error (device vdc): bdev /dev/vdc errs: wr 0,
>>>> rd 0, flush 0, corrupt 2, gen 0
>>>> [  785.529412] BTRFS warning (device vdc): csum failed root -9 ino
>>>> 262 off 401408 csum 0x8941f998 expected csum 0x667b7e1e mirror 1
>>>> [  785.529714] BTRFS error (device vdc): bdev /dev/vdc errs: wr 0,
>>>> rd 0, flush 0, corrupt 3, gen 0
>>>> [  785.530321] BTRFS warning (device vdc): csum failed root -9 ino
>>>> 262 off 393216 csum 0x8941f998 expected csum 0x9439dda4 mirror 1
>>>> [  785.530637] BTRFS error (device vdc): bdev /dev/vdc errs: wr 0,
>>>> rd 0, flush 0, corrupt 4, gen 0
>>>> [  785.530882] BTRFS warning (device vdc): csum failed root -9 ino
>>>> 262 off 397312 csum 0x8941f998 expected csum 0x9439dda4 mirror 1
>>>> [  785.531185] BTRFS error (device vdc): bdev /dev/vdc errs: wr 0,
>>>> rd 0, flush 0, corrupt 5, gen 0
>>>> [  785.531428] BTRFS warning (device vdc): csum failed root -9 ino
>>>> 262 off 401408 csum 0x8941f998 expected csum 0x667b7e1e mirror 1
>>>> [  785.531719] BTRFS error (device vdc): bdev /dev/vdc errs: wr 0,
>>>> rd 0, flush 0, corrupt 6, gen 0
>>>> <...>
>>>> [  803.459877] BTRFS info (device vdc): relocating block group
>>>> 10499391488 flags data
>>>> [  803.776810] BTRFS info (device vdc): found 29 extents, stage:
>>>> move data extents
>>>> [  803.979572] BTRFS info (device vdc): found 18 extents, stage:
>>>> update data pointers
>>>> [  804.276370] BTRFS info (device vdc): balance: ended with status: 0
>>>> [  804.427621] BTRFS info (device vdc): balance: start -d
>>>> [  804.454527] BTRFS info (device vdc): relocating block group
>>>> 11036262400 flags data
>>>> [  804.623962] BTRFS warning (device vdc): csum failed root -9 ino
>>>> 282 off 684032 csum 0x8941f998 expected csum 0x605aaa22 mirror 1
>>>> [  804.624147] BTRFS error (device vdc): bdev /dev/vdc errs: wr 0,
>>>> rd 0, flush 0, corrupt 15, gen 0
>>>> [  804.624277] BTRFS warning (device vdc): csum failed root -9 ino
>>>> 282 off 688128 csum 0x8941f998 expected csum 0xe90a7889 mirror 1
>>>> [  804.624435] BTRFS error (device vdc): bdev /dev/vdc errs: wr 0,
>>>> rd 0, flush 0, corrupt 16, gen 0
>>>> [  804.624682] assertion failed: atomic_read(&subpage->readers) >=
>>>> nbits, in fs/btrfs/subpage.c:203
>>>> [  804.624902] ------------[ cut here ]------------
>>>> [  804.624989] kernel BUG at fs/btrfs/ctree.h:3415!
>>>> cpu 0x1: Vector: 700 (Program Check) at [c000000007b47640]
>>>>       pc: c000000000af297c: assertfail.constprop.11+0x34/0x38
>>>>       lr: c000000000af2978: assertfail.constprop.11+0x30/0x38
>>>>       sp: c000000007b478e0
>>>>      msr: 800000000282b033
>>>>     current = 0xc000000007999800
>>>>     paca    = 0xc00000003fffee00     irqmask: 0x03     irq_happened:
>>>> 0x01
>>>>       pid   = 23, comm = kworker/u4:1
>>>> kernel BUG at fs/btrfs/ctree.h:3415!
>>>> Linux version 5.12.0-rc8-00160-gcd0da6627caa (root@ltctulc6a-p1)
>>>> (gcc (Ubuntu 8.4.0-1ubuntu1~18.04) 8.4.0, GNU ld (GNU Binutils for
>>>> Ubuntu) 2.30) #25 SMP Mon May 10 01:31:44 CDT 2021
>>>> enter ? for help
>>>> [c000000007b47940] c000000000aefdac btrfs_subpage_end_reader+0x5c/0xb0
>>>> [c000000007b47980] c000000000a379f0 end_page_read+0x1d0/0x200
>>>> [c000000007b479c0] c000000000a41554 end_bio_extent_readpage+0x784/0x9b0
>>>> [c000000007b47b30] c000000000b4a234 bio_endio+0x254/0x270
>>>> [c000000007b47b70] c0000000009f6178 end_workqueue_fn+0x48/0x80
>>>> [c000000007b47ba0] c000000000a5c960 btrfs_work_helper+0x260/0x8e0
>>>> [c000000007b47c40] c00000000020a7f4 process_one_work+0x434/0x7d0
>>>> [c000000007b47d10] c00000000020ae94 worker_thread+0x304/0x570
>>>> [c000000007b47da0] c0000000002173cc kthread+0x1bc/0x1d0
>>>> [c000000007b47e10] c00000000000d6ec ret_from_kernel_thread+0x5c/0x70
>>>>
>>>> -ritesh
>>>>

  reply	other threads:[~2021-05-12  1:49 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-27 23:03 [Patch v2 00/42] btrfs: add data write support for subpage Qu Wenruo
2021-04-27 23:03 ` [Patch v2 01/42] btrfs: scrub: fix subpage scrub repair error caused by hardcoded PAGE_SIZE Qu Wenruo
2021-05-13 22:57   ` David Sterba
2021-05-13 23:32     ` Qu Wenruo
2021-04-27 23:03 ` [Patch v2 02/42] btrfs: make free space cache size consistent across different PAGE_SIZE Qu Wenruo
2021-04-27 23:03 ` [Patch v2 03/42] btrfs: remove the unused parameter @len for btrfs_bio_fits_in_stripe() Qu Wenruo
2021-05-13 22:58   ` David Sterba
2021-05-13 23:07   ` David Sterba
2021-04-27 23:03 ` [Patch v2 04/42] btrfs: allow btrfs_bio_fits_in_stripe() to accept bio without any page Qu Wenruo
2021-04-27 23:03 ` [Patch v2 05/42] btrfs: refactor submit_extent_page() to make bio and its flag tracing easier Qu Wenruo
2021-05-13 23:03   ` David Sterba
2021-05-21 11:06   ` Johannes Thumshirn
2021-05-21 11:26     ` Qu Wenruo
2021-05-21 13:30       ` David Sterba
2021-04-27 23:03 ` [Patch v2 06/42] btrfs: make subpage metadata write path to call its own endio functions Qu Wenruo
2021-04-27 23:03 ` [Patch v2 07/42] btrfs: pass btrfs_inode into btrfs_writepage_endio_finish_ordered() Qu Wenruo
2021-05-13 23:06   ` David Sterba
2021-05-13 23:35     ` Qu Wenruo
2021-05-21 14:27   ` Josef Bacik
2021-05-21 20:22     ` David Sterba
2021-05-22  0:24     ` Qu Wenruo
2021-05-23  7:40       ` Qu Wenruo
2021-05-23 13:43         ` Josef Bacik
2021-05-23 13:50           ` Qu Wenruo
2021-05-23 14:08             ` Josef Bacik
2021-04-27 23:03 ` [Patch v2 08/42] btrfs: make Private2 lifespan more consistent Qu Wenruo
2021-04-27 23:03 ` [Patch v2 09/42] btrfs: refactor how we finish ordered extent io for endio functions Qu Wenruo
2021-05-13 23:11   ` David Sterba
2021-04-27 23:03 ` [Patch v2 10/42] btrfs: update the comments in btrfs_invalidatepage() Qu Wenruo
2021-04-27 23:03 ` [Patch v2 11/42] btrfs: introduce btrfs_lookup_first_ordered_range() Qu Wenruo
2021-05-13 23:13   ` David Sterba
2021-04-27 23:03 ` [Patch v2 12/42] btrfs: refactor btrfs_invalidatepage() Qu Wenruo
2021-04-27 23:03 ` [Patch v2 13/42] btrfs: rename PagePrivate2 to PageOrdered inside btrfs Qu Wenruo
2021-04-27 23:03 ` [Patch v2 14/42] btrfs: pass bytenr directly to __process_pages_contig() Qu Wenruo
2021-04-27 23:03 ` [Patch v2 15/42] btrfs: refactor the page status update into process_one_page() Qu Wenruo
2021-04-27 23:03 ` [Patch v2 16/42] btrfs: provide btrfs_page_clamp_*() helpers Qu Wenruo
2021-04-27 23:03 ` [Patch v2 17/42] btrfs: only require sector size alignment for end_bio_extent_writepage() Qu Wenruo
2021-04-27 23:03 ` [Patch v2 18/42] btrfs: make btrfs_dirty_pages() to be subpage compatible Qu Wenruo
2021-04-27 23:03 ` [Patch v2 19/42] btrfs: make __process_pages_contig() to handle subpage dirty/error/writeback status Qu Wenruo
2021-04-27 23:03 ` [Patch v2 20/42] btrfs: make end_bio_extent_writepage() to be subpage compatible Qu Wenruo
2021-04-27 23:03 ` [Patch v2 21/42] btrfs: make process_one_page() to handle subpage locking Qu Wenruo
2021-04-27 23:03 ` [Patch v2 22/42] btrfs: introduce helpers for subpage ordered status Qu Wenruo
2021-04-27 23:03 ` [Patch v2 23/42] btrfs: make page Ordered bit to be subpage compatible Qu Wenruo
2021-04-27 23:03 ` [Patch v2 24/42] btrfs: update locked page dirty/writeback/error bits in __process_pages_contig Qu Wenruo
2021-04-27 23:03 ` [Patch v2 25/42] btrfs: prevent extent_clear_unlock_delalloc() to unlock page not locked by __process_pages_contig() Qu Wenruo
2021-04-27 23:03 ` [Patch v2 26/42] btrfs: make btrfs_set_range_writeback() subpage compatible Qu Wenruo
2021-04-27 23:03 ` [Patch v2 27/42] btrfs: make __extent_writepage_io() only submit dirty range for subpage Qu Wenruo
2021-04-27 23:03 ` [Patch v2 28/42] btrfs: make btrfs_truncate_block() to be subpage compatible Qu Wenruo
2021-04-27 23:03 ` [Patch v2 29/42] btrfs: make btrfs_page_mkwrite() " Qu Wenruo
2021-04-27 23:03 ` [Patch v2 30/42] btrfs: reflink: make copy_inline_to_page() " Qu Wenruo
2021-04-27 23:03 ` [Patch v2 31/42] btrfs: fix the filemap_range_has_page() call in btrfs_punch_hole_lock_range() Qu Wenruo
2021-04-27 23:03 ` [Patch v2 32/42] btrfs: don't clear page extent mapped if we're not invalidating the full page Qu Wenruo
2021-04-27 23:03 ` [Patch v2 33/42] btrfs: extract relocation page read and dirty part into its own function Qu Wenruo
2021-04-27 23:03 ` [Patch v2 34/42] btrfs: make relocate_one_page() to handle subpage case Qu Wenruo
2021-04-27 23:03 ` [Patch v2 35/42] btrfs: fix wild subpage writeback which does not have ordered extent Qu Wenruo
2021-04-27 23:03 ` [Patch v2 36/42] btrfs: disable inline extent creation for subpage Qu Wenruo
2021-05-04  4:28   ` Qu Wenruo
2021-04-27 23:03 ` [Patch v2 37/42] btrfs: skip validation for subpage read repair Qu Wenruo
2021-04-27 23:03 ` [Patch v2 38/42] btrfs: allow submit_extent_page() to do bio split for subpage Qu Wenruo
2021-04-27 23:03 ` [Patch v2 39/42] btrfs: reject raid5/6 fs " Qu Wenruo
2021-04-28 14:22   ` Neal Gompa
2021-04-28 23:11     ` Qu Wenruo
2021-05-12 22:04       ` David Sterba
2021-04-27 23:03 ` [Patch v2 40/42] btrfs: fix a crash caused by race between prepare_pages() and btrfs_releasepage() Qu Wenruo
2021-04-28 10:56   ` Filipe Manana
2021-04-27 23:03 ` [Patch v2 41/42] btrfs: fix the use-after-free bug in writeback subpage helper Qu Wenruo
2021-05-06 23:46   ` Qu Wenruo
2021-05-07  4:57     ` Ritesh Harjani
2021-05-07  5:14       ` Qu Wenruo
2021-05-10  8:38         ` Qu Wenruo
2021-05-10 12:29           ` Ritesh Harjani
2021-05-10 13:10             ` Qu Wenruo
2021-05-11 10:48               ` Ritesh Harjani
2021-05-11 11:15                 ` Qu Wenruo
2021-05-12  1:49                   ` Qu Wenruo [this message]
2021-05-12  7:09                     ` Ritesh Harjani
2021-05-13 16:33                       ` Ritesh Harjani
2021-05-13 21:36                         ` Ritesh Harjani
2021-05-13 23:41                           ` Qu Wenruo
2021-05-14 15:08                             ` Ritesh Harjani
2021-05-14 17:53                               ` Ritesh Harjani
2021-05-14 22:22                                 ` Qu Wenruo
2021-05-15  9:59                                   ` Ritesh Harjani
2021-05-15 10:15                                     ` Qu Wenruo
2021-05-25  4:43                                       ` Ritesh Harjani
2021-05-25  5:52                                         ` Qu Wenruo
2021-05-25  6:14                                           ` Qu Wenruo
2021-05-25  9:23                                             ` Ritesh Harjani
2021-05-25  9:45                                               ` Qu Wenruo
2021-05-25  9:49                                                 ` Qu Wenruo
2021-05-25 10:20                                                   ` Ritesh Harjani
2021-05-25 11:41                                                     ` Qu Wenruo
2021-05-25 13:02                                                       ` Ritesh Harjani
2021-05-26  5:29                                                         ` Ritesh Harjani
2021-05-26  5:58                                                           ` Qu Wenruo
2021-05-26 13:45                                                             ` Ritesh Harjani
2021-05-28  8:26                                                               ` Qu Wenruo
2021-05-28  8:59                                                                 ` Ritesh Harjani
2021-05-28 10:25                                                                   ` Qu Wenruo
2021-05-30  1:50                                                                     ` Qu Wenruo
2021-04-27 23:03 ` [Patch v2 42/42] btrfs: allow read-write for 4K sectorsize on 64K page size systems Qu Wenruo
2021-05-12 22:18 ` [Patch v2 00/42] btrfs: add data write support for subpage David Sterba
2021-05-12 23:48   ` Qu Wenruo
2021-05-13  2:21     ` Qu Wenruo
2021-05-13 22:54       ` David Sterba
2021-05-14  1:41         ` Qu Wenruo
2021-05-14  2:26           ` riteshh
2021-05-14 10:28             ` riteshh
2021-05-14 11:28               ` David Sterba
2021-05-14 14:38                 ` riteshh
2021-05-14 11:30       ` David Sterba
2021-05-14 22:25         ` David Sterba
2021-05-14 22:45         ` Qu Wenruo
2021-05-14 23:05           ` David Sterba
2021-05-14 23:17             ` Qu Wenruo
2021-05-17 13:22               ` David Sterba
2021-05-17 23:20                 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fae358ed-8d14-8e14-2dc3-173637ec5e87@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=riteshh@linux.ibm.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.