DM-Devel Archive mirror
 help / color / mirror / Atom feed
From: Xiao Ni <xni@redhat.com>
To: Yu Kuai <yukuai1@huaweicloud.com>
Cc: zkabelac@redhat.com, agk@redhat.com, snitzer@kernel.org,
	 mpatocka@redhat.com, dm-devel@lists.linux.dev, song@kernel.org,
	 heinzm@redhat.com, neilb@suse.de, jbrassow@redhat.com,
	 linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	yi.zhang@huawei.com,  yangerkun@huawei.com,
	"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2
Date: Mon, 4 Mar 2024 09:25:55 +0800	[thread overview]
Message-ID: <CALTww2-BmudurHbsbbqBMq+KgZs+hokqOJnovS5KDGEidHqZzA@mail.gmail.com> (raw)
In-Reply-To: <35feaa54-db9e-f0d6-d5a5-a10a45bb90a5@huaweicloud.com>

On Mon, Mar 4, 2024 at 9:24 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/03/04 9:07, Yu Kuai 写道:
> > Hi,
> >
> > 在 2024/03/03 21:16, Xiao Ni 写道:
> >> Hi all
> >>
> >> There is a error report from lvm regression tests. The case is
> >> lvconvert-raid-reshape-stripes-load-reload.sh. I saw this error when I
> >> tried to fix dmraid regression problems too. In my patch set,  after
> >> reverting ad39c08186f8a0f221337985036ba86731d6aafe (md: Don't register
> >> sync_thread for reshape directly), this problem doesn't appear.
> >

Hi Kuai
> > How often did you see this tes failed? I'm running the tests for over
> > two days now, for 30+ rounds, and this test never fail in my VM.

I ran 5 times and it failed 2 times just now.

>
> Take a quick look, there is still a path from raid10 that
> MD_RECOVERY_FROZEN can be cleared, and in theroy this problem can be
> triggered. Can you test the following patch on the top of this set?
> I'll keep running the test myself.

Sure, I'll give the result later.

Regards
Xiao
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index a5f8419e2df1..7ca29469123a 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -4575,7 +4575,8 @@ static int raid10_start_reshape(struct mddev *mddev)
>          return 0;
>
>   abort:
> -       mddev->recovery = 0;
> +       if (mddev->gendisk)
> +               mddev->recovery = 0;
>          spin_lock_irq(&conf->device_lock);
>          conf->geo = conf->prev;
>          mddev->raid_disks = conf->geo.raid_disks;
>
> Thanks,
> Kuai
> >
> > Thanks,
> > Kuai
> >
> >>
> >> I put the log in the attachment.
> >>
> >> On Fri, Mar 1, 2024 at 6:03 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>>
> >>> From: Yu Kuai <yukuai3@huawei.com>
> >>>
> >>> link to part1:
> >>> https://lore.kernel.org/all/CAPhsuW7u1UKHCDOBDhD7DzOVtkGemDz_QnJ4DUq_kSN-Q3G66Q@mail.gmail.com/
> >>>
> >>>
> >>> part1 contains fixes for deadlocks for stopping sync_thread
> >>>
> >>> This set contains fixes:
> >>>   - reshape can start unexpected, cause data corruption, patch 1,5,6;
> >>>   - deadlocks that reshape concurrent with IO, patch 8;
> >>>   - a lockdep warning, patch 9;
> >>>
> >>> I'm runing lvm2 tests with following scripts with a few rounds now,
> >>>
> >>> for t in `ls test/shell`; do
> >>>          if cat test/shell/$t | grep raid &> /dev/null; then
> >>>                  make check T=shell/$t
> >>>          fi
> >>> done
> >>>
> >>> There are no deadlock and no fs corrupt now, however, there are still
> >>> four
> >>> failed tests:
> >>>
> >>> ###       failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh
> >>> ###       failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
> >>> ###       failed: [ndev-vanilla] shell/lvcreate-large-raid.sh
> >>> ###       failed: [ndev-vanilla] shell/lvextend-raid.sh
> >>>
> >>> And failed reasons are the same:
> >>>
> >>> ## ERROR: The test started dmeventd (147856) unexpectedly
> >>>
> >>> I have no clue yet, and it seems other folks doesn't have this issue.
> >>>
> >>> Yu Kuai (9):
> >>>    md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume
> >>>    md: export helpers to stop sync_thread
> >>>    md: export helper md_is_rdwr()
> >>>    md: add a new helper reshape_interrupted()
> >>>    dm-raid: really frozen sync_thread during suspend
> >>>    md/dm-raid: don't call md_reap_sync_thread() directly
> >>>    dm-raid: add a new helper prepare_suspend() in md_personality
> >>>    dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io
> >>>      concurrent with reshape
> >>>    dm-raid: fix lockdep waring in "pers->hot_add_disk"
> >>>
> >>>   drivers/md/dm-raid.c | 93 ++++++++++++++++++++++++++++++++++----------
> >>>   drivers/md/md.c      | 73 ++++++++++++++++++++++++++--------
> >>>   drivers/md/md.h      | 38 +++++++++++++++++-
> >>>   drivers/md/raid5.c   | 32 ++++++++++++++-
> >>>   4 files changed, 196 insertions(+), 40 deletions(-)
> >>>
> >>> --
> >>> 2.39.2
> >>>
> >
> >
> > .
> >
>


  reply	other threads:[~2024-03-04  1:26 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-01  9:56 [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2 Yu Kuai
2024-03-01  9:56 ` [PATCH -next 1/9] md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume Yu Kuai
2024-03-01  9:56 ` [PATCH -next 2/9] md: export helpers to stop sync_thread Yu Kuai
2024-03-01  9:56 ` [PATCH -next 3/9] md: export helper md_is_rdwr() Yu Kuai
2024-03-01  9:56 ` [PATCH -next 4/9] md: add a new helper reshape_interrupted() Yu Kuai
2024-03-01  9:56 ` [PATCH -next 5/9] dm-raid: really frozen sync_thread during suspend Yu Kuai
2024-03-01  9:56 ` [PATCH -next 6/9] md/dm-raid: don't call md_reap_sync_thread() directly Yu Kuai
2024-03-01  9:56 ` [PATCH -next 7/9] dm-raid: add a new helper prepare_suspend() in md_personality Yu Kuai
2024-03-01  9:56 ` [PATCH -next 8/9] dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape Yu Kuai
2024-03-01  9:56 ` [PATCH -next 9/9] dm-raid: fix lockdep waring in "pers->hot_add_disk" Yu Kuai
2024-03-01 22:36 ` [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2 Song Liu
2024-03-02 15:56   ` Mike Snitzer
2024-03-03 13:16 ` Xiao Ni
2024-03-04  1:07   ` Yu Kuai
2024-03-04  1:23     ` Yu Kuai
2024-03-04  1:25       ` Xiao Ni [this message]
2024-03-04  8:27         ` Xiao Ni
2024-03-04 11:06           ` Xiao Ni
2024-03-04 11:52             ` Yu Kuai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALTww2-BmudurHbsbbqBMq+KgZs+hokqOJnovS5KDGEidHqZzA@mail.gmail.com \
    --to=xni@redhat.com \
    --cc=agk@redhat.com \
    --cc=dm-devel@lists.linux.dev \
    --cc=heinzm@redhat.com \
    --cc=jbrassow@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=neilb@suse.de \
    --cc=snitzer@kernel.org \
    --cc=song@kernel.org \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai1@huaweicloud.com \
    --cc=yukuai3@huawei.com \
    --cc=zkabelac@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).