DM-Devel Archive mirror
 help / color / mirror / Atom feed
From: Yu Kuai <yukuai1@huaweicloud.com>
To: Xiao Ni <xni@redhat.com>, Yu Kuai <yukuai1@huaweicloud.com>
Cc: zkabelac@redhat.com, agk@redhat.com, snitzer@kernel.org,
	mpatocka@redhat.com, dm-devel@lists.linux.dev, song@kernel.org,
	heinzm@redhat.com, neilb@suse.de, jbrassow@redhat.com,
	linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	yi.zhang@huawei.com, yangerkun@huawei.com,
	"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2
Date: Mon, 4 Mar 2024 19:52:52 +0800	[thread overview]
Message-ID: <c87f249a-2bfd-edd2-887d-87413bd044d7@huaweicloud.com> (raw)
In-Reply-To: <CALTww2_8B1XMaFEBtPeWae0Gse7ngqZuuRZMn32BdfW2-M8uYA@mail.gmail.com>

Hi,

在 2024/03/04 19:06, Xiao Ni 写道:
> On Mon, Mar 4, 2024 at 4:27 PM Xiao Ni <xni@redhat.com> wrote:
>>
>> On Mon, Mar 4, 2024 at 9:25 AM Xiao Ni <xni@redhat.com> wrote:
>>>
>>> On Mon, Mar 4, 2024 at 9:24 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> 在 2024/03/04 9:07, Yu Kuai 写道:
>>>>> Hi,
>>>>>
>>>>> 在 2024/03/03 21:16, Xiao Ni 写道:
>>>>>> Hi all
>>>>>>
>>>>>> There is a error report from lvm regression tests. The case is
>>>>>> lvconvert-raid-reshape-stripes-load-reload.sh. I saw this error when I
>>>>>> tried to fix dmraid regression problems too. In my patch set,  after
>>>>>> reverting ad39c08186f8a0f221337985036ba86731d6aafe (md: Don't register
>>>>>> sync_thread for reshape directly), this problem doesn't appear.
>>>>>
>>>
>>> Hi Kuai
>>>>> How often did you see this tes failed? I'm running the tests for over
>>>>> two days now, for 30+ rounds, and this test never fail in my VM.
>>>
>>> I ran 5 times and it failed 2 times just now.
>>>
>>>>
>>>> Take a quick look, there is still a path from raid10 that
>>>> MD_RECOVERY_FROZEN can be cleared, and in theroy this problem can be
>>>> triggered. Can you test the following patch on the top of this set?
>>>> I'll keep running the test myself.
>>>
>>> Sure, I'll give the result later.
>>
>> Hi all
>>
>> It's not stable to reproduce this. After applying this raid10 patch it
>> failed once 28 times. Without the raid10 patch, it failed once 30
>> times, but it failed frequently this morning.
> 
> Hi all
> 
> After running 152 times with kernel 6.6, the problem can appear too.
> So it can return the state of 6.6. This patch set can make this
> problem appear quickly.

I verified in my VM that after testing 100+ times, this problem can both
triggered with v6.6 and v6.8-rc5 + this set.

I think we can merge this patchset, and figure out why the test can fail
later.

Thanks,
Kuai


> 
> Best Regards
> Xiao
> 
> 
>>
>> Regards
>> Xiao
>>>
>>> Regards
>>> Xiao
>>>>
>>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>>>> index a5f8419e2df1..7ca29469123a 100644
>>>> --- a/drivers/md/raid10.c
>>>> +++ b/drivers/md/raid10.c
>>>> @@ -4575,7 +4575,8 @@ static int raid10_start_reshape(struct mddev *mddev)
>>>>           return 0;
>>>>
>>>>    abort:
>>>> -       mddev->recovery = 0;
>>>> +       if (mddev->gendisk)
>>>> +               mddev->recovery = 0;
>>>>           spin_lock_irq(&conf->device_lock);
>>>>           conf->geo = conf->prev;
>>>>           mddev->raid_disks = conf->geo.raid_disks;
>>>>
>>>> Thanks,
>>>> Kuai
>>>>>
>>>>> Thanks,
>>>>> Kuai
>>>>>
>>>>>>
>>>>>> I put the log in the attachment.
>>>>>>
>>>>>> On Fri, Mar 1, 2024 at 6:03 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>>>>
>>>>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>>>>
>>>>>>> link to part1:
>>>>>>> https://lore.kernel.org/all/CAPhsuW7u1UKHCDOBDhD7DzOVtkGemDz_QnJ4DUq_kSN-Q3G66Q@mail.gmail.com/
>>>>>>>
>>>>>>>
>>>>>>> part1 contains fixes for deadlocks for stopping sync_thread
>>>>>>>
>>>>>>> This set contains fixes:
>>>>>>>    - reshape can start unexpected, cause data corruption, patch 1,5,6;
>>>>>>>    - deadlocks that reshape concurrent with IO, patch 8;
>>>>>>>    - a lockdep warning, patch 9;
>>>>>>>
>>>>>>> I'm runing lvm2 tests with following scripts with a few rounds now,
>>>>>>>
>>>>>>> for t in `ls test/shell`; do
>>>>>>>           if cat test/shell/$t | grep raid &> /dev/null; then
>>>>>>>                   make check T=shell/$t
>>>>>>>           fi
>>>>>>> done
>>>>>>>
>>>>>>> There are no deadlock and no fs corrupt now, however, there are still
>>>>>>> four
>>>>>>> failed tests:
>>>>>>>
>>>>>>> ###       failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh
>>>>>>> ###       failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
>>>>>>> ###       failed: [ndev-vanilla] shell/lvcreate-large-raid.sh
>>>>>>> ###       failed: [ndev-vanilla] shell/lvextend-raid.sh
>>>>>>>
>>>>>>> And failed reasons are the same:
>>>>>>>
>>>>>>> ## ERROR: The test started dmeventd (147856) unexpectedly
>>>>>>>
>>>>>>> I have no clue yet, and it seems other folks doesn't have this issue.
>>>>>>>
>>>>>>> Yu Kuai (9):
>>>>>>>     md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume
>>>>>>>     md: export helpers to stop sync_thread
>>>>>>>     md: export helper md_is_rdwr()
>>>>>>>     md: add a new helper reshape_interrupted()
>>>>>>>     dm-raid: really frozen sync_thread during suspend
>>>>>>>     md/dm-raid: don't call md_reap_sync_thread() directly
>>>>>>>     dm-raid: add a new helper prepare_suspend() in md_personality
>>>>>>>     dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io
>>>>>>>       concurrent with reshape
>>>>>>>     dm-raid: fix lockdep waring in "pers->hot_add_disk"
>>>>>>>
>>>>>>>    drivers/md/dm-raid.c | 93 ++++++++++++++++++++++++++++++++++----------
>>>>>>>    drivers/md/md.c      | 73 ++++++++++++++++++++++++++--------
>>>>>>>    drivers/md/md.h      | 38 +++++++++++++++++-
>>>>>>>    drivers/md/raid5.c   | 32 ++++++++++++++-
>>>>>>>    4 files changed, 196 insertions(+), 40 deletions(-)
>>>>>>>
>>>>>>> --
>>>>>>> 2.39.2
>>>>>>>
>>>>>
>>>>>
>>>>> .
>>>>>
>>>>
> 
> .
> 


      reply	other threads:[~2024-03-04 11:52 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-01  9:56 [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2 Yu Kuai
2024-03-01  9:56 ` [PATCH -next 1/9] md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume Yu Kuai
2024-03-01  9:56 ` [PATCH -next 2/9] md: export helpers to stop sync_thread Yu Kuai
2024-03-01  9:56 ` [PATCH -next 3/9] md: export helper md_is_rdwr() Yu Kuai
2024-03-01  9:56 ` [PATCH -next 4/9] md: add a new helper reshape_interrupted() Yu Kuai
2024-03-01  9:56 ` [PATCH -next 5/9] dm-raid: really frozen sync_thread during suspend Yu Kuai
2024-03-01  9:56 ` [PATCH -next 6/9] md/dm-raid: don't call md_reap_sync_thread() directly Yu Kuai
2024-03-01  9:56 ` [PATCH -next 7/9] dm-raid: add a new helper prepare_suspend() in md_personality Yu Kuai
2024-03-01  9:56 ` [PATCH -next 8/9] dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape Yu Kuai
2024-03-01  9:56 ` [PATCH -next 9/9] dm-raid: fix lockdep waring in "pers->hot_add_disk" Yu Kuai
2024-03-01 22:36 ` [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2 Song Liu
2024-03-02 15:56   ` Mike Snitzer
2024-03-03 13:16 ` Xiao Ni
2024-03-04  1:07   ` Yu Kuai
2024-03-04  1:23     ` Yu Kuai
2024-03-04  1:25       ` Xiao Ni
2024-03-04  8:27         ` Xiao Ni
2024-03-04 11:06           ` Xiao Ni
2024-03-04 11:52             ` Yu Kuai [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c87f249a-2bfd-edd2-887d-87413bd044d7@huaweicloud.com \
    --to=yukuai1@huaweicloud.com \
    --cc=agk@redhat.com \
    --cc=dm-devel@lists.linux.dev \
    --cc=heinzm@redhat.com \
    --cc=jbrassow@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=neilb@suse.de \
    --cc=snitzer@kernel.org \
    --cc=song@kernel.org \
    --cc=xni@redhat.com \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    --cc=zkabelac@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).