DM-Devel Archive mirror
 help / color / mirror / Atom feed
From: Yu Kuai <yukuai1@huaweicloud.com>
To: Yu Kuai <yukuai1@huaweicloud.com>, Xiao Ni <xni@redhat.com>
Cc: zkabelac@redhat.com, agk@redhat.com, snitzer@kernel.org,
	mpatocka@redhat.com, dm-devel@lists.linux.dev, song@kernel.org,
	heinzm@redhat.com, neilb@suse.de, jbrassow@redhat.com,
	linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	yi.zhang@huawei.com, yangerkun@huawei.com,
	"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2
Date: Mon, 4 Mar 2024 09:23:56 +0800	[thread overview]
Message-ID: <35feaa54-db9e-f0d6-d5a5-a10a45bb90a5@huaweicloud.com> (raw)
In-Reply-To: <0091f7d1-2273-16ff-8285-5fa3f7e2e0f7@huaweicloud.com>

Hi,

在 2024/03/04 9:07, Yu Kuai 写道:
> Hi,
> 
> 在 2024/03/03 21:16, Xiao Ni 写道:
>> Hi all
>>
>> There is a error report from lvm regression tests. The case is
>> lvconvert-raid-reshape-stripes-load-reload.sh. I saw this error when I
>> tried to fix dmraid regression problems too. In my patch set,  after
>> reverting ad39c08186f8a0f221337985036ba86731d6aafe (md: Don't register
>> sync_thread for reshape directly), this problem doesn't appear.
> 
> How often did you see this tes failed? I'm running the tests for over
> two days now, for 30+ rounds, and this test never fail in my VM.

Take a quick look, there is still a path from raid10 that
MD_RECOVERY_FROZEN can be cleared, and in theroy this problem can be
triggered. Can you test the following patch on the top of this set?
I'll keep running the test myself.

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index a5f8419e2df1..7ca29469123a 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4575,7 +4575,8 @@ static int raid10_start_reshape(struct mddev *mddev)
         return 0;

  abort:
-       mddev->recovery = 0;
+       if (mddev->gendisk)
+               mddev->recovery = 0;
         spin_lock_irq(&conf->device_lock);
         conf->geo = conf->prev;
         mddev->raid_disks = conf->geo.raid_disks;

Thanks,
Kuai
> 
> Thanks,
> Kuai
> 
>>
>> I put the log in the attachment.
>>
>> On Fri, Mar 1, 2024 at 6:03 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> link to part1: 
>>> https://lore.kernel.org/all/CAPhsuW7u1UKHCDOBDhD7DzOVtkGemDz_QnJ4DUq_kSN-Q3G66Q@mail.gmail.com/ 
>>>
>>>
>>> part1 contains fixes for deadlocks for stopping sync_thread
>>>
>>> This set contains fixes:
>>>   - reshape can start unexpected, cause data corruption, patch 1,5,6;
>>>   - deadlocks that reshape concurrent with IO, patch 8;
>>>   - a lockdep warning, patch 9;
>>>
>>> I'm runing lvm2 tests with following scripts with a few rounds now,
>>>
>>> for t in `ls test/shell`; do
>>>          if cat test/shell/$t | grep raid &> /dev/null; then
>>>                  make check T=shell/$t
>>>          fi
>>> done
>>>
>>> There are no deadlock and no fs corrupt now, however, there are still 
>>> four
>>> failed tests:
>>>
>>> ###       failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh
>>> ###       failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
>>> ###       failed: [ndev-vanilla] shell/lvcreate-large-raid.sh
>>> ###       failed: [ndev-vanilla] shell/lvextend-raid.sh
>>>
>>> And failed reasons are the same:
>>>
>>> ## ERROR: The test started dmeventd (147856) unexpectedly
>>>
>>> I have no clue yet, and it seems other folks doesn't have this issue.
>>>
>>> Yu Kuai (9):
>>>    md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume
>>>    md: export helpers to stop sync_thread
>>>    md: export helper md_is_rdwr()
>>>    md: add a new helper reshape_interrupted()
>>>    dm-raid: really frozen sync_thread during suspend
>>>    md/dm-raid: don't call md_reap_sync_thread() directly
>>>    dm-raid: add a new helper prepare_suspend() in md_personality
>>>    dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io
>>>      concurrent with reshape
>>>    dm-raid: fix lockdep waring in "pers->hot_add_disk"
>>>
>>>   drivers/md/dm-raid.c | 93 ++++++++++++++++++++++++++++++++++----------
>>>   drivers/md/md.c      | 73 ++++++++++++++++++++++++++--------
>>>   drivers/md/md.h      | 38 +++++++++++++++++-
>>>   drivers/md/raid5.c   | 32 ++++++++++++++-
>>>   4 files changed, 196 insertions(+), 40 deletions(-)
>>>
>>> -- 
>>> 2.39.2
>>>
> 
> 
> .
> 


  reply	other threads:[~2024-03-04  1:24 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-01  9:56 [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2 Yu Kuai
2024-03-01  9:56 ` [PATCH -next 1/9] md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume Yu Kuai
2024-03-01  9:56 ` [PATCH -next 2/9] md: export helpers to stop sync_thread Yu Kuai
2024-03-01  9:56 ` [PATCH -next 3/9] md: export helper md_is_rdwr() Yu Kuai
2024-03-01  9:56 ` [PATCH -next 4/9] md: add a new helper reshape_interrupted() Yu Kuai
2024-03-01  9:56 ` [PATCH -next 5/9] dm-raid: really frozen sync_thread during suspend Yu Kuai
2024-03-01  9:56 ` [PATCH -next 6/9] md/dm-raid: don't call md_reap_sync_thread() directly Yu Kuai
2024-03-01  9:56 ` [PATCH -next 7/9] dm-raid: add a new helper prepare_suspend() in md_personality Yu Kuai
2024-03-01  9:56 ` [PATCH -next 8/9] dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape Yu Kuai
2024-03-01  9:56 ` [PATCH -next 9/9] dm-raid: fix lockdep waring in "pers->hot_add_disk" Yu Kuai
2024-03-01 22:36 ` [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2 Song Liu
2024-03-02 15:56   ` Mike Snitzer
2024-03-03 13:16 ` Xiao Ni
2024-03-04  1:07   ` Yu Kuai
2024-03-04  1:23     ` Yu Kuai [this message]
2024-03-04  1:25       ` Xiao Ni
2024-03-04  8:27         ` Xiao Ni
2024-03-04 11:06           ` Xiao Ni
2024-03-04 11:52             ` Yu Kuai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=35feaa54-db9e-f0d6-d5a5-a10a45bb90a5@huaweicloud.com \
    --to=yukuai1@huaweicloud.com \
    --cc=agk@redhat.com \
    --cc=dm-devel@lists.linux.dev \
    --cc=heinzm@redhat.com \
    --cc=jbrassow@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=neilb@suse.de \
    --cc=snitzer@kernel.org \
    --cc=song@kernel.org \
    --cc=xni@redhat.com \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    --cc=zkabelac@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).