From: Zorro Lang <zlang@redhat.com>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: fstests@vger.kernel.org, kdevops@lists.linux.dev,
linux-xfs@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org, willy@infradead.org,
david@redhat.com, linmiaohe@huawei.com, muchun.song@linux.dev,
osalvador@suse.de
Subject: Re: [PATCH] fstests: add fsstress + compaction test
Date: Sat, 20 Apr 2024 22:02:41 +0800 [thread overview]
Message-ID: <20240420140241.wez2x3zoirzlmat6@dell-per750-06-vm-08.rhts.eng.pek2.redhat.com> (raw)
In-Reply-To: <20240418001356.95857-1-mcgrof@kernel.org>
On Wed, Apr 17, 2024 at 05:13:56PM -0700, Luis Chamberlain wrote:
> Running compaction while we run fsstress can crash older kernels as per
> korg#218227 [0], the fix for that [0] has been posted [1] but that patch
> is not yet on v6.9-rc4 and the patch requires changes for v6.9.
>
> Today I find that v6.9-rc4 is also hitting an unrecoverable hung task
> between compaction and fsstress while running generic/476 on the
> following kdevops test sections [2]:
>
> * xfs_nocrc
> * xfs_nocrc_2k
> * xfs_nocrc_4k
>
> Analyzing the trace I see the guest uses loopback block devices for the
> fstests TEST_DEV, the loopback file uses sparsefiles on a btrfs
> partition. The contention based on traces [3] [4] seems to be that we
> have somehow have fsstress + compaction race on folio_wait_bit_common().
>
> We have this happening:
>
> a) kthread compaction --> migrate_pages_batch()
> --> folio_wait_bit_common()
> b) workqueue on btrfs writeback wb_workfn --> extent_write_cache_pages()
> --> folio_wait_bit_common()
> c) workqueue on loopback loop_rootcg_workfn() --> filemap_fdatawrite_wbc()
> --> folio_wait_bit_common()
> d) kthread xfsaild --> blk_mq_submit_bio() --> wbt_wait()
>
> I tried to reproduce but couldn't easily do so, so I wrote this test
> to help, and with this I have 100% failure rate so far out of 2 runs.
>
> Given we also have korg#218227 and that patch likely needing
> backporting, folks will want a reproducer for this issue. This should
> hopefully help with that case and this new separate issue.
>
> To reproduce with kdevops just:
>
> make defconfig-xfs_nocrc_2k -j $(nproc)
> make -j $(nproc)
> make fstests
> make linux
> make fstests-baseline TESTS=generic/733
> tail -f guestfs/*-xfs-nocrc-2k/console.log
>
> [0] https://bugzilla.kernel.org/show_bug.cgi?id=218227
> [1] https://lore.kernel.org/all/7ee2bb8c-441a-418b-ba3a-d305f69d31c8@suse.cz/T/#u
> [2] https://github.com/linux-kdevops/kdevops/blob/main/playbooks/roles/fstests/templates/xfs/xfs.config
> [3] https://gist.github.com/mcgrof/4dfa3264f513ce6ca398414326cfab84
> [4] https://gist.github.com/mcgrof/f40a9f31a43793dac928ce287cfacfeb
>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>
> Note: kdevops uses its own fork of fstests which has this merged
> already, so the above should just work. If it's your first time using
> kdevops be sure to just read the README for the first time users:
>
> https://github.com/linux-kdevops/kdevops/blob/main/docs/kdevops-first-run.md
>
> common/rc | 7 ++++++
> tests/generic/744 | 56 +++++++++++++++++++++++++++++++++++++++++++
> tests/generic/744.out | 2 ++
> 3 files changed, 65 insertions(+)
> create mode 100755 tests/generic/744
> create mode 100644 tests/generic/744.out
>
> diff --git a/common/rc b/common/rc
> index b7b77ac1b46d..d4432f5ce259 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -120,6 +120,13 @@ _require_hugepages()
> _notrun "Kernel does not report huge page size"
> }
>
> +# Requires CONFIG_COMPACTION
> +_require_compaction()
I'm not sure if we should name it as "_require_vm_compaction", does linux
have other "compaction" or only memory compaction?
> +{
> + if [ ! -f /proc/sys/vm/compact_memory ]; then
> + _notrun "Need compaction enabled CONFIG_COMPACTION=y"
> + fi
> +}
> # Get hugepagesize in bytes
> _get_hugepagesize()
> {
> diff --git a/tests/generic/744 b/tests/generic/744
> new file mode 100755
> index 000000000000..2b3c0c7e92fb
> --- /dev/null
> +++ b/tests/generic/744
> @@ -0,0 +1,56 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2024 Luis Chamberlain. All Rights Reserved.
> +#
> +# FS QA Test 744
> +#
> +# fsstress + compaction test
fsstress + memory compaction ?
Looks like this case is copied from g/476, just add memory_compaction
test. That makes sense to me from the test side.
I'm a bit confused on your discussion about an old bug and a new bug(?)
you just found. Looks like you're reporting a bug, and provide a test
case to fstests@ by the way. Anyway, I think there's not objection on
this test itself, right? And is this test for someone known bug or not?
> +#
> +. ./common/preamble
> +_begin_fstest auto rw long_rw stress soak smoketest
> +
> +_cleanup()
> +{
> + cd /
> + rm -f $tmp.*
> + $KILLALL_PROG -9 fsstress > /dev/null 2>&1
> +}
> +
> +# Import common functions.
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
Useless comment~
> +_supported_fs generic
> +
> +_require_scratch
> +_require_compaction
> +_require_command "$KILLALL_PROG" "killall"
> +
> +echo "Silence is golden."
> +
> +_scratch_mkfs > $seqres.full 2>&1
> +_scratch_mount >> $seqres.full 2>&1
> +
> +nr_cpus=$((LOAD_FACTOR * 4))
> +nr_ops=$((25000 * nr_cpus * TIME_FACTOR))
> +fsstress_args=(-w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus)
> +
> +# start a background getxattr loop for the existing xattr
> +runfile="$tmp.getfattr"
> +touch $runfile
> +while [ -e $runfile ]; do
> + echo 1 > /proc/sys/vm/compact_memory
> + sleep 15
> +done &
> +getfattr_pid=$!
I didn't see any other place use this "getfattr_pid". Better to deal with
it in _cleanup().
> +
> +test -n "$SOAK_DURATION" && fsstress_args+=(--duration="$SOAK_DURATION")
> +
> +$FSSTRESS_PROG $FSSTRESS_AVOID "${fsstress_args[@]}" >> $seqres.full
> +
> +rm -f $runfile
> +wait > /dev/null 2>&1
Better to do these things in _cleanup() function, make sure all background
processes can be done in _cleanup.
> +
> +status=0
> +exit
> diff --git a/tests/generic/744.out b/tests/generic/744.out
> new file mode 100644
> index 000000000000..205c684fa995
> --- /dev/null
> +++ b/tests/generic/744.out
> @@ -0,0 +1,2 @@
> +QA output created by 744
> +Silence is golden
> --
> 2.43.0
>
>
prev parent reply other threads:[~2024-04-20 14:02 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-18 0:13 [PATCH] fstests: add fsstress + compaction test Luis Chamberlain
2024-04-18 1:39 ` Matthew Wilcox
2024-04-18 6:42 ` Luis Chamberlain
2024-04-18 13:30 ` Matthew Wilcox
2024-04-18 6:57 ` Christoph Hellwig
2024-04-18 9:19 ` Vlastimil Babka
2024-04-18 18:45 ` Andrew Morton
2024-04-18 19:01 ` Matthew Wilcox
2024-04-19 7:51 ` Vlastimil Babka
2024-04-19 17:25 ` Luis Chamberlain
2024-04-20 14:02 ` Zorro Lang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240420140241.wez2x3zoirzlmat6@dell-per750-06-vm-08.rhts.eng.pek2.redhat.com \
--to=zlang@redhat.com \
--cc=david@redhat.com \
--cc=fstests@vger.kernel.org \
--cc=kdevops@lists.linux.dev \
--cc=linmiaohe@huawei.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).