From: Matthew Wilcox <willy@infradead.org>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: fstests@vger.kernel.org, kdevops@lists.linux.dev,
linux-xfs@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org, david@redhat.com,
linmiaohe@huawei.com, muchun.song@linux.dev, osalvador@suse.de
Subject: Re: [PATCH] fstests: add fsstress + compaction test
Date: Thu, 18 Apr 2024 02:39:19 +0100 [thread overview]
Message-ID: <ZiB5x-EKrmb1ZPuf@casper.infradead.org> (raw)
In-Reply-To: <20240418001356.95857-1-mcgrof@kernel.org>
On Wed, Apr 17, 2024 at 05:13:56PM -0700, Luis Chamberlain wrote:
> Running compaction while we run fsstress can crash older kernels as per
> korg#218227 [0], the fix for that [0] has been posted [1] but that patch
> is not yet on v6.9-rc4 and the patch requires changes for v6.9.
It doesn't require changes, it just has prerequisites:
https://lore.kernel.org/all/ZgHhcojXc9QjynUI@casper.infradead.org/
> Today I find that v6.9-rc4 is also hitting an unrecoverable hung task
> between compaction and fsstress while running generic/476 on the
> following kdevops test sections [2]:
>
> * xfs_nocrc
> * xfs_nocrc_2k
> * xfs_nocrc_4k
>
> Analyzing the trace I see the guest uses loopback block devices for the
> fstests TEST_DEV, the loopback file uses sparsefiles on a btrfs
> partition. The contention based on traces [3] [4] seems to be that we
> have somehow have fsstress + compaction race on folio_wait_bit_common().
What do you mean by "race"? Here's what I see:
Apr 16 23:06:11 base-xfs-nocrc-2k kernel: INFO: task kcompactd0:72 blocked for more than 120 seconds.
Apr 16 23:06:11 base-xfs-nocrc-2k kernel: Not tainted 6.9.0-rc4 #4
Apr 16 23:06:11 base-xfs-nocrc-2k kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 16 23:06:11 base-xfs-nocrc-2k kernel: task:kcompactd0 state:D stack:0 pid:72 tgid:72 ppid:2 flags:0x00004000
Apr 16 23:06:11 base-xfs-nocrc-2k kernel: Call Trace:
Apr 16 23:06:11 base-xfs-nocrc-2k kernel: <TASK>
Apr 16 23:06:11 base-xfs-nocrc-2k kernel: __schedule+0x3d9/0xaf0
Apr 16 23:06:11 base-xfs-nocrc-2k kernel: schedule+0x26/0xf0
Apr 16 23:06:11 base-xfs-nocrc-2k kernel: io_schedule+0x42/0x70
Apr 16 23:06:11 base-xfs-nocrc-2k kernel: folio_wait_bit_common+0x123/0x370
Apr 16 23:06:11 base-xfs-nocrc-2k kernel: ? __pfx_wake_page_function+0x10/0x10
Apr 16 23:06:11 base-xfs-nocrc-2k kernel: migrate_pages_batch+0x69a/0xd70
But you didn't run the backtrace through scripts/decode_stacktrace.sh
so I can't figure out what we're waiting on.
> We have this happening:
>
> a) kthread compaction --> migrate_pages_batch()
> --> folio_wait_bit_common()
> b) workqueue on btrfs writeback wb_workfn --> extent_write_cache_pages()
> --> folio_wait_bit_common()
> c) workqueue on loopback loop_rootcg_workfn() --> filemap_fdatawrite_wbc()
> --> folio_wait_bit_common()
> d) kthread xfsaild --> blk_mq_submit_bio() --> wbt_wait()
>
> I tried to reproduce but couldn't easily do so, so I wrote this test
> to help, and with this I have 100% failure rate so far out of 2 runs.
>
> Given we also have korg#218227 and that patch likely needing
> backporting, folks will want a reproducer for this issue. This should
> hopefully help with that case and this new separate issue.
>
> To reproduce with kdevops just:
>
> make defconfig-xfs_nocrc_2k -j $(nproc)
> make -j $(nproc)
> make fstests
> make linux
> make fstests-baseline TESTS=generic/733
> tail -f guestfs/*-xfs-nocrc-2k/console.log
>
> [0] https://bugzilla.kernel.org/show_bug.cgi?id=218227
> [1] https://lore.kernel.org/all/7ee2bb8c-441a-418b-ba3a-d305f69d31c8@suse.cz/T/#u
> [2] https://github.com/linux-kdevops/kdevops/blob/main/playbooks/roles/fstests/templates/xfs/xfs.config
> [3] https://gist.github.com/mcgrof/4dfa3264f513ce6ca398414326cfab84
> [4] https://gist.github.com/mcgrof/f40a9f31a43793dac928ce287cfacfeb
>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>
> Note: kdevops uses its own fork of fstests which has this merged
> already, so the above should just work. If it's your first time using
> kdevops be sure to just read the README for the first time users:
>
> https://github.com/linux-kdevops/kdevops/blob/main/docs/kdevops-first-run.md
>
> common/rc | 7 ++++++
> tests/generic/744 | 56 +++++++++++++++++++++++++++++++++++++++++++
> tests/generic/744.out | 2 ++
> 3 files changed, 65 insertions(+)
> create mode 100755 tests/generic/744
> create mode 100644 tests/generic/744.out
>
> diff --git a/common/rc b/common/rc
> index b7b77ac1b46d..d4432f5ce259 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -120,6 +120,13 @@ _require_hugepages()
> _notrun "Kernel does not report huge page size"
> }
>
> +# Requires CONFIG_COMPACTION
> +_require_compaction()
> +{
> + if [ ! -f /proc/sys/vm/compact_memory ]; then
> + _notrun "Need compaction enabled CONFIG_COMPACTION=y"
> + fi
> +}
> # Get hugepagesize in bytes
> _get_hugepagesize()
> {
> diff --git a/tests/generic/744 b/tests/generic/744
> new file mode 100755
> index 000000000000..2b3c0c7e92fb
> --- /dev/null
> +++ b/tests/generic/744
> @@ -0,0 +1,56 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2024 Luis Chamberlain. All Rights Reserved.
> +#
> +# FS QA Test 744
> +#
> +# fsstress + compaction test
> +#
> +. ./common/preamble
> +_begin_fstest auto rw long_rw stress soak smoketest
> +
> +_cleanup()
> +{
> + cd /
> + rm -f $tmp.*
> + $KILLALL_PROG -9 fsstress > /dev/null 2>&1
> +}
> +
> +# Import common functions.
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs generic
> +
> +_require_scratch
> +_require_compaction
> +_require_command "$KILLALL_PROG" "killall"
> +
> +echo "Silence is golden."
> +
> +_scratch_mkfs > $seqres.full 2>&1
> +_scratch_mount >> $seqres.full 2>&1
> +
> +nr_cpus=$((LOAD_FACTOR * 4))
> +nr_ops=$((25000 * nr_cpus * TIME_FACTOR))
> +fsstress_args=(-w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus)
> +
> +# start a background getxattr loop for the existing xattr
> +runfile="$tmp.getfattr"
> +touch $runfile
> +while [ -e $runfile ]; do
> + echo 1 > /proc/sys/vm/compact_memory
> + sleep 15
> +done &
> +getfattr_pid=$!
> +
> +test -n "$SOAK_DURATION" && fsstress_args+=(--duration="$SOAK_DURATION")
> +
> +$FSSTRESS_PROG $FSSTRESS_AVOID "${fsstress_args[@]}" >> $seqres.full
> +
> +rm -f $runfile
> +wait > /dev/null 2>&1
> +
> +status=0
> +exit
> diff --git a/tests/generic/744.out b/tests/generic/744.out
> new file mode 100644
> index 000000000000..205c684fa995
> --- /dev/null
> +++ b/tests/generic/744.out
> @@ -0,0 +1,2 @@
> +QA output created by 744
> +Silence is golden
> --
> 2.43.0
>
next prev parent reply other threads:[~2024-04-18 1:39 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-18 0:13 [PATCH] fstests: add fsstress + compaction test Luis Chamberlain
2024-04-18 1:39 ` Matthew Wilcox [this message]
2024-04-18 6:42 ` Luis Chamberlain
2024-04-18 13:30 ` Matthew Wilcox
2024-04-18 6:57 ` Christoph Hellwig
2024-04-18 9:19 ` Vlastimil Babka
2024-04-18 18:45 ` Andrew Morton
2024-04-18 19:01 ` Matthew Wilcox
2024-04-19 7:51 ` Vlastimil Babka
2024-04-19 17:25 ` Luis Chamberlain
2024-04-20 14:02 ` Zorro Lang
2024-05-28 22:58 ` Luis Chamberlain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZiB5x-EKrmb1ZPuf@casper.infradead.org \
--to=willy@infradead.org \
--cc=david@redhat.com \
--cc=fstests@vger.kernel.org \
--cc=kdevops@lists.linux.dev \
--cc=linmiaohe@huawei.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).