xfstests run to run variability

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* xfstests run to run variability
@ 2015-09-01 14:22 Jeff Moyer
  2015-09-01 14:56 ` Theodore Ts'o
  2015-09-01 21:19 ` Dave Chinner
  0 siblings, 2 replies; 3+ messages in thread
From: Jeff Moyer @ 2015-09-01 14:22 UTC (permalink / raw
  To: fstests

Hi,

I typically use ./check -g auto to test for regressions in my patches.
However, I've noticed that there is some run-to-run variability in the
results, even for a single kernel.  Here are the tests that fail, either
reliably, or worse, intermittently:

reproducible failures: generic/042 generic/311 xfs/032 xfs/053 xfs/070 xfs/071
intermittent failures: generic/192 generic/247 generic/232 xfs/167

I don't have time to dig into things right now (and so I haven't
included logs here since I'm not looking for help diagnosing the
issues).  What I'd like to know is if others also see this.  Second, I
was wondering if someone else has the time and motivation to look into
this inconsistency.

In case it's interesting, I run my tests on a Micron P320h PCIe SSD as
the test device, and a regular sata disk as the scratch device.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xfstests run to run variability
  2015-09-01 14:22 xfstests run to run variability Jeff Moyer
@ 2015-09-01 14:56 ` Theodore Ts'o
  2015-09-01 21:19 ` Dave Chinner
  1 sibling, 0 replies; 3+ messages in thread
From: Theodore Ts'o @ 2015-09-01 14:56 UTC (permalink / raw
  To: Jeff Moyer; +Cc: fstests

On Tue, Sep 01, 2015 at 10:22:34AM -0400, Jeff Moyer wrote:
> Hi,
> 
> I typically use ./check -g auto to test for regressions in my patches.
> However, I've noticed that there is some run-to-run variability in the
> results, even for a single kernel.

I've certainly noticed this for ext4.  My response is to keep an
archive of test results, and keeping an eye on those tests which are
known to be flaky.

I've considered having a way of keeping a database of tests known to
be flaky, and having a program which automates updating that list over
time, but I haven't gotten around to it yet.

At least for ext4, in many cases the flaky tests are often (I'd say at
least 75% of the time when I've investigated) a bug in the file system
as opposed to the test.  There are a large number of bigalloc failures
which are due to the fact that xfs doesn't support file systems where
the block size can be different from the cluster allocation size, but
Eric Whitney was going to work on patches to enhance xfstests to
support this.

Probably not helpful for you since you're looking at flaky tests for
xfs.  I can say that I don't see an intersection between the flaky
tests for ext4 and the flaky generic tests you've listed for xfs:

> intermittent failures: generic/192 generic/247 generic/232 xfs/167

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xfstests run to run variability
  2015-09-01 14:22 xfstests run to run variability Jeff Moyer
  2015-09-01 14:56 ` Theodore Ts'o
@ 2015-09-01 21:19 ` Dave Chinner
  1 sibling, 0 replies; 3+ messages in thread
From: Dave Chinner @ 2015-09-01 21:19 UTC (permalink / raw
  To: Jeff Moyer; +Cc: fstests

On Tue, Sep 01, 2015 at 10:22:34AM -0400, Jeff Moyer wrote:
> Hi,
> 
> I typically use ./check -g auto to test for regressions in my patches.
> However, I've noticed that there is some run-to-run variability in the
> results, even for a single kernel.  Here are the tests that fail, either
> reliably, or worse, intermittently:

What kernel, what xfsprogs version, xfs_info $TEST_DIR, etc.

> 
> reproducible failures: generic/042 generic/311 xfs/032 xfs/053 xfs/070 xfs/071

generic/042 will fail for XFS - the test needs fixing IIRC. You can
ignore it.

generic/311 has been failing intermittently for me on XFS since
4.2-rc1, and I can't reproduce it reliably enough to triage it.
Failure mode is a hash mismatch:

    --- tests/generic/311.out   2014-01-20 16:57:33.000000000 +1100
    +++ /home/dave/src/xfstests-dev/results//generic/311.out.bad        2015-08-28 16:29:29.000000000 +1000
    @@ -166,7 +166,7 @@
     Running test 11 direct, normal suspend
     Random seed is 11
     1144c9b3147873328cf4e81d066cd3da
    -1144c9b3147873328cf4e81d066cd3da
    +95cbe2ba4a2ace65edc71ab9165ceed2
     Running test 11 buffered, nolockfs
     Random seed is 11
    ...
    (Run 'diff -u tests/generic/311.out /home/dave/src/xfstests-dev/results//generic/311.out.bad'  to see the entire diff)

I suspect either another sync regression in the memcg-aware
writeback patches that landed in 4.2-rc1, but as yet I'm unable to
reproduce it reliably.

xfs/053 requires a TOT xfsprogs (i.e. 4.2.0-rcX and a 4.2 kernel to
pass (recently found problem, new test, new fixes)

xfs/032, xfs/070 and xfs/071 haven't failed for me for a long, long
time, so without more info I can't really say anything about it.

> intermittent failures: generic/192 generic/247 generic/232 xfs/167

generic/192 should not fail - it's just an atime test. I don't
recall ever seeing it fail.

generic/247 throws warnings on XFS because it's exercising mmap vs
direct IO to the same file and we explicitly make XFS to tell us
when an application is doing this and we hit a potential data
corruption event (e.g. invalidation fails during direct IO due to
racing page fault in the invalidation range). It's a race condition,
so it occurs intermittently. The test fails when this happens since
the test harness grew generic dmesg warning detection. You can
ignore it.

generic/232 is a fstress vs quota reporting test. The space usage
can vary slightly as fsstress does random operations, and when
there's unexpected extra metadata on disk (e.g. a directory btree
was split in an unusual way) the quota counts can be slightly higher
than expected and the test reports a failure. No big deal, happened
a lot more with older kernels than it does now, I haven't seen it
fail for months, you can ignore it.

xfs/167 is doing the same to me as generic/311. It's worked for a
long time, but since 4.2.-rc1 its failed a couple of times, but not
enough to be able to debug the failures.

> In case it's interesting, I run my tests on a Micron P320h PCIe SSD as
> the test device, and a regular sata disk as the scratch device.

Shouldn't make any difference - I test on all sorts of different
speed block devices, from ram disks to local sata to iscsi.  Results
are pretty consistent for me, regardless of the backing store.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-09-01 21:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-01 14:22 xfstests run to run variability Jeff Moyer
2015-09-01 14:56 ` Theodore Ts'o
2015-09-01 21:19 ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.