Re: [BUG] generic/475 recovery failure(s)

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

From: Brian Foster <bfoster@redhat.com>
To: linux-xfs@vger.kernel.org
Subject: Re: [BUG] generic/475 recovery failure(s)
Date: Fri, 11 Jun 2021 15:02:39 -0400	[thread overview]
Message-ID: <YMOzT1goreWVgo8S@bfoster> (raw)
In-Reply-To: <YMIsWJ0Cb2ot/UjG@bfoster>

On Thu, Jun 10, 2021 at 11:14:32AM -0400, Brian Foster wrote:
> Hi all,
> 
> I'm seeing what looks like at least one new generic/475 failure on
> current for-next. (I've seen one related to an attr buffer that seems to
> be older and harder to reproduce.). The test devices are a couple ~15GB
> lvm devices formatted with mkfs defaults. I'm still trying to establish
> reproducibility, but so far a failure seems fairly reliable within ~30
> iterations.
> 
> The first [1] looks like log recovery failure processing an EFI. The
> second variant [2] looks like it passes log recovery, but then fails the
> mount in the COW extent cleanup stage due to a refcountbt problem. I've
> also seen one that looks like the same free space corruption error as
> [1], but triggered via the COW recovery codepath in [2], so these could
> very well be related. A snippet of the dmesg output for each failed
> mount is appended below.
> 
...

A couple updates..

First (as noted on irc), the generic/475 failure is not new as I was
able to produce it on vanilla 5.13.0-rc4. I'm not quite sure how far
back that one goes, but Dave noted he's seen it on occasion for some
time.

The generic/019 failure I'm seeing does appear to be new as I cannot
reproduce on 5.13.0-rc4. This failure looks more like silent fs
corruption. I.e., the test or log recovery doesn't explicitly fail, but
the post-test xfs_repair check detects corruption. Example xfs_repair
output is appended below (note that 'xfs_repair -n' actually crashes,
while destructive repair seems to work). Since this reproduces fairly
reliably on for-next, I bisected it (while also navigating an unmount
hang that I don't otherwise have data on) down to facd77e4e38b ("xfs:
CIL work is serialised, not pipelined"). From a quick glance at that I'm
not quite sure what the problem is there, just that it doesn't occur
prior to that particular commit.

Brian

--- 8< ---

019.full:

*** xfs_repair -n output ***
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
bad btree key (is 2222043, should be 2222328) in inode 2201
		data fork, btree block 3411717
bad nblocks 54669 for inode 2201, would reset to 54667
bad nextents 54402 for inode 2201, would reset to 54400
out-of-order bmap key (file offset) in inode 2202, data fork, fsbno 1056544
bad data fork in inode 2202
would have cleared inode 2202
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 3
entry "stress_dio_aio_activity.2.0" in shortform directory 128 references free inode 2202
        - agno = 1
would have junked entry "stress_dio_aio_activity.2.0" in directory inode 128
bad btree key (is 2222043, should be 2222328) in inode 2201
		data fork, btree block 3411717
bad nblocks 54669 for inode 2201, would reset to 54667
bad nextents 54402 for inode 2201, would reset to 54400
out-of-order bmap key (file offset) in inode 2202, data fork, fsbno 1056544
would have cleared inode 2202
xfs_repair: rmap.c:1267: fix_inode_reflink_flags: Assertion `(irec->ino_is_rl & irec->ir_free) == 0' failed.
Incorrect reference count: saw (3/3386) len 1 nlinks 29; should be (3/4370) len 1 nlinks 2
*** end xfs_repair output

next prev parent reply	other threads:[~2021-06-11 19:02 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10 15:14 [BUG] generic/475 recovery failure(s) Brian Foster
2021-06-11 19:02 ` Brian Foster [this message]
2021-06-11 22:33   ` Dave Chinner
     [not found]     ` <YMdMehWQoBJC9l0W@bfoster>
2021-06-14 12:56       ` Brian Foster
2021-06-14 23:41         ` Dave Chinner
2021-06-15  4:39           ` Dave Chinner
2021-06-16  7:05     ` Dave Chinner
2021-06-16 20:33       ` Brian Foster
2021-06-16 21:05         ` Darrick J. Wong
2021-06-16 22:54           ` Dave Chinner
2021-06-17  1:28             ` Darrick J. Wong
2021-06-17 12:52           ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YMOzT1goreWVgo8S@bfoster \
    --to=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.