From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: BTRFS: read error corrected: ino 1 off 226840576 (dev /dev/mapper/dshelf1 sector 459432)
Date: Thu, 18 Jun 2015 04:32:17 +0000 (UTC) [thread overview]
Message-ID: <pan$e509f$9bb669c$8cfbb039$6d6b7135@cox.net> (raw)
In-Reply-To: 20150617161936.GK16468@merlins.org
Marc MERLIN posted on Wed, 17 Jun 2015 09:19:36 -0700 as excerpted:
> On Wed, Jun 17, 2015 at 01:51:26PM +0000, Duncan wrote:
>> > Also, if my actual data got corrupted, am I correct that btrfs will
>> > detect the checksum failure and give me a different error message of
>> > a read error that cannot be corrected?
>> >
>> > I'll do a scrub later, for now I have to wait 20 hours for the raid
>> > rebuild first.
>>
>> Yes again.
>
> Great, thanks for confirming.
> Makes me happy to know that checksums and metadata DUP are helping me
> out here. With ext4 I'd have been worse off for sure.
>
>> One thing I'd strongly recommend. Once the rebuild is complete and you
>> do the scrub, there may well be both read/corrected errors, and
>> unverified errors. AFAIK, the unverified errors are a result of bad
>> metadata blocks, so missing checksums for what they covered. So once
>> you
>
> I'm slightly confused here. If I have metadata DUP and checksums, how
> can metadata blocks be unverified?
> Data blocks being unverified, I understand, it would mean the data or
> checksum is bad, but I expect that's a different error message I haven't
> seen yet.
Backing up a bit to better explain what I'm seeing here...
What I'm getting here, when the sectors go unreadable on the (slowly)
failing SSD, is actually a SATA level timeout, which btrfs (correctly)
interprets as a read error. But it wouldn't really matter whether it was
a read error or a corruption error, btrfs would respond the same --
because both data and metadata are btrfs raid1 here, it would fetch and
verify the other copy of the block from the raid1 mirror device, and
assuming it verified (which it should since the other device is still in
great condition, zero relocations), rewrite it over the one it couldn't
read.
Back on the failing device, the rewrite triggers a sector relocation, and
assuming it doesn't fall in the bad area too, that block is now clean.
(If it does fall in the defective area, I simply have to repeat the scrub
another time or two, until there are no more errors.)
But, and this is what I was trying to explain earlier but skipped a step
I figured was more obvious than it apparently was, btrfs works with
trees, including a metadata tree. So each block of metadata that has
checksums covering actual data, is in turn itself checksummed by a
metadata block one step closer to the metadata root block, multiple
levels deep.
I should mention here that this is my non-coder understanding. If a dev
says it works differently...
It's these multiple metadata levels and the chained checksums for them,
that I was referencing. Suppose it's a metadata block that fails, not a
data block. That metadata block will be checksummed, and will in turn
contain checksums for other blocks, which might be either data blocks, or
other metadata blocks, a level closer to the data (and further from the
root) than the failed block.
Because the metadata block was failed (either checksum failure or read
error, shouldn't matter at this point), whatever checksums it contained,
whether for data, or for other metadata blocks, will be unverified. If
the affected metadata block is close to the root of the tree, the effect
could in theory domino thru to several further levels.
These checksum unverified blocks (because the block containing the
checksums failed) will show up as unverified errors, and whatever that
checksum was supposed to cover, whether other metadata blocks or data
blocks, won't be checked in that scrub round, because the level above it
can't be verified.
Given a checksum-verified raid1 copy on the mirror device, the original
failed block will be rewritten. But if it's metadata, whatever checksums
it in turn contained will still not be verified in that scrub round.
Again, these show up as unverified errors.
By running scrub repeatedly, however, now that the first error has been
fixed by the rewrite from the good copy, the checksums it contained can
now in turn be checked. If they all verify, great. If not, another
rewrite will be triggered, fixing them, but if if those checksums were in
turn for other metadata blocks, now /those/ will need checked and will
show up as unverified.
So depending on what the bad metadata block was located on in the
metadata tree, a second, third, possibly even fourth, scrub may be
needed, in ordered to correct all the errors at all levels of the
metadata tree, thereby fixing in turn each level of unverified errors
exposed as the level above it (closer to root) was fixed.
Of course, if your scrub listed all corrected (metadata since it's raid1
in your case) or uncorrectable (data since it's single in your case, or
metadata with both copies bad) errors, no unverified errors, then at
least in theory, a second scrub shouldn't find any further errors to
correct. Only if you see unverified errors should it be necessary to
repeat the scrub, but then you might need to repeat it several times as
each run will expose another level to checksum verification that was
previously unverified.
Of course, an extra scrub run shouldn't hurt anything in any case. It'll
just have nothing it can fix, and will only cost time. (Tho on multi-TB
spinning rust that time could be significant!)
Hopefully it makes more sense now, given that I've included the critical
information about multi-level metadata trees that I had skipped as
obvious, the first time. Again, this is my understanding as a btrfs
using admin and list regular, not a coder. If a dev says the code
doesn't work that way, he's most likely correct.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2015-06-18 4:32 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-17 7:16 BTRFS: read error corrected: ino 1 off 226840576 (dev /dev/mapper/dshelf1 sector 459432) Marc MERLIN
2015-06-17 10:11 ` Hugo Mills
2015-06-17 10:58 ` Sander
2015-06-17 11:01 ` Hugo Mills
2015-06-17 16:19 ` Marc MERLIN
2015-06-18 4:32 ` Duncan [this message]
2015-06-17 13:51 ` Duncan
2015-06-17 14:58 ` Chris Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$e509f$9bb669c$8cfbb039$6d6b7135@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.