All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: BTRFS: read error corrected: ino 1 off 226840576 (dev /dev/mapper/dshelf1 sector 459432)
Date: Wed, 17 Jun 2015 13:51:26 +0000 (UTC)	[thread overview]
Message-ID: <pan$d7e40$f0fd83dc$6357f1e5$a552946e@cox.net> (raw)
In-Reply-To: 20150617071654.GI16468@merlins.org

Marc MERLIN posted on Wed, 17 Jun 2015 00:16:54 -0700 as excerpted:

> I had a few power offs due to a faulty power supply, and my mdadm raid5
> got into fail mode after 2 drives got kicked out since their sequence
> numbers didn't match due to the abrupt power offs.
> 
> I brought the swraid5 back up by force assembling it with 4 drives (one
> was really only a few sequence numbers behind), and it's doing a full
> parity rebuild on the 5th drive that was farther behind.
> 
> So I can understand how I may have had a few blocks that are in a bad
> state.
> I'm getting a few (not many) of those messages in syslog.
> BTRFS: read error corrected: ino 1 off 226840576 (dev
> /dev/mapper/dshelf1 sector 459432)
> 
> Filesystem looks like this:
> Label: 'btrfs_pool1'  uuid: 6358304a-2234-4243-b02d-4944c9af47d7
>         Total devices 1 FS bytes used 8.29TiB devid    1 size 14.55TiB
>         used 8.32TiB path /dev/mapper/dshelf1
> 
> gargamel:~# btrfs fi df /mnt/btrfs_pool1 Data, single: total=8.29TiB,
> used=8.28TiB System, DUP: total=8.00MiB, used=920.00KiB System, single:
> total=4.00MiB, used=0.00B Metadata, DUP: total=14.00GiB, used=10.58GiB
> Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single:
> total=512.00MiB, used=0.00B
> 
> Kernel 3.19.8.
> 
> Just to make sure I understand, do those messages in syslog mean that my
> metadata got corrupted a bit, but because I have 2 copies, btrfs can fix
> the bad copy by using the good one?

Yes.  Despite the confusion between btrfs raid5 and mdraid5, Hugo was 
correct there.  It's just the 3.19 kernel bit that he got wrong, since he 
was thinking btrfs raid.  Btrfs dup mode should be good going back many 
kernels.

> Also, if my actual data got corrupted, am I correct that btrfs will
> detect the checksum failure and give me a different error message of a
> read error that cannot be corrected?
> 
> I'll do a scrub later, for now I have to wait 20 hours for the raid
> rebuild first.

Yes again.

As I mentioned in a different thread a few hours ago, I have an SSD that 
is slowly going bad, relocating sectors, etc (200-some relocated at this 
point, by raw value, that attribute dropped to 100 "cooked" value on the 
first relocation and is now at 98, with a threshold of 36, so I figure it 
should be good for a few thousand relocations if I let it go that far).  
But it's in a btrfs raid1 with a reliable (no relocations yet) paired-ssd 
and I've been able to scrub-fix the errors so far, plus I have things 
backed up and a replacement ready to insert when I decide it's time, so 
I'm able to watch in more or less morbid fascination as the thing slowly 
dies, a sector at a time.  

The interesting thing is that with btrfs' checksumming and data integrity 
feature, I can continue to use the drive in raid1 even tho it's 
definitely bad enough to be all but unusable with ordinary filesystems.

Anyway, as a result of that, I'm getting lots of experience with scrubs 
and corrected errors.

One thing I'd strongly recommend.  Once the rebuild is complete and you 
do the scrub, there may well be both read/corrected errors, and 
unverified errors.  AFAIK, the unverified errors are a result of bad 
metadata blocks, so missing checksums for what they covered.  So once you 
finish the first scrub and have corrected most of the metadata block 
errors, do another scrub.  The idea is to repeat until you have no more 
unverified errors, they're either all corrected (if dup metadata) or all 
uncorrectable (the single data).  That's what I'm doing here, with both 
data and metadata as raid1 and thus correctable, tho in some instances 
the device is triggering a new relocation on the second and occasionally 
(once?) third scrub, so that's causing me to have to do more scrubs than 
I would if the problem were entirely in the past, as it sounds like yours 
is, or will-be once the mdraid rebuild is done, anyway.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


  parent reply	other threads:[~2015-06-17 13:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-17  7:16 BTRFS: read error corrected: ino 1 off 226840576 (dev /dev/mapper/dshelf1 sector 459432) Marc MERLIN
2015-06-17 10:11 ` Hugo Mills
2015-06-17 10:58   ` Sander
2015-06-17 11:01     ` Hugo Mills
2015-06-17 16:19     ` Marc MERLIN
2015-06-18  4:32       ` Duncan
2015-06-17 13:51 ` Duncan [this message]
2015-06-17 14:58 ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$d7e40$f0fd83dc$6357f1e5$a552946e@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.