How to replace a failed drive in btrfs RAID 1 filesystem

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* How to replace a failed drive in btrfs RAID 1 filesystem
@ 2018-03-09 16:02 Paul Richards
  2018-03-09 16:43 ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Richards @ 2018-03-09 16:02 UTC (permalink / raw
  To: linux-btrfs

Hello there,

I have a 3 disk btrfs RAID 1 filesystem, with a single failed drive.
Before I attempt any recovery I’d like to ask what is the recommended
approach?  (The wiki docs suggest consulting here before attempting
recovery[1].)

The system is powered down currently and a replacement drive is being
delivered soon.

Should I use “replace”, or “add” and “delete”?

Once replaced should I rebalance and/or scrub?

I believe that the recovery may involve mounting in degraded mode.  If
I do this, how do I later get out of degraded mode, or if it’s
automatic how do i determine when I’m out of degraded mode?

The system is running Ubuntu 16.04 LTS with kernel 4.4.0.

1: “it may be helpful to consult the mailing list of irc channel
before attempting recovery”. -
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to replace a failed drive in btrfs RAID 1 filesystem
  2018-03-09 16:02 How to replace a failed drive in btrfs RAID 1 filesystem Paul Richards
@ 2018-03-09 16:43 ` Austin S. Hemmelgarn
       [not found]   ` <CAMoswegyGSote6U3z+aE3fJ+ihPbsXLqUwY9K3GnmtjGSF7o0g@mail.gmail.com>
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Austin S. Hemmelgarn @ 2018-03-09 16:43 UTC (permalink / raw
  To: Paul Richards, linux-btrfs

On 2018-03-09 11:02, Paul Richards wrote:
> Hello there,
> 
> I have a 3 disk btrfs RAID 1 filesystem, with a single failed drive.
> Before I attempt any recovery I’d like to ask what is the recommended
> approach?  (The wiki docs suggest consulting here before attempting
> recovery[1].)
> 
> The system is powered down currently and a replacement drive is being
> delivered soon.
> 
> Should I use “replace”, or “add” and “delete”?
> 
> Once replaced should I rebalance and/or scrub?
> 
> I believe that the recovery may involve mounting in degraded mode.  If
> I do this, how do I later get out of degraded mode, or if it’s
> automatic how do i determine when I’m out of degraded mode?
> 
It won't automatically mount degraded, you either have to explicitly ask 
it to, or you have to have an option to do so in your default mount 
options for the volume in /etc/fstab (which is dangerous for multiple 
reasons).

Now, as to what the best way to go about this is, there are three things 
to consider:

1. Is the failed disk still usable enough that you can get good data off 
of it in a reasonable amount of time?  If you're replacing the disk 
because of a lot of failed sectors, you can still probably get data off 
of it, while something like a head crash isn't worth trying to get data 
back.
2. Do you have enough room in the system itself to add another disk 
without removing one?
3. Is the replacement disk at least as big as the failed disk?

If the answer to all three is yes, then just put in the new disk, mount 
the volume normally (you don't need to mount it degraded if the failed 
disk is working this well), and use `btrfs replace` to move the data. 
This is the most efficient option in terms of both time and is also 
generally the safest (and I personally always over-spec drive-bays in 
systems we build where I work specifically so that this approach can be 
used).

If the answer to the third question is no, put in the new disk (removing 
the failed one first if the answer to the second question is no), mount 
the volume (mount it degraded if one of the first two questions is no, 
normally otherwise), then add the new disk to the volume with `btrfs 
device add` and remove the old one with `btrfs device delete` (using the 
'missing' option if you had to remove the failed disk).  This is needed 
because the replace operation requires the new device to be at least as 
big as the old one.

If the answer to either one or two is no but the answer to three is yes, 
pull out the failed disk, put in a new one, mount the volume degraded, 
and use `btrfs replace` as well (you will need to specify the device ID 
for the now missing failed disk, which you can find by calling `btrfs 
filesystem show` on the volume).  In the event that the replace 
operation refuses to run in this case, instead add the new disk to the 
volume with `btrfs device add` and then run `btrfs device delete 
missing` on the volume.

If you follow any of the above procedures, you don't need to balance 
(the replace operation is equivalent to a block level copy and will 
result in data being distributed exactly the same as it was before, 
while the delete operation is a special type of balance), and you 
generally don't need to scrub the volume either (though it may still be 
a good idea).  As far as getting back from degraded mode, you can just 
remount the volume to do so, though I would generally suggest rebooting.

Note that there are three other possible approaches to consider as well:

1. If you can't immediately get a new disk _and_ all the data will fit 
on the other two disks, use `btrfs device delete` to remove the failed 
disk anyway, and run with just the two until you can get a new disk. 
This is exponentially safer than running the volume degraded until you 
get a new disk, and is the only case you realistically should delete a 
device before adding the new one.  Make sure to balance the volume after 
adding the new device.
2. Depending on the situation, it may be faster to just recreate the 
whole volume from scratch using a backup than it is to try to repair it. 
  This is actually the absolute safest method of handling this 
situation, as it makes sure that nothing from the old volume with the 
failed disk causes problems in the future.
3. If you don't have a backup, but have some temporary storage space 
that will fit all the data from the volume, you could also use `btrfs 
restore` to extract files from the old volume to temporary storage, 
recreate the volume, and copy the data back in from the temporary storage.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to replace a failed drive in btrfs RAID 1 filesystem
       [not found]   ` <CAMoswegyGSote6U3z+aE3fJ+ihPbsXLqUwY9K3GnmtjGSF7o0g@mail.gmail.com>
@ 2018-03-09 16:58     ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 6+ messages in thread
From: Austin S. Hemmelgarn @ 2018-03-09 16:58 UTC (permalink / raw
  To: Paul Richards; +Cc: linux-btrfs

On 2018-03-09 11:53, Paul Richards wrote:
> Fantastic response!  Thank you.
> 
> I haven’t investigated how broken the failed drive is, I just shutdown 
> as soon as I noticed.
> 
> The 3 drives were 8, 8 and 2 TB.  The 2TB one failed and I’m replacing 
> it with a new 8TB.  So the new drive is indeed larger.  If I do a 
> “replace” I’ll end up with the same block distribution as before, so 
> would likely want to balance afterwards.
Yes, you probably do, but you'll also need to resize the device first 
(which I forgot to mention in my reply), as replace doesn't expand that 
part of the volume to fill the new device.
> 
> I think, but I’ll need to confirm, that I have enough free space to do a 
> mount degraded, delete, remount non-degraded again, then add, and 
> rebalance.  This will leave me in degraded mode for the shortest time if 
> my understanding is correct.
Assuming you can fit all the data on the two 8TB drives, then yes this 
will result int he shortest amount of time running degraded (although, 
if the failed drive is mostly working, you may not need to mount 
degraded at all to do this), though keep in mind that this will also 
result in significant load on the other disks and will give you degraded 
performance for the longest amount of time.
> 
> Thanks again for your notes, they should be on the wiki..  :)
I've been meaning to add it for a while actually, I just haven't gotten 
around to it yet.
> 
> 
> 
> On Fri, 9 Mar 2018 at 16:43, Austin S. Hemmelgarn <ahferroin7@gmail.com 
> <mailto:ahferroin7@gmail.com>> wrote:
> 
>     On 2018-03-09 11:02, Paul Richards wrote:
>      > Hello there,
>      >
>      > I have a 3 disk btrfs RAID 1 filesystem, with a single failed drive.
>      > Before I attempt any recovery I’d like to ask what is the recommended
>      > approach?  (The wiki docs suggest consulting here before attempting
>      > recovery[1].)
>      >
>      > The system is powered down currently and a replacement drive is being
>      > delivered soon.
>      >
>      > Should I use “replace”, or “add” and “delete”?
>      >
>      > Once replaced should I rebalance and/or scrub?
>      >
>      > I believe that the recovery may involve mounting in degraded
>     mode.  If
>      > I do this, how do I later get out of degraded mode, or if it’s
>      > automatic how do i determine when I’m out of degraded mode?
>      >
>     It won't automatically mount degraded, you either have to explicitly ask
>     it to, or you have to have an option to do so in your default mount
>     options for the volume in /etc/fstab (which is dangerous for multiple
>     reasons).
> 
>     Now, as to what the best way to go about this is, there are three things
>     to consider:
> 
>     1. Is the failed disk still usable enough that you can get good data off
>     of it in a reasonable amount of time?  If you're replacing the disk
>     because of a lot of failed sectors, you can still probably get data off
>     of it, while something like a head crash isn't worth trying to get data
>     back.
>     2. Do you have enough room in the system itself to add another disk
>     without removing one?
>     3. Is the replacement disk at least as big as the failed disk?
> 
>     If the answer to all three is yes, then just put in the new disk, mount
>     the volume normally (you don't need to mount it degraded if the failed
>     disk is working this well), and use `btrfs replace` to move the data.
>     This is the most efficient option in terms of both time and is also
>     generally the safest (and I personally always over-spec drive-bays in
>     systems we build where I work specifically so that this approach can be
>     used).
> 
>     If the answer to the third question is no, put in the new disk (removing
>     the failed one first if the answer to the second question is no), mount
>     the volume (mount it degraded if one of the first two questions is no,
>     normally otherwise), then add the new disk to the volume with `btrfs
>     device add` and remove the old one with `btrfs device delete` (using the
>     'missing' option if you had to remove the failed disk).  This is needed
>     because the replace operation requires the new device to be at least as
>     big as the old one.
> 
>     If the answer to either one or two is no but the answer to three is yes,
>     pull out the failed disk, put in a new one, mount the volume degraded,
>     and use `btrfs replace` as well (you will need to specify the device ID
>     for the now missing failed disk, which you can find by calling `btrfs
>     filesystem show` on the volume).  In the event that the replace
>     operation refuses to run in this case, instead add the new disk to the
>     volume with `btrfs device add` and then run `btrfs device delete
>     missing` on the volume.
> 
>     If you follow any of the above procedures, you don't need to balance
>     (the replace operation is equivalent to a block level copy and will
>     result in data being distributed exactly the same as it was before,
>     while the delete operation is a special type of balance), and you
>     generally don't need to scrub the volume either (though it may still be
>     a good idea).  As far as getting back from degraded mode, you can just
>     remount the volume to do so, though I would generally suggest rebooting.
> 
>     Note that there are three other possible approaches to consider as well:
> 
>     1. If you can't immediately get a new disk _and_ all the data will fit
>     on the other two disks, use `btrfs device delete` to remove the failed
>     disk anyway, and run with just the two until you can get a new disk.
>     This is exponentially safer than running the volume degraded until you
>     get a new disk, and is the only case you realistically should delete a
>     device before adding the new one.  Make sure to balance the volume after
>     adding the new device.
>     2. Depending on the situation, it may be faster to just recreate the
>     whole volume from scratch using a backup than it is to try to repair it.
>        This is actually the absolute safest method of handling this
>     situation, as it makes sure that nothing from the old volume with the
>     failed disk causes problems in the future.
>     3. If you don't have a backup, but have some temporary storage space
>     that will fit all the data from the volume, you could also use `btrfs
>     restore` to extract files from the old volume to temporary storage,
>     recreate the volume, and copy the data back in from the temporary
>     storage.
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to replace a failed drive in btrfs RAID 1 filesystem
  2018-03-09 16:43 ` Austin S. Hemmelgarn
       [not found]   ` <CAMoswegyGSote6U3z+aE3fJ+ihPbsXLqUwY9K3GnmtjGSF7o0g@mail.gmail.com>
@ 2018-03-10  9:37   ` waxhead
  2018-03-10 10:27   ` Andrei Borzenkov
  2 siblings, 0 replies; 6+ messages in thread
From: waxhead @ 2018-03-10  9:37 UTC (permalink / raw
  To: Austin S. Hemmelgarn, Paul Richards, linux-btrfs

Austin S. Hemmelgarn wrote:
> On 2018-03-09 11:02, Paul Richards wrote:
>> Hello there,
>>
>> I have a 3 disk btrfs RAID 1 filesystem, with a single failed drive.
>> Before I attempt any recovery I’d like to ask what is the recommended
>> approach?  (The wiki docs suggest consulting here before attempting
>> recovery[1].)
>>
>> The system is powered down currently and a replacement drive is being
>> delivered soon.
>>
>> Should I use “replace”, or “add” and “delete”?
>>
>> Once replaced should I rebalance and/or scrub?
>>
>> I believe that the recovery may involve mounting in degraded mode.  If
>> I do this, how do I later get out of degraded mode, or if it’s
>> automatic how do i determine when I’m out of degraded mode?
>>
> It won't automatically mount degraded, you either have to explicitly ask 
> it to, or you have to have an option to do so in your default mount 
> options for the volume in /etc/fstab (which is dangerous for multiple 
> reasons).
> 
> Now, as to what the best way to go about this is, there are three things 
> to consider:
> 
> 1. Is the failed disk still usable enough that you can get good data off 
> of it in a reasonable amount of time?  If you're replacing the disk 
> because of a lot of failed sectors, you can still probably get data off 
> of it, while something like a head crash isn't worth trying to get data 
> back.
> 2. Do you have enough room in the system itself to add another disk 
> without removing one?
> 3. Is the replacement disk at least as big as the failed disk?
> 
> If the answer to all three is yes, then just put in the new disk, mount 
> the volume normally (you don't need to mount it degraded if the failed 
> disk is working this well), and use `btrfs replace` to move the data. 
> This is the most efficient option in terms of both time and is also 
> generally the safest (and I personally always over-spec drive-bays in 
> systems we build where I work specifically so that this approach can be 
> used).
> 
> If the answer to the third question is no, put in the new disk (removing 
> the failed one first if the answer to the second question is no), mount 
> the volume (mount it degraded if one of the first two questions is no, 
> normally otherwise), then add the new disk to the volume with `btrfs 
> device add` and remove the old one with `btrfs device delete` (using the 
> 'missing' option if you had to remove the failed disk).  This is needed 
> because the replace operation requires the new device to be at least as 
> big as the old one.
> 
> If the answer to either one or two is no but the answer to three is yes, 
> pull out the failed disk, put in a new one, mount the volume degraded, 
> and use `btrfs replace` as well (you will need to specify the device ID 
> for the now missing failed disk, which you can find by calling `btrfs 
> filesystem show` on the volume).  In the event that the replace 
> operation refuses to run in this case, instead add the new disk to the 
> volume with `btrfs device add` and then run `btrfs device delete 
> missing` on the volume.
> 
> If you follow any of the above procedures, you don't need to balance 
> (the replace operation is equivalent to a block level copy and will 
> result in data being distributed exactly the same as it was before, 
> while the delete operation is a special type of balance), and you 
> generally don't need to scrub the volume either (though it may still be 
> a good idea).  As far as getting back from degraded mode, you can just 
> remount the volume to do so, though I would generally suggest rebooting.
> 
> Note that there are three other possible approaches to consider as well:
> 
> 1. If you can't immediately get a new disk _and_ all the data will fit 
> on the other two disks, use `btrfs device delete` to remove the failed 
> disk anyway, and run with just the two until you can get a new disk. 
> This is exponentially safer than running the volume degraded until you 
> get a new disk, and is the only case you realistically should delete a 
> device before adding the new one.  Make sure to balance the volume after 
> adding the new device.
> 2. Depending on the situation, it may be faster to just recreate the 
> whole volume from scratch using a backup than it is to try to repair it. 
>   This is actually the absolute safest method of handling this 
> situation, as it makes sure that nothing from the old volume with the 
> failed disk causes problems in the future.
> 3. If you don't have a backup, but have some temporary storage space 
> that will fit all the data from the volume, you could also use `btrfs 
> restore` to extract files from the old volume to temporary storage, 
> recreate the volume, and copy the data back in from the temporary storage.
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

I did a quick scan of the wiki just to see, but I did not find any good 
info about how to recover a "RAID" like set if degraded. Information 
about how to recover, and what profiles can be recovered from would be 
good to have (with examples) in a separate "how-to" on the Wiki.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to replace a failed drive in btrfs RAID 1 filesystem
  2018-03-09 16:43 ` Austin S. Hemmelgarn
       [not found]   ` <CAMoswegyGSote6U3z+aE3fJ+ihPbsXLqUwY9K3GnmtjGSF7o0g@mail.gmail.com>
  2018-03-10  9:37   ` waxhead
@ 2018-03-10 10:27   ` Andrei Borzenkov
  2018-03-11  0:08     ` Duncan
  2 siblings, 1 reply; 6+ messages in thread
From: Andrei Borzenkov @ 2018-03-10 10:27 UTC (permalink / raw
  To: Austin S. Hemmelgarn, Paul Richards, linux-btrfs

09.03.2018 19:43, Austin S. Hemmelgarn пишет:
> 
> If the answer to either one or two is no but the answer to three is yes,
> pull out the failed disk, put in a new one, mount the volume degraded,
> and use `btrfs replace` as well (you will need to specify the device ID
> for the now missing failed disk, which you can find by calling `btrfs
> filesystem show` on the volume).

I do not see it and I do not remember ever seeing device ID of missing
devices.

10:/home/bor # blkid
/dev/sda1: UUID="ce0caa57-7140-4374-8534-3443d21f3edc" TYPE="swap"
PARTUUID="d2714b67-01"
/dev/sda2: UUID="cc072e56-f671-4388-a4a0-2ffee7c98fdb"
UUID_SUB="eaeb4c78-da94-43b3-acc7-c3e963f1108d" TYPE="btrfs"
PTTYPE="dos" PARTUUID="d2714b67-02"
/dev/sdb1: UUID="e4af8f3c-8307-4397-90e3-97b90989cf5d"
UUID_SUB="f421f1e7-2bb0-4a67-a18e-cfcbd63560a8" TYPE="btrfs"
PARTUUID="875525bf-01"
10:/home/bor # mount /dev/sdb1 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdb1,
missing codepage or helper program, or other error.
10:/home/bor # mount -o degraded /dev/sdb1 /mnt
10:/home/bor # btrfs fi sh /mnt
Label: none  uuid: e4af8f3c-8307-4397-90e3-97b90989cf5d
	Total devices 2 FS bytes used 256.00KiB
	devid    2 size 1023.00MiB used 212.50MiB path /dev/sdb1
	*** Some devices missing

10:/home/bor # btrfs fi us /mnt
Overall:
    Device size:		   2.00GiB
    Device allocated:		 425.00MiB
    Device unallocated:		   1.58GiB
    Device missing:		1023.00MiB
    Used:			 512.00KiB
    Free (estimated):		 912.62MiB	(min: 912.62MiB)
    Data ratio:			      2.00
    Metadata ratio:		      2.00
    Global reserve:		  16.00MiB	(used: 0.00B)

Data,RAID1: Size:102.25MiB, Used:128.00KiB
   /dev/sdb1	 102.25MiB
   missing	 102.25MiB

Metadata,RAID1: Size:102.25MiB, Used:112.00KiB
   /dev/sdb1	 102.25MiB
   missing	 102.25MiB

System,RAID1: Size:8.00MiB, Used:16.00KiB
   /dev/sdb1	   8.00MiB
   missing	   8.00MiB

Unallocated:
   /dev/sdb1	 810.50MiB
   missing	 810.50MiB
10:/home/bor # rpm -q btrfsprogs
btrfsprogs-4.15-2.1.x86_64
10:/home/bor # uname -a
Linux 10 4.15.7-1-default #1 SMP PREEMPT Wed Feb 28 12:40:23 UTC 2018
(a36e160) x86_64 x86_64 x86_64 GNU/Linux
10:/home/bor #



And "missing" is not the answer because I obviously may have more than
one missing device.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to replace a failed drive in btrfs RAID 1 filesystem
  2018-03-10 10:27   ` Andrei Borzenkov
@ 2018-03-11  0:08     ` Duncan
  0 siblings, 0 replies; 6+ messages in thread
From: Duncan @ 2018-03-11  0:08 UTC (permalink / raw
  To: linux-btrfs

Andrei Borzenkov posted on Sat, 10 Mar 2018 13:27:03 +0300 as excerpted:

> And "missing" is not the answer because I obviously may have more than
> one missing device.

"missing" is indeed the answer when using btrfs device remove.  See the 
btrfs-device manpage, which explains that if there's more than one device 
missing, either just the first one described by the metadata will be 
removed (if missing is only specified once), or missing can be specified 
multiple times.

raid6 with two devices missing is the only normal candidate for that 
presently, tho on-list we've seen aborted-add cases where it still worked 
as well, because while the metadata listed the new device it didn't 
actually have any data when it became apparent it was bad and thus needed 
to be removed again.

Note that because btrfs raid1 and raid10 only does two-way-mirroring 
regardless of the number of devices, and because of the per-chunk (as 
opposed to per-device) nature of btrfs raid10, those modes can only 
expect successful recovery with a single missing device, altho as 
mentioned above we've seen on-list at least one case where an aborted 
device-add of device found to be bad after the add didn't actually have 
anything on it, so it could still be removed along with the device it was 
originally intended to replace.

Of course the N-way-mirroring mode, whenever it eventually gets 
implemented, will allow missing devices upto N-1, and N-way-parity mode, 
if it's ever implemented, similar, but N-way-mirroring was scheduled for 
after raid56 mode so it could make use of some of the same code, and that 
has of course taken years on years to get merged and stabilize, and 
there's no sign yet of N-way-mirroring patches, which based on the raid56 
case could take years to stabilize and debug after original merge, so the 
still somewhat iffy raid6 mode is likely to remain the only normal usage 
of multiple missing for years, yet.

For btrfs replace, the manpage says ID's the only way to handle missing, 
but getting that ID, as you've indicated, could be difficult.  For 
filesystems with only a few devices that haven't had any or many device 
config changes, it should be pretty easy to guess (a two device 
filesystem with no changes should have IDs 1 and 2, so if only one is 
listed, the other is obvious, and a 3-4 device fs with only one or two 
previous device changes, likely well remembered by the admin, should 
still be reasonably easy to guess), but as the number of devices and the 
number of device adds/removes/replaces increases, finding/guessing the 
missing one becomes far more difficult.

Of course the sysadmin's first rule of backups states in simple form that 
not having one == defining the value of the data as trivial, not worth 
the trouble of a backup, which in turn means that at some point before 
there's /too/ many device change events, it's likely going to be less 
trouble (particularly after factoring in reliability) to restore from 
backups to a fresh filesystem than it is to do yet another device change, 
and together with the current practical limits btrfs imposes on the 
number of missing devices, that tends to impose /some/ limit on the 
possibilities for missing device IDs, so the situation, while not ideal, 
isn't yet /entirely/ out of hand, either, because a successful guess 
based on available information should be possible without /too/ many 
attempts.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-03-11  0:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-09 16:02 How to replace a failed drive in btrfs RAID 1 filesystem Paul Richards
2018-03-09 16:43 ` Austin S. Hemmelgarn
     [not found]   ` <CAMoswegyGSote6U3z+aE3fJ+ihPbsXLqUwY9K3GnmtjGSF7o0g@mail.gmail.com>
2018-03-09 16:58     ` Austin S. Hemmelgarn
2018-03-10  9:37   ` waxhead
2018-03-10 10:27   ` Andrei Borzenkov
2018-03-11  0:08     ` Duncan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.