* replacing a disk in a btrfs multi disk array with raid10
@ 2020-08-03 5:26 Norbert Preining
2020-08-03 6:15 ` Chris Murphy
0 siblings, 1 reply; 4+ messages in thread
From: Norbert Preining @ 2020-08-03 5:26 UTC (permalink / raw
To: linux-btrfs
Hi all
(please Cc)
I am running Linux 5.7 or 5.8 on a btrfs array of 7 disks, with metadata
and data both on raid1, which contains the complete system.
(btrfs balance start -dconvert=raid1 -mconvert=raid1 /)
Although btrfs device stats / doesn't show any errors, SMART warns about
one disk (reallocated sector count property) and I was pondering
replacing the device.
What is the currently suggested method given that I cannot plug in
another disk into the computer, all slots are used up (thus a btrfs
replace will not work as far as I understand).
Do I need to:
- shutdown
- pysically replace disk
- reboot into rescue system
- mount in degraded mode
- add the new device
- resize the file system (new disk would be bigger)
- start a new rebalancing
(for the rebalance, do I need to give the
same -dconvert=raid1 -mconvert=raid1 arguments?)
Thanks for any guidance (and please Cc)
All the best
Norbert
--
PREINING Norbert https://www.preining.info
Accelia Inc. + IFMGA ProGuide + TU Wien + JAIST + TeX Live + Debian Dev
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: replacing a disk in a btrfs multi disk array with raid10
2020-08-03 5:26 replacing a disk in a btrfs multi disk array with raid10 Norbert Preining
@ 2020-08-03 6:15 ` Chris Murphy
2020-08-03 7:47 ` Norbert Preining
2020-10-09 4:20 ` Norbert Preining
0 siblings, 2 replies; 4+ messages in thread
From: Chris Murphy @ 2020-08-03 6:15 UTC (permalink / raw
To: Norbert Preining; +Cc: Btrfs BTRFS
On Sun, Aug 2, 2020 at 11:51 PM Norbert Preining <norbert@preining.info> wrote:
>
> Hi all
>
> (please Cc)
>
> I am running Linux 5.7 or 5.8 on a btrfs array of 7 disks, with metadata
> and data both on raid1, which contains the complete system.
> (btrfs balance start -dconvert=raid1 -mconvert=raid1 /)
>
> Although btrfs device stats / doesn't show any errors, SMART warns about
> one disk (reallocated sector count property) and I was pondering
> replacing the device.
Some of these are considered normal. I suggest making sure each
drive's SCT ERC value is less than the SCSI command timer. You want
the drive to give up on reading a sector before the kernel considers
the command "overdue" and does a link reset - losing the contents of
the command queue. Upon read error, the drive reports the sector LBA
so that Btrfs can automatically do a fixup.
More info here. It applies to mdadm, lvm, and Btrfs raid.
https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
Once you've done that, do a btrfs scrub.
>
> What is the currently suggested method given that I cannot plug in
> another disk into the computer, all slots are used up (thus a btrfs
> replace will not work as far as I understand).
btrfs replace will work whether the drive is present or not. It's just
safer to do it with the drive present because you don't have to mount
degraded.
> Do I need to:
> - shutdown
> - pysically replace disk
> - reboot into rescue system
> - mount in degraded mode
> - add the new device
Use 'btrfs replace'
> - resize the file system (new disk would be bigger)
Currently 'btrfs replace' does require a separate resize step. 'device
add' doesn't, resize is implied by the command.
> - start a new rebalancing
> (for the rebalance, do I need to give the
> same -dconvert=raid1 -mconvert=raid1 arguments?)
Not necessary. But it's worth checking 'btrfs fi us -T' and making
sure everything is raid1 as you expect.
--
Chris Murphy
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: replacing a disk in a btrfs multi disk array with raid10
2020-08-03 6:15 ` Chris Murphy
@ 2020-08-03 7:47 ` Norbert Preining
2020-10-09 4:20 ` Norbert Preining
1 sibling, 0 replies; 4+ messages in thread
From: Norbert Preining @ 2020-08-03 7:47 UTC (permalink / raw
To: Btrfs BTRFS
Hi Chris,
thanks for your answer, that is very much appreciated.
On Mon, 03 Aug 2020, Chris Murphy wrote:
> Some of these are considered normal. I suggest making sure each
> https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
Thanks, will read up on that.
> Once you've done that, do a btrfs scrub.
Happening regularly, but I will kick one off anyway.
> btrfs replace will work whether the drive is present or not. It's just
> safer to do it with the drive present because you don't have to mount
> degraded.
Ok.
I wasn't sure about whether I can mount without -o degraded because all
the metadata and data is on raid1. And then, I don't know what the
Debian initramfs is doing - that is probably the more interesting
surprise.
> > - add the new device
>
> Use 'btrfs replace'
Thanks, noted.
> Currently 'btrfs replace' does require a separate resize step. 'device
> add' doesn't, resize is implied by the command.
This is somehow a logic approach, I agree.
> > - start a new rebalancing
> > (for the rebalance, do I need to give the
> > same -dconvert=raid1 -mconvert=raid1 arguments?)
>
> Not necessary. But it's worth checking 'btrfs fi us -T' and making
> sure everything is raid1 as you expect.
Thanks, good to know.
Again, thanks a lot for all the details - I couldn't deduce most of them
from the wiki page on multiple devices. Your email is extremely helpful!
All the best
Norbert
--
PREINING Norbert https://www.preining.info
Accelia Inc. + IFMGA ProGuide + TU Wien + JAIST + TeX Live + Debian Dev
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: replacing a disk in a btrfs multi disk array with raid10
2020-08-03 6:15 ` Chris Murphy
2020-08-03 7:47 ` Norbert Preining
@ 2020-10-09 4:20 ` Norbert Preining
1 sibling, 0 replies; 4+ messages in thread
From: Norbert Preining @ 2020-10-09 4:20 UTC (permalink / raw
To: Chris Murphy; +Cc: Btrfs BTRFS
Hi Chris,
(please Cc)
sorry for the late reply - real life.
It turned out that the disk I use is well known to misreport this
property, and thus it can be ignored.
But I had to deal with (temporary) loss of one disk. Fortunately,
Debian's initramfs dropped me into a proper shell where I could mount
the array in degraded mode and just remove the device.
Just one hiccup I realized: **after** some time I could re-connect the
one disc from the array that was missing (I needed a x1 NVMe extender
which I didn't have at the beginning). I though reconnecting is as
simple as
btrfs device add -f /dev/nvme0n1p1 /
but it turned out, because that disk has been part of the array, it was
rejected. Even using the -f option did not work. At the end I had to
fdisk the drive and trash the partition table and btrfs info to get it
ready to be re-added.
Full story https://www.preining.info/blog/2020/09/dealing-with-lost-disks-in-a-btrfs-array/
Anyway, all suprisingly smooth. Thanks to all of you.
Best
Norbert
On Mon, 03 Aug 2020, Chris Murphy wrote:
> On Sun, Aug 2, 2020 at 11:51 PM Norbert Preining <norbert@preining.info> wrote:
> >
> > Hi all
> >
> > (please Cc)
> >
> > I am running Linux 5.7 or 5.8 on a btrfs array of 7 disks, with metadata
> > and data both on raid1, which contains the complete system.
> > (btrfs balance start -dconvert=raid1 -mconvert=raid1 /)
> >
> > Although btrfs device stats / doesn't show any errors, SMART warns about
> > one disk (reallocated sector count property) and I was pondering
> > replacing the device.
>
> Some of these are considered normal. I suggest making sure each
> drive's SCT ERC value is less than the SCSI command timer. You want
> the drive to give up on reading a sector before the kernel considers
> the command "overdue" and does a link reset - losing the contents of
> the command queue. Upon read error, the drive reports the sector LBA
> so that Btrfs can automatically do a fixup.
>
> More info here. It applies to mdadm, lvm, and Btrfs raid.
> https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
>
> Once you've done that, do a btrfs scrub.
>
> >
> > What is the currently suggested method given that I cannot plug in
> > another disk into the computer, all slots are used up (thus a btrfs
> > replace will not work as far as I understand).
>
> btrfs replace will work whether the drive is present or not. It's just
> safer to do it with the drive present because you don't have to mount
> degraded.
>
>
> > Do I need to:
> > - shutdown
> > - pysically replace disk
> > - reboot into rescue system
> > - mount in degraded mode
> > - add the new device
>
> Use 'btrfs replace'
>
> > - resize the file system (new disk would be bigger)
>
> Currently 'btrfs replace' does require a separate resize step. 'device
> add' doesn't, resize is implied by the command.
>
>
> > - start a new rebalancing
> > (for the rebalance, do I need to give the
> > same -dconvert=raid1 -mconvert=raid1 arguments?)
>
> Not necessary. But it's worth checking 'btrfs fi us -T' and making
> sure everything is raid1 as you expect.
--
PREINING Norbert https://www.preining.info
Accelia Inc. + IFMGA ProGuide + TU Wien + JAIST + TeX Live + Debian Dev
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-10-09 4:57 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-03 5:26 replacing a disk in a btrfs multi disk array with raid10 Norbert Preining
2020-08-03 6:15 ` Chris Murphy
2020-08-03 7:47 ` Norbert Preining
2020-10-09 4:20 ` Norbert Preining
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.