Resurrecting a Dirty RAID-5

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* Resurrecting a Dirty RAID-5
@ 2015-07-08 12:14 jtroan
  2015-07-08 13:19 ` Phil Turmel
  0 siblings, 1 reply; 8+ messages in thread
From: jtroan @ 2015-07-08 12:14 UTC (permalink / raw
  To: linux-raid

I've accidentally toasted a software RAID-5 on one of my VM hosts, taking a
small number of VMs with it.  (Fortunately, no unique data was lost as
every server has a "twinned" system or the data can be regenerated from
elsewhere via script.)

Short background --  the system is RHEL 6.5 and has a three-drive RAID-5
with most of the space.  ( /boot is a RAID-1 and it seems to be fine in a
degraded state.)   The system locked up when (presumably) one of the drives
died.  Upon reboot, I saw errors for "ata2", so I naturally forgot that the
drives started at ata0 and unplugged the drive at "SATA2" (DOH!).  The
system errored and halted its boot when the root file system was no longer
available.  I then switched around cables and brought up ata0 and ata1, but
the RAID won't start because of the dirty broken raid.

I booted from the RHEL 6.5 DVD and entered rescue mode.  Using information
from the Linux RAID wiki [1], I was able to confirm that my sd0 and sd1
drives are alive and have all three partitions I had originally deployed to
them -- /boot, /, and swap space.

Following the advice on the wiki [1], I'm asking for some help in using
mdadm via rescue mode in trying to get this dirty RAID-5 back on its feet
with the surviving two drives.  (I've got a third drive already on hand
that's I'll then use to bring the RAID back to full robustness.)

[1] https://raid.wiki.kernel.org/index.php/RAID_Recovery

Many thanks for any assistance that can be provided.....

=======================================================================
John M. Troan  <jtroan@jt-sw.com>
Maintainer: Football Site @ JT-SW.com
  http://www.jt-sw.com/football
Chief of Computer Operations
  U.S.S. Kitty Hawk / NCC-1659
=======================================================================

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Resurrecting a Dirty RAID-5
  2015-07-08 12:14 Resurrecting a Dirty RAID-5 jtroan
@ 2015-07-08 13:19 ` Phil Turmel
  2015-07-09  2:57   ` jtroan
  0 siblings, 1 reply; 8+ messages in thread
From: Phil Turmel @ 2015-07-08 13:19 UTC (permalink / raw
  To: jtroan, linux-raid

Good morning John,

On 07/08/2015 08:14 AM, jtroan@jt-sw.com wrote:

[trim /]

> I booted from the RHEL 6.5 DVD and entered rescue mode.  Using information
> from the Linux RAID wiki [1], I was able to confirm that my sd0 and sd1
> drives are alive and have all three partitions I had originally deployed to
> them -- /boot, /, and swap space.
> 
> Following the advice on the wiki [1], I'm asking for some help in using
> mdadm via rescue mode in trying to get this dirty RAID-5 back on its feet
> with the surviving two drives.  (I've got a third drive already on hand
> that's I'll then use to bring the RAID back to full robustness.)

From this report, you should only need to do a forced assembly from the
rescue environment with the good devices.  Like so (substituting actual
names):

mdadm -Afv /dev/md2 /dev/sda2 /dev/sdb2

If that fails, paste the verbose output in your reply here.

If the above succeeds, you may shut down, plug in your new drive, and
boot into your normal environment (still degraded, but should be
bootable).  Then add the new drive's partitions to each array.

Don't do *anything* else.

HTH,

Phil

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Resurrecting a Dirty RAID-5
  2015-07-08 13:19 ` Phil Turmel
@ 2015-07-09  2:57   ` jtroan
  2015-07-12  5:57     ` Mikael Abrahamsson
  0 siblings, 1 reply; 8+ messages in thread
From: jtroan @ 2015-07-09  2:57 UTC (permalink / raw
  To: Phil Turmel; +Cc: linux-raid

Many, many thanks for the solution.  I was able to run to a pair of mdadm
-A commands to stitch both RAID devices back together and got it to
completely boot up on the two drives.

As expected, it did note the degraded md1 since the third drive was still
missing.  I'll install it when I have a few more spare moments, and have
left the system powered down for now.

I'm also thinking about adding a fourth drive and try to config it as a
hot-spare, giving me some extra margin for failure.

=======================================================================
John M. Troan  <jtroan@jt-sw.com>
Maintainer: Football Site @ JT-SW.com
  http://www.jt-sw.com/football
Chief of Computer Operations
U.S.S. Kitty Hawk / NCC-1659
=======================================================================


Phil Turmel <philip@turmel.org> wrote on 07/08/2015 09:19:48:

>
> To: jtroan@jt-sw.com, linux-raid@vger.kernel.org,
> Subject: Re: Resurrecting a Dirty RAID-5
>
> Good morning John,
>
> On 07/08/2015 08:14 AM, jtroan@jt-sw.com wrote:
>
> [trim /]
>
> > I booted from the RHEL 6.5 DVD and entered rescue mode.  Using
information
> > from the Linux RAID wiki [1], I was able to confirm that my sd0 and sd1
> > drives are alive and have all three partitions I had originally
deployed to
> > them -- /boot, /, and swap space.
> >
> > Following the advice on the wiki [1], I'm asking for some help in using
> > mdadm via rescue mode in trying to get this dirty RAID-5 back on its
feet
> > with the surviving two drives.  (I've got a third drive already on hand
> > that's I'll then use to bring the RAID back to full robustness.)
>
> From this report, you should only need to do a forced assembly from the
> rescue environment with the good devices.  Like so (substituting actual
> names):
>
> mdadm -Afv /dev/md2 /dev/sda2 /dev/sdb2
>
> If that fails, paste the verbose output in your reply here.
>
> If the above succeeds, you may shut down, plug in your new drive, and
> boot into your normal environment (still degraded, but should be
> bootable).  Then add the new drive's partitions to each array.
>
> Don't do *anything* else.
>
> HTH,
>
> Phil


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Resurrecting a Dirty RAID-5
  2015-07-09  2:57   ` jtroan
@ 2015-07-12  5:57     ` Mikael Abrahamsson
  2015-07-12 15:01       ` jtroan
  0 siblings, 1 reply; 8+ messages in thread
From: Mikael Abrahamsson @ 2015-07-12  5:57 UTC (permalink / raw
  To: jtroan; +Cc: linux-raid

On Wed, 8 Jul 2015, jtroan@jt-sw.com wrote:

> I'm also thinking about adding a fourth drive and try to config it as a 
> hot-spare, giving me some extra margin for failure.

I would recommend you turn your raid5 into raid6 instead of having a 
raid5+spare. This can be done with a fairly simple command, without 
downtime.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Resurrecting a Dirty RAID-5
  2015-07-12  5:57     ` Mikael Abrahamsson
@ 2015-07-12 15:01       ` jtroan
  2015-07-12 22:33         ` Adam Goryachev
  0 siblings, 1 reply; 8+ messages in thread
From: jtroan @ 2015-07-12 15:01 UTC (permalink / raw
  To: Mikael Abrahamsson; +Cc: linux-raid

>
> On Wed, 8 Jul 2015, jtroan@jt-sw.com wrote:
>
> > I'm also thinking about adding a fourth drive and try to config it as a

> > hot-spare, giving me some extra margin for failure.
>
> I would recommend you turn your raid5 into raid6 instead of having a
> raid5+spare. This can be done with a fairly simple command, without
> downtime.
>

I like the idea of using RAID-6 for / (for all the VMs under /var).  (I
figure I'll probably still have to do a spare on my RAID-1 device
for /boot .)

What's the (mdadm?) command to convert an MD from RAID-5 to RAID-6?

Thanks.....

=======================================================================
John M. Troan  <jtroan@jt-sw.com>
Maintainer: Football Site @ JT-SW.com
  http://www.jt-sw.com/football
Chief of Computer Operations
  U.S.S. Kitty Hawk / NCC-1659
=======================================================================


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Resurrecting a Dirty RAID-5
  2015-07-12 15:01       ` jtroan
@ 2015-07-12 22:33         ` Adam Goryachev
  2015-07-13  6:31           ` Can Jeuleers
  0 siblings, 1 reply; 8+ messages in thread
From: Adam Goryachev @ 2015-07-12 22:33 UTC (permalink / raw
  To: jtroan, Mikael Abrahamsson; +Cc: linux-raid

On 13/07/15 01:01, jtroan@jt-sw.com wrote:
>> On Wed, 8 Jul 2015, jtroan@jt-sw.com wrote:
>>
>>> I'm also thinking about adding a fourth drive and try to config it as a
>>> hot-spare, giving me some extra margin for failure.
>> I would recommend you turn your raid5 into raid6 instead of having a
>> raid5+spare. This can be done with a fairly simple command, without
>> downtime.
>>
> I like the idea of using RAID-6 for / (for all the VMs under /var).  (I
> figure I'll probably still have to do a spare on my RAID-1 device
> for /boot .)
>
> What's the (mdadm?) command to convert an MD from RAID-5 to RAID-6?
>
I think the standard option would be to have a hot spare, and then 
something like:
mdadm /dev/mdX --grow --level=raid6

Also, for your raid1, never have a hot spare, just do this:
mdadm /dev/mdX --grow --raid-devices=3
Then you will always have all your data replicated on all three drives, 
so again, no hot spare required.

Also remember to duplicate the grub/whatever boot sector....

Regards,
Adam

-- 
Adam Goryachev Website Managers www.websitemanagers.com.au

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Resurrecting a Dirty RAID-5
  2015-07-12 22:33         ` Adam Goryachev
@ 2015-07-13  6:31           ` Can Jeuleers
  2015-07-13  7:15             ` Adam Goryachev
  0 siblings, 1 reply; 8+ messages in thread
From: Can Jeuleers @ 2015-07-13  6:31 UTC (permalink / raw
  To: linux-raid

On 13/07/15 00:33, Adam Goryachev wrote:
> Also, for your raid1, never have a hot spare, just do this:
> mdadm /dev/mdX --grow --raid-devices=3
> Then you will always have all your data replicated on all three drives,
> so again, no hot spare required.

Never say never, as there are valid use cases for having hot spares in a
RAID1 set.

My own use case is that I want to be reasonably assured that my spare
won't fail at around the same time as the active disks (due to having
the same age and having been subjected to exactly the same workload).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Resurrecting a Dirty RAID-5
  2015-07-13  6:31           ` Can Jeuleers
@ 2015-07-13  7:15             ` Adam Goryachev
  0 siblings, 0 replies; 8+ messages in thread
From: Adam Goryachev @ 2015-07-13  7:15 UTC (permalink / raw
  To: Can Jeuleers, linux-raid

On 13/07/15 16:31, Can Jeuleers wrote:
> On 13/07/15 00:33, Adam Goryachev wrote:
>> Also, for your raid1, never have a hot spare, just do this:
>> mdadm /dev/mdX --grow --raid-devices=3
>> Then you will always have all your data replicated on all three drives,
>> so again, no hot spare required.
> Never say never, as there are valid use cases for having hot spares in a
> RAID1 set.
>
> My own use case is that I want to be reasonably assured that my spare
> won't fail at around the same time as the active disks (due to having
> the same age and having been subjected to exactly the same workload).

Yes, this is true too... though with physical HDD, I would expect there 
are enough differences to mean that they will fail at different times, 
even SSD's should have enough variance.

Of course, the other option is to purchase the drives at different times 
(eg, one month apart) so they are also from different batches, as well 
as have a month or two difference in how they are used.

Of course, if both drives in the RAID1 fail at the exact same moment, 
how is a hot spare better than a three drive raid1? Wouldn't the chance 
of three drives failing at the same critical moment be less than the 
chance of two drives failing at the same time (or one drive to fail, and 
then the second to fail during the increased load of a resync)?

I really have no idea about the actual statistical numbers/chances, but 
sounds like a valid question to me...

PS, of course, you should never say never, so I do still agree with you, 
someone, somewhere might have a reason to do it differently. However, if 
they really do, then they should know better than me.

Regards,
Adam

-- 
Adam Goryachev
Website Managers
P: +61 2 8304 0000                    adam@websitemanagers.com.au
F: +61 2 8304 0001                     www.websitemanagers.com.au

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-07-13  7:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-08 12:14 Resurrecting a Dirty RAID-5 jtroan
2015-07-08 13:19 ` Phil Turmel
2015-07-09  2:57   ` jtroan
2015-07-12  5:57     ` Mikael Abrahamsson
2015-07-12 15:01       ` jtroan
2015-07-12 22:33         ` Adam Goryachev
2015-07-13  6:31           ` Can Jeuleers
2015-07-13  7:15             ` Adam Goryachev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.