[rescue] Drive Replacement Question

Dan Sikorski me at dansikorski.com
Fri Sep 7 16:54:16 CDT 2007


Ahmed Ewing wrote:
> I came across a great one where a customer couldn't figure out why he
> couldn't restore his RAID5 metadevice from its "Needs Maintenance"
> state after a proper replacement of the offending disk. At first
> glance at the metastat, I missed it too. Turns out, the guy's
> predecessor saw it fit to use *two slices per disk* to make the RAID5,
> so when a single disk failed the RAID5 lost two members and all data
> was immediately lost. There was a hot spare pool configured, but they
> never associated it with any volumes, so it sat idly by. Not that it
> would have been able to do anything, unless there were read/write
> errors on only one of the two slices and the reconstruction had time
> to complete before errors were detected on the other slice... A
> further check of servers at the site showed that several others were
> configured with this ticking timebomb configuration... it was the
> definition of a hot sticky mess.
>   
Amazing what you find some places....   I inherited a mail server that 
had six hard drives on a raid controller.  I was horrified when i 
figured out months later that the single volume that the controller 
presented to the host was concatenated volume.  Not only did it not have 
redundancy, it didn't even have the data striped.  Murphy's law failed 
on that one, because none of the disks did.

    -Dan Sikorski



More information about the rescue mailing list