[rescue] Drive Replacement Question

Ahmed Ewing aewing at gmail.com
Thu Sep 6 20:32:34 CDT 2007


On 9/6/07, Brian Deloria <bdeloria at gmail.com> wrote:
> I was mistaken when I originally said that the metadb's are only on the root
> devices.  The guy that set this up put one on each drive.  2x36 and 2x300
> drives.

That's fine as long as you have enabled the /etc/system flag I gave
earlier, or you don't mind the single-user bootup game occasionally.

For greater than 2 disks, my recommendation would be to set it up so
that no matter how many replicas you decide to use, the root disks
together account for greater than 50% of the amount. Consider  an
example host with two root submirrors, and an attached D130 for data,
so 5 disks total with a metadb on each... removing your D130, or even
a connectivity hiccup on it during boot, means you fail to achieve
quorum (2/5 or 40% replicas remaining). I'd double up on each of the
root disks so that it's 7 total, and the loss of 3 doesn't keep bring
everything down.

This is akin to Phil's suggestion, except that the number of replicas
on the root disks would float so that together they are never
outnumbered by the collective external storage. 3 apiece for each
6-slot Multipack, 6 apiece for each 12-slot D1000, etc. Even the new
larger size of 4mb each replica is still negligible on today's
cavernous disks.

But I digress...

> The drive wasn't marked for maintenance however it was complaining
> about needing to be replaced in dmesg.  I did a metadetach to the
> submirrors, cleared the metadb state information from the drive.  I then
> tried using using cfgadm replace_device.  It bitched a whole lot about the
> drive erroring and wanted me to remove the other submirrors (good) from the
> mirror.  I just laughed halted the machine powered it off and replaced the
> drive.

I would have done the same. No point in diddling with the hotswap
stuff if the hardware doesn't require it (like the T3) and you have
the luxury of being able to bring the box down.

> Booted it back up dumpted the vtoc from the good drive to the new drive,
> added the metadb information to it, then I tried using metareplace and
> metattach to put the submirrors back on to the mirror. It bitched that the
> device was not found.  ran drvconfig, devlinks, disks tried it again no
> dice.  Went back to the man pages and the walk throughs on sun's site found
> no mention of metadevadm.  Eventually found it with gooogle updated the
> metadevice information on the drive and the mirrors attached just fine.

Great! Glad it worked out, one way or another.

> It seems that I could have more easily just halted the machine pulled the
> disk put the new one in there and ran metareplace and been fine although I
> suspect I would have had to run metadevadm and then metareplace.  Anyone
> want to comment on this being easier / more dangerous?

Honestly, I know my procedure seemed long-winded vs. the other option
(hence my original disclaimer), but in my experience with the
metareplace -e method there's *much* more trouble than meets the
eye.One of the pitfalls is that the box can fail the disk on the next
reboot if the DevIDs get screwed up and it reads the new one but
expects the old one.

Of course, I'm all about hands on experimentation though, so next time
you have a box being staged for production and you're ahead of
schedule with time to play, pull a disk then use the metareplace -e /
metadevadm -u after inserting a different disk in its place. Use
"metastat | tail" to track the DevIDs before the swap, after the swap,
after a metadevadm, and after a reboot, and let us know how it goes.
Heck, try the metadevadm before the metareplace while you're at it.
But from my stance in a support role, Sun appears to only suggest the
metareplace -e for replacement of RAID-5 components... so when in Rome
with $FORTUNE100CORP's data availability at stake, I do as the Romans
do. All I know is, it seems to make for much shorter conference calls
on average :-)

If you have access to someone with a SunSolve account, take a look at
InfoDoc 73132, which now appears to endorse the steps I gave you
above, including the necessity for metadevadm.

Hope that helps,

-A



More information about the rescue mailing list