[rescue] Disksuite 4.2.1 help

Tue Jul 19 15:27:16 CDT 2011

    Greetings Rescuers,

    Here's one that is rescue-worthy, and still toilling along faith-
fully after all these years -- a Sun Netra NFS (a rebadged Enterprise
150).  This box has seen a boatload of upgrades since beginning her
association with me in the mid 1990s and two jobs ago, including more
memory, replacing most of the 4-Gig disks with 36es, and jumping from
Solaris 2.5.1 with the Netra NFS custom bits to the latest Solaris 8
that there was to be had.

    What's biting me in the backside now is an array problem.  Drives
die every so often (I need to go in and re-crimp all the connectors
on the SCSI chain), so I need to have a hot-spare at all times.  The
thing kicked out again a while ago, allocated the hot-spare, recalced
the RAID, then ran silently for about three weeks before another drive
ran out of bad-block reallocation sectors and put the array back into
a Maintenance state.

    Here's the configuration:  The OS disk is a 4 Gigger st c0t0d0
with a mirror in the lower bay (c1t15d0); there used to be a hot-
spare for that set in c1t14d0 but that was removed and the hot-spare
pool destroyed.  The main array runs between c1t2d0 to c1t12d0 with
a hot-spare in c1t13d0.  I now want to add another 36 Gig drive in
c1t14d0 and when I try to add it to the hor-spare pool get this:

bash-2.03# metahs -a hsp001 c1t14d0s0
metahs: arachne: c1t14d0s0: overlaps with device in d127

In this case, d127 is the main RAID in which there are several soft
partitions that contain filesystems:

bash-2.03# metastat d127
d127: RAID
     State: Okay
     Hot spare pool: hsp001
     Interlace: 32 blocks
     Size: 536888871 blocks
Original device:
     Size: 536889088 blocks
         Device      Start Block  Dbase State        Hot Spare
         c1t2d0s0         330     No    Okay
         c1t3d0s0         330     No    Okay
         c1t4d0s0         330     No    Okay
         c1t5d0s0         330     No    Okay
         c1t8d0s0         330     No    Okay
         c1t9d0s0         330     No    Okay
         c1t10d0s0        330     No    Okay
         c1t11d0s0        330     No    Okay
         c1t12d0s0        330     No    Okay

bash-2.03# metastat hsp001
hsp001: 1 hot spare
         c1t13d0s0       Available       67111470 blocks

    There must be some remnant memory of something that's causing the
error, and I can't for life nor Google find anything about it.  If
need be, I can back the array up, destroy it -- and the rest of the
Disksuite configuration -- then recreate from scratch and restore it,
but I'd really rather not go to that length because it takes about
20 hours just to initialise the main RAID set (resyncing one of the
36 Giggers takes about 16 hours).

    Is there any "magic bullet" for this, or am I condemned to spend
several days' worth of time rebuilding from scratch?

    Ideas are welcome.  Thanks in advance.  Suggestions to "upgrade to
Linux" will go to NULL:  ;-)

    Cheers!

+------------------------------------------------+---------------------+
| Carl Richard Friend (UNIX Sysadmin)            | West Boylston       |
| Minicomputer Collector / Enthusiast            | Massachusetts, USA  |
| mailto:crfriend at rcn.com                        +---------------------+
| http://users.rcn.com/crfriend/museum           | ICBM: 42:22N 71:47W |
+------------------------------------------------+---------------------+