[rescue] SUn 420r and D1000 Array Problems

Ahmed Ewing aewing at gmail.com
Mon Jul 16 00:32:26 CDT 2007


On 7/15/07, Chris Brandstetter <sirloxelroy at gmail.com> wrote:
> Dear All,
>      I have never really used a Sun D1000 system before and seem to have a
> problem.  I installed Sol 10 on the 420r.  It has a Symbios dual channel

The A1000 (D1000 with hardware RAID controller) is no longer supported
in Solaris 10. Don't know about the D1000 exactly, but since it's a
JBOD there really is no reason why it wouldn't work with Sol10. So
that's good.

> differential scsi controller in it.  THe D1000 has 12 9.1GB Hard Drives in
> it.  The dip switches on the D1000 are up up down down up.  I have also
> tried up down down down up.

Heh, I hate D1000s sometimes.

I assume this is being read them from left to right while the SCSI
interface board is installed in the chassis. However, the board slides
in upside down (i.e., solder-side up, component-side down), and while
I don't have one in front of me right now, I'm pretty sure the dips
are labeled 5-4-3-2-1 accordingly. This would make your current config
correspond to the following:

5 - UP - Reserved (default is down)
4 - UP - 12sec x ID# delay start (default)
3 - DOWN - Use dip 4 setting for start behavior (default)
2 - DOWN - Left half of array uses IDs 0-5 (default)
1 - UP - Right half of array uses IDs 8-13 (default)

I would definitely get that reserved switch back to the default
position before proceeding, to remove that variable from the equation.

> When I startup the sun do a stop a and go into the NVRAM.  I can do a
> probe-scsi-all and it sees and identifies all the drives in the array.  I
> boot up Sol 10 and try and use SMC to setup the drives I get a generic
> CIM_ERR_FAILED when clicking on disks.  I then halt from the console which
> drops me back to the NVRAM and do a probe-scsi-all and I get bus errors on
> the 2nd chain that is not hooked up off the sumbios card.  I have also

Issuing "probe-scsi-all" anytime after kernel load is not a good idea
and yields unpredictable results, even hard hangs. Later OBP revs warn
you against doing this before a "reset-all" is issued and request a
confirmation; newer Sun Blade / Fire systems actually issue the
"reset-all" no matter what.

> included the dmesg output here.
>
> [snip]
>
> Sorry for the logs, but I figure they might help.  Any help would be greatly
> appreciated.

They do help, by proving that all disks in the D1000 are present and
accounted for from the Solaris dev tree perspective (it probably
wouldn't hurt to make sure that format sees them all properly with no
hiccups too). After breaking out the dev paths, the D1000 appears to
be in single-bus cable configuration:

420R -> PCI Slot 1 -> Dual HVD adapter port A (upper) -> D1000

I also assume that the D1000 itself is connected with the host cable
on the interface board's far left or right HD-68 port, a jumper cable
between the two inner ports, and an HVD terminator on the remaining
outside port. All this looks ok, and would suggest that there's no
physical configuration issue.

The only suggestions I have off the bat would be:

1) Take out the middleman, so to speak, by setting SMC aside and using
CLI commands to manipulate the disks for the duration of the
troubleshooting effort. You never did go into detail with what kind of
"setup" was being performed, but the CLI equivalent of whatever you're
attempting will serve as a control in the experiment and help
determine whether or not SMC itself is not playing nice with the
storage.

2) If CLI commands work to perform the desired tasks but SMC doesn't,
check for the presence of the Solaris 10 SMC patch:

# showrev -p | grep 121308

The latest and greatest is -09. It's publicly available from SunSolve:
http://sunsolve8.sun.com/search/document.do?assetkey=1-21-121308-09-1

3) If neither SMC nor CLI approaches work, take note of the CLI error,
which will be much less generic than the "CIM_ERR_FAILED" that SMC
returned since it'll be a specific command returning it instead of an
abstract layer.

4) Also, try addressing only one half of the D1000 at a time by
connecting the host cable to an "outer" port, then placing the HVD
terminator on a directly adjacent "inner" port, with nothing on the
other two ports on the other half of the interface board. This will
effectively bisect the array's disk backplane and help identify a
possible hardware issue with either a disk or a backplane segment.
After testing one side, repeat for the other. If problems persist
while addressing the left half and not the right, or vice versa,
you'll have a much better basis for continuing the troubleshooting
effort.

Hope that helps,

-A



More information about the rescue mailing list