[SunRescue] fun SCSI errors

Greg A. Woods rescue at sunhelp.org
Fri May 18 16:09:33 CDT 2001


[ On Friday, May 18, 2001 at 14:06:32 (-0500), Dan Debertin wrote: ]
> Subject: [SunRescue] fun SCSI errors
>
> I have a known-to-be-pretty-good Seagate drive that Solaris 2.5 just
> hates. Here are some of the boot messages:
> 
> WARNING: /iommu at f,e0000000/sbus at f,e0001000/dma at f,81000/esp at f,80000 (esp0):
>         Connected command timeout for Target 5.0
> WARNING: /iommu at f,e0000000/sbus at f,e0001000/dma at f,81000/esp at f,80000 (esp0):
>         Target 5.0 reducing sync. transfer rate
> 
> That is followed by SCSI timeouts for almost every disk on the bus.
> When I remove this disk from the bus, everything works fine. It's a
> Seagate ST51080N; the enclosure is centronics-50, which I convert to
> narrow SCSI2 to plug into my 600MP.
> 
> The drive works fine on a NetBSD/i386 box, but maybe Solaris is more
> sensitive or something. Termination is not a problem, and I have carefully
> audited the SCSI ID's.

How do you know that "termination is not a problem"?  Have you measured
the impedance of the bus and the termination power at both ends of the
bus?

The only way to be absolutely sure would be to take the exact same
physical bus that's in the Sun, including all the devices but the HA,
and attach it to the i386 box, and then vice versa.

Obviously this might not be possible in the strictest sense, but you
should endeavour to get at least the same mix.  Moving one device from
one bus to another and then blaming the problem on the host is simply
not a valid test except in very particular circumstances.

You may have termination power problems, bus-length problems, bad
connectors or cables or whatever, etc.  You may even have power-supply
problems, depending on where you're getting the power for this device
when you add it to the Sun's bus.

"Solaris" is not more sensitive to SCSI errors than anything else
(though it may be better at reporting them).  However many Sun boxes are
more exacting w.r.t. requiring that your bus and the devices on it meet
the necessary specifications.  I don't know exactly why this is, but it
seems that their bus interface electronics are either less (or more?)
sensitive, and/or report errors to the SCSI chip more readily.  Modern
high-speed adapters on PC's can sometimes be just as discerning though,
and indeed many Solaris boxes now use the very same Adaptec PCI host
adapters.

I would suggest making sure that the drive is supplying termination
power (but not termination), and then try swapping cables, terminators,
and even taking other stuff off the bus on the Sun for testing.

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods at acm.org>     <woods at robohack.ca>
Planix, Inc. <woods at planix.com>;   Secrets of the Weird <woods at weird.com>



More information about the rescue mailing list