[SunHELP] 'Bad Slots' on Enterprise 6500 and 3500

David Froman dfroman at cray-powered.org
Fri Jul 21 21:38:08 CDT 2006


I have on each Enterprise 3500 and Enterprise 6500.

Originally I configured the 6500 with 16, 400mhz processors.  Filling slots
0-12 in the front, and slot 3 in the read with mem/proc boards.  The 6500 ran
fine like this for several months.  I then decided to add 12 more processors
into slot 14 in the front and slots 5-13 in the rear.  This is where things
went south.  I powered off the server and installed the new boards.  When I
rebooted I received POST memory failures for every board that had memory
installed.  After much troubleshooting I was able to get the server to boot up
with 20 processors, using slots 0,2,4,10,12,14 in the front and slots 3,
7,9,13 in the rear.  I can use any board in any of these slots, but I can use
not boards in slots 6,8,5,11,15; even if they work in other slots.

The story continues.

After running the 6500 for a few months the electric bill was getting a tad
steep, so I decided to have a 3500 take over for the 6500.  The 3500 was
running 8 336mhz processors so I figured I would replace them with 400's from
the 6500.  The 3500 had been up for several months as well and had not
previous issues..  I pulled boards from slots 0,2,4 and 10 on the 6500 and
installed them in the 3500, which had been online 5 minutes earlier with no
problems.  When I booted up I received memory errors on every board.  I
tracked the problem with the 3500 down to slot 7.  Any board in slot 7
produces errors for every board.  At first I figured that maybe just all the
boards in the 6500 were 'flaky' so I reverted back to the original 336mhz
processors still in their original boards.  But the problem remains.  Slot 7
produces errors for every board.

I've dug around in the Sun guides ever since this problem first presented on
the 6500, but now it's become a major issue.  The 3500 needs to manage a
database that's large enough to where I want to maximize the memory.  If this
problem is correctable I would give most anything to learn how to clear it.

Once the memory errors run through though, the POST hangs at (PRIV)
Priviledged Code, (TO) Time Out Error.  OBP does not launch so I never get to
a point to attempt any corrective measures via the console.

So my questions are:

Has anyone seen anything like this before?

Are these slots really fried?

Any help or ideas would be greatly appreciated.

Thanks,
Dave



More information about the SunHELP mailing list