[rescue] weird Opteron 865 / Tyan 4882 problem

Patrick Giagnocavo patrick at zill.net
Tue Jul 14 21:20:31 CDT 2009


Jonathan Patschke wrote:
> On Tue, 14 Jul 2009, Patrick Giagnocavo wrote:
> 
>> My thinking is that either:
>>
>> 1.  CPU1 is partially busted (specifically its memory controller) and
>> should be replaced
> 
> Swap CPU0 and CPU1 to see if the problem follows the CPU.

Each CPU is connected with a heat pipe to another one, e.g.

CPU0 <--> CPU2
CPU1 <--> CPU3

Looks like I may have to hold off on deployment until I can fully debug
this sucker.

>> 2.  There is a problem with the DIMM slots itself, like maybe a resistor
>> or some other electrical channel problem.
> 
> Swap the memory between CPU0 and CPU1 to see if the problem follows the
> memory.

I have swapped the memory, it works in the other slot.

>> Any ideas?  Suggestions for further testing?
> 
> Same as it ever was: set up two scenarios identical but for one variable
> (or a matrix with multiple changes) until you find the one variable that
> wiggles.

Yeah, that sucks :-)

Cordially

Patrick



More information about the rescue mailing list