[geeks] Sun to adopt newest Intel Xeon chips for upcoming servers (link)

Charles Shannon Hendrix shannon at widomaker.com
Tue Jan 23 19:47:07 CST 2007


Tue, 23 Jan 2007 @ 14:41 -0500, Patrick Giagnocavo said:

> Amazingly however, I have found a challenge with non-Sun memory in the 
> V20z, even though the same RAM works in HP and IBM systems.  I would 
> have thought that the memory controller being integrated would mean 
> that the same RAM would work with every Opteron.

Assuming all three systems are fairly close in architecture, there is no
reason for the memory to fail in one and work in two others with AMD
CPUs.  They all must be nearly identical since the CPU dictates the
design of the memory bus.

DIMMs have a SPD ROM which tells the system what memory timing to use.
If the timing is too aggressive for a Sun, that would cause problems for
you.

Likewise if the Sun is incapable of memory configuration changes (it has
a fixed speed and latency setting), then a DIMM that is too slow could
cause a problem. However, any modern DIMM that is worth a damn should be
more than fast enough for Sun. Most of the time Sun is conservative on
memory configuration.

If the memory timing is OK, then it seems to me there is something else
wrong.

Even with an integrated controller, the motherboard and some misc
components still affect memory.

The obvious stuff:

	- bad memory bus traces on the motherboard
	- poor quality or damaged DIMM slots
	- poor quality or damaged DIMM voltage input
	- poor quality or damaged power supply
	- overloaded power supply
	- using a mix of DIMMs with different timings or different numbers
	  of ranks, or different maximum speed ratings
	- a damaged CPU

If all of the above is OK, then that leaves the CPU.

You can do a fairly reliable test of the CPU with the marsienne prime
number "heat test".  It's a prime number program which causes most any
CPU to generate a large amount of heat and it pushes them pretty hard.

Most of the time the heat test can detect a bad CPU (or bad
configuration of a CPU and memory) in a few minutes or a couple of
hours.  If possible, it is better to run for a full 24 hours, as that
finds almost all problems.

Rarely, it can take as long as a week for some CPUs to show an error,
but usually 2-24 hours is enough.

If all of that checks out... then maybe you aren't holding your mouth
right.




-- 
shannon "AT" widomaker.com -- ["I want this Perl software checked for
viruses.  Use Norton Antivirus." -- Charlie Kirkpatrick]



More information about the geeks mailing list