[geeks] computer room gallery 8-)

Peter L. Wargo geeks at sunhelp.org
Fri Jan 4 21:50:57 CST 2002


On Fri, 4 Jan 2002, Eric Dittman wrote:

> The E10K was the one with the faulty cache design that Sun denied
> for over a year (and fixed only if the customer signed an NDA),
> wasn't it?

I must respond, being somewhat sensitive about this issue...

The problem was not E10K-specific, and Sun did not require an NDA to fix
it.  The issue was that the processor modules were originally designed
with much smaller caches than the 4 & 8M used in the later units. Since
they were small when originally designed, the decision was made not to use
ECC on the cache memory. (IMO, a dumb idea)  When the cache size grew, the
density increased, and the SRAM became vulnerable to cosmic radiation and
other particles.

Sun kept sites it was investigating under NDA until they were sure what
the problem was.  (A factor that added to the confusion was that SRAM from
one manufacturer seemed much more stable than that of another.)  Once they
knew what the issue really was, they produced a revised module, and have
swapped them at any sites that evidemce a problem. (Some sites seem more
problematic than others, especially those with bad datacenter conditions.
Also, I believe (not sure) that altitude can be an aggrivating factor.)

I think it was a wake-up call for Sun, as they have always been good as a
reactive company, but not very proactive.  I know that they are now
putting a real effort into RAS in product design.  

-Pete



More information about the geeks mailing list