[rescue] ECC [was: Re: WOT: Ebay changes to IBM from Sun E10

Stephen D. B. Wolthusen stephen at wolthusen.com
Thu Jul 10 10:42:35 CDT 2003


On 10-Jul-2003 Dave McGuire wrote:

>    Properly-implemented ECC is the right thing to do.  Yes, it's more 
> expensive, but it's the right thing to do.  A memory error that would 
> otherwise bring a system to its knees will simply cause an entry into 
> an error log with ECC...turning unexpected downtime into scheduled 
> downtime.

Even if the system stops because the ECC can't correct the error (which happens
quite often, some failure modes result in burst errors that are not
correctable), that's more desirable since ECC gives you an indication that
something went wrong in the first place (bad data, anyone?) and which module to
replace if it wasn't a transient error. Although that shouldn't bring down the
entire system, only affected processes, and permit you to hot-swap whatever
component self-destructed.
 
>    With this "I like my stuff to work all the time" attitude, am I just 
> being too intolerant?
> 
>          -Dave

No, just mention the words `ECC' and `Ecache' to some customers of a certain
company, sit back, relax, and enjoy the flying spittle. 


-- 

        later,
        Stephen

Stephen Wolthusen (stephen at wolthusen.com)



More information about the rescue mailing list