[geeks] memtest86 question (are correctable ECC errors, errors?)

Jonathan C. Patschke jp at celestrion.net
Wed Sep 26 01:05:17 CDT 2007


On Wed, 26 Sep 2007, Patrick Giagnocavo wrote:

> I have been running memtest86 against a 16GB RAM , dual Opteron 248
> CPU system for almost 5 hours.
>
> During that time, there have been no errors, but, there have been 215
> ECC errors, all of which were corrected.

That's an -awful- lot for only 5 hours of testing time.  You should see
approximately zero.  $ork has a couple of 64GB systems, and they take a
large chunk of a day to do a full run of memtest86.  The goal of burn-in
for those systems is zero errors over the course of a week.

> Can someone who knows more than me about memory architecture explain
> whether that means the RAM is bad or is it OK?

Think of it like RAID.  You're having to rely on your parity data a lot
more than you'd really like.

If it were non-ECC memory, you'd have an extremely unreliable system.
Since you have ECC, you have a system that is hopefully reliable, but
whose memory is failing.  It just hasn't failed enough to eat through
the redundancy yet.

Try re-seating the memory first.  Macs Pro, for instance, tend to throw
very large numbers of parity errors if the memory modules aren't seated
absolutely perfectly (I'd seen upwards of 4000/hr) but will run without
other ill effects.

If anything, this illustrates why ECC should be a mandatory requirement
for anything other than a gaming/media-playback system.

-- 
Jonathan Patschke     )  "So far, 99% of illegal activity has been caused
Elgin, TX            (    by criminals."
USA                   )                                    --David Willis



More information about the geeks mailing list