[SunRescue] Logging memory errors

Paul Khoury rescue at sunhelp.org
Tue Dec 5 02:17:24 CST 2000


On Mon, 04 Dec 2000 21:12:14 -0800, Paul Theodoropoulos wrote:

>That's controlled by your /etc/syslog.conf. just specify that you 
>want whatever was sent to console (should also be insted in your 
>syslog.conf) to go to a specific logfile. I have it go to 
>/var/adm/messages.
>
>I've been getting the following in my messages log on one of my 
>e4500's for months now -
>
>Dec  4 20:40:31 e4500a unix: CPU0 CE Error: AFSR 
>0x00000000.00100000 AFAR 0x00000000.7f755c10 UDBH MemMod Board 0 
>J3800
>Dec  4 20:40:31 e4500a unix:    Syndrome 0xf8 Size 3 Offset 0 UPA 
>MID 0
>Dec  4 20:40:31 e4500a unix: Softerror: Persistent ECC Memory Error
>Dec  4 20:40:31 e4500a unix:  Corrected MemMod Board 0 J3800
>Dec  4 20:40:31 e4500a unix:    ECC Data Bit 11 was corrected
>
>Haven't had time to swap out the module. Just keeps running and 
>running, doesn't bat an eyelash.
>
>I refuse to use anything but SPARC running Solaris for core 
>infrastructure. Nothing is as reliable.
>

I agree.  I was comtemplating shutting down the machine, and swapping out RAM
that's known good, but why should I when the machine has been running 68 days? =)

How do the memory errors work, BTW?  Does Solaris just map around them in realtime?
I'm sure Linux would have a fit if it encountered that.

Paul





More information about the rescue mailing list