[SunHELP] SUN machine crash

Raghuraj, Ajay Ajay.Raghuraj at blr.hpsglobal.com
Sat Dec 20 01:22:08 CST 2003


Hi Everyone,

I have a SUN E450 machine with 4GB memory, 4CPU. Earlier the kernel patch
was 108526-03 and was rebooting quite often . I did run the crash dump
analysis and found nothing. I could not send it to SUN as I have run out of
the contract period. This is the error i found in /var/adm/messages after an
automatic reboot.

Nov 26 16:53:01 SPARKLE savecore: [ID 570001 auth.error] reboot after panic:
UE
Error: AFSR 0x00000000.00200000 AFAR 0x00000000.89438000 Id 0 Inst 0 MemMod
190x


I applied the latest patch kernel patch from SUN 108526-26 and now I am
seeing this :

Dec  1 14:53:05 SPARKLE SUNW,UltraSPARC-II: [ID 747708 kern.info] [AFT0]
Correct
ed Memory Error detected by CPU0, errID 0x00009ffc.55a1242f
Dec  1 14:53:05 SPARKLE     AFSR 0x00000000.00100000<CE> AFAR
0x00000000.daee711
8
Dec  1 14:53:05 SPARKLE     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00
Fault_PC 0
x1012782fc
Dec  1 14:53:05 SPARKLE     UDBL Syndrome 0xd6 Memory Module 1901
Dec  1 14:53:05 SPARKLE SUNW,UltraSPARC-II: [ID 725962 kern.info] [AFT0]
errID 0
x00009ffc.55a1242f Corrected Memory Error on 1901 is Intermittent
Dec  1 14:53:05 SPARKLE SUNW,UltraSPARC-II: [ID 240962 kern.info] [AFT0]
errID 0
x00009ffc.55a1242f ECC Data Bit 16 was in error and corrected

But the machine has not rebooted by itself after applying the patch. I want
to know if the patch has solved my problem . I read somewhere that the
latest patch offers better redability of error messages, but it also says
that ECC does not correct doublebit problems. Now am i seeing double bit
problems. Is my understanding of the problem on the machine correct? Please
help

Ajay Raghuraj
SSG-ITS
Extn: 2152

"The hand that rocks the cradle rules the world"



More information about the SunHELP mailing list