[SunHELP] CPU Memory error causes reboot
Raymond Wong
wong.raymond at pansar.com.my
Thu Apr 14 01:41:58 CDT 2005
Hi,
Our sun server self rebooted this morning.
Search seems to indicate that it's an ecache problem, but our's
involves 3 CPUs.
Please help to analyze the log entries & shed light on the nature of
the problem.
Extract of /var/adm/messages
~~~~~~~
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 868172
kern.warning] WARNING: [AFT1] WP event on CPU3, errID
0x0000046c.737b999f
Apr 14 08:47:04 machine-name AFSR 0x00000000.00800002<WP> AFAR
0x000001ff.f1500000
Apr 14 08:47:04 machine-name AFSR.PSYND 0x0002(Score 95) AFSR.ETS
0x00 Fault_PC 0x17fff6c
Apr 14 08:47:04 machine-name UDBH 0x0000 UDBH.ESYND 0x00 UDBL
0x0000 UDBL.ESYND 0x00
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 315841
kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU2 Data
access at TL=0, errID 0x0000046c.890e8026
Apr 14 08:47:04 machine-name AFSR 0x00000000.80200000<PRIV,UE>
AFAR 0x00000000.077c8568
Apr 14 08:47:04 machine-name AFSR.PSYND 0x0000(Score 05) AFSR.ETS
0x00 Fault_PC 0x1002533c
Apr 14 08:47:04 machine-name UDBH 0x0000 UDBH.ESYND 0x00 UDBL
0x0203<UE> UDBL.ESYND 0x03
Apr 14 08:47:04 machine-name UDBL Syndrome 0x3 Memory Module 190x
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359128
kern.warning] WARNING: [AFT1] errID 0x0000046c.890e8026 Syndrome 0x3
indicates that this may not be a memory module problem
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 127160 kern.info]
[AFT2] errID 0x0000046c.890e8026 PA=0x00000000.077c8568
Apr 14 08:47:04 machine-name E$tag 0x00000000.1cc000ef E$State:
Exclusive E$parity 0x0e
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x00): 0x03021764.030238bc
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x08): 0x0000004c.0300952c
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x10): 0x0302c2a4.00000060
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x18): 0x03025968.0303cfcc
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x20): 0x0000008c.02f78564
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 989652 kern.info]
[AFT2] E$Data (0x28): 0x02f78564.000020ec *Bad* PSYND=0x00ff
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x30): 0x02f78570.02f78570
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x38): 0x0000010c.0302e44c
Apr 14 08:47:04 machine-name unix: [ID 321153 kern.notice] NOTICE:
Scheduling clearing of error on page 0x00000000.077c8000
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 937065 kern.info]
[AFT3] errID 0x0000046c.890e8026 Above Error detected by protected
Kernel code
Apr 14 08:47:04 machine-name that will try to clear error from
system
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 387418
kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU2 Data
access at TL=0, errID 0x0000046c.8cc25fd7
Apr 14 08:47:04 machine-name AFSR 0x00000000.80200000<PRIV,UE>
AFAR 0x00000000.077c8568
Apr 14 08:47:04 machine-name AFSR.PSYND 0x0000(Score 05) AFSR.ETS
0x00 Fault_PC 0x1002533c
Apr 14 08:47:04 machine-name UDBH 0x0000 UDBH.ESYND 0x00 UDBL
0x0203<UE> UDBL.ESYND 0x03
Apr 14 08:47:04 machine-name UDBL Syndrome 0x3 Memory Module 190x
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 752377
kern.warning] WARNING: [AFT1] errID 0x0000046c.8cc25fd7 Syndrome 0x3
indicates that this may not be a memory module problem
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 999817 kern.info]
[AFT2] errID 0x0000046c.8cc25fd7 PA=0x00000000.077c8568
Apr 14 08:47:04 machine-name E$tag 0x00000000.1cc000ef E$State:
Exclusive E$parity 0x0e
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x00): 0x03021764.030238bc
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x08): 0x0000004c.0300952c
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x10): 0x0302c2a4.00000060
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x18): 0x03025968.0303cfcc
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x20): 0x0000008c.02f78564
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 989652 kern.info]
[AFT2] E$Data (0x28): 0x02f78564.000020ec *Bad* PSYND=0x00ff
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x30): 0x02f78570.02f78570
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x38): 0x0000010c.0302e44c
Apr 14 08:47:04 machine-name unix: [ID 321153 kern.notice] NOTICE:
Scheduling clearing of error on page 0x00000000.077c8000
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 676056 kern.info]
[AFT3] errID 0x0000046c.8cc25fd7 Above Error detected by protected
Kernel code
Apr 14 08:47:04 machine-name that will try to clear error from
system
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 612880
kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU0 Data
access at TL=0, errID 0x0000046c.94ea7880
Apr 14 08:47:04 machine-name AFSR 0x00000000.00200000<UE> AFAR
0x00000000.077c8568
Apr 14 08:47:04 machine-name AFSR.PSYND 0x0000(Score 05) AFSR.ETS
0x00 Fault_PC 0x1880064
Apr 14 08:47:04 machine-name UDBH 0x0000 UDBH.ESYND 0x00 UDBL
0x0203<UE> UDBL.ESYND 0x03
Apr 14 08:47:04 machine-name UDBL Syndrome 0x3 Memory Module 190x
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 307235
kern.warning] WARNING: [AFT1] errID 0x0000046c.94ea7880 Syndrome 0x3
indicates that this may not be a memory module problem
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 243430 kern.info]
[AFT2] errID 0x0000046c.94ea7880 PA=0x00000000.077c8568
Apr 14 08:47:04 machine-name E$tag 0x00000000.1cc000ef E$State:
Exclusive E$parity 0x0e
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x00): 0x03021764.030238bc
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x08): 0x0000004c.0300952c
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x10): 0x0302c2a4.00000060
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x18): 0x03025968.0303cfcc
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x20): 0x0000008c.02f78564
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 989652 kern.info]
[AFT2] E$Data (0x28): 0x02f78564.000020ec *Bad* PSYND=0x00ff
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x30): 0x02f78570.02f78570
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
[AFT2] E$Data (0x38): 0x0000010c.0302e44c
Apr 14 08:47:04 machine-name unix: [ID 321153 kern.notice] NOTICE:
Scheduling clearing of error on page 0x00000000.077c8000
Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 370242 kern.info]
[AFT3] errID 0x0000046c.94ea7880 Above Error is in User Mode
Apr 14 08:47:04 machine-name and is fatal: will reboot
Apr 14 08:47:04 machine-name unix: [ID 855177 kern.warning] WARNING:
[AFT1] initiating reboot due to above error in pid 1470 (oracle)
Apr 14 08:47:20 machine-name unix: [ID 221039 kern.notice] NOTICE:
Previously reported error on page 0x00000000.077c8000 cleared
Apr 14 08:49:04 machine-name pseudo: [ID 129642 kern.info]
pseudo-device: tod0
Apr 14 08:49:04 machine-name genunix: [ID 936769 kern.info] tod0 is
/pseudo/tod at 0
Apr 14 08:49:04 machine-name syslogd: going down on signal 15
Apr 14 08:49:33 machine-name genunix: [ID 672855 kern.notice] syncing
file systems...
Apr 14 08:49:33 machine-name genunix: [ID 904073 kern.notice] done
~~~~~~~
Thanks,
Raymond Wong
System & Network Engineer
Pan Sarawak Company Sdn Bhd
***************************************************************
CAUTION: This e-mail is confidential and may contain privileged
information. If you are not the intended recipient, you must
not disclose or use the information. If you have received this
e-mail in error, please notify by returning the e-mail and
delete the document. Any views expressed message are those of
the individual sender, and may not necessarily reflect the
views of the Company.
***************************************************************
More information about the SunHELP
mailing list