[SunHELP] CPU Memory error causes reboot

Raymond Wong wong.raymond at pansar.com.my
Thu Apr 14 01:41:58 CDT 2005


   Hi,
   Our sun server self rebooted this morning.
   Search seems to indicate that it's an ecache problem, but our's
   involves 3 CPUs.
   Please help to analyze the log entries & shed light on the nature of
   the problem.
   Extract of /var/adm/messages
   ~~~~~~~
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 868172
   kern.warning] WARNING: [AFT1] WP event on CPU3, errID
   0x0000046c.737b999f
   Apr 14 08:47:04 machine-name     AFSR 0x00000000.00800002<WP> AFAR
   0x000001ff.f1500000
   Apr 14 08:47:04 machine-name     AFSR.PSYND 0x0002(Score 95) AFSR.ETS
   0x00 Fault_PC 0x17fff6c
   Apr 14 08:47:04 machine-name     UDBH 0x0000 UDBH.ESYND 0x00 UDBL
   0x0000 UDBL.ESYND 0x00
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 315841
   kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU2 Data
   access at TL=0, errID 0x0000046c.890e8026
   Apr 14 08:47:04 machine-name     AFSR 0x00000000.80200000<PRIV,UE>
   AFAR 0x00000000.077c8568
   Apr 14 08:47:04 machine-name     AFSR.PSYND 0x0000(Score 05) AFSR.ETS
   0x00 Fault_PC 0x1002533c
   Apr 14 08:47:04 machine-name     UDBH 0x0000 UDBH.ESYND 0x00 UDBL
   0x0203<UE> UDBL.ESYND 0x03
   Apr 14 08:47:04 machine-name     UDBL Syndrome 0x3 Memory Module 190x
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359128
   kern.warning] WARNING: [AFT1] errID 0x0000046c.890e8026 Syndrome 0x3
   indicates that this may not be a memory module problem
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 127160 kern.info]
   [AFT2] errID 0x0000046c.890e8026 PA=0x00000000.077c8568
   Apr 14 08:47:04 machine-name     E$tag 0x00000000.1cc000ef E$State:
   Exclusive E$parity 0x0e
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x00): 0x03021764.030238bc
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x08): 0x0000004c.0300952c
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x10): 0x0302c2a4.00000060
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x18): 0x03025968.0303cfcc
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x20): 0x0000008c.02f78564
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 989652 kern.info]
   [AFT2] E$Data (0x28): 0x02f78564.000020ec *Bad* PSYND=0x00ff
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x30): 0x02f78570.02f78570
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x38): 0x0000010c.0302e44c
   Apr 14 08:47:04 machine-name unix: [ID 321153 kern.notice] NOTICE:
   Scheduling clearing of error on page 0x00000000.077c8000
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 937065 kern.info]
   [AFT3] errID 0x0000046c.890e8026 Above Error detected by protected
   Kernel code
   Apr 14 08:47:04 machine-name     that will try to clear error from
   system
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 387418
   kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU2 Data
   access at TL=0, errID 0x0000046c.8cc25fd7
   Apr 14 08:47:04 machine-name     AFSR 0x00000000.80200000<PRIV,UE>
   AFAR 0x00000000.077c8568
   Apr 14 08:47:04 machine-name     AFSR.PSYND 0x0000(Score 05) AFSR.ETS
   0x00 Fault_PC 0x1002533c
   Apr 14 08:47:04 machine-name     UDBH 0x0000 UDBH.ESYND 0x00 UDBL
   0x0203<UE> UDBL.ESYND 0x03
   Apr 14 08:47:04 machine-name     UDBL Syndrome 0x3 Memory Module 190x
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 752377
   kern.warning] WARNING: [AFT1] errID 0x0000046c.8cc25fd7 Syndrome 0x3
   indicates that this may not be a memory module problem
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 999817 kern.info]
   [AFT2] errID 0x0000046c.8cc25fd7 PA=0x00000000.077c8568
   Apr 14 08:47:04 machine-name     E$tag 0x00000000.1cc000ef E$State:
   Exclusive E$parity 0x0e
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x00): 0x03021764.030238bc
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x08): 0x0000004c.0300952c
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x10): 0x0302c2a4.00000060
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x18): 0x03025968.0303cfcc
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x20): 0x0000008c.02f78564
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 989652 kern.info]
   [AFT2] E$Data (0x28): 0x02f78564.000020ec *Bad* PSYND=0x00ff
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x30): 0x02f78570.02f78570
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x38): 0x0000010c.0302e44c
   Apr 14 08:47:04 machine-name unix: [ID 321153 kern.notice] NOTICE:
   Scheduling clearing of error on page 0x00000000.077c8000
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 676056 kern.info]
   [AFT3] errID 0x0000046c.8cc25fd7 Above Error detected by protected
   Kernel code
   Apr 14 08:47:04 machine-name     that will try to clear error from
   system
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 612880
   kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU0 Data
   access at TL=0, errID 0x0000046c.94ea7880
   Apr 14 08:47:04 machine-name     AFSR 0x00000000.00200000<UE> AFAR
   0x00000000.077c8568
   Apr 14 08:47:04 machine-name     AFSR.PSYND 0x0000(Score 05) AFSR.ETS
   0x00 Fault_PC 0x1880064
   Apr 14 08:47:04 machine-name     UDBH 0x0000 UDBH.ESYND 0x00 UDBL
   0x0203<UE> UDBL.ESYND 0x03
   Apr 14 08:47:04 machine-name     UDBL Syndrome 0x3 Memory Module 190x
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 307235
   kern.warning] WARNING: [AFT1] errID 0x0000046c.94ea7880 Syndrome 0x3
   indicates that this may not be a memory module problem
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 243430 kern.info]
   [AFT2] errID 0x0000046c.94ea7880 PA=0x00000000.077c8568
   Apr 14 08:47:04 machine-name     E$tag 0x00000000.1cc000ef E$State:
   Exclusive E$parity 0x0e
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x00): 0x03021764.030238bc
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x08): 0x0000004c.0300952c
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x10): 0x0302c2a4.00000060
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x18): 0x03025968.0303cfcc
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x20): 0x0000008c.02f78564
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 989652 kern.info]
   [AFT2] E$Data (0x28): 0x02f78564.000020ec *Bad* PSYND=0x00ff
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x30): 0x02f78570.02f78570
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
   [AFT2] E$Data (0x38): 0x0000010c.0302e44c
   Apr 14 08:47:04 machine-name unix: [ID 321153 kern.notice] NOTICE:
   Scheduling clearing of error on page 0x00000000.077c8000
   Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 370242 kern.info]
   [AFT3] errID 0x0000046c.94ea7880 Above Error is in User Mode
   Apr 14 08:47:04 machine-name     and is fatal: will reboot
   Apr 14 08:47:04 machine-name unix: [ID 855177 kern.warning] WARNING:
   [AFT1] initiating reboot due to above error in pid 1470 (oracle)
   Apr 14 08:47:20 machine-name unix: [ID 221039 kern.notice] NOTICE:
   Previously reported error on page 0x00000000.077c8000 cleared
   Apr 14 08:49:04 machine-name pseudo: [ID 129642 kern.info]
   pseudo-device: tod0
   Apr 14 08:49:04 machine-name genunix: [ID 936769 kern.info] tod0 is
   /pseudo/tod at 0
   Apr 14 08:49:04 machine-name syslogd: going down on signal 15
   Apr 14 08:49:33 machine-name genunix: [ID 672855 kern.notice] syncing
   file systems...
   Apr 14 08:49:33 machine-name genunix: [ID 904073 kern.notice]  done
   ~~~~~~~
   Thanks,
   Raymond Wong
   System & Network Engineer
   Pan Sarawak Company Sdn Bhd
   ***************************************************************
   CAUTION: This e-mail is confidential and may contain privileged
   information. If you are not the intended recipient, you must
   not disclose or use the information. If you have received this
   e-mail in error, please notify by returning the e-mail and
   delete the document. Any views expressed message are those of
   the individual sender, and may not necessarily reflect the
   views of the Company.
   ***************************************************************



More information about the SunHELP mailing list