[SunHELP] CPU Memory error causes reboot
Michael Horton
Michael.Horton at acntv.com
Thu Apr 14 09:06:37 CDT 2005
Uncorrectable Memory Error on CPU2 Data
access at TL=0, errID 0x0000046c.890e8026
Raymond Wong wrote:
> Hi,
> Our sun server self rebooted this morning.
> Search seems to indicate that it's an ecache problem, but our's
> involves 3 CPUs.
> Please help to analyze the log entries & shed light on the nature of
> the problem.
> Extract of /var/adm/messages
> ~~~~~~~
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 868172
> kern.warning] WARNING: [AFT1] WP event on CPU3, errID
> 0x0000046c.737b999f
> Apr 14 08:47:04 machine-name AFSR 0x00000000.00800002<WP> AFAR
> 0x000001ff.f1500000
> Apr 14 08:47:04 machine-name AFSR.PSYND 0x0002(Score 95) AFSR.ETS
> 0x00 Fault_PC 0x17fff6c
> Apr 14 08:47:04 machine-name UDBH 0x0000 UDBH.ESYND 0x00 UDBL
> 0x0000 UDBL.ESYND 0x00
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 315841
> kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU2 Data
> access at TL=0, errID 0x0000046c.890e8026
> Apr 14 08:47:04 machine-name AFSR 0x00000000.80200000<PRIV,UE>
> AFAR 0x00000000.077c8568
> Apr 14 08:47:04 machine-name AFSR.PSYND 0x0000(Score 05) AFSR.ETS
> 0x00 Fault_PC 0x1002533c
> Apr 14 08:47:04 machine-name UDBH 0x0000 UDBH.ESYND 0x00 UDBL
> 0x0203<UE> UDBL.ESYND 0x03
> Apr 14 08:47:04 machine-name UDBL Syndrome 0x3 Memory Module 190x
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359128
> kern.warning] WARNING: [AFT1] errID 0x0000046c.890e8026 Syndrome 0x3
> indicates that this may not be a memory module problem
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 127160 kern.info]
> [AFT2] errID 0x0000046c.890e8026 PA=0x00000000.077c8568
> Apr 14 08:47:04 machine-name E$tag 0x00000000.1cc000ef E$State:
> Exclusive E$parity 0x0e
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x00): 0x03021764.030238bc
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x08): 0x0000004c.0300952c
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x10): 0x0302c2a4.00000060
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x18): 0x03025968.0303cfcc
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x20): 0x0000008c.02f78564
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 989652 kern.info]
> [AFT2] E$Data (0x28): 0x02f78564.000020ec *Bad* PSYND=0x00ff
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x30): 0x02f78570.02f78570
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x38): 0x0000010c.0302e44c
> Apr 14 08:47:04 machine-name unix: [ID 321153 kern.notice] NOTICE:
> Scheduling clearing of error on page 0x00000000.077c8000
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 937065 kern.info]
> [AFT3] errID 0x0000046c.890e8026 Above Error detected by protected
> Kernel code
> Apr 14 08:47:04 machine-name that will try to clear error from
> system
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 387418
> kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU2 Data
> access at TL=0, errID 0x0000046c.8cc25fd7
> Apr 14 08:47:04 machine-name AFSR 0x00000000.80200000<PRIV,UE>
> AFAR 0x00000000.077c8568
> Apr 14 08:47:04 machine-name AFSR.PSYND 0x0000(Score 05) AFSR.ETS
> 0x00 Fault_PC 0x1002533c
> Apr 14 08:47:04 machine-name UDBH 0x0000 UDBH.ESYND 0x00 UDBL
> 0x0203<UE> UDBL.ESYND 0x03
> Apr 14 08:47:04 machine-name UDBL Syndrome 0x3 Memory Module 190x
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 752377
> kern.warning] WARNING: [AFT1] errID 0x0000046c.8cc25fd7 Syndrome 0x3
> indicates that this may not be a memory module problem
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 999817 kern.info]
> [AFT2] errID 0x0000046c.8cc25fd7 PA=0x00000000.077c8568
> Apr 14 08:47:04 machine-name E$tag 0x00000000.1cc000ef E$State:
> Exclusive E$parity 0x0e
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x00): 0x03021764.030238bc
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x08): 0x0000004c.0300952c
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x10): 0x0302c2a4.00000060
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x18): 0x03025968.0303cfcc
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x20): 0x0000008c.02f78564
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 989652 kern.info]
> [AFT2] E$Data (0x28): 0x02f78564.000020ec *Bad* PSYND=0x00ff
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x30): 0x02f78570.02f78570
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x38): 0x0000010c.0302e44c
> Apr 14 08:47:04 machine-name unix: [ID 321153 kern.notice] NOTICE:
> Scheduling clearing of error on page 0x00000000.077c8000
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 676056 kern.info]
> [AFT3] errID 0x0000046c.8cc25fd7 Above Error detected by protected
> Kernel code
> Apr 14 08:47:04 machine-name that will try to clear error from
> system
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 612880
> kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU0 Data
> access at TL=0, errID 0x0000046c.94ea7880
> Apr 14 08:47:04 machine-name AFSR 0x00000000.00200000<UE> AFAR
> 0x00000000.077c8568
> Apr 14 08:47:04 machine-name AFSR.PSYND 0x0000(Score 05) AFSR.ETS
> 0x00 Fault_PC 0x1880064
> Apr 14 08:47:04 machine-name UDBH 0x0000 UDBH.ESYND 0x00 UDBL
> 0x0203<UE> UDBL.ESYND 0x03
> Apr 14 08:47:04 machine-name UDBL Syndrome 0x3 Memory Module 190x
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 307235
> kern.warning] WARNING: [AFT1] errID 0x0000046c.94ea7880 Syndrome 0x3
> indicates that this may not be a memory module problem
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 243430 kern.info]
> [AFT2] errID 0x0000046c.94ea7880 PA=0x00000000.077c8568
> Apr 14 08:47:04 machine-name E$tag 0x00000000.1cc000ef E$State:
> Exclusive E$parity 0x0e
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x00): 0x03021764.030238bc
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x08): 0x0000004c.0300952c
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x10): 0x0302c2a4.00000060
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x18): 0x03025968.0303cfcc
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x20): 0x0000008c.02f78564
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 989652 kern.info]
> [AFT2] E$Data (0x28): 0x02f78564.000020ec *Bad* PSYND=0x00ff
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x30): 0x02f78570.02f78570
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 359263 kern.info]
> [AFT2] E$Data (0x38): 0x0000010c.0302e44c
> Apr 14 08:47:04 machine-name unix: [ID 321153 kern.notice] NOTICE:
> Scheduling clearing of error on page 0x00000000.077c8000
> Apr 14 08:47:04 machine-name SUNW,UltraSPARC-II: [ID 370242 kern.info]
> [AFT3] errID 0x0000046c.94ea7880 Above Error is in User Mode
> Apr 14 08:47:04 machine-name and is fatal: will reboot
> Apr 14 08:47:04 machine-name unix: [ID 855177 kern.warning] WARNING:
> [AFT1] initiating reboot due to above error in pid 1470 (oracle)
> Apr 14 08:47:20 machine-name unix: [ID 221039 kern.notice] NOTICE:
> Previously reported error on page 0x00000000.077c8000 cleared
> Apr 14 08:49:04 machine-name pseudo: [ID 129642 kern.info]
> pseudo-device: tod0
> Apr 14 08:49:04 machine-name genunix: [ID 936769 kern.info] tod0 is
> /pseudo/tod at 0
> Apr 14 08:49:04 machine-name syslogd: going down on signal 15
> Apr 14 08:49:33 machine-name genunix: [ID 672855 kern.notice] syncing
> file systems...
> Apr 14 08:49:33 machine-name genunix: [ID 904073 kern.notice] done
> ~~~~~~~
> Thanks,
> Raymond Wong
> System & Network Engineer
> Pan Sarawak Company Sdn Bhd
> ***************************************************************
> CAUTION: This e-mail is confidential and may contain privileged
> information. If you are not the intended recipient, you must
> not disclose or use the information. If you have received this
> e-mail in error, please notify by returning the e-mail and
> delete the document. Any views expressed message are those of
> the individual sender, and may not necessarily reflect the
> views of the Company.
> ***************************************************************
> _______________________________________________
> SunHELP maillist - SunHELP at sunhelp.org
> http://www.sunhelp.org/mailman/listinfo/sunhelp
>
This message originates from Jewelry Television (TM) . The message and any
file transmitted with it is intended to be privileged and confidential. It is
intended only for the addressee named above. Any disclosure, distribution,
copying or use of the information by anyone other than the intended recipient,
regardless of address or routing, is strictly prohibited. If you have received
this message in error, please advise the sender by immediate reply and delete
the original message. Personal messages express views solely of the sender and
are not attributable to Jewelry Television (TM).
More information about the SunHELP
mailing list