[SunHELP] Sun M/c REbooted
Lund, Dennis
sunhelp at sunhelp.org
Tue Sep 4 12:04:03 CDT 2001
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
------_=_NextPart_001_01C13563.99355C10
Content-Type: text/plain
Check for a vmcore.# file. If you have savecore enabled you may have a core
file. In most cases though, when you see "fatal error FATAL" in the
messages file, you will not get a vmcore file. Check for any "panic"
entries in messages file.
If this occurred on a sun4u system check prtdiag -v for clues.
"fatal error FATAL" errors in my short experience indicate some kind of
hardware problem that was of a severe enough nature that a core dump could
not be created. It is very hard to diagnose these types of failures.
You could enable deadman timer to try and capture these types of events.
# This is how to turn on "deadman timeout". This is useful when
# you have a system that seems to crash without leaving any evidence
# of the crash, and hangs in a power-on state that leaves you without
# a boot prompt and no way into the system (login or console).
1) Edit /etc/system and insert the following line:
set snooping=1
2) boot the system in kadb mode:
ok> boot kadb
3)
When the next hang occurs, hopefully the deadman
timer will be triggered, and the system will drop into kadb:
# ~stopped at 0xfbd01028: ta 0x7d
kadb[0]:
At this point, any specific debugger commands can be run to examine
the current state of the system. Of particular interest are:
$r dump the registers
$c dump the current stack backtrace
$C dump the
freemem/D see how much memory is free
4) When kadb debugging is complete, attempt to take
a core dump by doing:
kadb[0]: $q
ok> sync
Also check out Sun INFO DOC 12936 and 13039.
Dennis L. Lund
-----Original Message-----
From: darshan pai [mailto:darshanps at visto.com]
Sent: Tuesday, September 04, 2001 11:38 AM
To: sunhelp at sunhelp.org
Subject: [SunHELP] Sun M/c REbooted
Hi ,
My Sun Machine Rebooted by itself 2 days back . Its running fine now , But
i wanted to find out wat caused it to reboot...
This is wat the var/adm/messages show .
Kern.notice:- System booting after fatal error FATAL...
Wat causes this message ...
And wat tasks should i do to ensure it dosent happen again
Thanx
DPAI
___________________________________________________________________________
Visit http://www.visto.com.
Find out how companies are linking mobile users to the
enterprise with Visto.
_______________________________________________
SunHELP maillist - SunHELP at sunhelp.org
http://www.sunhelp.org/mailman/listinfo/sunhelp
<html>
<body>
<font size="3" face="Times New Roman"><span style="mso-fareast-font-family: Times New Roman; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA">
- - - - - - - Appended by Scientific-Atlanta, Inc. - - - - - - -
<span style="font-size:10.0pt;font-family:Times New Roman;
mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA"></span><font face="Times New Roman" size="3"><span style="mso-fareast-font-family:Times New Roman; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA">This e-mail and any attachments may contain information which is confidential, proprietary, privileged or otherwise protected by law. The information is solely intended for the named addressee (or a person responsible for delivering it to the addressee). If you are not the intended recipient of this message, you are not authorized to read, print, retain, copy or disseminate this message or any part of it. If you have received this e-mail in error, please notify the sender immediately by return e-mail and delete it from your computer.</span></font></p>
</body>
</html>
------_=_NextPart_001_01C13563.99355C10
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; charset=3Dus-ascii">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version 5.5.2654.45">
<TITLE>RE: [SunHELP] Sun M/c REbooted</TITLE>
</HEAD>
<BODY>
<P><FONT SIZE=3D2>Check for a vmcore.# file. If you have savecore ena=
bled you may have a core file. In most cases though, when you see &qu=
ot;fatal error FATAL" in the messages file, you will not get a vmcore =
file. Check for any "panic" entries in messages file.</FONT=
></P>
<P><FONT SIZE=3D2>If this occurred on a sun4u system check prtdiag -v for c=
lues.</FONT>
</P>
<P><FONT SIZE=3D2>"fatal error FATAL" errors in my short experien=
ce indicate some kind of hardware problem that was of a severe enough natur=
e that a core dump could not be created. It is very hard to dia=
gnose these types of failures.</FONT></P>
<P><FONT SIZE=3D2>You could enable deadman timer to try and capture these t=
ypes of events.</FONT>
</P>
<P><FONT SIZE=3D2># This is how to turn on "deadman timeout".&nbs=
p; This is useful when</FONT>
<BR><FONT SIZE=3D2># you have a system that seems to crash without leaving =
any evidence</FONT>
<BR><FONT SIZE=3D2># of the crash, and hangs in a power-on state that leave=
s you without</FONT>
<BR><FONT SIZE=3D2># a boot prompt and no way into the system (login or con=
sole).</FONT>
</P>
<P><FONT SIZE=3D2> 1) Edit /etc/s=
ystem and insert the following line:</FONT>
</P>
<P><FONT SIZE=3D2> &nb=
sp; set snooping=3D1</FONT>
</P>
<P><FONT SIZE=3D2> 2) boot the sy=
stem in kadb mode:</FONT>
</P>
<P><FONT SIZE=3D2> &nb=
sp; ok> boot kadb</FONT>
</P>
<P><FONT SIZE=3D2> 3) </FONT>
<BR><FONT SIZE=3D2> When th=
e next hang occurs, hopefully the deadman</FONT>
<BR><FONT SIZE=3D2> timer w=
ill be triggered, and the system will drop into kadb:</FONT>
</P>
<P><FONT SIZE=3D2> # ~stopped&nbs=
p; at 0xfbd01028:&nbs=
p; ta 0x7d</FONT>
<BR><FONT SIZE=3D2> kadb[0]: </FO=
NT>
</P>
<P><FONT SIZE=3D2> At this point,=
any specific debugger commands can be run to examine</FONT>
<BR><FONT SIZE=3D2> the current s=
tate of the system. Of particular interest are:</FONT>
<BR><FONT SIZE=3D2> $r  =
; dump th=
e registers</FONT>
<BR><FONT SIZE=3D2> $c  =
; dump th=
e current stack backtrace</FONT>
<BR><FONT SIZE=3D2> $C  =
; dump th=
e</FONT>
<BR><FONT SIZE=3D2> freemem/D&nbs=
p; see how much memory is free</FONT>
</P>
<P><FONT SIZE=3D2> 4) When kadb d=
ebugging is complete, attempt to take</FONT>
<BR><FONT SIZE=3D2> a core dump b=
y doing:</FONT>
<BR><FONT SIZE=3D2> kadb[0]: $q</=
FONT>
<BR><FONT SIZE=3D2> &n=
bsp; </FONT>
<BR><FONT SIZE=3D2> ok> sync</=
FONT>
</P>
<P><FONT SIZE=3D2>Also check out Sun INFO DOC 12936 and 13039.</FONT>
</P>
<P><FONT SIZE=3D2>Dennis L. Lund</FONT>
</P>
<P><FONT SIZE=3D2>-----Original Message-----</FONT>
<BR><FONT SIZE=3D2>From: darshan pai [<A HREF=3D"mailto:darshanps at visto.com=
">mailto:darshanps at visto.com</A>]</FONT>
<BR><FONT SIZE=3D2>Sent: Tuesday, September 04, 2001 11:38 AM</FONT>
<BR><FONT SIZE=3D2>To: sunhelp at sunhelp.org</FONT>
<BR><FONT SIZE=3D2>Subject: [SunHELP] Sun M/c REbooted</FONT>
</P>
<BR>
<BR>
<P><FONT SIZE=3D2>Hi ,</FONT>
<BR><FONT SIZE=3D2>My Sun Machine Rebooted by itself 2 days back . It=
s running fine now , But i wanted to find out wat caused it to reboot...</F=
ONT></P>
<P><FONT SIZE=3D2>This is wat the var/adm/messages show .</FONT>
<BR><FONT SIZE=3D2>Kern.notice:- System booting after fatal error FATAL...<=
/FONT>
<BR><FONT SIZE=3D2>Wat causes this message ...</FONT>
<BR><FONT SIZE=3D2>And wat tasks should i do to ensure it dosent happen aga=
in</FONT>
<BR><FONT SIZE=3D2>Thanx</FONT>
<BR><FONT SIZE=3D2>DPAI</FONT>
</P>
<BR>
<BR>
<P><FONT SIZE=3D2>_________________________________________________________=
__________________</FONT>
<BR><FONT SIZE=3D2>Visit <A HREF=3D"http://www.visto.com" TARGET=3D"_blank"=
>http://www.visto.com</A>.</FONT>
<BR><FONT SIZE=3D2>Find out how companies are linking mobile users to=
the </FONT>
<BR><FONT SIZE=3D2>enterprise with Visto.</FONT>
</P>
<P><FONT SIZE=3D2>_______________________________________________</FONT>
<BR><FONT SIZE=3D2>SunHELP maillist - SunHELP at sunhelp.org</FONT>
<BR><FONT SIZE=3D2><A HREF=3D"http://www.sunhelp.org/mailman/listinfo/sunhe=
lp" TARGET=3D"_blank">http://www.sunhelp.org/mailman/listinfo/sunhelp</A></=
FONT>
</P>
<CODE><FONT SIZE=3D3><BR>
<BR>
<html><BR>
<body><BR>
<font size=3D"3" face=3D"Times New Roman"><span style=3D"mso-fareast-font-f=
amily: Times New Roman; mso-ansi-language: EN-US; mso-fareast-language: EN-=
US; mso-bidi-language: AR-SA"><BR>
- - - - - - - Appended by Scientific-Atlanta, Inc. - - - - - - -<BR>
<span style=3D"font-size:10.0pt;font-family:Times New Roman;<BR>
mso-fareast-font-family:"Times New Roman";mso-ansi-language:EN-US=
;mso-fareast-language:<BR>
EN-US;mso-bidi-language:AR-SA"></span><font face=3D"Times New Roman" size=
=3D"3"><span style=3D"mso-fareast-font-family:Times New Roman; mso-ansi-lan=
guage: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA">This e=
-mail and any attachments may contain information which is confidential, pr=
oprietary, privileged or otherwise protected by law. The information is sol=
ely intended for the named addressee (or a person responsible for deliverin=
g it to the addressee). If you are not the intended recipient of this messa=
ge, you are not authorized to read, print, retain, copy or disseminate this=
message or any part of it. If you have received this e-mail in error, plea=
se notify the sender immediately by return e-mail and delete it from your c=
omputer.</span></font></p><BR>
</body><BR>
</html><BR>
</FONT></CODE></BODY>
</HTML>
------_=_NextPart_001_01C13563.99355C10--
More information about the SunHELP
mailing list