Select one of the Recovery Subsystem components below to find out more information about that component.
This option allows you to initiate a reboot from the browser. You will be warned before the system allows you to continue the rebooting process. The following reboot options are available.
Warm Reboot initiates the device's normal boot sequence.
Reboot to Utilities loads Compaq Utilities.
Cold Reboot shuts down the system without shutting down the operating system. This option is only available if Compaq Insight Manager is communicating directly with a Remote Insight board. This option should only be used if you are unable to gracefully shut down the operating system.
To reboot the device, select a reboot option and click Reboot. A text page displays notifying you that the reboot was successfully requested.
NOTE: The reboot option is not available for all devices.
This section provides Automatic Server Recovery (ASR) configuration information, tells you when the server was last reset, and allows you to modify pager settings. You can modify the Status, ASR Reset Boot Option, Pager Status, Pager Dial String, and Pager Message settings.
The following items display on this window.
Status displays the status of ASR. The possible values are:
Enabled - ASR is enabled for this server.
Disabled - ASR is disabled for this server. To change this status, run the Compaq System Configuration Utility or perform a set on this item.
Not Available - ASR is not available for this server or your driver is not loaded. ASR is available only on operating systems using the ASR software support provided by Compaq.
Unknown - You may need to upgrade your support software and/or Server Agent(s). The Server Agent cannot determine the status.
Last Reset displays how the last server reset was performed. The following values are possible:
ASR - The last reset was performed by ASR. Check the Critical Error Log to determine what may have caused ASR.
ASR-Cleared - The last reset was performed by ASR. The degraded condition caused by the ASR reset has been cleared. Degraded ASR conditions can be cleared by selecting the Clear ASR button on the Auto Server Recovery window.
Manual - The last reset was performed manually.
Unknown - You may need to upgrade your driver software and/or Server Agents. The Server Agent cannot determine the status of the device.
If the last reset was an ASR reset, the ASR condition will be degraded.
Timeout displays how many minutes ASR will wait before initiating a recovery process. ASR depends on the software support to routinely notify the ASR hardware that the server is operating properly.
To change the timeout setting, use the Compaq System Configuration Utility. The time you specify for this field should be a prudent period of time before resetting the system and activating the recovery process after a fault occurs. If the timeout period is set too low on a heavily utilized server, the timeout could occur before the software support has time to service the timer.
ASR Hardware Version displays the version of the hardware supporting ASR. Use this information for identification purposes.
Reset Boot Option displays what the server will boot after an ASR reset occurs. When the recovery process is initiated, ASR will reset the server, test all memory, de-allocate any bad memory blocks, and page you (if modem is present in the server and paging is enabled).
ASR Reset Limit displays the number of consecutive times that ASR will attempt recovery. The Automatic Server Recovery (ASR) feature can restart a server after a critical hardware or software error occurs. ASR will attempt the recovery process a limited number of consecutive times. You cannot change this number. If the server continues to experience hardware or software errors and the number of recovery cycles exceeds this limit, the server will log an error to the Critical Error Log and continue to boot the Compaq Utilities from the hard drive.
Use the ASR Reset Limit feature in conjunction with the ASR Reset Count feature in the same window. The ASR Reset Count feature displays the number of times that ASR has rebooted the server. If the ASR Reset Count is approaching the reset limit, immediately investigate the server for problems by checking the Critical Error Log and running Compaq Diagnostics.
ASR Reset Count displays how many times the ASR feature has rebooted the server. ASR will reboot (or reset) the server a limited number of times. If the ASR Reset Count is incremented, complete the following:
Check the Critical Error Log to determine if a serious problem exists.
If you suspect a software problem, consult your operating system documentation.
If you suspect a hardware problem, run Compaq Diagnostics to determine if a problem exists.
This count is reset to 0 when the system is reset manually .
Pager Status displays the status of the pager. If a modem is installed in the server and paging is enabled, ASR can send an alarm to a pager when a critical error occurs.
The status can be:
Enabled - Paging will occur.
Disabled - Paging will not occur.
Unknown - You may need to upgrade your support software or Server Agents or the Server Agent cannot determine the status of this pager.
Pager Dial String displays the pager dial string that the server will dial when an alarm occurs. If a modem is installed in the server and paging is enabled, ASR will send an alarm to a pager and deliver a pager message.
Pager Message displays the pager message sent when an ASR occurs. The pager message is a numeric value of up to seven digits (characters must be 0 through 9) that identifies the server experiencing the hardware or software failure. There is an additional space for a pound sign (#), which many pagers require for ending a sequence. The numbers are chosen to uniquely identify the server so you know which server experienced a problem.
Serial Port displays the communication port that is enabled for use with the ASR feature. For example, this port might be Serial Port 1. ASR will use this port to page the system administrator, and the administrator will use this port when dialing into the device. You can set the Serial Port value.
The Critical Error Log records non-correctable memory errors, as well as catastrophic hardware and software errors that cause a system to fail. This information helps you quickly identify and correct the problem, minimizing downtime.
This section displays a description of critical errors. The date and time of each error is followed by a brief description of the error. The time shown is rounded to the nearest hour.
If critical errors are marked with an exclamation point (!), indicating corrective action is required, the log condition is degraded. To eliminate the exclamation mark and indicate that an entry has been corrected, select the entries you wish to clear and click the Correct Marked Entries button or run Compaq Diagnostics on the device. An asterisk ( * ) indicates the log entry to which the Last Failure Message applies.
IMPORTANT: Agents must have sets enabled and you must have the correct SNMP Community string to be able to mark entries as corrected.
The following list displays errors that may be logged. If you receive any of these errors, run Compaq Diagnostics on your system or consult your software documentation.
Abnormal Program Termination - The device has detected a fatal software error resulting in a device failure.
ASR Base Memory Parity Error - The system detected a data error in base memory following a reset due to an ASR timeout.
ASR Extended Memory Parity Error - The system detected a data error in extended memory following a reset due to an ASR timeout.
ASR Memory Parity Error - The system ROM was unable to allocate enough memory to create a stack. It was unable to put a message on the screen or continue booting the server.
ASR Reset Limit Reached - The maximum number of system resets has been reached. The Compaq Utilities will be loaded.
ASR Reset Occurred - No error data is logged.
ASR Test Event - An ASR Test event was generated by the user through the system utilities. No action is required since the event was user-generated to test the ASR configuration.
ASR Timeout NMI - The server has generated an ASR NMI because the ASR timer has not been refreshed. This generally indicates a driver has not relinquished control of the processor causing a server failure. The resulting ASR NMI was generated to log this event. Note the module that was executing.
CPU Internal Corrected Error Threshold Exceeded - The system has detected that a CPU has exceeded the threshold for the number of internal ECC cache errors.
CPU Processor Power Module Failed - The system has detected that a processors power module has failed.
Critical Temperature - The system's critical temperature has been exceeded and auto shutdown has been initiated.
Error Detected On Bootup - The system detected an error during the Power-On Self-Test.
Exception - The processor has detected a critical exception resulting in a device failure.
Fan Failure - The system or processor fan failed.
NMI - CPU Local Error - The processor experienced a fatal error resulting in a device failure.
NMI - Expansion Board Error - A board on the expansion bus indicated an error condition causing a device failure.
NMI - Expansion Bus Arbitration Error - Memory refresh cycles were delayed, potentially leading to data loss. The error results in a system failure.
NMI - Expansion Bus Master Time-out - A bus master expansion board in the indicated slot did not release the bus after its maximum time resulting in a device failure.
NMI - Expansion Bus Slave Time-out - A board on the expansion bus delayed a bus cycle beyond the maximum time resulting in a device failure.
NMI - Failsafe Timer Expiration - The software was unable to reset the system failsafe timer, resulting in a system failure.
NMI - Processor Address Error 1 - A processor internal address parity checking error occurred, resulting in a device failure.
NMI - Processor Address Error 2 - The processor detected an address parity error during an inquire cycle.
NMI - Processor Cache Parity Error - A data error occurred within the processor cache, resulting in a system failure.
NMI - Processor Internal Error 1 - A processor internal parity error occurred, resulting in a device failure.
NMI - Processor Internal Error 2 - The processor detected an internal parity error or a functional redundancy error.
NMI - Processor Parity Error - The processor detected a data error resulting in a device failure.
NMI - Software Generated Interrupt - Software indicated a system error resulting in a system failure.
NMI - System Concurrency Error - A potential error condition was detected within the Data Flow Manager, resulting in a system failure.
NMI - Uncorrectable Memory Error - The device experienced an uncorrectable memory parity error resulting in a device failure.
NMI - Unknown Error Type - The device driver does not recognize this NMI. You may need to upgrade your health driver.
Processor Failure - The processor failed during the Power-On Self-Test.
Server Manager Failure - An error occurred in the server interface with the Server Manager/R.
UPS A/C Line Failure/Shutdown or Battery Low - The device has initiated a UPS or operating system shutdown, or the battery is almost depleted after an AC line failure.
The Last Failure Message on this window displays the last failure message associated with a critical error.
This section displays the Power-On messages logged when the device was turned on. Refer to your device documentation for a listing of possible Power-On error messages and their meanings. Click the Clear Power-On Message button to clear the power-on message log. This button is only available if there are messages to clear.
This alarm indicates that a block of memory has failed or is failing and may need to be replaced. This condition is generally non-critical since the memory controller can correct the problem. However, this type of error indicates that a memory component is failing or has failed in the system issuing the alarm. The system continues to correct any errors it can.
Memory errors are corrected by the ECC memory subsystem when they occur. If you notice an increase in these errors, correct the problems as soon as possible. Further degradation of the memory components may occur, and then errors may no longer be correctable.
This section displays details on the device environment. The following information is available.
Degraded Action allows you to designate what action will be taken when the device environment becomes degraded. The options are:
Continue - The health or wellness driver will signal the operating system to continue functioning in situations where the temperature is too high or too low. In more serious temperature situations, the device shuts down automatically.
Shut Down - The health or wellness driver will signal the operating system to shut down in situations where the temperature is too high or too low. In more serious temperature situations, the device shuts down automatically.
Unknown - You may need to upgrade your driver software or Server Agents or the Server Agent cannot determine the status of the device.
Temperature displays the current temperature condition of the system or client PC. This value can be:
OK - The temperature is within normal operating range.
Degraded -
The temperature is above normal for airflow obstructions. Make sure
that the cover is on.
CAUTION: Do not operate the system with the cover removed. Proper airflow is possible only when the cover is in place and properly secured.
Failed - The temperature is outside the normal operating range and could permanently damage the system. The system will automatically shut down to prevent damage to hardware or data loss.
NOTE: A Failed condition will not occur in a client PC since the power supply for the client will be cut off in the event the thermal condition reaches a permanently damaging level.
Unknown - You may need to upgrade your driver software or Server Agents or the Server Agent cannot determine the status of the device. If you are managing a client with an unknown temperature status, the client may not support thermal detection.
Fans displays an entry for each of the device or system processor fans. The status of each fan can be:
OK - The fan is operational.
Failed - The fan has failed. The device will shut down automatically to prevent damage to hardware or data loss. Replace the fan.
Unknown - You may need to upgrade your driver software or Server Agents and the Server Agent cannot determine the status of this setting.
This section displays information about the power supplies.
The following entries may be displayed:
Location displays the bay where the power supply is located.
Status displays the status of the power supply. The following values are possible:
OK - A power supply is installed and operating normally.
Failed - A power supply is installed and is no longer operating. Replace the power supply.
Not Installed - Nothing is installed in this power supply bay.
Unknown - The Server Agent is unable to determine if this storage system power supply bay is occupied.
Serial Number displays the serial number of the power supply. This information can be used for identification purposes.
Firmware Revision displays the firmware revision of the power supply.
This section displays information about the power converters. The following entries may be displayed:
Slot and Socket displays the location of the power converter.
Status displays the status of the power converter. The following values are possible.
OK - A power converter is installed and operating normally.
Degraded - A power converter is installed and is operating in a degraded state. Replace the power converter.
Failed - A power converter is installed and is no longer operating. Replace the power converter.
Unknown - The Server Agent is unable to determine the status of this power converter.
This section displays details about the status of the Integrated Remote Console (IRC) and the Rapid Recovery communications configuration.
The following fields display.
Integrated Remote Console
Status indicates whether the IRC is supported and enabled. Possible values include Not Supported, Enabled, and Disabled.
If IRC is not present on this device, the field displays Not Supported.
If IRC is present and enabled, the field displays Enabled.
If IRC is present but disabled, the field displays Disabled. Three things can cause IRC to be disabled even though you enabled it:
The COM port for which IRC is configured does not exist.
The COM port for which IRC is configured is a PCI device
The IRQ for which IRC is configured does not match the COM port for which IRC is configured.
Network Access displays the status of the ASR Network Remote Console feature. The following values may display in this field.
Enabled - Remote Console network access is enabled. If the server ASR reboots to Compaq Utilities (see Reset Boot Option in Automatic Server Recovery Window) or if you reboot to Compaq Utilities from Compaq Insight Manager by pressing the Reboot Button in the Device View Window, then network remote access is enabled. You may access Compaq Utilities through Remote Console.
Disabled - Remote Console network access is not enabled.
Unknown - You may need to upgrade your driver software or Server Agents or the Server Agent cannot determine the status of this setting.
Dial In Status displays whether the ASR feature will put the modem into auto-answer mode after an ASR reboot.
The following values may be displayed in this field.
Enabled - Remote Console dial-in access is enabled by putting the modem into auto-answer mode. If the server ASR reboots to Compaq Utilities (see Reset Boot Option in Automatic Server Recovery Window) or if you reboot to Compaq Utilities from Compaq Insight Manager by pressing the Reboot button in the Device View Window, then modem remote access is enabled. You may access Compaq Utilities via Remote Console using a modem connection.
If you have enabled Dial-Out Status, a dial-out connection will be attempted first. If that connection fails, then dial-in access is enabled. If the dial-out connection is successful, then dial-in is enabled after that connection is terminated.
Disabled - This feature is not enabled. ASR will not put the modem in auto-answer mode.
Unknown - You may need to upgrade your driver software or Server Agents or the Server Agent cannot determine the status of this setting.
Dial-Out Status
After the ASR feature has attempted to deliver an alarm by the means of the pager, if the Dial-Out Status is enabled and a proper Dial-Out String has been provided, ASR will dial a remote PC. When a session is established, the server administrator can use a third party terminal emulation program to run the Compaq Utilities to diagnose the problem.
Possible values are:
Enabled - ASR will dial the Dial-Out String and attempt to set up a connection to a remote PC. ASR will attempt the connection five times. If a connection is not established and the Dial-In Status is enabled, ASR will put the modem into auto-answer mode so that the server administrator can dial-in.
Disabled - This feature is not enabled. ASR will not attempt a remote connection. However, if the Dial-In Status in enabled, ASR will put the modem into auto-answer mode so that the server administrator can dial in.
Unknown - You may need to upgrade your driver software or Server Agents or the Server Agent cannot determine the status of this setting.
Dial-Out String
After the ASR feature has attempted to deliver an alarm by means of the pager, if the Dial-Out Status is enabled and a proper Dial-Out String is provided in this field, ASR will attempt to dial a remote PC. When a session is established, the system administrator can use a third-party terminal emulation program to run the Compaq Utilities to diagnose the problem.
Serial Port displays the communication port that is enabled for use with the ASR feature. For example, this port might be Serial Port 1. ASR will use this port to page the system administrator, and the administrator will use this port when dialing in to the device.
The Integrated Management Log records system events, critical errors, power-on message errors, and memory errors.The log also records catastrophic hardware and software errors that typically cause a system to fail. This information helps to quickly identify and correct the problem and minimize downtime.
Each event log entry has a status to identify the severity of the event:
Informational - General information about a system event.
Repaired - Indication that this entry has been repaired. Users must mark entries as repaired.
Caution - Indication that a non-fatal error condition has occurred.
Critical - A component of the system has failed.
If any events in the log have a condition of Caution, the overall log condition will be marked as degraded. If Critical events exist in the log, the overall log condition will be marked as failed.
To clear a degraded or failed event log, mark the log entry as repaired after you have repaired the condition that caused a log entry to be generated. Perform the following steps.
Highlight the log entries in the Integrated Management Log.
Click the Mark Repaired button. This button is located at the top of the window.
IMPORTANT: Agents must have sets enabled and you must have the correct SNMP Community string to be able to mark log entries as corrected.
The description column gives a brief description of the error or event. The update time column contains the last time this log was updated. The status column contains the status of the log entry.
Refer to the Compaq Integrated Management Log User Guide for more information.
Select the Remote Insight entry from the Recovery list to display a submenu containing separate entries for General Information, Network Interface Card, Event Log, and a link to the Remote Insight Board Web Interface.
The General Information section displays the following information about the Remote Insight board. Not all of the listed fields are supported on every model of Remote Insight Board.
Model displays the Remote Insight board model name.
Serial Number displays the Remote Insight board serial number.
ROM Version displays the Remote Insight firmware version and date.
Video displays the status of the Remote Insight video feature (enabled or disabled).
Mouse indicates if the mouse cable is connected to or disconnected from the Remote Insight board.
Keyboard Cable indicates if the keyboard cable is connected to or disconnected from the Remote Insight board.
Interface Status displays the interface status of the Remote Insight board. The status values include:
OK - The host operating system is able to communicate with the Remote Insight board.
Not Responding - The host operating system is unable to communicate with the Remote Insight board.
Alarm Status indicates if alerts are enabled at the Remote Insight board. This is a global flag and governs all users. If alerts are disabled, alarms will not be sent.
Pending Alarm indicates if any alerts are remaining to be sent from the Remote Insight Board.
Battery Status indicates the status of the Remote Insight board battery. When the Remote Insight board battery is enabled and there is a host power failure, the Remote Insight board battery provides a minimum of 30 minutes of operation. This enables the Remote Insight board to send alerts to the users that were specified during configuration.
Battery Condition displays the condition of the battery. The following values are possible.
OK - The battery is charged and functional.
Failed - The battery needs to be replaced.
Disconnected - The battery has been disconnected.
Battery Charge displays the percentage of the charge in the Remote Insight board battery.
External Power Cable displays if the external power cable is connected or disconnected.
Virtual Power Cable displays if the virtual power cable is connected or disconnected.
Modem/COM Port Settings A series of tables is displayed providing information about any Remote Insight Board modems or communication ports. The first table title will indicate the type of device described like "Internal Modem", "External Port", "External Modem", "External Direct Connect" or "External XonXoff". The tables may contain the following:
Model displays the model of the communication device.
Data Settings This table contains the data settings for the communication device including:
Alarms displays if the Remote Insight firmware will use this communication device to deliver traps.
Non-PPP displays if non-PPP connections are allowed on this port.
Baud Rate displays the baud rate to be used on this port.
Data Bits displays the number of data bits to be used on this port.
Stop Bits displays the number of stop bits to be used on this port.
Parity displays the type of parity to be used on this port.
Pager Settings displays pager settings for the communication device.
Alarms displays if the Remote Insight Board firmware will use this port to deliver pages.
Message displays the message that will be in the body of the page.
Baud Rate displays the baud rate to use for pager messages.
Data Bits displays the number of data bits to user for pager messages.
Stop Bits displays the number of stop bits to use for pager messages.
Parity displays the type of parity to user for pager messages.
Modem Control Strings This table contains the modem control strings for the communication device including:
Reset displays the string used to reset the modem.
Initialize displays the string used to initialize the modem.
Dial Prefix displays the string prepended to phone numbers before dialing.
Self Test Results displays The results of all the self-tests available for this Remote Insight Board.
The NIC section displays the following information about the NIC in the Remote Insight Board. Not all fields are supported by all models of Remote Insight Board and/or NIC.
Model displays the NIC model.
DNS Name displays the fully qualified DNS name assigned to this Remote Insight Board.
Type displays if the NIC is embedded or pcmcia and whether it is ethernet or token ring.
IP Address displays the IP address for this NIC.
Subnet Mask displays the subnet mask for this NIC.
Gateway displays the default gateway configured for this NIC.
Status displays if this NIC is enabled or disabled.
Physical Address displays the MAC address for this NIC.
Duplex displays if the controller is in half duplex, full duplex or does not support a duplex state.
Speed displays the speed of the NIC.
Max Packet Size displays the maximum packet size of the NIC.
Transmit/Receive Statistics displays the following set of statistics for the NIC:
Bytes displays the number of bytes transmitted/received.
Total Packets displays the number of packets transmitted/received.
Unicast Packets displays the number of unicast packets transmitted/received.
Non-Unicast Packets displays the number of non-unicast packets transmitted/received.
Discarded Packets displays the number of packets discarded during transmit/receive.
Error Packets displays the number of error packets found during transmit/receive.
Unknown Protocols displays the number of unknown protocol packets received.
Queue Length displays the number of outstanding packets in the transmit queue.
The Event Log section displays the list of events stored in the Remote Insight Board event log. These events can be cleared by a user with appropriate authority. Each event includes the following information:
Index displays a numeric index for each event.
Time of Event displays the time the event occurred.
Description displays a text description of the event.
This link launches a new browser window that will contain the web interface to the Remote Insight Board. This link is only present for models of Remote Insight Board that support this functionality.