This section displays general and status information about Fibre Channel storage systems. Select a Fibre Channel Storage System entry from the Mass Storage list to display a submenu containing separate entries for storage systems, fiber array controllers, physical drives, and logical drives. The following items display:
Fibre Channel Storage System Information
Fiber Array Controller Information
Accelerator Information
Physical Drive Information
Logical Drive Information
RA-8000 Storage Systems
Fibre Channel Connections
Select the Storage System Information entry from the Mass Storage submenu to display the following information about a fiber storage system:
Name displays the user-defined name (or serial number, if preferred) of this storage system chassis.
Connection displays the type of connection between the server and the box. The following values are possible:
Fibre Attached - This chassis is attached to the server through Fibre Channel.
Unknown - The Storage Agents are unable to determine the type of this chassis.
Serial Number displays the storage system chassis serial number, which is normally displayed on the front panel. Use this information for identification purposes.
IO Slots displays whether a Fibre Channel Array controller is installed.
Firmware Revision displays the revision level of storage system backplane.
Drive Bays displays the Compaq Storage System Backplane Drive Bays. This is the number of bays on this storage system backplane.
Duplex displays the Compaq Storage System Backplane Duplex Option. The following values are possible:
NotDuplexed - This storage system is not duplexed.
DuplexTop - This is the top portion of a duplexed storage system.
DuplexBottom - This is the bottom portion of a duplexed storage system.
Unknown - The Storage Agents are unable to determine if this storage system is duplexed.
Board displays the type of board (system, power, or SCSI).
Serial Number displays the serial number of the board.
Board Revision displays the revision number of the board.
Location displays the bay location of the power supply. The following values are possible:
Power Bay 1 - The power supply is installed in the first power supply bay.
Power Bay 2 - The power supply is installed in the second power supply bay.
Unknown - The Storage Agents do not recognize the bay. You may need to upgrade your software.
Status displays the status of the power supply. The following values are possible:
OK - A power supply is installed and operating normally.
Failed - A power supply is installed and is no longer operating. Replace the power supply.
Not Installed - Nothing is installed in this power supply bay.
Unknown - The Storage Agents are unable to determine if this storage system power supply bay is occupied.
UPS State displays the status of the UPS attached to the power supply. The following values are possible:
OK - A UPS is attached to the power supply and is operating normally.
Battery Low - A UPS is attached to the power supply, the AC power has failed and the UPS battery is low.
Power Failed - A UPS is attached to the power supply and the AC power has failed.
No Ups - No UPS is attached to the power supply.
Unknown - The Storage Agents are unable to determine if this power supply is attached to an Uninterruptible Power Supply (UPS).
Serial Number displays the serial number of the power supply. Use this information for identification purposes.
Board Revision displays the board revision of the power supply.
Firmware Revision displays the firmware revision of the power supply.
Location displays the location of the temperature sensor. The following values are possible:
Fan Bay - This temperature sensor is located on the fan module in the fan bay.
Backplane - This temperature sensor is located on the SCSI drive backplane.
Unknown - The Management Agent is unable to determine the location of this storage system temperature sensor.
Status displays the status of the temperature sensor. The following values are possible:
OK - The temperature is OK.
Degraded - The temperature is degraded.
Failed - The temperature is failed.
Unknown - The Storage Agents are unable to determine the storage system temperature sensor status.
Current Value displays the current temperature value.
Limit Value displays the threshold value of the temperature sensor.
Location displays the location of the fan module. The following values are possible:
Fan Bay - This fan module is installed in the fan bay.
Unknown - The Storage Agents are unable to determine the location of this storage system fan module.
Status displays the status of the fan module. The following values are possible:
OK - The fan module is installed and operating normally.
Degraded - The fan module degraded.
Failed - The fan module is failed. Replace the fan module.
Not Installed - The fan module is not installed.
Unknown - The Storage Agent is unable to determine if this storage system fan module is installed.
Serial Number displays the serial number for the fan module. Use this information for identification purposes.
Board Revision displays the board revision of the fan module.
This section displays the following information about fibre array controllers that are installed in a storage system.
Model displays the model type of the controller card. The valid types are:
Fibre Array - Compaq StorageWorks RAID Array 4000 Controller.
Unknown - You may need to upgrade your driver software or Storage Agents. You have an array controller in the storage box that the Storage Agents do not recognize.
Firmware Version displays the version of the controller's firmware.
Serial Number displays the serial number of the controller. Use this information for identification purposes.
Product Revision displays the product revision of the controller. Use this value to further identify a particular revision of the controller model.
WorldWide Name displays the unique Fibre Channel name for the controller. Use this value to further identify a particular controller.
Controller Status displays the status of the controller hardware. The following values are valid:
OK - The controller is operating normally.
Failed - The controller has failed and is no longer operating.
Offline - The controller is offline.
Unknown - Indicates that the Storage Agents are unable to determine the status of the controller. You may need to upgrade the Storage Agents.
Current Role displays the Compaq Array Controller current role for duplexed array controllers. The following values are valid:
Not Duplexed - This array controller is not duplexed.
Active - This duplexed array controller is the active controller.
Backup - This duplexed array controller is the backup controller.
Unknown - Indicates that the Storage Agents are unable to determine the role of the controller. You may need to upgrade the Storage Agents.
Redundancy Type displays the type of redundant configuration. The following values are valid:
Not Redundant - This array controller is not in a redundant configuration.
Firmware Active/Standby - The array controller is using an active/standby algorithm implemented in the controller firmware and the operating system driver.
Firmware Primary/Secondary - The array controller is using a primary/secondary algorithm implemented in the controller firmware and the operating system driver.
Unknown - Indicates that the Storage Agents are unable to determine the type of redundancy for the controller. You may need to upgrade the Storage Agents.
Redundancy Error displays the redundancy error for the controller. The following values are valid:
No Failure - No failures have been detected.
No Redundant Controller - No redundant controller is installed.
Different Hardware - The other controller indicates a different hardware model.
No Link - A link to the other controller could not be established.
Different Firmware - The other controller indicates a different firmware version.
Different Cache - The other controller indicates a different cache size.
Other Cache Failure - The other controller indicates a cache failure.
No Drives - This controller cannot see any attached drives, but the other controller can.
Other No Drives - This controller can see the attached drives, but the other controller cannot.
Unsupported Drives - One or more attached drives has been determined to be incapable of properly supporting redundant controller operation.
Expand in Progress - An expand operation is in progress. Redundant operation not supported until the expand operation is complete.
Unknown - Indicates that the Storage Agents are unable to determine the redundancy error for the controller. You may need to upgrade the Storage Agents.
This section displays the following information about fibre array controllers that are installed in a storage system.
Status displays the status of the Fibre Channel Array Accelerator (FCAA). The status can be one of the following:
Enabled - Cache operations are currently configured and enabled for at least one logical drive.
Temporarily Disabled - Cache operations have been temporarily disabled. Check the Array Accelerator Error Code for the monitored item to determine why the cache operations have been temporarily disabled.
Permanently Disabled - operations have been permanently disabled. Check the Array Accelerator Error Code for the monitored item to determine why the cache operations have been permanently disabled.
Invalid - Cache operations have been disabled and are invalid.
Battery Status displays the status of the battery pack on the Array Accelerator. The battery pack can recharge only when the system is powered on. The status can be one of the following:
OK - The battery pack is fully charged.
Failed - The battery pack is below a sufficient voltage level and has not fully recharged within the maximum 36 hours. Your board should be serviced as soon as possible.
Charging - The battery power is less than 75%. The Compaq Array Controller is attempting to recharge the battery pack. A battery pack can take as long as 36 hours to fully recharge. If the battery pack has not recharged after 36 hours, it is considered failed.
Degraded - The battery pack is still operating but one of the batteries in the pack has failed to recharge properly. Your board should be serviced as soon as possible.
Not Present - The battery pack is not present. (Some controllers do not have a battery backed cache.)
Bad Data indicates possible data loss due to a battery problem when the system was powered on. The following values are valid:
Possible - At power on, the battery pack was not sufficiently charged. The Array Accelerator has not retained any data that may have been stored in the cache because the battery pack did not retain sufficient charge when the system resumed power. If no data was in the cache, no data was lost. Several situations may have caused this condition, including the following:
If the system was without power for 8 days and the battery pack was on (the battery pack activates only if the system loses power unexpectedly), any data that may have been stored was lost.
There may be a problem with the battery pack. See the Battery Status monitored item for more information.
The Array Accelerator board has been replaced with a new board that has a discharged battery pack. No data has been lost in this case and posted reads and writes will automatically be enabled when the battery pack reaches full charge.
None - No data loss occurred. At power on, the battery pack was properly charged.
Read Errors displays the total number of read memory parity errors that were detected while reading from the Array Accelerator. If a memory parity error occurs, the mirrored copy of data in the write cache can be accessed to obtain correct data.
Memory parity errors occur when the system detects that information has not been transferred correctly. A parity bit is included for each byte of information stored in memory. When the microprocessor reads or writes data, the system counts the value of the bits in each byte. If the total does not match the systems expectations, a parity error occurs. A bad memory chip, memory corruption, or lack of memory refresh may cause memory parity errors.
Write Errors displays the total number of write memory parity errors that were detected while writing to the Array Accelerator.
Write parity errors occur when the system detects that information has not been transferred to the Array Accelerator correctly. A parity bit is included for each byte of information stored in memory. When the microprocessor reads or writes data, the system counts the value of the bits in each byte. If the total does not match the systems expectations, a parity error occurs.
Total Memory displays the total amount of accelerator memory in megabytes, including both battery-backed and non battery-backed memory.
Write Cache displays the amount of memory allocated for the write cache in megabytes. The actual amount of usable memory is half the amount shown because data is kept in duplicate (mirrored).
Read Cache displays the memory allocated for the read cache in megabytes.
Serial Number displays the serial number of the accelerator board. Use this value to further identify the cache controller.
Error Code displays the status of the cache operations. The status can be one of the following:
Invalid - Write cache operations are currently configured and enabled for at least one logical drive. No write cache errors have occurred.
Bad Configuration - Write cache operations are temporarily disabled. The Array Accelerator board was configured for a different controller. This error could be caused if boards were switched from one system to another. Rerun the Compaq EISA Configuration Utility and ensure that the board has been properly configured for this system.
NOTE: If data from another system was stored on the board, rerunning EISA Configuration will cause the data to be lost.
Low Battery Power - Write cache operations are temporarily disabled due to insufficient battery power. Please view the Battery Status object instance for more information.
Disable Command Issued - Write cache operations are temporarily disabled. The device driver issues this command when the server is taken down. This condition should not exist when the system regains power.
No Resources Available - Write cache operations are temporarily disabled. The controller does not have sufficient resources to perform write cache operations. For example, when a replaced drive is being rebuilt, there will not be sufficient resources. After the operation that requires the resources has completed, this condition will clear and write cache operations will resume.
Board Not Connected - Write cache operations are temporarily disabled. The Array Accelerator board has been configured but is not currently attached to the controller. Check the alignment of the board and connections.
Bad Mirror Data - Write cache operations have been permanently disabled. The Array Accelerator board stores mirrored copies of all data. If data exists on the board when the system is first powered up, the board performs a data compare test between the mirrored copies. If the data does not match, an error has occurred. Data may have been lost. Your board may need servicing.
Read Failure - Write cache operations have been permanently disabled. The Array Accelerator board stores duplicate copies of all data. While reading the data from the board, memory parity errors have occurred so both copies were corrupted and cannot be retrieved. Data has been lost. Have the board serviced.
Write Failure - Write cache operations have been permanently disabled. This error occurs when an unsuccessful attempt was made to write data to the Array Accelerator board. Data could not be written to write cache memory in duplicate due to the detection of parity errors. This error does not indicate data loss. Have the Array Accelerator board serviced.
Config Command - Write cache operations have been permanently disabled. The configuration of the logical drives has changed. Reconfigure the Array Accelerator board.
Expand in Progress Cache operations are temporarily disabled due to an expand of a logical drive. When the expand operation completes, the accelerator will be enabled.
Snapshot in Progress - Cache operations are temporarily disabled due to a snapshot operation that is queued up or in progress. When the snapshot operation completes, the accelerator will be enabled.
Redundant Low Battery - Cache operations are temporarily disabled. The redundant controller has insufficient cache battery power.
Redundant Size Mismatch - Cache operations are temporarily disabled. The cache sizes on the redundant controllers do not match.
Redundant Cache Failure - Cache operations are temporarily disabled. The cache on the redundant controller has failed.
Excessive ECC Errors - Cache operations have been permanently disabled. The number of cache lines experiencing excessive ECC errors has reached a preset limit.
This section provides an overview of all disk drives attached to the controller. Each physical drive is listed as a separate entry in the Mass Storage submenu. The information displayed next to the physical drive includes the condition, location of the drive (port and drive number) and drive size. Select any of the physical drives from the Mass Storage submenu to display more information about the drive. The following information can be displayed:
Status displays the status of the physical drive. The following values are possible:
OK - The drive is functioning properly.
Unconfigured - Indicates the drive is present, but is not part of any logical drive configuration.
Threshold Exceeded - Indicates that the drive has a threshold exceeded error and should be replaced.
Predictive Failure - Indicates that the drive has a predictive failure error and should be replaced.
Failed - The drive is no longer operating and should be replaced.
Unknown - The physical drive cannot be monitored at this time. This may be due to:
The device driver for this drive may have been unloaded.
The logical drive may have failed and been deactivated by the operating system. In this case, the last known status was OK.
The Storage Agent does not recognize the drive. You may need to upgrade your software.
Action displays the action that is required for this drive. The following values are valid:
Replace Drive - Replace this drive. If the drive condition is Failed, check the Predictive Indicators, Problem Indicators, and Failure Indicators for a possible cause of the failure.
Replace S.M.A.R.T. Drive - The S.M.A.R.T. hard drive predicts imminent failure. Schedule replacement of the drive before an actual failure occurs.
No Action Required - The drive is operating normally and no action is required.
Capacity displays the size of the physical drive in megabytes. For example, 120 indicates that the physical drive is 120 megabytes.
Model displays a description of the physical drive. The text depends on the manufacturer of the drive and the drive type. For example, you might see: Compaq 210MB CP3201.
If a drive fails, note the model to identify the type of drive necessary for replacement.
Firmware Version displays the physical drive firmware version number. Make sure that you have the most recent version of the firmware because older versions may not support all of the newest features.
Serial Number displays the serial number assigned to the SMART physical drive. This value is based upon the serial number as returned by the SCSI inquiry command but may be modified due to space limitations. This item can be used for identification purposes.
Service Hours displays the current number of hours of service (the number of hours that a physical drive has been spinning) since the drive was stamped. The drive was stamped when it left the factory or when you ran Compaq Diagnostics on your new drive.
For example, if the Current Service Hours value is 604, the drive has been operating for 604 hours. If an error occurred at 499 Service Hours, it occurred after 499 hours of service.
S.M.A.R.T. indicates whether or not the SCSI physical drive supports S.M.A.R.T. The possible values are:
Available - This drive supports predictive failure monitoring.
Not available - Predictive failure monitoring is not available for this drive.
Unknown - The Storage Agents cannot determine if the drive supports predictive failure monitoring. You may need to upgrade your driver or Storage Agents.
NOTE: A value of Unknown indicates that the agents are unable to determine this information from the physical drive.
Current Width displays the Physical Drive Current Width. The following values are possible:
Narrow - The negotiated data transfer width for this drive is narrow (8 data bits).
Wide16 - The negotiated data transfer width for this drive is wide (16 data bits).
Unknown - The Storage Agents are unable to determine the current negotiated data transfer width for this drive.
Current Speed displays the Physical Drive Current Data Transfer Speed. The following values are possible:
Asynchronous - The negotiated data transfer speed for this drive is asynchronous.
Fast - The negotiated data transfer speed for this drive is 10 million transfers per second.
Ultra - The negotiated data transfer speed for this drive is 20 million transfers per second.
Unknown - The agent is unable to determine the current negotiated data transfer speed for this drive.
Select one of the listed logical drives to see more information about the drive.
Use the Predictive Indicators to predict
when a drive, which is now operating normally, may need to be replaced. The
numerical data associated with these items displays after the item name.
For example, Used Realloc: 122 means that there are 122 used reallocation
sectors for this drive.
The status of these items can be OK or Replace Drive. If the status is Replace
Drive, replace the drive, or an actual drive failure may occur in the future.
The Predictive Indicators are:
Functional Test 1, 2, and 3 provides information about a series of tests that indicates how well a physical drive works. The Status of these items can be OK or Replace Drive. If the status is Replace Drive, replace the drive, or an actual drive failure may occur in the future.
These tests compare the way the physical drive currently operates when performing various tasks with the way it worked when it was new.
Used Realloc shows the number of sectors of the reallocation area that have been used by the physical drive. The Status of this item can be OK or Replace Drive. If the status is Replace Drive, replace the drive, or an actual drive failure may occur in the future.
Because of the nature of magnetic disks, certain sectors on a drive may have media defects. The reallocation area is part of the drive that the drive manufacturer sets aside to compensate for these defects. The array controller writes information addressed from the unusable sectors to available sectors in the reallocation area. If too many sectors have been reallocated, there may be a problem with the drive.
Spinup Time displays the time it takes for a physical drive to spin up to full speed. The Status of this item can be OK or Replace Drive. If the status is Replace Drive, replace the drive, or an actual drive failure may occur in the future.
Drives require time to gain momentum and reach operating speed. As cars are tested to go from 0 mph to 60 mph in x number of seconds, drive manufacturers have preset expectations for the time it takes the drive to spin to full speed. Drives that do not meet these expectations may have problems.
The value is shown in tenths of a second. Thus, if the drive took 12 seconds to spin up, the value would be 120.
Use the Problem Indicators to determine when a drive failure has occurred that may be correctable without replacing the drive. The Problem Indicators are:
Fail Recov Reads shows the number of read errors that occurred while Automatic Data Recovery was being performed from this physical drive to another drive. If a read error occurs, Automatic Data Recovery stops.
If Automatic Data Recovery stops repeatedly and this counter is incremented on a drive in the recovering volume, you may be able to correct the problem. Follow the steps below:
CAUTION: Do not replace this drive without first performing the following steps or data loss will occur.
Back up the system data, if possible. Otherwise, revert to a previous backup.
Run Compaq Diagnostics 8.18 or later on the drive exhibiting these errors. Perform a Surface Analysis by following these steps:
Insert the Diagnostics diskette into the diskette drive. Reboot the system.
Select Computer Checkup (Test) at the first menu.
At the Main Diagnostics menu, select Prompted Diagnostics.
At the Test Options screen, select Interactive Testing (Single Device).
At the Device Selection menu, select Fixed Disks.
At the Fixed Disk Test Selection menu, select the Format menu. If more than one drive is available, select the drive you wish to test.
At the Format menu, select Surface Analysis. This test remaps any bad sectors. This test will indicate further problems with the drive, if any.
Restore data from the backup.
If these errors repeat, replace the drive.
Other Timeouts shows the number of times the drive did not respond with an interrupt within a controller-defined period of time after a command had been issued. This monitored item does not include Data Request (DRQ) timeouts.
If the count is not zero and the drive has failed, you may be able to correct the problem without replacing the drive. Follow the steps below:
Ensure that all system and storage system cables are intact and seated properly. You may need to replace the cables.
Ensure that a Compaq ProLiant Storage System is plugged in and powered on. Make sure the power supply is functioning.
IMPORTANT: Never turn off a ProLiant Storage System when the attached system is still turned on.
Check the physical proximity of the system to other electrical devices. Since electrical noise may cause this error, check the AC circuit for other electrical devices.
For Compaq IDA systems, contact your local Compaq Service Provider to verify that the COMPAQ IDA Controller is at a minimum revision level. Refer the provider to Service Bulletin 102A.
Timeouts can be caused when two or more drives are set to the same SCSI ID. Ensure that the ProLiant and system SCSI IDs do not conflict.
On a Compaq ProLiant Storage System, check the SCSI ID cable on the drive tray. If the cable is damaged or incorrectly installed, SCSI Timeouts can occur. See the documentation accompanying the Hot Plug Drive Tray Service Spare Kit.
Ensure that the system temperature is within specified limits. Ensure that the fans are operating and are not blocked.
In some instances, drive failure can cause Timeouts. If you continue to receive many of these errors, replace the drive.
You can reset Other Timeouts using Compaq Diagnostics. Follow these steps for Compaq Diagnostics 8.19 or later:
Reboot the system with the Compaq Diagnostics diskette in drive A.
Press Enter at the Welcome screen.
At the Main menu, select (Computer Checkup) Test.
Select Continue at the Note: screen.
Select Prompted Diagnostics at the next screen. Select Continue at any Warning panels that may display.
At the Test Options screen, select Interactive Testing (single device).
At the Device Selection menu, select the type of drive that indicated Other Timeouts.
At the Test Selection menu, select Drive Monitoring Diagnostics Test.
If the next screen offers you a choice of logical drives, select the logical drive associated with the physical drive indicating Other Timeouts or select Test All Drives.
Diagnostics will display the 1736-22 error if Other Timeouts are discovered. Press Enter.
Select Yes at the next screen to reset Other Timeouts.
SCSI Bus Faults displays the number of times that SCSI bus parity, overrun, or underrun errors have been detected on the SCSI bus. Since the controller will retry the operation, SCSI bus faults can cause a drop in performance, or, in some cases, data corruption.
If the count is not zero and the drive has failed, you may be able to correct the problem without replacing the drive. Follow the steps below:
Ensure that all system and storage system cables are intact and seated properly. You may need to replace the cables.
Check the physical proximity of the system to other electrical devices. Since electrical noise may cause a Bus Fault error, check the AC circuit for other electrical devices.
Ensure that the system temperature is within specified limits. Ensure that fans are operating and are not blocked.
SCSI Bus Faults can be caused when two or more drives are set to the same SCSI ID. Ensure that ProLiant and system SCSI IDs do not conflict.
In some instances, drive failure can cause SCSI Bus Faults. If you continue to receive many of these errors, replace the drive.
NOTE: If the drive has not failed, the above counts simply provide a cumulative record of past errors that have been corrected.
Use the Failure Indicators to determine the cause of a drive failure. Typically, the number of failures is zero when the drive is operating normally. If a counter is not zero and the drive has not failed, there could be an intermittent problem that may require the drive to be replaced. The Failure Indicators are:
Spinup Errors - When the physical drive fails due to the failure of a spin-up command, a Spinup Error occurs. If the count is not zero and the drive has failed, replace the drive.
If the failure count is not zero and the drive is OK (has not failed), there may bean intermittent problem that requires drive replacement. If you observe that the count is increasing over time, replace the drive.
Aborted Commands - The Aborted Commands counter records the number of times that a physical SCSI drive returned an Aborted Command status when a SCSI command was attempted. This error count indicates unsuccessful termination of the SCSI command. When the physical drive is failed due to aborted commands that could not be retried successfully, Aborted Commands errors occur. If the count is not zero and the drive has failed, replace the drive.
If the number of aborted commands is not zero and the drive is OK (has not failed), there may be an intermittent problem that requires drive replacement. If you observe that the count is increasing over time, replace the drive.
Format Errors - When a format operation fails because the controller was unable to remap a bad sector, a Format Error occurs.
If the number of format errors is not zero and the drive has failed, replace the drive. If the counter is not zero and the drive is OK (has not failed), there may be an intermittent problem that requires drive replacement. If you observe that the count is increasing over time, replace the drive.
Hardware Errors - The Hardware Errors counter records the number of times that a physical SCSI drive returned a Hardware Error status when a SCSI command was attempted. This error status indicates unsuccessful termination of the SCSI command. The controller typically retries this command several times before failing the drive. If the count is not zero and the drive has failed, replace the drive.
If the number of hardware errors is not zero and the drive is OK (has not failed), there may be an intermittent problem that requires drive replacement. If you observe that the count is increasing over time, replace the drive.
Not Ready Errors - When a physical drive returns a "not ready" status when it should be ready, a Drive Not Ready Error occurs. This error could occur if a drive spins down unexpectedly, or if the drive never becomes ready after the spin up command is issued.
If the number of not ready errors is not zero and the drive has failed, replace the drive. If the counter is not zero and the drive is OK (has not failed), there may bean intermittent problem that requires drive replacement. If you observe that the count is increasing over time, replace the drive.
Bad Target Errors - When a physical drive performs an action that does not conform to the SCSI-2 port protocol, the SCSI port is reset. If the count is not zero and the drive has failed, replace the drive.
If the number of bad target errors is not zero and the drive is OK (has not failed), there may be an intermittent problem that requires drive replacement. If you observe that the count is increasing over time, replace the drive.
Fail Recov Writes - indicates whether write errors occurred while Automatic Data Recovery was being performed to this physical drive. If a write error occurs, Automatic Data Recovery stops. These errors indicate that the physical drive has failed.
If the number of fail recov writes is not zero and the drive has failed, replace the drive. If the counter is not zero and the drive is OK (has not failed), there may be an intermittent problem that requires drive replacement. If you observe that the count is increasing over time, replace the drive.
Media Failures - When this physical drive is failed due to unrecoverable media errors, a Media Failure occurs.
If the number of media failure errors is not zero and the drive has failed, replace the drive. If the counter is not zero and the drive is OK (has not failed), there may be an intermittent problem that requires drive replacement. If you observe that the count is increasing over time, replace the drive.
This section displays statistics about a specific drive array controller physical drive. You can use the run-time statistics to monitor the health of a specific drive. The following information displays.
Sectors Read shows the total number of sectors read from the physical drive since the drive was stamped. The drive was stamped when it left the factory or when you ran Compaq Diagnostics on your new drive.
Hard Read Errors displays the number of read errors that could not be recovered by a physical drive's Error Correction Code (ECC) algorithm or through retries. Over time, a drive may produce these errors. If you receive these errors, a problem may exist with your drive.
The severity of these errors depends on whether the managed system is running in a fault tolerant mode. With fault tolerance, the controller can remap data to eliminate the problems caused by these errors.
Recovered Read Errors displays the number of read errors corrected through physical drive retries. Over time, all drives produce these errors. If you notice a rapid increase in the value for Recovered Read Errors or Hard Read Errors, a problem may exist with the drive. Expect more errors for this monitored item than for Hard Read Errors.
Total Seeks displays the total number of seek operations during seek tests performed by the physical drive since the drive was stamped. The drive was stamped when it left the factory or when you ran Compaq Diagnostics on your new drive.
During normal reads and writes to the drive, the drive does implied seeks to the location where data resides. These are not included in this count.
Seek Errors displays the number of seek errors that a physical drive detects. A seek error is a seek that failed. Over time, a drive usually produces these errors. If you notice a rapid increase in the value shown for Seek Errors, this physical drive may be failing. Only an unusually rapid increase in these errors indicates a problem.
Sectors Written displays the total number of sectors written to the physical drive since the drive was stamped. The drive was stamped when it left the factory or when you ran Compaq Diagnostics on your new drive.
Hard Write Errors displays the number of write errors that could not be recovered by a physical drive. Over time, a drive may produce these errors. If you notice an increase in the value shown for Hard Write Errors or Recovered Write Errors, a problem may exist with the drive. On average, these errors should occur less frequently than read errors.
Recovered Write Errors displays the number of write errors corrected through physical drive retries or recovered by a physical drive on a monitored system. Over time, a drive may produce these errors. If you notice an increase in the value shown for Recovered Write Errors or Hard Write Errors, a problem may exist with the drive.
The value increases every time the physical drive detects and corrects an error. Only an unusually rapid increase in these errors indicates a problem. On average, these errors should occur less frequently than read errors.
Hot-Plug Count indicates the number of times this physical drive was removed via a hot-plug event from a Compaq ProLiant Storage System since the drive was stamped. The drive was stamped when it left the factory or when you ran Compaq Diagnostics on your new drive.
A list of logical drives associated with the controller displays in the Mass Storage submenu. Each logical drive in the list displays the condition, the logical drive number and the fault tolerance of that logical drive. Select one of the logical drive entries to display the following information.
Status displays the status of the logical drive. The logical drive can be in one of the following states:
OK - The logical drive is in normal operation mode.
Failed - More physical drives have failed than the fault tolerance mode of the logical drive can handle without data loss.
Unconfigured - The logical drive is not configured.
Interim recovery - The logical drive is using Interim Recovery Mode. In Interim Recovery Mode, at least one physical drive has failed, but the logical drive's fault tolerance mode lets the drive continue to operate with no data loss.
Ready for rebuild - The logical drive is ready for Automatic Data Recovery. The physical drive that failed has been replaced, but the logical drive is still operating in Interim Recovery Mode.
Rebuilding - The logical drive is currently doing Automatic Data Recovery. During Automatic Data Recovery, fault tolerance algorithms restore data to the replacement drive.
Wrong drive - The wrong physical drive was replaced after a physical drive failure.
Bad connect - A physical drive is not responding.
Overheating - The drive array enclosure that contains the logical drive is overheating. The array is still functioning, but should be shut down.
Shutdown - The drive array enclosure that contains the logical drive has overheated. The logical drive is no longer functioning.
Expanding - The logical drive is currently doing Automatic Data Expansion. During Automatic Data Expansion, fault tolerance algorithms redistribute logical drive data to the newly added physical drive.
Not available - The logical drive is currently unavailable. If a logical drive is expanding and the new configuration frees additional disk space, this free space can be configured into another logical volume. If this is done, the new volume will be set to Not Available.
Queued for expansion - The logical drive is ready for Automatic Data Expansion. The logical drive is in the queue for expansion.
Fault Tolerance displays the fault tolerance mode of the logical drive. To change the fault tolerance mode, run the Compaq System Configuration Utility.
The following values are valid for the Logical Drive Fault Tolerance:
None - Fault tolerance is not enabled (referred to as RAID 0). If a physical drive reports an error, the data cannot be recovered by the Compaq Drive Array.
Mirroring - For each physical drive, there is a second physical drive containing identical data (also known as RAID 1). If a drive fails, the data can be retrieved from the mirrored drive.
Data Guarding - One of the physical drives is used as a data guard drive and contains the exclusive OR of the data on the remaining drives (also known as RAID 4). If a failure is detected, the Compaq Drive Array rebuilds the data using the data guard information plus information from the other drives.
Distributed Data Guarding - Distributed data guarding (sometimes referred to as RAID 5) is similar to data guarding, but instead of storing the parity information on one drive, the information is distributed across all of the drives. If a failure is detected, the Drive Array Controller rebuilds the data using the data guard information from all the drives.
Unknown - You may need to upgrade your software.
Capacity displays the size of the logical drive in megabytes. For example, 120 indicates that the logical drive is 120 megabytes. Use this data to determine whether the drive will be large enough to accommodate your needs.
The capacity utility defines a megabyte as 1,048,576 bytes. The capacity value shown may differ from the stated size of the drive due to different definitions of a megabyte. Many hardware manufacturers use the value of 1,000,000 for megabyte instead of 1,048,576.
Percent Rebuild Complete displays the percent complete of the rebuild. When the value reaches 100, the rebuilding process is complete. The drive array continues to operate in interim recovery mode while a drive is rebuilding.
When a logical volume is expanding, the drive must redistribute the logical volume data across the physical drives. This value shows how many blocks of data still need to be redistributed. When the value reaches 100, the expand process is complete. The array continues to operate normally while the drive is expanding.
This value is only valid if the Logical Drive Status is Rebuilding or Expanding.
Accelerator indicates whether the logical drive has an Array Accelerator board configured and enabled. The following values are valid:
Enabled - The Array Accelerator board is configured and enabled for this logical drive. Run the Compaq System Configuration Utility to change this value.
Disabled - The Array Accelerator board is configured but not enabled for this logical drive. Run the Compaq System Configuration Utility to change this value.
Unavailable - There is no Array Accelerator board configured for this logical drive.
Unknown - The Storage Agents do not recognize the Array Accelerator board. You may need to upgrade your software.
Stripe Size displays the size of a logical drive stripe in kilobytes.
Select one of the listed physical drives to see more information about the drive.
This section provides additional information about the spare drive, including its status and the number of physical drives it is replacing, if any. This section is available only if a spare drive is configured for the selected logical drive. The following information is available.
Status displays the status of the on-line spare drive. The following values are possible:
Building - A physical drive has failed. Automatic Data Recovery is in progress to recover data to the on-line spare.
Active - A physical drive has failed. Automatic Data Recovery is complete. The system is using the on-line spare as a replacement for the failed drive.
Failed - The on-line spare has failed and is no longer available for use.
Inactive - The monitored system has an online spare configured, but is not currently in use.
Unknown - You may need to upgrade your software.
Spare Drive ID indicates which physical drive functions as a spare. This value represents the physical drive ID. If you have a SCSI Managed Array Technology (SMART or SMART-2) Controller installed, this item will show the port that the spare is attached to, followed by the physical drive ID.
Replaced Drive ID - identifies a failed physical drive by its drive ID number. If one of the physical drives has failed and the spare drive is now operating in place of the failed drive. For the SCSI Managed Array Technology (SMART or SMART-2) Controller, the port number of the replaced drive displays, followed by the drive ID number.
Use this monitored item to identify the failed drive and replace that drive as soon as possible.
If N/A displays, the spare has not begun operating in place of the failed drive.
Rebuild Percentage displays the percent complete of the rebuild. When the value reaches 100, the rebuilding process is complete. The drive array continues to operate in interim recovery mode while a drive is rebuilding.
When a logical volume is expanding, the drive must redistribute the logical volume data across the physical drives. This value shows how many blocks of data still need to be redistributed. When the value reaches 100, the expand process is complete. The array continues to operate normally while the drive is expanding.
This value is only valid if the Logical Drive Status is Rebuilding or Expanding.
Select the RA-8000 RAID Array Storage Systems item from the Mass Storage submenu to display the following information for each storage system:
Name identifies the type of storage system for identification purposes.
Status displays the current status of the storage system. The following values are valid:
Good - Indicates that the system is working properly.
Warning - indicates that at least one component of the system failed.
Agent Not Running - Indicates that the StorageWorks Agent is not running. You need to restart the StorageWorks Agent.
Communication Loss - Indicates that the storage system has a communication or cable problem. Please check all cable connections to the host server.
Unknown - Indicates that the Storage Agent does not recognize the state of the storage system. You may need to upgrade the Storage Agents.
Controller 1 Serial # is the storage system's first controller serial number which can be used for identification purposes.
Controller 2 Serial # is the storage system's second controller serial number which can be used for identification purposes.
Select the Fibre Channel Connections item from the Mass Storage menu to display the following information:
Host Controller displays the condition and model name for the controller.
WorldWide Name displays the unique Fibre Channel name for this controller.
Status displays the status for this controller. The following values are valid:
OK - Indicates the host controller is operating normally.
Failed - Indicates the host controller has failed and should be replaced.
Shutdown - Indicates the host controller has been shutdown.
Loop Degraded - Indicates the fibre channel connection is degraded.
Loop Failed - Indicates the fibre channel connection is failed.
Unknown - Indicates the Storage Agents cannot determined the status of the host controller.
Slot displays the physical slot where the host controller resides in the system. For example, if this value is three, the controller is located in slot three of your computer.
Attached Storage Systems displays all storage systems and Fibre Channel tape controllers attached to the selected Fibre Channel controller. Select a storage system entry to display the related Storage System Information or select a Fibre Channel tape controller entry to display the related tape controller information.