Fibre Channel Storage Systems

This section displays general and status information about Fibre Channel storage systems. Select a Fibre Channel Storage System entry from the Mass Storage list to display a submenu containing separate entries for storage systems, fiber array controllers, physical drives, and logical drives. The following items display:

Fibre Channel Storage System Information

Fiber Array Controller Information

Accelerator Information

Physical Drive Information

Logical Drive Information

RA-8000 Storage Systems

Fibre Channel Connections


Fibre Channel Storage System Information

Select the Storage System Information entry from the Mass Storage submenu to display the following information about a fiber storage system:

System Information

Name displays the user-defined name (or serial number, if preferred) of this storage system chassis.

Connection displays the type of connection between the server and the box. The following values are possible:

Serial Number displays the storage system chassis serial number, which is normally displayed on the front panel. Use this information for identification purposes.

IO Slots displays whether a Fibre Channel Array controller is installed.

Backplane Information

Firmware Revision displays the revision level of storage system backplane.

Drive Bays displays the Compaq Storage System Backplane Drive Bays. This is the number of bays on this storage system backplane.

Duplex displays the Compaq Storage System Backplane Duplex Option. The following values are possible:

Asset Information

Board displays the type of board (system, power, or SCSI).

Serial Number displays the serial number of the board.

Board Revision displays the revision number of the board.

Power Supply Information

Location displays the bay location of the power supply. The following values are possible:

Status displays the status of the power supply. The following values are possible:

UPS State displays the status of the UPS attached to the power supply. The following values are possible:

Serial Number displays the serial number of the power supply. Use this information for identification purposes.

Board Revision displays the board revision of the power supply.

Firmware Revision displays the firmware revision of the power supply.

Temperature Information

Location displays the location of the temperature sensor. The following values are possible:

Status displays the status of the temperature sensor. The following values are possible:

Current Value displays the current temperature value.

Limit Value displays the threshold value of the temperature sensor.

Fan Information

Location displays the location of the fan module. The following values are possible:

Status displays the status of the fan module. The following values are possible:

Serial Number displays the serial number for the fan module. Use this information for identification purposes.

Board Revision displays the board revision of the fan module.


Fibre Array Controller Information

This section displays the following information about fibre array controllers that are installed in a storage system.

Model displays the model type of the controller card. The valid types are:

Firmware Version displays the version of the controller's firmware.

Serial Number displays the serial number of the controller. Use this information for identification purposes.

Product Revision displays the product revision of the controller. Use this value to further identify a particular revision of the controller model.

WorldWide Name displays the unique Fibre Channel name for the controller. Use this value to further identify a particular controller.

Controller Status displays the status of the controller hardware. The following values are valid:

Current Role displays the Compaq Array Controller current role for duplexed array controllers. The following values are valid:

Redundancy Type displays the type of redundant configuration. The following values are valid:

Redundancy Error displays the redundancy error for the controller. The following values are valid:


Accelerator Information

This section displays the following information about fibre array controllers that are installed in a storage system.

Status displays the status of the Fibre Channel Array Accelerator (FCAA). The status can be one of the following:

Battery Status displays the status of the battery pack on the Array Accelerator. The battery pack can recharge only when the system is powered on. The status can be one of the following:

Bad Data indicates possible data loss due to a battery problem when the system was powered on. The following values are valid:

Read Errors displays the total number of read memory parity errors that were detected while reading from the Array Accelerator. If a memory parity error occurs, the mirrored copy of data in the write cache can be accessed to obtain correct data.

Memory parity errors occur when the system detects that information has not been transferred correctly. A parity bit is included for each byte of information stored in memory. When the microprocessor reads or writes data, the system counts the value of the bits in each byte. If the total does not match the system’s expectations, a parity error occurs. A bad memory chip, memory corruption, or lack of memory refresh may cause memory parity errors.

Write Errors displays the total number of write memory parity errors that were detected while writing to the Array Accelerator.

Write parity errors occur when the system detects that information has not been transferred to the Array Accelerator correctly. A parity bit is included for each byte of information stored in memory. When the microprocessor reads or writes data, the system counts the value of the bits in each byte. If the total does not match the system’s expectations, a parity error occurs.

Total Memory displays the total amount of accelerator memory in megabytes, including both battery-backed and non battery-backed memory.

Write Cache displays the amount of memory allocated for the write cache in megabytes. The actual amount of usable memory is half the amount shown because data is kept in duplicate (mirrored).

Read Cache displays the memory allocated for the read cache in megabytes.

Serial Number displays the serial number of the accelerator board. Use this value to further identify the cache controller.

Error Code displays the status of the cache operations. The status can be one of the following:

NOTE: If data from another system was stored on the board, rerunning EISA Configuration will cause the data to be lost.


Physical Drive Information

This section provides an overview of all disk drives attached to the controller. Each physical drive is listed as a separate entry in the Mass Storage submenu. The information displayed next to the physical drive includes the condition, location of the drive (port and drive number) and drive size. Select any of the physical drives from the Mass Storage submenu to display more information about the drive. The following information can be displayed:

Status displays the status of the physical drive. The following values are possible:

Action displays the action that is required for this drive. The following values are valid:

Capacity displays the size of the physical drive in megabytes. For example, 120 indicates that the physical drive is 120 megabytes.

Model displays a description of the physical drive. The text depends on the manufacturer of the drive and the drive type. For example, you might see: Compaq 210MB CP3201.

If a drive fails, note the model to identify the type of drive necessary for replacement.

Firmware Version displays the physical drive firmware version number. Make sure that you have the most recent version of the firmware because older versions may not support all of the newest features.

Serial Number displays the serial number assigned to the SMART physical drive. This value is based upon the serial number as returned by the SCSI inquiry command but may be modified due to space limitations. This item can be used for identification purposes.

Service Hours displays the current number of hours of service (the number of hours that a physical drive has been spinning) since the drive was stamped. The drive was stamped when it left the factory or when you ran Compaq Diagnostics on your new drive.

For example, if the Current Service Hours value is 604, the drive has been operating for 604 hours. If an error occurred at 499 Service Hours, it occurred after 499 hours of service.

S.M.A.R.T. indicates whether or not the SCSI physical drive supports S.M.A.R.T. The possible values are:

NOTE: A value of “Unknown” indicates that the agents are unable to determine this information from the physical drive.

Current Width displays the Physical Drive Current Width. The following values are possible:

Current Speed displays the Physical Drive Current Data Transfer Speed. The following values are possible:

Logical Drive Information

Select one of the listed logical drives to see more information about the drive.

Predictive Indicators

Use the Predictive Indicators to predict when a drive, which is now operating normally, may need to be replaced. The numerical data associated with these items displays after the item name. For example, Used Realloc: 122 means that there are 122 used reallocation sectors for this drive.

The status of these items can be OK or Replace Drive. If the status is Replace Drive, replace the drive, or an actual drive failure may occur in the future. The Predictive Indicators are:

Functional Test 1, 2, and 3 provides information about a series of tests that indicates how well a physical drive works. The Status of these items can be OK or Replace Drive. If the status is Replace Drive, replace the drive, or an actual drive failure may occur in the future.

These tests compare the way the physical drive currently operates when performing various tasks with the way it worked when it was new.

Used Realloc shows the number of sectors of the reallocation area that have been used by the physical drive. The Status of this item can be OK or Replace Drive. If the status is Replace Drive, replace the drive, or an actual drive failure may occur in the future.

Because of the nature of magnetic disks, certain sectors on a drive may have media defects. The reallocation area is part of the drive that the drive manufacturer sets aside to compensate for these defects. The array controller writes information addressed from the unusable sectors to available sectors in the reallocation area. If too many sectors have been reallocated, there may be a problem with the drive.

Spinup Time displays the time it takes for a physical drive to spin up to full speed. The Status of this item can be OK or Replace Drive. If the status is Replace Drive, replace the drive, or an actual drive failure may occur in the future.

Drives require time to gain momentum and reach operating speed. As cars are tested to go from 0 mph to 60 mph in x number of seconds, drive manufacturers have preset expectations for the time it takes the drive to spin to full speed. Drives that do not meet these expectations may have problems.

The value is shown in tenths of a second. Thus, if the drive took 12 seconds to spin up, the value would be 120.

Problem Indicators

Use the Problem Indicators to determine when a drive failure has occurred that may be correctable without replacing the drive. The Problem Indicators are:

Fail Recov Reads shows the number of read errors that occurred while Automatic Data Recovery was being performed from this physical drive to another drive. If a read error occurs, Automatic Data Recovery stops.

If Automatic Data Recovery stops repeatedly and this counter is incremented on a drive in the recovering volume, you may be able to correct the problem. Follow the steps below:

CAUTION: Do not replace this drive without first performing the following steps or data loss will occur.

  1. Back up the system data, if possible. Otherwise, revert to a previous backup.

  2. Run Compaq Diagnostics 8.18 or later on the drive exhibiting these errors. Perform a Surface Analysis by following these steps:

    1. Insert the Diagnostics diskette into the diskette drive. Reboot the system.

    2. Select Computer Checkup (Test) at the first menu.

    3. At the Main Diagnostics menu, select Prompted Diagnostics.

    4. At the Test Options screen, select Interactive Testing (Single Device).

    5. At the Device Selection menu, select Fixed Disks.

    6. At the Fixed Disk Test Selection menu, select the Format menu. If more than one drive is available, select the drive you wish to test.

    7. At the Format menu, select Surface Analysis. This test remaps any bad sectors. This test will indicate further problems with the drive, if any.

  3. Restore data from the backup.

  4. If these errors repeat, replace the drive.

Other Timeouts shows the number of times the drive did not respond with an interrupt within a controller-defined period of time after a command had been issued. This monitored item does not include Data Request (DRQ) timeouts.

If the count is not zero and the drive has failed, you may be able to correct the problem without replacing the drive. Follow the steps below:

  1. Ensure that all system and storage system cables are intact and seated properly. You may need to replace the cables.

  2. Ensure that a Compaq ProLiant Storage System is plugged in and powered on. Make sure the power supply is functioning.

IMPORTANT: Never turn off a ProLiant Storage System when the attached system is still turned on.

  1. Check the physical proximity of the system to other electrical devices. Since electrical noise may cause this error, check the AC circuit for other electrical devices.

  2. For Compaq IDA systems, contact your local Compaq Service Provider to verify that the COMPAQ IDA Controller is at a minimum revision level. Refer the provider to Service Bulletin 102A.

  3. Timeouts can be caused when two or more drives are set to the same SCSI ID. Ensure that the ProLiant and system SCSI IDs do not conflict.

  4. On a Compaq ProLiant Storage System, check the SCSI ID cable on the drive tray. If the cable is damaged or incorrectly installed, SCSI Timeouts can occur. See the documentation accompanying the Hot Plug Drive Tray Service Spare Kit.

  5. Ensure that the system temperature is within specified limits. Ensure that the fans are operating and are not blocked.

  6. In some instances, drive failure can cause Timeouts. If you continue to receive many of these errors, replace the drive.

You can reset Other Timeouts using Compaq Diagnostics. Follow these steps for Compaq Diagnostics 8.19 or later:

  1. Reboot the system with the Compaq Diagnostics diskette in drive A.

  2. Press Enter at the Welcome screen.

  3. At the Main menu, select (Computer Checkup) Test.

  4. Select Continue at the Note: screen.

  5. Select Prompted Diagnostics at the next screen. Select Continue at any Warning panels that may display.

  6. At the Test Options screen, select Interactive Testing (single device).

  7. At the Device Selection menu, select the type of drive that indicated Other Timeouts.

  8. At the Test Selection menu, select Drive Monitoring Diagnostics Test.

  9. If the next screen offers you a choice of logical drives, select the logical drive associated with the physical drive indicating Other Timeouts or select Test All Drives.

  10. Diagnostics will display the 1736-22 error if Other Timeouts are discovered. Press Enter.

  11. Select Yes at the next screen to reset Other Timeouts.

SCSI Bus Faults displays the number of times that SCSI bus parity, overrun, or underrun errors have been detected on the SCSI bus. Since the controller will retry the operation, SCSI bus faults can cause a drop in performance, or, in some cases, data corruption.

If the count is not zero and the drive has failed, you may be able to correct the problem without replacing the drive. Follow the steps below:

  1. Ensure that all system and storage system cables are intact and seated properly. You may need to replace the cables.

  2. Check the physical proximity of the system to other electrical devices. Since electrical noise may cause a Bus Fault error, check the AC circuit for other electrical devices.

  3. Ensure that the system temperature is within specified limits. Ensure that fans are operating and are not blocked.

  4. SCSI Bus Faults can be caused when two or more drives are set to the same SCSI ID. Ensure that ProLiant and system SCSI IDs do not conflict.

  5. In some instances, drive failure can cause SCSI Bus Faults. If you continue to receive many of these errors, replace the drive.

NOTE: If the drive has not failed, the above counts simply provide a cumulative record of past errors that have been corrected.

Failure Indicators

Use the Failure Indicators to determine the cause of a drive failure. Typically, the number of failures is zero when the drive is operating normally. If a counter is not zero and the drive has not failed, there could be an intermittent problem that may require the drive to be replaced. The Failure Indicators are:

If the failure count is not zero and the drive is OK (has not failed), there may bean intermittent problem that requires drive replacement. If you observe that the count is increasing over time, replace the drive.

If the number of aborted commands is not zero and the drive is OK (has not failed), there may be an intermittent problem that requires drive replacement. If you observe that the count is increasing over time, replace the drive.

If the number of format errors is not zero and the drive has failed, replace the drive. If the counter is not zero and the drive is OK (has not failed), there may be an intermittent problem that requires drive replacement. If you observe that the count is increasing over time, replace the drive.

If the number of hardware errors is not zero and the drive is OK (has not failed), there may be an intermittent problem that requires drive replacement. If you observe that the count is increasing over time, replace the drive.

If the number of not ready errors is not zero and the drive has failed, replace the drive. If the counter is not zero and the drive is OK (has not failed), there may bean intermittent problem that requires drive replacement. If you observe that the count is increasing over time, replace the drive.

If the number of bad target errors is not zero and the drive is OK (has not failed), there may be an intermittent problem that requires drive replacement. If you observe that the count is increasing over time, replace the drive.

If the number of fail recov writes is not zero and the drive has failed, replace the drive. If the counter is not zero and the drive is OK (has not failed), there may be an intermittent problem that requires drive replacement. If you observe that the count is increasing over time, replace the drive.

If the number of media failure errors is not zero and the drive has failed, replace the drive. If the counter is not zero and the drive is OK (has not failed), there may be an intermittent problem that requires drive replacement. If you observe that the count is increasing over time, replace the drive.

Statistics

This section displays statistics about a specific drive array controller physical drive. You can use the run-time statistics to monitor the health of a specific drive. The following information displays.

Sectors Read shows the total number of sectors read from the physical drive since the drive was stamped. The drive was stamped when it left the factory or when you ran Compaq Diagnostics on your new drive.

Hard Read Errors displays the number of read errors that could not be recovered by a physical drive's Error Correction Code (ECC) algorithm or through retries. Over time, a drive may produce these errors. If you receive these errors, a problem may exist with your drive.

The severity of these errors depends on whether the managed system is running in a fault tolerant mode. With fault tolerance, the controller can remap data to eliminate the problems caused by these errors.

Recovered Read Errors displays the number of read errors corrected through physical drive retries. Over time, all drives produce these errors. If you notice a rapid increase in the value for Recovered Read Errors or Hard Read Errors, a problem may exist with the drive. Expect more errors for this monitored item than for Hard Read Errors.

Total Seeks displays the total number of seek operations during seek tests performed by the physical drive since the drive was stamped. The drive was stamped when it left the factory or when you ran Compaq Diagnostics on your new drive.

During normal reads and writes to the drive, the drive does implied seeks to the location where data resides. These are not included in this count.

Seek Errors displays the number of seek errors that a physical drive detects. A seek error is a seek that failed. Over time, a drive usually produces these errors. If you notice a rapid increase in the value shown for Seek Errors, this physical drive may be failing. Only an unusually rapid increase in these errors indicates a problem.

Sectors Written displays the total number of sectors written to the physical drive since the drive was stamped. The drive was stamped when it left the factory or when you ran Compaq Diagnostics on your new drive.

Hard Write Errors displays the number of write errors that could not be recovered by a physical drive. Over time, a drive may produce these errors. If you notice an increase in the value shown for Hard Write Errors or Recovered Write Errors, a problem may exist with the drive. On average, these errors should occur less frequently than read errors.

Recovered Write Errors displays the number of write errors corrected through physical drive retries or recovered by a physical drive on a monitored system. Over time, a drive may produce these errors. If you notice an increase in the value shown for Recovered Write Errors or Hard Write Errors, a problem may exist with the drive.

The value increases every time the physical drive detects and corrects an error. Only an unusually rapid increase in these errors indicates a problem. On average, these errors should occur less frequently than read errors.

Hot-Plug Count indicates the number of times this physical drive was removed via a hot-plug event from a Compaq ProLiant Storage System since the drive was stamped. The drive was stamped when it left the factory or when you ran Compaq Diagnostics on your new drive.


Logical Drive Information

A list of logical drives associated with the controller displays in the Mass Storage submenu. Each logical drive in the list displays the condition, the logical drive number and the fault tolerance of that logical drive. Select one of the logical drive entries to display the following information.

Status displays the status of the logical drive. The logical drive can be in one of the following states:

Fault Tolerance displays the fault tolerance mode of the logical drive. To change the fault tolerance mode, run the Compaq System Configuration Utility.

The following values are valid for the Logical Drive Fault Tolerance:

Capacity displays the size of the logical drive in megabytes. For example, 120 indicates that the logical drive is 120 megabytes. Use this data to determine whether the drive will be large enough to accommodate your needs.

The capacity utility defines a megabyte as 1,048,576 bytes. The capacity value shown may differ from the stated size of the drive due to different definitions of a megabyte. Many hardware manufacturers use the value of 1,000,000 for megabyte instead of 1,048,576.

Percent Rebuild Complete displays the percent complete of the rebuild. When the value reaches 100, the rebuilding process is complete. The drive array continues to operate in interim recovery mode while a drive is rebuilding.

When a logical volume is expanding, the drive must redistribute the logical volume data across the physical drives. This value shows how many blocks of data still need to be redistributed. When the value reaches 100, the expand process is complete. The array continues to operate normally while the drive is expanding.

This value is only valid if the Logical Drive Status is Rebuilding or Expanding.

Accelerator indicates whether the logical drive has an Array Accelerator board configured and enabled. The following values are valid:

Stripe Size displays the size of a logical drive stripe in kilobytes.

Physical Drives

Select one of the listed physical drives to see more information about the drive.

Spare Drives

This section provides additional information about the spare drive, including its status and the number of physical drives it is replacing, if any. This section is available only if a spare drive is configured for the selected logical drive. The following information is available.

Status displays the status of the on-line spare drive. The following values are possible:

Spare Drive ID indicates which physical drive functions as a spare. This value represents the physical drive ID. If you have a SCSI Managed Array Technology (SMART or SMART-2) Controller installed, this item will show the port that the spare is attached to, followed by the physical drive ID.

Replaced Drive ID - identifies a failed physical drive by its drive ID number. If one of the physical drives has failed and the spare drive is now operating in place of the failed drive. For the SCSI Managed Array Technology (SMART or SMART-2) Controller, the port number of the replaced drive displays, followed by the drive ID number.

Use this monitored item to identify the failed drive and replace that drive as soon as possible.

If N/A displays, the spare has not begun operating in place of the failed drive.

Rebuild Percentage displays the percent complete of the rebuild. When the value reaches 100, the rebuilding process is complete. The drive array continues to operate in interim recovery mode while a drive is rebuilding.

When a logical volume is expanding, the drive must redistribute the logical volume data across the physical drives. This value shows how many blocks of data still need to be redistributed. When the value reaches 100, the expand process is complete. The array continues to operate normally while the drive is expanding.

This value is only valid if the Logical Drive Status is Rebuilding or Expanding.


RA-8000 RAID Array Storage Systems

Select the RA-8000 RAID Array Storage Systems item from the Mass Storage submenu to display the following information for each storage system:

Name identifies the type of storage system for identification purposes.

Status displays the current status of the storage system. The following values are valid:

Controller 1 Serial # is the storage system's first controller serial number which can be used for identification purposes.

Controller 2 Serial # is the storage system's second controller serial number which can be used for identification purposes.


Fibre Channel Connections

Select the Fibre Channel Connections item from the Mass Storage menu to display the following information:

Host Controller displays the condition and model name for the controller.

WorldWide Name displays the unique Fibre Channel name for this controller.

Status displays the status for this controller. The following values are valid:

Slot displays the physical slot where the host controller resides in the system. For example, if this value is three, the controller is located in slot three of your computer.

Attached Storage Systems displays all storage systems and Fibre Channel tape controllers attached to the selected Fibre Channel controller. Select a storage system entry to display the related Storage System Information or select a Fibre Channel tape controller entry to display the related tape controller information.