[rescue] IBM disk FRU (was "Re: playing with the new E250 ... disk performance vs. old PeeCee")

Wed Dec 20 15:43:21 CST 2006

Dan Duncan wrote:

> The last time this happened to me was during a blizzard.
> 
> Today I got another call about a couple of failed disks in another IBM server.
> "Aha," I said to myself, "I wrote the FRU down last time!" and
> retrieved my notes
> only to find that it's a different sized drive this time.
> 
> Today's CNN headline:  "Blizzard threatens to bury Colorado, Plains"
> 
> Shit.
> 
> It took me over an hour to drive home, too.  All 7 miles worth.  I'm just
> glad I left when I did.

Don't you have the IBM provided tools installed on your servers?

Every one of my servers every day emails me its RAID status which 
includes in verbose detail the logical and physical composition of every 
RAID volume.  Just looking at today's report from a randomly selected 
server in my farm, if I knew that SCSI device 0 went tits-up, I have the 
FRU and serial number handy in this report (as well as a lot of other 
info they won't know what to do with).

This example runs on a RHEL4 host.  If you're running AIX the provided 
tools may be somewhat different but I bet they exist on your support CD.

If you have the tools installed, make this script and drop it in your 
crontab:

#!/bin/bash
#################################################################
#
# aacraid_report.sh
#
# monitors serveraid 8i
#
# v0.1 3/23/06  this script is born
#
#################################################################

#-----------------------------------------------------------------
# function HANDLE_TRAP
# abnormal exit routine
# executed when the script receives a kill signal
#-----------------------------------------------------------------

function HANDLE_TRAP ( )
    {
    TODAY=`date +%d%b%y`
    HMS=`date +%X`

    exit 0
    return
    }

#######################################################################
#######################################################################
##
##                       Main
##                       MAIN
##                       main
##
#######################################################################
#######################################################################

VERSION="v 0.1"
SCRIPTNAME="${0}  ${VERSION}"
HMS=`date +%X`
TODAY=`date +%d%b%y`
SYSTEMNAME=`uname -n`
STATUSFILE='/tmp/aacraid_report.txt'
MAILTO='magnus at yonderway.com'

#------------------------------------------
# turn on trap handling
#------------------------------------------

trap "HANDLE_TRAP" 1 2 3 14 15

echo -e "============================================="   > ${STATUSFILE}
echo -e "Executing ${SCRIPTNAME}"                         >> ${STATUSFILE}
echo -e "At ${HMS} on ${TODAY}"                           >> ${STATUSFILE}
echo -e "on system [${SYSTEMNAME}]"                       >> ${STATUSFILE}
echo -e "=============================================\n" >> ${STATUSFILE}

/usr/RaidMan/arcconf getconfig 1 >> ${STATUSFILE}
STATUS=`cat ${STATUSFILE} |awk '/Status of logical drive/ {print $6}'`

cat "${STATUSFILE}" |mail -s "SR8i ${STATUS} - ${SYSTEMNAME}" ${MAILTO}

exit