[rescue] Looking for I/O performance metrics

Derrick D. Daugherty rescue at sunhelp.org
Mon Nov 26 13:20:26 CST 2001


--dc+cDN39EJAMEtIO
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

coming in late to the discussion...but my four hay-pennies....

It's rumored that around Fri, Nov 23, 2001 at 05:28:21PM -0000
Chris Byrne <chris at chrisbyrne.com> wrote:
> Patrick,
> 
> Everything you said makes perfect sense, I just wish it were true ;-)
> 
> Here's the situation
> 
> The problem is occurring under light load or more precisely no essentially
> no load.
> It's only occurring on the Sun systems attached to the SAN not the windows
> systems.

does `sync` take a while to return?  Sounds like the interfaces aren't
happy

> They are using Veritas to manage UFS filesystems on virtual mount points
> created out of a single large LUN being presented to them from the SAN.

If winblows doesn't have a prob it's more than likely something in
/etc/system  sd.conf or lpfc.conf/fca[w,l].conf

I'm guessing sd_io timeouts and buffer sizes should be looked at..

some questions...

o what sort of storage is this?

o is this switched or hub?

o what vendor(s)?

o looped or fabric?

o is IP enabled/used on the hba's?


Just like we've all had problems with eth auto-negotiation..check the
interfaces on the hosts and the switch.  netstat -k will have stats for
the hba's.  If you paste that info I'd have a better idea.

Also, the best way to t-shoot is always with output from the kernel..via
iostat, vmstat, mpstat and vxstat.  I'll attach a script that has all
but the vxstat..it's similar syntax though.  vxstat -g whatdg -c count
-i interval...  so vxstat -g oradg -c 20 -i 10  The -i may need to come
before -c..I never bother to remember.   Also, the actual disk layout on
the array would be helpful.. vxprint -ht.  That could make all the
difference in the world.

That of course is only how vxvm will see it..so the hardware config is
just as, if not more, improtant.  Is this striped concatenated or wha?
These things don't really sound like a light-load problem.  I'm leaning
more toward the san config on that one (interface configs).

> Yes I know that these are screwy
> 
> I've already recommended that they restructure their filesystems and put
> their data on Oracle raw but I need to give them hard numbers on the current
> upgefucked setup.

Restructuring the fs from one big lun to many would be very
beneficial..that's just lazy to do one lun.  but nowadays you can live
with modern fs's and oracle and not have a performance penalty.  there's
a sun blueprint with all the 'hard' data to back this.  I won't bore
you.

> The system is basically set up as if it was on a JBOD array and they were
> trying to reduce spindle contention, but with a large scale storage array
> the array itself handles the contention and resource management issues so
> anything you do on the filesystem side will just make things worse.

this sounds like a symmetrix sales pitch.... ya know rather than proper
architecture up front they'll sell you db tuner if you want :)

if this is a symm then they have recommended settings for the hba's conf
files.. make sure these are set accordingly, and /etc/system

if you want to send the output of the script and the vxstat and netstat
-k to me instead of spamming the list I"d love to look over it..I love
this stuff.  oh yeah, and vxprint -ht

feels good to use this part of my brain again..i've only been using the
'playstation' and 'scotch' lobes :D

HTH
^D

--dc+cDN39EJAMEtIO
Content-Type: text/x-sh; charset=us-ascii
Content-Disposition: attachment; filename="sysload.sh"

#!/bin/sh 
# contact derrick at blinky-lights.org for any questions

VMSTAT="/usr/bin/vmstat"
MPSTAT="/usr/bin/mpstat"
IOSTAT="/usr/bin/iostat"
PRTDIAG="/usr/platform/`uname -m`/sbin/prtdiag"
DATE="/usr/bin/date"
TODAY=`$DATE +%m.%d.%Y`
HOSTNAME=`/usr/bin/hostname`

/usr/bin/rm $HOSTNAME-prtdiag $HOSTNAME-vmstat $HOSTNAME-iostat \
$HOSTNAME-mpstat  $HOSTNAME-stats-$TODAY 2>> /dev/null

echo
echo "gathering architecture info..."
echo

$PRTDIAG -v 2> $HOSTNAME-prtdiag 1>> $HOSTNAME-prtdiag 

echo "gathering system load info..."
echo

$VMSTAT 20 10 2> $HOSTNAME-vmstat 1>> $HOSTNAME-vmstat &
p1=$!

echo "gathering disk load info..."
echo 

$IOSTAT 20 10 2> $HOSTNAME-iostat 1>> $HOSTNAME-iostat &
p2=$!

echo "gathering cpu load info..."
echo 

$MPSTAT 20 10 2> $HOSTNAME-mpstat 1>> $HOSTNAME-mpstat &
p3=$!

echo "waiting....."
echo 

wait $p1 $p2 $p3

#/usr/bin/wc -l $HOSTNAME*

/usr/bin/uname -a > $HOSTNAME-stats-$TODAY
/usr/bin/uptime >> $HOSTNAME-stats-$TODAY
/usr/bin/echo "----------------------------------------------------------------" >> $HOSTNAME-stats-$TODAY
/usr/bin/echo "vmstat:" >> $HOSTNAME-stats-$TODAY
/usr/bin/cat $HOSTNAME-vmstat >> $HOSTNAME-stats-$TODAY
/usr/bin/echo "----------------------------------------------------------------" >> $HOSTNAME-stats-$TODAY
/usr/bin/echo "iostat:" >> $HOSTNAME-stats-$TODAY
/usr/bin/cat $HOSTNAME-iostat >> $HOSTNAME-stats-$TODAY
/usr/bin/echo "----------------------------------------------------------------" >> $HOSTNAME-stats-$TODAY
/usr/bin/echo "mpstat:" >> $HOSTNAME-stats-$TODAY
/usr/bin/cat $HOSTNAME-mpstat >> $HOSTNAME-stats-$TODAY
/usr/bin/echo "----------------------------------------------------------------" >> $HOSTNAME-stats-$TODAY
/usr/bin/df -kl >> $HOSTNAME-stats-$TODAY
/usr/bin/echo "----------------------------------------------------------------" >> $HOSTNAME-stats-$TODAY
/usr/bin/cat $HOSTNAME-prtdiag >> $HOSTNAME-stats-$TODAY

/usr/bin/rm $HOSTNAME-prtdiag $HOSTNAME-vmstat $HOSTNAME-iostat \
$HOSTNAME-mpstat 2>> /dev/null

echo
echo "Done.  Please send $HOSTNAME-stats-$TODAY"

--dc+cDN39EJAMEtIO--



More information about the rescue mailing list