[rescue] rescue Digest, Vol 48, Issue 7

Dan Duncan dand at pcisys.net
Sun Nov 12 10:44:38 CST 2006


On Sun, 12 Nov 2006, William Kirkland wrote:
> The difficulty with that definition, and mime has problems too, is
> that management typically wants personnel to cover for lack of
> equipment and design.

Yes, management often prefers to piss and moan and gnash teeth
and tell you how much this is costing them per minute to be down
than spend a fraction of that up front and prevent the outage in
the first place.[1]  Ounce of prevention, etc.  I have better redundancy
on much of my hardware at home than I have had on past jobs.  My current
job has massive redundancy but it's primarily a side effect of load
balancing for peak capacity.  It's definitely nice!  There are no single
points of failure at any given site and if enough goes wrong at a site
to take it down we just bypass it.  We have over a thousand servers in
more than a dozen data centers. (and 4 fulltime SAs)  It's a LOT of work,
but it's almost all 9x5.  We even get to do upgrades and maintenance on
a 9x5 schedule because we just take a site offline and upgrade it in
a day.  I don't know who the architect was on the project, but I'm glad.

-DanD

[1]  Of course, a smoothly running operation makes them look like they
aren't doing anything so maybe the occasional outage is good for
management's image because they can look like they fixed something?
You know their management doesn't know they could have prevented it.

-- 
#  Dan Duncan (kd4igw)  dand at pcisys.net  http://pcisys.net/~dand
# I guess we were all guilty, in a way. We all shot him, we all skinned him,
# and we all got a complimentary bumper sticker that said, "I helped skin Bob."
#                  -DEEP THOUGHTS



More information about the rescue mailing list