[geeks] Solaris 10 / OpenSolaris bits to be in next version of OSX

Phil Stracchino phil.stracchino at speakeasy.net
Wed Aug 9 20:41:47 CDT 2006


Bill Bradford wrote:
> There was a weblog post a week or two ago about one of the guys at Sun
> who had a bad disk and didn't notice - because RAID-Z / ZFS kept the 
> data intact...  I'll see if I can dig it up.

That reminds me of a story my first wife told me, out of Stratus.

For those not familiar with Stratus, Stratus did high-availability.
Their principal competitor was Tandem.  Stratus used its own custom OS
called FT/OS for a long time, then finally migrated to what they called
Fault Tolerant UNIX ... but that's another story.  This story concerns
the hardware.

EVERYTHING was replicated in Stratus hardware.  Redundant power
supplies.  Redundant RAM.  Redundant disks.  Redundant CPUs.  Redundant
controllers.  Redundant NICs.  Redundant I/O boards.  Even the backplane
had redundant busses.  The system constantly monitored every piece of
hardware in the box, and would seamlessly switch everything over to the
good one if a component failed.  What's more, everything in the box was
hot-pluggable.  You could plug in a new, uninitialized disk and it would
initialize the disk, partition it, create a filesystem on it according
to the defaults set up in its configuration, and ask where you wanted it
mounted.  (OK, so we're briefly back to the OS again.)

So one day, there's this trade show, and Stratus has a box there, with a
couple of engineers doing their standard fault tolerance demo, which was
to just open up the front of the cabinet, reach into the running
machine, and yank something out at random.  And on one of these demos,
they had an interested audience of about a dozen people, which happened
to include two engineers from Tandem.  So one guy reaches into the box,
grabs a random board without looking, and casually yanks it out.

Unbeknownst to him, the twin of that board had failed.  So the moment he
pulled the board, the machine went down, resulting in some smirks and
snickers from the two Tandem guys.  Embarrassed, but immediately
realizing the problem, the Stratus engineer re-inserted the board as he
explained what had happened, pointing out that even operating on only
one of that pair of boards the machine's performance hadn't been
noticeably degraded at all.  As soon as he plugged the board back in,
the machine's continuous firmware self-test detected it had a complete
operable set of hardware, and it automatically started booting.


This is where the two Tandem guys suddenly looked like someone had just
run over their favorite puppy with a lawnmower.  One of them finally got
out, in heartbroken tones,

"Your machine *automatically reboots itself* after a failure...?"

Apparently, Tandem's machines weren't robust enough to do that.  ;)



-- 
 Phil Stracchino                     Landline: 603-886-3518
 phil.stracchino at speakeasy.net         Mobile: 603-216-7037
 Renaissance Man, Unix generalist, Perl hacker, Free Stater



More information about the geeks mailing list