[geeks] sun ultra 24

Lionel Peterson lionel4287 at verizon.net
Thu Oct 18 11:08:13 CDT 2007


>From: Shannon Hendrix <shannon at widomaker.com>
>Date: 2007/10/17 Wed PM 10:01:40 CDT
>To: The Geeks List <geeks at sunhelp.org>
>Subject: Re: [geeks] sun ultra 24

>On Oct 17, 2007, at 7:35 PM, Dan Sikorski wrote:
<snip>

>> Or if downtime just plain can't happen.  If my life depended on the
>> operation of a computer, (I.E. Some sort of life support system) I  
>> would
>> want such features.
>
>True, but my point was that even IBM discovered that many times it  
>was more reliable and cost effective to have N machines rather than  
>one that 'cannot fail'.

I fully understand redundant systems (telco/mainframe background), my point was that simply doubling up on RAM in case one DIMM fails really increases system cost:

i.e. if a given system board can only hold 8 DIMM slots, with redundancy that takes you down to 4 slots in use, 4 in spare. Fair enough, but as a further hypothetical, assume the working set of applications requires 8 Gigs of RAM - rather than buying 8x 1 Gig DIMMs, you have to buy 4x 2 Gig DIMMs (at more than double the per-DIMM cost), and then to gain the redundancy desired, you have to buy 4 more 2 Gig DIMMs, so your cost for RAM is 4x what it would have been if the 8 DIMM slots were populated with lower-density RAM.
 
>It's often difficult to tell which is the best solution.

Building duplicate machines in one chassis, with a third "arbitrator" to determine when a part failed is likely only valuable when there are unique I/O requirements that far exceed the cost of the hadrware itself (for example, a phone switch - it is more effective to have a redundant system becuse the physical phone line connections are expensive to route to multiple machines).

>I've been in an IBM shop when the mainframe died.  Everything...  
>stopped...

I've been in a mainframe shop when the A/C went out - everything stopped. It is oddly quiet and fun, for about 3 seconds, then your world turns upside-down for the rest of the shift...

>If we'd had multiple machines instead, we'd have been able to keep  
>running.

Unless th AC goes out ;^)

>Of course, I've also been in a shop that used redundant hardware and  
>fail-over setups where a crash in one machine led to cascade failure.
>
>All of the redundant backups were were bug compatible... :)
>
>The best one of all though: in one shop we had two machines in fail- 
>over mode, and when it came time to upgrade them, management decided  
>instead to just keep loading up both machines.

I worked at a large oil company and they had two data centers, each running 1/2 the production load, with the other half dedicated to development and non-essential work. The idea was that if one center was hit, the other could carry the entire work load *immediately*. Good in theory, but as it was tested, the importance of having DASD with the same capacity and addresses was soon discovered...

Lionel



More information about the geeks mailing list