[geeks] New Itanium machines from SGI

Chris Byrne chris at chrisbyrne.com
Wed Jan 8 22:53:20 CST 2003


I realize the priorities of the market. However as processor speeds
level off, and as computers are less and less processor bound (and
they're already there in a major way. There's no way you can say a 2 GHZ
P4 is 4 ten times as fast as a 200mhz Ppro) we'll need to find a way to
increase 3, or we wont be able to do 2, making 1 irrelevant.

So if we do hit a clock speed ceiling we are gonna have to do one (or
both) or two things. Increase effective IPC, or decrease cpu dependence.


The only way we've been able to hit these ridiculous speeds is to vastly
increase pipeline lengths this drastically reduces the IPC (to the point
where its more like CPI in some cases), and even more drasticaly reduces
the effective IPC because of the extremely high penalty for a branch
mis-prediction or a cache miss. 

So in order to reduce these IPCs we need to radically reduce our
pipeline lengths or radically increase our OOE capabilities without
proportionally increasing our branch miss penatly. We don't know how to
do either of these things. We think we can increase our OOE capabilities
it by writing massively paralell code, but its only been a few years
since we started experimenting with this and it needs  a hell of a lot
more work before its ready for prime time. Not only that but massively
paralell code requires even longer pipelines which offset some of the
advantage that it provides. 

Ok so we know that this is hard. No-ones really done it before in a mjor
way. This is what Intel and HP are trying to do with the Itanium family.
So far they havent done too great of a job but I don't know if it's the
code or the hardware, and since theres no convenient way of separating
them I wont even try.

Ok so what's left? Reduce the dependence on the CPU for performance. Now
how do we do that? 

Well the first step is to improve memory systems and caching systems.
Massively increase the bandwidth going to memory. Including multiple
bank interleaving with redundancy and coherency (imagine something like
raid 0+1 for RAM)and either multiple caches, or a new large scale cache
architecture (or more likely both). That'll give as a good few
percentage points. Honestly I expect that if we do it right we can
double the performance of our systems just by messing with the memory.

But ultimately we are going to need to have a dedicated "master
controller" that acts a a kind of director for the system.  It will
direct to coprocessor systems with their own cache and memory.It can
manage all the things that don't need general logic calculations by some
type of mechanism similar to bus mastering where the request goes direct
from I/O into the proper subsystem. Its going to need an extremely high
speed transport, either bus or serial. Personally Im voting for some
kind of switched bus with direct bursting and pipelining. Then we'll
need a dedicated I/O subsytem (or systems) and possily a dedicated video
subsystem with the associated memory, cache, and data transport. 

That also means coming up with new high bandwidth peripheral and
component interconnects. PCI-X and probably even infiniband arent gonna
cut it. 

We may even build in subsystems to handle everything other than direct
logic calculations. Stuff like drawing windows and screen widgets for
example could be handled entirely in the video subsystem. Same thing for
playing a sound. The command for the sound to be played goes into the
I/O subsystem and never uses a CPU cycle. It just gets forwarded on to
the sound subsystem which handles all decoding and playing. 

The whole point of all that being, the cpu is free to handle nothing but
logic calculations. 

But the thing is those things are relatively easy to do, VS changin the
way we program and the way our processors work which are both rather
difficult. Basically what Ive described is similar in concept to the way
a mainframe works, only taken to its hopefully logical extreme. 

It also creates a decentralized system where the CPU itself is no longer
the most important factor in performance. Which was kind of the point
;-)

Chris Byrne



> -----Original Message-----
> From: geeks-bounces at sunhelp.org 
> [mailto:geeks-bounces at sunhelp.org] On Behalf Of Dan Sikorski
> Sent: Wednesday, January 08, 2003 20:17
> To: The Geeks List
> Subject: RE: [geeks] New Itanium machines from SGI
> 
> 
> On Wed, 2003-01-08 at 21:58, Chris Byrne wrote:
> > Why don't we design our subsystems take on more of the 
> subsidiary loads
> > of computing much in the way that mainframes do. Dedicated 
> subsystems
> 
> If your posing this question with regard to PC's, my guess would be
> because that would cost money.  Priority 1: price. Priority 
> 2: marketing
> Priority 3: performance.
> 
> 	-Dan Sikorski
> _______________________________________________
> GEEKS:  http://www.sunhelp.org/mailman/listinfo/geeks


More information about the geeks mailing list