[geeks] New Itanium machines from SGI

Wed Jan 8 20:58:15 CST 2003

> -----Original Message-----
> From: geeks-bounces at sunhelp.org 
> [mailto:geeks-bounces at sunhelp.org] On Behalf Of Dave McGuire
>
> 
>    While I agree with what you're saying, and I'm sure you 
> already know 
> this, I should point out that the processor on these video cards are 
> *far* from general-purpose floating point processors.
> 

Of course not, they are very specialized floating point processors. And
its probably horribly inefficient to use them for general purpose
floating point work, but Im guessing its possible. Even if it isnt given
todays video hardware, why isnt it a consideration for tomorrows o given
the speed of product cycles in the video market, seven or eight product
cycles down the road. 

Why don't we design our subsystems take on more of the subsidiary loads
of computing much in the way that mainframes do. Dedicated subsystems
that put very little to no load on the main logic processing systems to
allow them to complete their core functions more efficiently. Modern
computers spend up to 98% of their time in I/O wait states during normal
usage. That's pretty silly if you ask me. As it is we are still dealing
with in general very primitive interrupt control processes. Why don't we
have I/O subprocessors to handle all of these issues. Be could literally
have some type of controller at each peering point to efficiently manage
the flow. 

Responding also to Joshes post

> That is very non-trivial.  You might be able to get away with it on a
> machine like the Onyx or Octane, assuming that you can figure out how
> [0], but it is not easy for PC cards.
> 
> First, you need to figure out how to make the video card do a bunch of
> work before reading it back or else you will severely saturate AGP on
> readbacks.  So, you have so hope with pixel and vertex shaders, except
> those are very limited as to program size.
>  

I know this. What Im asking is WHY don't we design these things to
either spread the loads out better, or effectively use them. Oh and I
know for a fact you can do this with the high end SGI boxen because I've
seen it done for cryptoprocessing. I couldn't tell you exactly how it
was done, Im not much of a coder, but they said they got a 20-40% boost
in their crunching.

> 
> I'm not aware of much crypto work being done with 4x4 matrices.  But,
> I'm not well schooled on crypto.
>

You can break MANY though certainly not all large matrices into several
sequential smaller matrices through several differnet transforms, but
none of them are what you would call efficient or easy to calculate.
Probably more overhead in the transform than in calculating the matrix,
though when you think about it that's a lot of what multple plane
filtering, lighting, and occlusion culling is doing.

> > Same thing for the super high end sound cards. Why don't we 
> use those
> > super fast DSP's for something.
> 
> I don't know why we even bother with putting them in frankly, 
> and don't
> know if there is any way to read data back from the DSPs on 
> these cards.
>  

There is. They are pretty much totally controllable from software.
Honestly a lot of them are getting more and more general purpose as they
become more reprogrammable. Same thing applies to video cards with the
programmable shaders etc...

> Frankly, I think that C++ is part of the problem here.  Good libraries
> help, but I see lots of people writing their own matrix code 
> rather than
> using Apples Altivec library.  But, then, if they did use it, they
> wouldn't be able to port as easily to linux.  On linux, there are
> some things that use SSE2, but no simple to use library like Apple's
> altivec library.  
> 

I agree whole heartedly. Its not so much the core of the language but
the programming practices that have been ingrained in our programmers.
Theres really one basic problem. Ever since the 66 mhz level we have
decided that machine time is far cheaper than developer time. So people
arent coding the new libraries or making the older ones more efficient,
they're just resuing the same old code. And as you point out even if
they wanted to they have too many tasks to make them cross platform and
it takes too much budget to be able to do separate implementations for
each.

Chris Byrne