[geeks] New Itanium machines from SGI

Joshua D Boyd jdboyd at cs.millersville.edu
Wed Jan 8 20:24:58 CST 2003


On Wed, Jan 08, 2003 at 05:15:20PM -0700, Chris Byrne wrote:

> Oh and why arent we using our video cards spare processing power when
> they arent being used for display processing? You would think it would
> be relatively easy to write software and a driver that uses the onboard
> geometry engines to do floating point work. Hell a GeForce ti4600 has
> more processing power than my whole computer did three years ago. 

That is very non-trivial.  You might be able to get away with it on a
machine like the Onyx or Octane, assuming that you can figure out how
[0], but it is not easy for PC cards.

First, you need to figure out how to make the video card do a bunch of
work before reading it back or else you will severely saturate AGP on
readbacks.  So, you have so hope with pixel and vertex shaders, except
those are very limited as to program size.
 
> We've basically got a massive floating point calculation engine sitting
> in every one of our systems. If you look at the high end cryptographic
> accellerators and modern graphics cards a lot of what they do is very
> similar. Why don't we use that power?

I'm not aware of much crypto work being done with 4x4 matrices.  But,
I'm not well schooled on crypto.
 
> Same thing for the super high end sound cards. Why don't we use those
> super fast DSP's for something.

I don't know why we even bother with putting them in frankly, and don't
know if there is any way to read data back from the DSPs on these cards.
 
> But the most findamental problem is msot code is still deisgned to run
> with a single thread or a small number of threads.Why? Because it's a
> lot harder to write heavily threaded code with a lot of paralellism in
> it. And it's a lot harder to design a compiler to deal with that code
> efficiently. 
> 
> The whole point of the Itanium (and to a esser extent the P4, especially
> with hyperthreading) is that its supposed to use highly paralellized
> code and the compiler is supposed to make massive paralellization
> optimizations at compile time. Only it doesn't work. Or at least it
> doesn't work well yet. Its only been a few years, and they need a lot
> more dev and debug time before its anything but a dog

Frankly, I think that C++ is part of the problem here.  Good libraries
help, but I see lots of people writing their own matrix code rather than
using Apples Altivec library.  But, then, if they did use it, they
wouldn't be able to port as easily to linux.  On linux, there are
some things that use SSE2, but no simple to use library like Apple's
altivec library.  

For SSE2, there are some application area specific libraries [1], but no
general ones.  The same appears to be true for VIS on UltraSPARCs.

Of course, these are just my compliants about the SIMD units on various
chips, it does nothing to explain difficulties in threading.  I don't
know how to make threading easy, but I think that it would be nice if
there were nicer ways to deal with semaphores automatically, that was
portable over most platforms, and a nicer way to deal with keeping
threads balance.  On a recent project, we had a terrible time on single
processor P4s keeping an IO thread from hogging the whole system, but
when we added a sleep to it, it ran too slowly the first several trys.
And the length of the sleep seemed to be needed to be changed for every
different CPU it ran on.  It probably just means I'm an incompetent
idiot who shouldn't be allowed near computers than that this is an
annoying task.

-- 
Joshua D. Boyd

[0] I'm interested knowing, but never quite enough to do the research to
find out, how hard it would be to take large matrix operations and
breakin them down into lots of 4x4 matrixs.

[1] Mesa3d, Sun's imaging, Intel's imaging, etc.


More information about the geeks mailing list