[geeks] examples of vector processors (vs scalar

Mon Aug 5 16:18:41 CDT 2002

jdboyd at cs.millersville.edu writes:

>Now (and this is from listening to others, rather than personal
>experience), Altivec allows adding 2 sets of 4 floats in one instruction
>like above (this part I know personally from reading the docs, and
>reviewing OS X code that used the units).  Cray's (and here is where I
>get to the hear say) can add 2 sets of 4 floats together like that one
>instruction, but they can also add a 2 sets of a few hundred floats
>together in one instruction as well.

If I understand the Crays correctly, the fact that one instruction
operates on a large array does *not* mean that all the elements
are actually calculated simultaneously. It's actually just a
really really optimized loop, initiated by a single instruction.
Obviously that lets you tune the hardware to keep all the stages
of the pipeline running optimally, but it isn't what would be called
"SIMD" in the MPP world.

The post-Seymour CDC supers (CDC-205?, ETA) actually had memory-to-memory
vector instructions, so the memory interface was part of that specially
tuned & optimized loop too, which gives stunning results.

------ David Fischer ------- dave at cca.org ------- http://www.cca.org -------
---------- "Anything Jesus can do, I can do better." - The Locust ----------