[rescue] PIXAR Box...

Nathan Raymond nate at portents.com
Thu Mar 17 13:10:39 CST 2005


On Thu, 17 Mar 2005, Paul Hortiatis wrote:

> And as far as i know this is only realy one big video card. I belive there is 
> a host machine that is needed, for some reason i think it was a SUN machine 
> but i'm not shure i'm trying to find the picture I had befor that had the 
> host next to it, but i may not be remembering it correctly, but i could be 
> compleatly off on that. There is very little real information about them on 
> the web, or i have yet to come across it.

I can help with that.  This page has some useful diagrams outlining how it 
worked:

http://www.specktech.com/PixarImageComputer.html

And here is a good Byte article by the same guy describing the 
architecture in more detail:

BYTE Magazine > Features > 1999 > November

>From Pixar To Velocity Engine
By Glen Speckert
November 10, 1999

In my recent Byte.com article, Dawn of the Desktop Supercomputers, we 
learned the Single Instruction Multiple Data (SIMD) architecture is alive 
again, when Steve Jobs announced the arrival of Apple's Power Mac G4 and 
its Velocity Engine. My background flows through Pixar, which evolved the 
Image Computer, Steve's previous SIMD architecture machine. Understanding 
what was learned in the Pixar evolution may be useful in understanding the 
emerging Velocity Engine. The change in scale over a decade is also 
interesting.

The Pixar Image Computer (PIC) was built around a channel processor, or 
Chap, which could perform the same instruction on four data channels, such 
as red, green, blue, and alpha. Imaging applications do this often. Each 
channel was 12 bits deep, allowing for photographic quality from the fully 
anti-aliased Pixar software suite. A Chap could also perform tile-based 
algorithms four-way parallel in "accordion mode" on a 12-bit monochrome 
channel. The video frame store processor (FSP) included four channels of 
2-megapixel memory, a 10-bit-in/10-bit-out lookup table (LUT) and 10-bit 
digital to analog converters (DACs) for high-quality monitors.

Many people had not heard of an alpha channel back then, which is very 
useful when compositing layers and dealing with transparency. The PIC was 
aimed at the digital-image compositing problem, and its roots go back to 
LucasFilm, long before the days of After Effects and Final Cut Pro.

Pixar's Chap was a card instead of a chip. The standard card size of the 
time, VME, was somewhat larger than today's motherboards. The FSP, or 
video card also required a full card, and could change output resolution 
under software control.

Another card was an "Off Screen Memory" card, or an OSM (pronounced 
reverently as "Awesome!"). The OSM was packed very tightly with the 
highest-density memory chips, packaged edgewise, covering the whole board. 
Channels were 12-bits deep, Pixels were four channels wide, and this 
awesome OSM held a whopping 32 megapixels, or a mere 48 Mbytes.

The Chap was programmed by a 96-bit instruction word, compared with the 
Velocity Engine's 128 bit instructions. The Chap hardware was a four-way 
parallel network of multipliers, scratchpads, arithmetic units, and pixel 
I/O buffers. The software controlled the routing of information through 
the hardware components. The key to performance was to establish a 
pipeline, whose depth was a function of the creativity of the software 
developer, limited by the sum of the hardware. Clever software could 
perform an operation on multiple RGBA pixels at different stages of their 
transformation each hardware clock cycle.

The Hurricane

I joined Pixar at the beginning of the Hurricane campaign, as we set out 
to build the super image computer of the pre-dawn. We scaled the chassis 
up to hold nine cards, allowing for four of the SIMD Chap processors to 
operate on the same bus with two or three OSM memory cards, two or one 
video cards, and an overlay board. The Hurricane was architecturally 
similar to a quad processor G4 system.

The overlay board was a new development, and was the tip of the iceberg, 
where the hardware and software met. The Chaps wrote to regions of memory 
on the FSPs, and a windowing system ran on a graphics processor on the 
overlay board. FSP digital video was intercepted after the LUTs and before 
the DACs and threaded through the overlay board frame buffers, where the 
windowing system graphics were superimposed. The overlay board included 
video output DACs for dual-monitor configurations, and connected to two 
FSPs.

Pixar licensed the Network Extensible Windowing System (NeWS) from Sun, 
which used Adobe's Postscript technology. We "ported" NeWS to the overlay 
board, debugging as we went. We added imaging extensions for image 
processing, roam, zoom, and multi-window operations. These extensions 
controlled the operation of the quad Chap imaging content areas, while 
Postscript controlled the appearance of the non-transparent desktop pixel 
areas. Applications written in Postscript could control both the windowing 
system and the underlying image-processing capabilities from an integrated 
framework.

The rumors of OS X's windowing system based around Portable Document 
Format (PDF) technology strike a resonance in terms of harnessing the 
Velocity Engine capability seamlessly for the application developer. The 
degree of integration between the windowing system and the multi-processor 
Velocity Engine will be a key differentiator for a G4 running OS X.

The Hurricane system could associate image-transformation methods between 
windows, which could be assembled in real time by the user, dynamically 
seeing the image pipeline results. Today's media streaming technologies 
could be used to extend software image pipeline support much further, 
especially in the area of dual-stereo window functionality.

Lesson Learned

The key thing we learned was the relationship between performance and 
keeping the pipeline full. The integrated roam pipeline code was written 
by none other than Loren Carpenter, the senior scientist of Pixar. Loren's 
SIMD programming skills were legendary. He would calculate the minimum 
number of hardware cycles needed to perform an operation on a block of 
pixels, and add the cost to fill and flush the pipeline. He compared his 
coded results with the theoretical minimum, and continued to bend his 
solution until his approach met the theoretical optimum. This concept was 
difficult to incorporate into SIMD compilers.

A lot of supporting technology is needed to feed a multiprocessor SIMD. We 
worked with very early RAIDs, and added a hardware decompression 
daughtercard to a high-speed, high-interface card, and optimized drivers 
for high-speed writing to the OSM memory, which was shared by the four 
Chaps. The G4 Sawtooth motherboard has ATA/66 for RAID I/O and high-speed 
memory pathways. The value of these increases as multi-processor Velocity 
Engines share memory and disk access. A full rack of spinning disks held 
3-Gbytes of RAID, which cost more than today's 3-Tbyte RAIDs in about the 
same footprint.

The G4 and the Pixar Image Computer share much in common. Keeping the 
pipeline full can be difficult, but when you are able to do it 
successfully, the results are nothing short of "Different." Pixar's 
imaging applications were head and shoulders above all, but the custom 
hardware, million-dollars-per-seat solutions. The Velocity Engine 
accelerates applications ranging from Photoshop and Final Cut Pro to 
real-time compression for videoconferencing applications over the Web. The 
G4 is enabling the use of video as a desktop data type. Any traditional 
supercomputer SIMD application, and there are a lot of them, can be made 
to run well on the G4.

Oh yes, one more thing we learned: Even when the machine runs beautifully, 
you've got to have a next-generation machine in the wings, or customers 
won't buy into the architecture. The Hurricane PII-9 was the end of a 
line, and we disassembled a fine team as Steve took Pixar out of the 
hardware and imaging business to focus on the rendering business. But the 
G4 Velocity Engine family is just beginning. While the early 500-MHz G4's 
were indeed made of unobtanium, IBM is coming to the party with their 
unmatched manufacturing capability. Even faster G4's will flow like a 
river by next summer, with longer hardware pipelines. Dual-processor 
Velocity Engines can be envisioned sitting in dual channel Sawtooth 
motherboards clustered around RAID farms performing many diverse tasks, 
possibly including computing Toy Story 3. But one thing is clear, the SIMD 
architecture is back. May it live long and prosper.

------------------------------------------------------------------------

Glen Speckert has been involved with imaging and video for most of his 
22-year career at LLNL, Pixar, TASC, and as an independent consultant. He 
is also the author of the original interactive dog Frisbee training CD, 
Dog, Disc, and Wind, which is now available in English and Japanese, can 
be previewed at: http://www.DogDisc.com/



More information about the rescue mailing list