[rescue] PIXAR Box...
Nathan Raymond
nate at portents.com
Thu Mar 17 13:10:39 CST 2005
On Thu, 17 Mar 2005, Paul Hortiatis wrote:
> And as far as i know this is only realy one big video card. I belive there is
> a host machine that is needed, for some reason i think it was a SUN machine
> but i'm not shure i'm trying to find the picture I had befor that had the
> host next to it, but i may not be remembering it correctly, but i could be
> compleatly off on that. There is very little real information about them on
> the web, or i have yet to come across it.
I can help with that. This page has some useful diagrams outlining how it
worked:
http://www.specktech.com/PixarImageComputer.html
And here is a good Byte article by the same guy describing the
architecture in more detail:
BYTE Magazine > Features > 1999 > November
>From Pixar To Velocity Engine
By Glen Speckert
November 10, 1999
In my recent Byte.com article, Dawn of the Desktop Supercomputers, we
learned the Single Instruction Multiple Data (SIMD) architecture is alive
again, when Steve Jobs announced the arrival of Apple's Power Mac G4 and
its Velocity Engine. My background flows through Pixar, which evolved the
Image Computer, Steve's previous SIMD architecture machine. Understanding
what was learned in the Pixar evolution may be useful in understanding the
emerging Velocity Engine. The change in scale over a decade is also
interesting.
The Pixar Image Computer (PIC) was built around a channel processor, or
Chap, which could perform the same instruction on four data channels, such
as red, green, blue, and alpha. Imaging applications do this often. Each
channel was 12 bits deep, allowing for photographic quality from the fully
anti-aliased Pixar software suite. A Chap could also perform tile-based
algorithms four-way parallel in "accordion mode" on a 12-bit monochrome
channel. The video frame store processor (FSP) included four channels of
2-megapixel memory, a 10-bit-in/10-bit-out lookup table (LUT) and 10-bit
digital to analog converters (DACs) for high-quality monitors.
Many people had not heard of an alpha channel back then, which is very
useful when compositing layers and dealing with transparency. The PIC was
aimed at the digital-image compositing problem, and its roots go back to
LucasFilm, long before the days of After Effects and Final Cut Pro.
Pixar's Chap was a card instead of a chip. The standard card size of the
time, VME, was somewhat larger than today's motherboards. The FSP, or
video card also required a full card, and could change output resolution
under software control.
Another card was an "Off Screen Memory" card, or an OSM (pronounced
reverently as "Awesome!"). The OSM was packed very tightly with the
highest-density memory chips, packaged edgewise, covering the whole board.
Channels were 12-bits deep, Pixels were four channels wide, and this
awesome OSM held a whopping 32 megapixels, or a mere 48 Mbytes.
The Chap was programmed by a 96-bit instruction word, compared with the
Velocity Engine's 128 bit instructions. The Chap hardware was a four-way
parallel network of multipliers, scratchpads, arithmetic units, and pixel
I/O buffers. The software controlled the routing of information through
the hardware components. The key to performance was to establish a
pipeline, whose depth was a function of the creativity of the software
developer, limited by the sum of the hardware. Clever software could
perform an operation on multiple RGBA pixels at different stages of their
transformation each hardware clock cycle.
The Hurricane
I joined Pixar at the beginning of the Hurricane campaign, as we set out
to build the super image computer of the pre-dawn. We scaled the chassis
up to hold nine cards, allowing for four of the SIMD Chap processors to
operate on the same bus with two or three OSM memory cards, two or one
video cards, and an overlay board. The Hurricane was architecturally
similar to a quad processor G4 system.
The overlay board was a new development, and was the tip of the iceberg,
where the hardware and software met. The Chaps wrote to regions of memory
on the FSPs, and a windowing system ran on a graphics processor on the
overlay board. FSP digital video was intercepted after the LUTs and before
the DACs and threaded through the overlay board frame buffers, where the
windowing system graphics were superimposed. The overlay board included
video output DACs for dual-monitor configurations, and connected to two
FSPs.
Pixar licensed the Network Extensible Windowing System (NeWS) from Sun,
which used Adobe's Postscript technology. We "ported" NeWS to the overlay
board, debugging as we went. We added imaging extensions for image
processing, roam, zoom, and multi-window operations. These extensions
controlled the operation of the quad Chap imaging content areas, while
Postscript controlled the appearance of the non-transparent desktop pixel
areas. Applications written in Postscript could control both the windowing
system and the underlying image-processing capabilities from an integrated
framework.
The rumors of OS X's windowing system based around Portable Document
Format (PDF) technology strike a resonance in terms of harnessing the
Velocity Engine capability seamlessly for the application developer. The
degree of integration between the windowing system and the multi-processor
Velocity Engine will be a key differentiator for a G4 running OS X.
The Hurricane system could associate image-transformation methods between
windows, which could be assembled in real time by the user, dynamically
seeing the image pipeline results. Today's media streaming technologies
could be used to extend software image pipeline support much further,
especially in the area of dual-stereo window functionality.
Lesson Learned
The key thing we learned was the relationship between performance and
keeping the pipeline full. The integrated roam pipeline code was written
by none other than Loren Carpenter, the senior scientist of Pixar. Loren's
SIMD programming skills were legendary. He would calculate the minimum
number of hardware cycles needed to perform an operation on a block of
pixels, and add the cost to fill and flush the pipeline. He compared his
coded results with the theoretical minimum, and continued to bend his
solution until his approach met the theoretical optimum. This concept was
difficult to incorporate into SIMD compilers.
A lot of supporting technology is needed to feed a multiprocessor SIMD. We
worked with very early RAIDs, and added a hardware decompression
daughtercard to a high-speed, high-interface card, and optimized drivers
for high-speed writing to the OSM memory, which was shared by the four
Chaps. The G4 Sawtooth motherboard has ATA/66 for RAID I/O and high-speed
memory pathways. The value of these increases as multi-processor Velocity
Engines share memory and disk access. A full rack of spinning disks held
3-Gbytes of RAID, which cost more than today's 3-Tbyte RAIDs in about the
same footprint.
The G4 and the Pixar Image Computer share much in common. Keeping the
pipeline full can be difficult, but when you are able to do it
successfully, the results are nothing short of "Different." Pixar's
imaging applications were head and shoulders above all, but the custom
hardware, million-dollars-per-seat solutions. The Velocity Engine
accelerates applications ranging from Photoshop and Final Cut Pro to
real-time compression for videoconferencing applications over the Web. The
G4 is enabling the use of video as a desktop data type. Any traditional
supercomputer SIMD application, and there are a lot of them, can be made
to run well on the G4.
Oh yes, one more thing we learned: Even when the machine runs beautifully,
you've got to have a next-generation machine in the wings, or customers
won't buy into the architecture. The Hurricane PII-9 was the end of a
line, and we disassembled a fine team as Steve took Pixar out of the
hardware and imaging business to focus on the rendering business. But the
G4 Velocity Engine family is just beginning. While the early 500-MHz G4's
were indeed made of unobtanium, IBM is coming to the party with their
unmatched manufacturing capability. Even faster G4's will flow like a
river by next summer, with longer hardware pipelines. Dual-processor
Velocity Engines can be envisioned sitting in dual channel Sawtooth
motherboards clustered around RAID farms performing many diverse tasks,
possibly including computing Toy Story 3. But one thing is clear, the SIMD
architecture is back. May it live long and prosper.
------------------------------------------------------------------------
Glen Speckert has been involved with imaging and video for most of his
22-year career at LLNL, Pixar, TASC, and as an independent consultant. He
is also the author of the original interactive dog Frisbee training CD,
Dog, Disc, and Wind, which is now available in English and Japanese, can
be previewed at: http://www.DogDisc.com/
More information about the rescue
mailing list