[rescue] Sun 711

Thu May 2 14:34:41 CDT 2002

On Thu, 2 May 2002, David Passmore wrote:

> Look at the specs on RAMBUS. I dare you.

I have.  The bus specs do not take overhead into account, the STREAMs
numbers do (as they are "real world" tests).  I am unaware of other
benchmarks that supercede them in the bandwidth arena.  The fastest clock
in commodity PC RAM tech is 133MHz.  RAMBUS does sync triggering in the
middle of the clock pulses, most recently at 4:1.

The real reason that I considered memory bandwidth an issue was the
assumption that at least some parsing (even just extracting TIFF data) was
going to occur as part of the display process.  This is almost certain to
involve either a copy or a data alignment change per image, which drives
bandwidth requirements through the roof if the buffer busts the cache.
Without any processing, there is probably sufficient margin in a fast
system.  Add it in (a slight change in many peoples minds) and you are
pushing the limits.

> Since Shawn did not specify anything more specific than the fact that it
> must show up on the screen at 24fps, I would imagine you can take advantage
> of any kinds of extensions your framebuffer subsystem has to offer.

This is precisely why I asked the question.  Without knowledge of how the
display engine must function, it's impossible to tell if its behaviour
will impact performance.

I seem to remember an initial system requirement of x86/Windows, but that
may have fallen by the wayside.  The ideal situation would be outputting
to a buffer for the raster in one stroke.  Due to framesync requirements
this may or may not be possible.

Video arch has a potential to be very significant here.  Everyone has
focused on disk I/O, but the video system has to handle a comparable
amount of data in the same time, and has much tighter timing constraints.
The disk subsystem can be buffered ahead for several seconds, but the
framebuffer needs the bytes NOW.  Double buffering is critical here.

> I would think you would know better than to try and pull the 'i've done X
> therefore I must be correct' trump card on this list. Someone somewhere has
> done much larger implementations and they might be the person you're arguing
> with.

You misunderstand.  I do not claim knowledge above and beyond any other
list member, merely actual experience in this area.

> I fail to see how the lack of DVMA will significantly hamper the performance
> of this task, perhaps you can enlighten me. The amount of CPU intervention
> needed to initiate these DMA transfers (to/from PCI and AGP) I would think
> to be minimal. What do you mean by DMA transfers 'locking everything up'?

You are moving data around, but you have to move it around in such a way
that you meet a synchronous deadline.  If you have very large DMA
transfers, you will tax the I/O interconnects to the point that you may
not be able to service a display update in time.  Once your DMA starts,
you have committed yourself.  Without DVMA you are doing v/p translation
in software at best (requiring contiguous mappings on both sides), or
buffer bouncing at worst.  Bounce that buffer and your bandwidth reqs
double instantly.

With every DMA block, you are guaranteed at least 4 context switches, each
of which will dump cache (more crucial as RAM load is high), require a
register load/save, spill the pipeline, and so on.  No, you won't bring
the system down just due to interrupt load, but you will induce a sizeable
high priority load potentially capable of throwing off your display
routines.  I would spec 2 CPUs for safety margin.

> As for the analogue to large database servers (I assume you're talking about
> OLTP-type applications), the dataset is very different and so the movement
> of that data within the system is different. We're talking about very large
> sequential transfers.

Issue here isn't where the transfers come from, it's the loading of the
busses that bring the data in.  I have to admit, I was thinking more along
the lines of engines like Texis than your typical transactional DB.

> Sure. But your typical mainframe doesn't have a framebuffer, either. :)

I only wish I had my AT/370 up so I could offer a counterexample.  :)

Totally off the wall solution: use a cyclops style video combiner ala SGI.
Use 2 PCs, each running at 12fps.  Bandwidth reqs for both are half of a
single machine, and almost certainly doable with cheap commodity hardware.
Even factoring in the cost of the video box this solution is probably
cheaper.  Thoughts?

-James