[geeks] Well, THAT was a setback

Tue Jan 19 20:29:56 CST 2010

>>> I hate applying band-aids - especially complex, labor-intensive
>>> band-aids - that don't actually address the underlying problem.
>> Ah, but what _is_ the underlying problem?  That's a serious
>> question.  Decide what you want to do before you try to do it.  If
>> your goal is to stop shoeshining, for example, then this does fine.
> Well, "stop shoeshining" is a part of it, but the shoeshining is
> really a symptom.  The real goal is "move data to the drive as close
> as possible to its maximum sustained data rate, so that full backups
> complete as fast as possible."

Well, I suspect you don't actually mean all those "as XYZ as possible";
they probably should be something more like "as XYZ as possible while
spending less than $MONEY and taking less than HOURS of staff time".
Otherwise, you'll end up with "solutions" like using a single shared
SCSI bus for everything so the disks can send their data directly to
the tape drive - it'll work, but so much hackery will be needed (to,
for example, teach the hosts' OSes to coexist on a shared SCSI bus)
that you'd actually be better off with something else.

It is, however, a good point that if the drive streams 40MB/sec, you
need more than 100Mb/sec incoming bandwidth to keep it busy.  Four
100Mb interfaces might do, if they're shared in sufficiently clever
ways, but unless there's some reason to avoid a gigabit-capable switch,
you'd probably be better off with gigabit.

Well, assuming your backup host's I/O subsystem can handle it.
(Hmm, 40MB/sec to the tape, plus 40MB/sec incoming, plus network
overhead - you'll need something like 100MB backplane throughput.)

But that's not true unless you have to stream the data to the tape at
the same time as you receive it from the network.  If you can buffer
enough that you don't need to write to tape in parallel with network
reception, there might be no need for new networking infrastructure at
all.  (Disks that can hold some two or three times your tape's capacity
are down in the $100 range these days.  Reliability is poor, as disks
go, but for backup holding area, reliability is less important than for
many other applications.)

If your metric is "time from first dumper start to tape popping out of
the drive", this sounds as though you can't do that.  But can you?  If
it's really more like "length of nightly cronjob run", then you could
perhaps buffer on disk for one night, so that while tonight's dumps are
dumping to disk, last night's dumps are being copied from disk to tape.
This lets you overlap dumping and writing just about perfectly - at the
price of one day's delay between dump run and bits being on a real
tape.  Endless options, depending on exactly what you care about.

Another aspect: what does it mean for backups to "complete"?  That is,
what exactly is the time period you're measuring when you talk about
backups completing fast (or not)?  For some environments, spending
three hours ferociously dumping to holding disk, followed by thirty
hours sorting it all out and writing it to tape, counts as three hours.
In others, that would be thirty-three hours.  In others, it would be
thirty hours.  And the case mentioned above, where you buffer for a
day, would add a day by some people's measurements but not by others'.
Each one pushes things in different directions when you try to optimize
time spent.  And there are some where, rather than a big burst of data
at backup time, it's better to trickle changes over as they occur and
then do the writing to tape from snapshots or moral equivalent - by
some measuring rules, that would mean that each week's backups take a
full week.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse at rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B