[geeks] Cheap Dell Servers

Francois Dion francois.dion at gmail.com
Thu Feb 7 15:03:26 CST 2008


On Feb 7, 2008 2:52 PM, Sridhar Ayengar <ploopster at gmail.com> wrote:
> Francois Dion wrote:
> > is available. But even before that, my real number 1 reason was ZFS. I
> > think I've mentionned this before but I've lost a ton of vinyl records
> > I ripped to disk to silent corruption. I've also experienced silent
> > data corruption on Raid-5 at work. That's only once in 10 years (that
> I honestly haven't thought about which filesystem to use.  Does ZFS use
> log-structures?

ZFS is more than a filesystem, at a minimum it is a pooled storage
volume manager and a filesystem. It is a copy-on-write transactional
filesystem, so the on disk format is consistent since it is an object
based transaction (all or nothing) and no need for a journal.
Basically, you have an uber block pointing to a block of pointers, but
the uber block is not updated until both data and indirect blocks are
written. Checksums are in the uber and indirect blocks. This of course
makes it trivial to implement snapshots and clones (HUGELY powerful),
which zfs has. And the checksums allows detection of errors, and
correction as long as it can be done either through checksum or
mirrored block. Resilvering takes care of that.

You can pull the plug and come back up, no fsck etc. And you dont have
to deal with fragmentation, since all storage (including when you add
new disks) is part of 1 pool (although you dont have to, and I've
typically left the 2 boot disks as their own syspool and the rest as
datapool. There's also compression and encryption, NFS 4 ACLS

As far as raidz itself, all writes are full stripe and each lblock is
a stripe. It doesn't need nvram (raid-5 requires it). There's also a
raidz2 where you can suffer the loss of 2 disks and still be chugging
along.

And then there is the whole ease of use administration. It almost
obsoletes NAS specific distributions, IMHO. Beside CIFS and NFS, you
can also present zfs filesystems (pieces of a pool) as an iSCSI dev to
other systems. With dual gigabit, you might think the disk is really
local on your workstation...

There's also a DMU that reorg transaction order for better
performance, and if you got lots of ram with stuff like intelligent
prefetch and the like, it uses it to great effect. It is multipath
aware.

There are some limitations when dealing with mirrors, you can replace
a dead drive by one of the exact same capacity or bigger, but not
smaller, even if you are not using the whole capacity. You also cant
currently remove a drive from a raidz setup. For example, if you set
up a 5 disk raidz and decide later to go to a 4 disk raiz, you'll have
to move data elsewhere, recreate a raidz pool with only 4 drives etc,
etc. So if you want to attach drives temporarily to move stuff over,
you would do instead a zpool import as a new pool, not as a zpool
attach. Move stuff, then zpool export, because zpool detach only
degrades the raidz, not shrink it. This feature of shrinking pools
however is a planned 08 feature.

Francois



More information about the geeks mailing list