[geeks] Impressive...

Francois Dion francois.dion at gmail.com
Mon Mar 9 11:21:41 CDT 2009


On Sat, Mar 7, 2009 at 10:21 AM, velociraptor <velociraptor at gmail.com> wrote:
> On Fri, Mar 6, 2009 at 3:04 PM, velociraptor <velociraptor at gmail.com>
wrote:
>
>> I need to educate myself more, obviously.  I didn't realize you could
>> not grow the pools transparently.

If they had that, can you imagine their market share? Even the new Sun
FISHworks appliances dont do it.

Actually, I do want to mention something. You can, somewhat. You can
add another disk to an already created pool as spare, or add two
mirrored drives to a pool. You dont have to create a new pool per
pair, you can add. But, it doesn't redistribute the data across all
the drives in a striped / mirror way as if it had been created like
that from the start.

There is also a trick I've been using for the past few years though.
With zfs features like compression and endianness, if you move the
files it will do some of the restructuring. Let me explain:

Create a zpool, create a zfs with compression off. Copy files on it.
Then enable compression, move the files around between 2 zfs FS on the
same pool (both sharing the total disk space set by the zpool, then
you'd unmount the original FS and set a new mount point on the 2nd FS
to the original FS mountpoint), and you now have more free space as on
the new writes the data is now being compressed.

I quickly mention compression here:
http://solarisdesktop.blogspot.com/2007/02/stick-to-zfs-or-laptop-with-mirror
ed.html
(there are all kinds of hacks you can do with zfs that are not
officially supported or much documented)

Similarly, create a zpool on a x86 machine. Copy files to it. Then
export the zpool from the x86 machine to a sparc machine. Again, move
files around from one zfs FS to the other. The blocks are now in the
native endianness, so no translation is done on the fly anymore. I've
done that to move a large Oracle database from sparc to x86, and then
later taking that SATA disk on the x86 box and loading it on a sparc
box. Then shuffling files to get native endianness, and finally adding
a mirror after the fact.

This same approach should work to take advantage of all the drives in
the expanded zpool, but I'm not sure how to test this. It would be a
good question for the zfs list. This is in theory for that part, as
I've not tried this yet, and like I said, no easy way to test that
data is really striped across the board.

>> My original plan was to take two
>> new disks and stripe them, move the data to those, then use the
>> existing 3 disks to start the zpool, transfer the data to zfs, then
>> put the two new disks into the zpool and grow it.  This will have to
>> be revised, and I'm going to need to reconsider the disks I use for
>> the project.
>
> Looking at my data, I think I can still work this with the planned
> hard drives, though the data transfer may be a bit hairy since I'll be
> using a non-redundant zfs pool for the temporary storage.

Just to make sure, make sure you do a scrub on the new pool with the
data and it completes without error before destroying the original
data...

> One question I do have--and I'll search some more after I get some
> solid sleep (fscking on-call pager did that just the right interval to
> not really sleep thing this evening :-( ):  If I create a zpool over
> the top of a hw raid array, will zfs see the space when that raid
> device gets bigger?

No. This is easily tested with slices. You would have to add that
extra space as a new lun, not the same lun. I wouldn't use hardware
raid5 at all. Even if you did, you would need to have a minimum of 2
luns so you can mirror with zfs for zfs to be able to recover any
corruption (which the hardware raid wouldn't detect most of the time).
The nice feature of zfs is end to end data integrity.

Does your raid controller have battery backed up cache?

> I know it's not any more "space efficient" than mirroring two raidz
> zpools (if that is even possible).

The equivalent of raid 0+1 or raid 1+0 is possible. You could also
mirror with zfs two hardware raid5. Or raidz hardware mirrors. Or
mirror hardware stripes. etc.

> I see a couple of mentions of
> using raid controllers in jbod mode, and given what zfs does it's
> obvious why you'd do that.

Even SANs. You can use LUNs, but again you need to mirror them with
zfs or you could raidz 3+ LUNs.

> I'm just wondering about zfs over things
> like LUNs, which I don't see much talk about other than some
> operational "here's how you add a LUN to a zpool" kind of thing.
> Pointers to more "enterprise-y" info appreciated.

LUNs add complexity with no real benefit in real life.

BTW, you can test anything zfs with files instead of actual devs.

If I get a chance I'll try the dual fs trick and add more "disks"
(files) and see if I cant use dtrace or something to monitor which
"disks" are accessed for a given file. Since this pool would be quiet
except for my direct operations, zpool iostat -v should be at least
enough to give a definitely no or maybe. If definitely no, there's no
point in writing dtrace code for nothing.

Francois



More information about the geeks mailing list