[geeks] Growing a ZFS pool, a workaround (was Re: Impressive...)

Francois Dion francois.dion at gmail.com
Mon Mar 9 12:14:11 CDT 2009


On Mon, Mar 9, 2009 at 12:21 PM, Francois Dion <francois.dion at gmail.com>
wrote:
> On Sat, Mar 7, 2009 at 10:21 AM, velociraptor <velociraptor at gmail.com>
wrote:
>> On Fri, Mar 6, 2009 at 3:04 PM, velociraptor <velociraptor at gmail.com>
wrote:
>>
>>> I need to educate myself more, obviously.  I didn't realize you could
>>> not grow the pools transparently.
>
> If they had that, can you imagine their market share? Even the new Sun
> FISHworks appliances dont do it.
>
> Actually, I do want to mention something. You can, somewhat. You can
> add another disk to an already created pool as spare, or add two
> mirrored drives to a pool. You dont have to create a new pool per
> pair, you can add. But, it doesn't redistribute the data across all
> the drives in a striped / mirror way as if it had been created like
> that from the start.
>
> There is also a trick I've been using for the past few years though.
> With zfs features like compression and endianness, if you move the
> files it will do some of the restructuring. Let me explain:
>
> Create a zpool, create a zfs with compression off. Copy files on it.
> Then enable compression, move the files around between 2 zfs FS on the
> same pool (both sharing the total disk space set by the zpool, then
> you'd unmount the original FS and set a new mount point on the 2nd FS
> to the original FS mountpoint), and you now have more free space as on
> the new writes the data is now being compressed.
>
> I quickly mention compression here:
>
http://solarisdesktop.blogspot.com/2007/02/stick-to-zfs-or-laptop-with-mirror
ed.html
> (there are all kinds of hacks you can do with zfs that are not
> officially supported or much documented)
>
> Similarly, create a zpool on a x86 machine. Copy files to it. Then
> export the zpool from the x86 machine to a sparc machine. Again, move
> files around from one zfs FS to the other. The blocks are now in the
> native endianness, so no translation is done on the fly anymore. I've
> done that to move a large Oracle database from sparc to x86, and then
> later taking that SATA disk on the x86 box and loading it on a sparc
> box. Then shuffling files to get native endianness, and finally adding
> a mirror after the fact.
>
> This same approach should work to take advantage of all the drives in
> the expanded zpool, but I'm not sure how to test this. It would be a
> good question for the zfs list. This is in theory for that part, as
> I've not tried this yet, and like I said, no easy way to test that
> data is really striped across the board.
>
>>> My original plan was to take two
>>> new disks and stripe them, move the data to those, then use the
>>> existing 3 disks to start the zpool, transfer the data to zfs, then
>>> put the two new disks into the zpool and grow it.  This will have to
>>> be revised, and I'm going to need to reconsider the disks I use for
>>> the project.
>>
>> Looking at my data, I think I can still work this with the planned
>> hard drives, though the data transfer may be a bit hairy since I'll be
>> using a non-redundant zfs pool for the temporary storage.
>
> Just to make sure, make sure you do a scrub on the new pool with the
> data and it completes without error before destroying the original
> data...
>
>> One question I do have--and I'll search some more after I get some
>> solid sleep (fscking on-call pager did that just the right interval to
>> not really sleep thing this evening :-( ):  If I create a zpool over
>> the top of a hw raid array, will zfs see the space when that raid
>> device gets bigger?
>
> No. This is easily tested with slices. You would have to add that
> extra space as a new lun, not the same lun. I wouldn't use hardware
> raid5 at all. Even if you did, you would need to have a minimum of 2
> luns so you can mirror with zfs for zfs to be able to recover any
> corruption (which the hardware raid wouldn't detect most of the time).
> The nice feature of zfs is end to end data integrity.
>
> Does your raid controller have battery backed up cache?
>
>> I know it's not any more "space efficient" than mirroring two raidz
>> zpools (if that is even possible).
>
> The equivalent of raid 0+1 or raid 1+0 is possible. You could also
> mirror with zfs two hardware raid5. Or raidz hardware mirrors. Or
> mirror hardware stripes. etc.
>
>> I see a couple of mentions of
>> using raid controllers in jbod mode, and given what zfs does it's
>> obvious why you'd do that.
>
> Even SANs. You can use LUNs, but again you need to mirror them with
> zfs or you could raidz 3+ LUNs.
>
>> I'm just wondering about zfs over things
>> like LUNs, which I don't see much talk about other than some
>> operational "here's how you add a LUN to a zpool" kind of thing.
>> Pointers to more "enterprise-y" info appreciated.
>
> LUNs add complexity with no real benefit in real life.
>
> BTW, you can test anything zfs with files instead of actual devs.
>
> If I get a chance I'll try the dual fs trick and add more "disks"
> (files) and see if I cant use dtrace or something to monitor which
> "disks" are accessed for a given file. Since this pool would be quiet
> except for my direct operations, zpool iostat -v should be at least
> enough to give a definitely no or maybe. If definitely no, there's no
> point in writing dtrace code for nothing.
>
> Francois

I edited the subject since we are going a bit on a tangent...

So, as to the compression/endian trick applied to growing a pool with
another mirror, It does appears to work, viz:

root at wel:/ # mkdir zfs_test
root at wel:/ # cd zfs_test
root at wel:/zfs_test # mkfile 128M disk1
root at wel:/zfs_test # mkfile 128M disk2
root at wel:/zfs_test # mkfile 128M disk3
root at wel:/zfs_test # mkfile 128M disk4
root at wel:/zfs_test # zpool create mypool mirror /zfs_test/disk1
/zfs_test/disk2
root at wel:/zfs_test # zfs create mypool/fs1
root at wel:/zfs_test # ls /mypool/
fs1
root at wel:/zfs_test # cp /usb/mp3/f/Francois\ Dion/IDM-4011\ -\ Test\
Tones\ I/* /mypool/fs1
root at wel:/zfs_test # zpool add mypool mirror /zfs_test/disk3
/zfs_test/disk4root at wel2500:/zfs_test # zfs create mypool/fs2
root at wel:/zfs_test # cd /mypool/fs1
root at wel:/mypool/fs1 # ls
00 - IDM-4011 - Francois Dion - Test Tones I.m3u
00 - Test Tones I.m3u
01 - 2600.mp3
02 - Nynex.mp3
03 - KP.mp3
04 - override.mp3
05 - ST.mp3
06 - Gibson.mp3
07 - DCC.mp3
08 - phrack.mp3
09 - 0062.mp3
Readme.rtf
TestToneI.cl5
http---www.cimastudios.com-fdion-.URL
test_tones.png

In another window I'm doing zpool iostat -v mypool 5:


                        capacity     operations    bandwidth
pool                  used  avail   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
mypool               33.8M   212M      0      0      0      0
  mirror             33.7M  89.3M      0      0      0      0
    /zfs_test/disk1      -      -      0      0      0      0
    /zfs_test/disk2      -      -      0      0      0      0
  mirror               54K   123M      0      0      0      0
    /zfs_test/disk3      -      -      0      0      0      0
    /zfs_test/disk4      -      -      0      0      0      0
-------------------  -----  -----  -----  -----  -----  -----

                        capacity     operations    bandwidth
pool                  used  avail   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
mypool               33.8M   212M     31     84  3.77M  4.34M
  mirror             33.7M  89.3M     31     47  3.77M  2.06M
    /zfs_test/disk1      -      -     19     24  2.43M  2.06M
    /zfs_test/disk2      -      -     11     23  1.35M  2.06M
  mirror               54K   123M      0     36      0  2.28M
    /zfs_test/disk3      -      -      0     23      0  2.28M
    /zfs_test/disk4      -      -      0     23      0  2.28M
-------------------  -----  -----  -----  -----  -----  -----

                        capacity     operations    bandwidth
pool                  used  avail   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
mypool               41.5M   205M      2     66   167K  2.31M
  mirror             30.0M  93.0M      2     25   167K  1019K
    /zfs_test/disk1      -      -      0     14  77.5K  1020K
    /zfs_test/disk2      -      -      1     13  89.6K  1020K
  mirror             11.4M   112M      0     41      0  1.31M
    /zfs_test/disk3      -      -      0     20      0  1.31M
    /zfs_test/disk4      -      -      0     19      0  1.31M
-------------------  -----  -----  -----  -----  -----  -----

                        capacity     operations    bandwidth
pool                  used  avail   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
mypool               33.9M   212M      0      0      0      0
  mirror             16.0M   107M      0      0      0      0
    /zfs_test/disk1      -      -      0      0      0    817
    /zfs_test/disk2      -      -      0      0      0    817
  mirror             17.9M   105M      0      0      0      0
    /zfs_test/disk3      -      -      0      0      0    817
    /zfs_test/disk4      -      -      0      0      0    817
-------------------  -----  -----  -----  -----  -----  -----

                        capacity     operations    bandwidth
pool                  used  avail   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
mypool               33.9M   212M      0      0      0      0
  mirror             16.0M   107M      0      0      0      0
    /zfs_test/disk1      -      -      0      0      0      0
    /zfs_test/disk2      -      -      0      0      0      0
  mirror             17.9M   105M      0      0      0      0
    /zfs_test/disk3      -      -      0      0      0      0
    /zfs_test/disk4      -      -      0      0      0      0
-------------------  -----  -----  -----  -----  -----  -----


Then I did:
root at wel:/mypool/fs1 # mv * ../fs2
root at wel:/mypool/fs1 # cd ../fs2

waited a bit more and did:
root at wel:/mypool/fs2 # mplayer 02\ -\ Nynex.mp3

while running zpool iostat again:

                       capacity     operations    bandwidth
pool                  used  avail   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
mypool               33.8M   212M      0      0  51.1K      0
  mirror             15.4M   108M      0      0  25.5K      0
    /zfs_test/disk1      -      -      0      0      0      0
    /zfs_test/disk2      -      -      0      0  25.5K      0
  mirror             18.5M   105M      0      0  25.5K      0
    /zfs_test/disk3      -      -      0      0      0      0
    /zfs_test/disk4      -      -      0      0  25.5K      0
-------------------  -----  -----  -----  -----  -----  -----

                        capacity     operations    bandwidth
pool                  used  avail   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
mypool               33.8M   212M      0      0  76.6K      0
  mirror             15.4M   108M      0      0  51.1K      0
    /zfs_test/disk1      -      -      0      0      0      0
    /zfs_test/disk2      -      -      0      0  51.1K      0
  mirror             18.5M   105M      0      0  25.5K      0
    /zfs_test/disk3      -      -      0      0  25.5K      0
    /zfs_test/disk4      -      -      0      0      0      0
-------------------  -----  -----  -----  -----  -----  -----

                        capacity     operations    bandwidth
pool                  used  avail   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
mypool               33.8M   212M      0      0  76.6K      0
  mirror             15.4M   108M      0      0      0      0
    /zfs_test/disk1      -      -      0      0      0      0
    /zfs_test/disk2      -      -      0      0      0      0
  mirror             18.5M   105M      0      0  76.6K      0
    /zfs_test/disk3      -      -      0      0  76.6K      0
    /zfs_test/disk4      -      -      0      0      0      0
-------------------  -----  -----  -----  -----  -----  -----

                        capacity     operations    bandwidth
pool                  used  avail   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
mypool               33.8M   212M      1      0   204K      0
  mirror             15.4M   108M      0      0  76.6K      0
    /zfs_test/disk1      -      -      0      0      0      0
    /zfs_test/disk2      -      -      0      0  76.6K      0
  mirror             18.5M   105M      0      0   128K      0
    /zfs_test/disk3      -      -      0      0   128K      0
    /zfs_test/disk4      -      -      0      0      0      0
-------------------  -----  -----  -----  -----  -----  -----

                        capacity     operations    bandwidth
pool                  used  avail   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
mypool               33.8M   212M      1      0   255K      0
  mirror             15.4M   108M      0      0   128K      0
    /zfs_test/disk1      -      -      0      0      0      0
    /zfs_test/disk2      -      -      0      0   128K      0
  mirror             18.5M   105M      0      0   128K      0
    /zfs_test/disk3      -      -      0      0   128K      0
    /zfs_test/disk4      -      -      0      0      0      0
-------------------  -----  -----  -----  -----  -----  -----


Now, this is just with mirrors. With raidz, if you try to add a disk,
you'll have a pool with 1 raidz1 and 1 single disk, so your pool is
not redundant. You would really need to add mirror disk4 disk5. Let me
try this.

Yep, you get on a mv from fs1 to fs2:

                        capacity     operations    bandwidth
pool                  used  avail   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
mypool               50.7M   442M     25     77  3.08M  4.36M
  raidz1             50.6M   319M     25     44  3.08M  2.16M
    /zfs_test/disk1      -      -     16     17  1.03M  1.08M
    /zfs_test/disk2      -      -     15     16  1000K  1.08M
    /zfs_test/disk3      -      -     17     16  1.09M  1.08M
  mirror               60K   123M      0     33      0  2.20M
    /zfs_test/disk4      -      -      0     22      0  2.20M
    /zfs_test/disk5      -      -      0     22      0  2.20M
-------------------  -----  -----  -----  -----  -----  -----

reads from disk1,2,3 and writes to disks 1,2,3,4,5.

and using mplayer, i see:

                        capacity     operations    bandwidth
pool                  used  avail   read  write   read  write
-------------------  -----  -----  -----  -----  -----  -----
mypool               42.5M   451M      1      0   204K      0
  raidz1             25.8M   344M      0      0   102K      0
    /zfs_test/disk1      -      -      0      0  51.1K      0
    /zfs_test/disk2      -      -      0      0      0      0
    /zfs_test/disk3      -      -      0      0  51.1K      0
  mirror             16.7M   106M      0      0   102K      0
    /zfs_test/disk4      -      -      0      0   102K      0
    /zfs_test/disk5      -      -      0      0      0      0
-------------------  -----  -----  -----  -----  -----  -----

So that would work, it appears. This is just a bit of a pain to do.
And of course dont wait until your raidz1 is near full to try this.

Final point, a warning: I think I've mentionned before but you cannot
shrink a pool. Whereas you can use my trick to get around the growing
the pool problem, there is absolutely no way to move the data off of a
specific disk to remove it from the pool. Once it is added you can
only replace it with a spare.

Francois



More information about the geeks mailing list