[geeks] Understanding ZFS performance (maybe)

Patrick Giagnocavo patrick at zill.net
Sun Aug 9 16:56:08 CDT 2009


OK, I have been doing some testing with Solaris ZFS, NFS, etc.

Comments, suggestions, etc. welcome ...

Setup:

snv_118 (to get the latest COMSTAR fixes)

Fileserver: dual 2.8Ghz Xeons, 2x 1TB 7200 rpm Seagate drive for data
(OS on different disk)

VMWare box: VMware ESXi running on DL360G6 (dual Xeon 5550's, 12GB RAM)

iSCSI::
once I updated to snv_118, all the iSCSI stuff works great, even after a
 reboot of the fileserver, the VMs didn't miss a beat after the machine
came up again.

The problem with iSCSI is that the VMFS file system is proprietary and I
worry that if something goes sideways on it I will be screwed in terms
of recovering the data.

iSCSI advantages:  very fast on both read and writes, lower CPU usage on
the server (because of less overhead).

NFS::
So, I tried to use NFS mounts (mount via NFS inside VMware, then store
the VMDKs on NFS).

Here is the problem with NFS write speeds - they are very slow because
VMware issues a COMMIT (fsync) after each block.  NFS reads are fast,
even faster than iSCSI.

Writes are about 5MB/second, which is painfully slow and especially
apparent when the CPUs are fast.

I can't use zil_disable because this is a production server and it is
unsafe in any case.

First I broke the 1TB mirror, leaving one drive for data, and turned the
other 1TB drive into a log drive (e.g. zfs create vmshare log c7t2d0s0).
 This does work and does improve writes by 15% or more, at the expense
of reducing read speeds (less spindles to pull data off of).

However going from 5MB/s peak to maybe 8MB/s peak is still not enough;
after all, this is per-host peak speeds, so if more than 1 VM starts
writing files at the same time performance will really be degraded.

Now I know why NetApp is pushing VMware VMs being stored on NFS - they
are one of the systems that return immediately from a COMMIT because
their systems use NVRAM in some fashion to guarantee stable storage of
NFS writes - they must know that companies that try to use regular
fileservers are going to bump into this problem.

Plan::

I went on Amazon.com and ordered a Gigabyte GC-RAMDISK for $99.

This is a PCI card with 4 slots for DDR RAM (up to 4GB RAM total), a
rechargeable LiOn  battery, and a SATA port - in short it looks like a
SATA drive but uses RAM instead of disk.

Since I already have some DDR RAM it ends up cheaper than a more
expensive SATA SLC SSD.  I have at least 3x256MB sticks, for a total of
768MB RAM.

I will re-attach the second 1TB drive to the pool to once again have a
mirror, then set up the RAMDISK as the log device.

Learned:

While setting this all up, I learned that SSDs or other fast drives can
be used to accelerate ZFS in two different ways:

ZFS log devices - are used to increase WRITE speeds - you don't need a
lot to make a difference, maximum size needed under 1GB in many cases
and even in a large server only 4 or 8 GB might be needed.  You can use
any fast disk (like if I had a 15K SCSI disk), not just an SSD.

ZFS cache devices - are used to increase READ speeds - the more the
better; right now I am not investigating this too much because my read
speeds are not the problem.

BTW many of the OpenSolaris man pages are outdated in warning about log
devices not being able to be removed - anything after about snv_90ish
can in fact have a log device fail or be removed - logging will revert
to the drives in the rest of the pool; since ZFS cache devices are just
to speed up reads, they also can fail without incident.

Hope this helps someone else setting up a ZFS or NFS fileserver.

--Patrick



More information about the geeks mailing list