FreeBSD and Multipath

I didn’t find any blog posts of discussions on FreeBSD and multipath (for storage) that wasn’t a man page.

That means it is up to me to write about it :)

Hardware

CPU

Machine class:	amd64
CPU Model:	Intel(R) Xeon(R) CPU           E5530  @ 2.40GHz
No. of Cores:	16

Memory

Total real memory available:	65511 MB
Logically used memory:		3945 MB
Logically available memory:	61565 MB

Storage

The storage is a large ~90TB Enterprise class Fibre Channel array, a Data Direct Networks S2A9900. Connected to that are two, dual port QLogic 2532 8Gb HBA’s. We also have two SSD drives (configured as a RAID1 device) for the ZFS Intent Log.

The storage array was configured from 120 1TB, 7200RPM Hitachi drives. It has 12 volumes in total, composed of 10 of the SATA drives (1 parity, 1 Spare), or ~7TB.

The S2N9900 has two controllers, one controller is responsible for LUN’s 1-6, the other controller is responsible for LUN’s 7-12. every LUN is presented to all four Fibre Channel ports. This got a little messy, trying to sort out 48 raw disk devices takes some patience and a decent attention span…

yeah, I did make a few typo’s here and there, thankfully creating and clearing disk labels is easy.

# camcontrol devlist|grep lun\ 0
                at scbus0 target 0 lun 0 (pass0,da0)
                at scbus1 target 0 lun 0 (pass6,da6)
                at scbus4 target 0 lun 0 (pass24,da24)
                at scbus5 target 0 lun 0 (pass30,da30)
# camcontrol inquiry da0 -S
108EA1B10001
# camcontrol inquiry da6 -S
108EA1B10001
# camcontrol inquiry da24 -S
108EA1B10001
# camcontrol inquiry da30 -S
108EA1B10001
# gmultipath label -v DDN-v00 /dev/da0 /dev/da6 /dev/da24 /dev/da30
Done.
# gmultipath status
             Name  Status  Components
multipath/DDN-v00     N/A  da0
                           da6
                           da24
                           da30

Now, to do that 12 more times…

Whew, hard work!

Now, to create a simple ZFS volume across all 12 luns:

# zpool create zfs multipath/DDN-v00 multipath/DDN-v01 multipath/DDN-v02 multipath/DDN-v03 multipath/DDN-v04 multipath/DDN-v05 multipath/DDN-v06 multipath/DDN-v07 multipath/DDN-v08 multipath/DDN-v09 multipath/DDN-v10 multipath/DDN-v11 log mfid1

# zpool status
  pool: zfs
 state: ONLINE
 scrub: none requested
config:

	NAME                 STATE     READ WRITE CKSUM
	zfs                   ONLINE       0     0     0
	  multipath/DDN-v00  ONLINE       0     0     0
	  multipath/DDN-v01  ONLINE       0     0     0
	  multipath/DDN-v02  ONLINE       0     0     0
	  multipath/DDN-v03  ONLINE       0     0     0
	  multipath/DDN-v04  ONLINE       0     0     0
	  multipath/DDN-v05  ONLINE       0     0     0
	  multipath/DDN-v06  ONLINE       0     0     0
	  multipath/DDN-v07  ONLINE       0     0     0
	  multipath/DDN-v08  ONLINE       0     0     0
	  multipath/DDN-v09  ONLINE       0     0     0
	  multipath/DDN-v10  ONLINE       0     0     0
	  multipath/DDN-v11  ONLINE       0     0     0
	logs                 ONLINE       0     0     0
	  mfid1              ONLINE       0     0     0

errors: No known data errors

Results

These results wre obtained from two similar servers. The other server is using a Winchester Systems Storage array, and has 24GB of system memory. The Winchester Storage is ~40TB of 2TB SATA disks:

Another RAID array, just for a comparison

I used IOZone for the test (iozone -a). The default iozone test is using 64k files to 512MB files, and since I’m trying to see how the server might actually react to the real worl, I’m okay with this (ie, I fully understand that a LOT of caching is taking place, and I want that for right now).

Forward Re-Write

Forward Re-Read

Forward Read

Forward Read


Backwards Read


Random Read


Re-Read


Rec? Re-Write

Write


Strided Read

The S2N9900 is a pretty nice device. Although you have to use TELNET (yeesh, couldn’t they spend a few more bucks on a small ARM processor and use ssh?), the controllers have a decent command line environment with HELP pages. What is also nice is the company provides the documentation for their products for free, and no registration is required. Good Job!

As far as raw read and write speeds, that is hard to nail down. I’ve been using IOZone, and when I run that, and take a look at ‘zpool iostat 1′, the ZFS Pool stays at a constant 200MB/sec for writes. I’ve seen in pop up higher, like 250MB to 500MB, but 200 seems to be the ceiling. I’ve done with and without a dedicated log device, with and without gmultipath, and finally, using the SSD RAID1 as a L2ARC cache device. All results are nearly identical. Reads are pretty crazy though, with 64GB of system memory, reading a file is nearly 1GB/sec.

Samba 3.0.28a vs 3.3.3 on FreeBSD 7.1

!!! UPDATE on 12/29/2009!!!
Since this blog post seems to get a good amount or hits from google, if you are reading this, please see my updated post: http://www.mywushublog.com/2009/12/freebsd-8-0-a-great-nas-server/ which has some additional information about FreeBSD 8.0
EOF

Lately at work, I’ve been involved with a very large file system that is being export from Solaris 10/ZFS to windows and OS X users via Samba. Even with a very large Sun server (T5220) a lot of users are complaining about the slow performance of the system. I’m not going to go into details, but what it has prompted me to do is to look into what I use at home (FreeBSD + Samba) for my network storage needs, and see if I can improve the performance of it.

Well, I was checking out the very useful Whats cooking for FreeBSD 8, unrelated to my Samba needs, when I noticed this post on Ivan Voras’ blog. Ivan has some really cool information there, and with this new knowledge I began my update from running Samba 3.0.28a,2 to 3.3.3,0 (the comma represents the Ports version).

Hvala Ivan!

I’ve discussed my new FreeBSD environment before here, with that, here are some quick details:

  • OS: FreeBSD 7.1
  • Intel Core 2 Duo E6750 (2.66Mhz 4MB cache)
  • Intel S975XBX2 workstation motherboard
  • AMCC 3Ware 9650SE 4 port SATA RAID controller (4x PCI-e)
    • BatterY backup for the 3Ware so I can enable cached writes
  • 2GB ECC Crucial Memory Kit
  • ASUS EN6200 LE 16x PCI-e nVidia GFX card
  • 4 Western Digital 1TB Drives

The 4 1TB drives create a nice 2.6TB (RAID5) array which I used Samba to share out to my 4 other systems in the house (which is a mix of Windows XP and Vista, sorry, no OS X). I do a lot with this array, any work that Michele and I do like photo editing, word documents, media files, etc.. all gets saved to this volume. Needless to say, this volume is accessed A LOT, and if the house ever caught on fire, I’d save the server before the family (Hey, I could still look at their pictures and videos…).

The Old Configuration

samba 3.0.28a:

[root@server samba3]> make showconfig
===> The following configuration options are available for samba-3.0.34,1:
     LDAP=on "With LDAP support"
     ADS=on "With Active Directory support"
     CUPS=on "With CUPS printing support"
     WINBIND=on "With WinBIND support"
     ACL_SUPPORT=on "With ACL support
     AIO_SUPPORT=on "With Asyncronous IO support"
     FAM_SUPPORT=on "With File Alteration Monitor"
     SYSLOG=on "With Syslog support"
     QUOTAS=off "With Disk quota support"
     UTMP=on "With UTMP accounting support"
     PAM_SMBPASS=off "With PAM authentication vs passdb backends"
     CLUSTER=off "With experimental cluster support"
     DNSUPDATE=off "With dynamic DNS update(require ADS)"
     EXP_MODULES=off "With experimental modules"
     POPT=on "With system-wide POPT library"
     PCH=on "With precompiled headers optimization"
     MAX_DEBUG=off "With maximum debugging"
     SMBTORTURE=off "With smbtorture"
===> Use 'make config' to modify these settings

The New Configuration

samba 3.3 config:

[root@server samba33]> make showconfig
===> The following configuration options are available for samba-3.3.3:
     LDAP=on "With LDAP support"
     ADS=off "With Active Directory support"
     CUPS=on "With CUPS printing support"
     WINBIND=on "With WinBIND support"
     SWAT=off "With SWAT WebGUI"
     ACL_SUPPORT=on "With ACL support"
     AIO_SUPPORT=on "With Asyncronous IO support"
     FAM_SUPPORT=off "With File Alteration Monitor"
     SYSLOG=off "With Syslog support"
     QUOTAS=off "With Disk quota support"
     UTMP=off "With UTMP accounting support"
     PAM_SMBPASS=on "With PAM authentication vs passdb backends"
     DNSUPDATE=off "With dynamic DNS update(require ADS)"
     DNSSD=off "With DNS service discovery support"
     EXP_MODULES=off "With experimental modules"
     SHARED_LIBS=off "With shared libraries"
     POPT=on "With system-wide POPT library"
     MAX_DEBUG=off "With maximum debugging"
     SMBTORTURE=off "With smbtorture"
===> Use 'make config' to modify these settings

If you have never dealt with FreeBSD’s Ports system, setting these compile time options is a breeze with ‘make config’. These options are presented in a nurses interface (which is great for ssh or other terminal based sessions) like this:

FreeBSD's make config screen

FreeBSD's make config screen

The noticable options that I chose are AIO=yes and ADS=off. I’ve always compiled ADS support thinking that I would also get around to configuring Samba as an Active Directory-like server. But you know, there are really only two active users here, myself and Michele, and I don’t see the benefit right now. It could also slow down samba with the extra system calls so again, I’m leaving it out. AIO, Asyncronous IO, is new, and it is reported to increase size (oh wait, wrong advertisement) I mean, network IO.

smb.conf options

The performance related options in smb.conf are here:

        socket options=SO_RCVBUF=131072 SO_SNDBUF=131072 TCP_NODELAY IPTOS_LOWDELAY
        min receivefile size=16384
        aio read size = 16384
        aio write size = 16384
        aio write behind = true

Again, I got these from Ivan’s post, plus what I’ve used for the past 8 years.

Benchmarking with IOzone

I ran a simple

iozone -Ra -b samba3.0.28.xls

for both versions of samba. I’d create some nice 3d charts but I really don’t know excel well enough. So, I’ll just link these here and you can see for yourself what the substantial differences are. Some quick glances while the tests ran, I saw around 50MB/sec for Samba 3.0.28 here and there, topping out at ~60MB/sec. This was only with 256MB and above files. Smaller files always stayed around 5-10MB/sec.

Samba 3.3.3 – with AIO enabled started up fine, but iozone crashed after wrting a few small 64K bytes. This was a little disappointing, however, I did continue the benchmark with the new send and recieve sizes.

UPDATE:
After doing a little more reading, I found out I was supposed to load the aio kernel module. After running:

$ kldload aio

I restarted Samba with the AIO options enabled, re-ran iozone, and it all worked.

ufs2-3ware-raid5-freebsd71

samba3028

samba333