I need to share this. When I google for “Samba performance”, I never see real numbers, real configuration files, or real hardware environments. All I read are anecdotal recollections, and that is not good enough. I like numbers, and I’ll let the numbers speak for themselves:
> netstat -I em0 -w 1 input (em0) output packets errs bytes packets errs bytes colls 90166 0 98762637 95363 0 5332847 0 18131 0 24713156 20042 0 1123684 0 4 0 310 1 0 178 0 8 0 518 1 0 178 0 10153 0 10952920 10696 0 598129 0 92990 0 102837002 98476 0 5514994 0 92025 0 102680574 97277 0 5439496 0 92080 0 101799874 97403 0 5448637 0 75348 0 90861608 80972 0 4537737 0 90895 0 100323946 95781 0 5360948 0 89313 0 97371154 94364 0 5278618 0 81363 0 89229738 85861 0 4803589 0 2 0 126 3 0 286 0
I was so shocked that I had to use gstat and zpool iostat to verify the information:
dT: 1.002s w: 1.000s filter: da0 L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 35 1476 0 0 0.0 1476 188421 23.7 100.0| da0 > zpool iostat 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- tank 5.68T 4.32T 1 81 250K 10.1M tank 5.68T 4.32T 0 1.37K 0 175M tank 5.68T 4.32T 0 1.44K 0 184M tank 5.68T 4.32T 0 1.44K 0 184M tank 5.68T 4.32T 0 1.44K 0 184M tank 5.68T 4.32T 0 1.44K 0 184M tank 5.68T 4.32T 0 1.44K 0 184M tank 5.68T 4.32T 0 1.44K 0 184M
This is all through Samba (3.3.9), There was no local work being done. I unfortunately didn’t configure MRTG correctly, so it had built a malformed graph while all this happened. Having a picture from all of this would have been nice.
The underlying storage is a SATABoy2 RAID6 array, with a simple “flat” ZFS filesystem (version 13). As cheap as the SATABoy’s are (and come on, they have a terrible IIS web interface), they can at least keep up with the current load.
I have felt that if you are going to use ZFS, you should let it manage the RAID, and not bother with a hardware RAID controller. While the hardware RAID may be faster, ZFS’s ability to self-correct bad blocks is a great feature despite the performance set back. However, RAID6 is pretty good in itself, and having dual parity would ideally reduce the risk of a bad block being detrimental.
One thing I noticed with Samba is it doesn’t seem to be a threaded daemon. When I do a top(1) -H, there are only 2-3 smbd processes, and one of them is running around 30%. Though I don’t really know how well Samba can scale out, this environment only has about 10 users. I would like to see how samba reacts if there are a couple hundred active users. Furthermore, how does a native Windows server handle a couple hundred users? It may handle it a little better, however, I don’t think I would enjoy watching NTFS handling a multi-terabyte volume… it would be like watching a stroke victim eat a bowl of soup. I do admit I am biased and I have no working experience with Windows as a large file server, most of them that I have worked on are horribly limited and underpowered, and no one seems to care if they perform well or not.
Machine class: amd64 CPU Model: Dual Core AMD Opteron(tm) Processor 285 No. of Cores: 4 Cores per CPU:
Memory information from dmidecode(8) Maximum Capacity: 8 GB Number Of Devices: 4 Maximum Capacity: 8 GB Number Of Devices: 4 INFO: Run `dmidecode -t memory` to see further information. System memory summary Total real memory available: 8048 MB Logically used memory: 2876 MB Logically available memory: 5172 MB Swap information Device 1K-blocks Used Avail Capacity /dev/da1s1b 8373844 28K 8.0G 0%
Available hard drives: cd0: Removable CD-ROM SCSI-0 device cd0: 1.000MB/s transfers da2: Fixed Direct Access SCSI-5 device da2: 300.000MB/s transfers da2: Command Queueing enabled da2: 140009MB (286739329 512 byte sectors: 255H 63S/T 17848C) da1: Fixed Direct Access SCSI-2 device da1: 300.000MB/s transfers da1: Command Queueing enabled da1: 69618MB (142577664 512 byte sectors: 255H 63S/T 8875C) da0: Fixed Direct Access SCSI-5 device da0: 200.000MB/s transfers da0: Command Queueing enabled da0: 10491861MB (21487333120 512 byte sectors: 255H 63S/T 1337524C) Raid controllers: umass-sim0: mpt0: vendor='LSI Logic (Was: Symbios Logic, NCR)' device='SAS 3000 series, 4-port with 1064 -StorPort' isp0: vendor='QLogic Corporation' device='QLA6322 Fibre Channel Adapter' Currently mounted filesystems: /dev/da1s1a on / devfs on /dev tank on /tank /dev/ufs/EXPORT on /export I/O statistics: tty da0 da1 da2 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 40 63.61 167 10.36 16.53 2 0.03 61.65 0 0.00 1 0 4 0 94 INFO: Run iostat(8) or gstat(8) to see live statistics. Disk usage: Filesystem Size Used Avail Capacity Mounted on /dev/da1s1a 58G 3.4G 50G 6% / devfs 1.0K 1.0K 0B 100% /dev tank 9.8T 5.7T 4.1T 58% /tank /dev/ufs/EXPORT 126G 148K 116G 0% /export
FreeBSD 8.0-RELEASE-p1 FreeBSD 8.0-RELEASE-p1 amd64
samba-3.3.9 A free SMB and CIFS client and server for UNIX
Samba 3.3.9 Compile-Time Config
> make showconfig ===> The following configuration options are available for samba-3.3.9: LDAP=on "With LDAP support" ADS=on "With Active Directory support" CUPS=off "With CUPS printing support" WINBIND=on "With WinBIND support" SWAT=off "With SWAT WebGUI" ACL_SUPPORT=on "With ACL support" AIO_SUPPORT=on "With Asyncronous IO support" FAM_SUPPORT=on "With File Alteration Monitor" SYSLOG=on "With Syslog support" QUOTAS=on "With Disk quota support" UTMP=off "With UTMP accounting support" PAM_SMBPASS=on "With PAM authentication vs passdb backends" DNSUPDATE=off "With dynamic DNS update(require ADS)" DNSSD=off "With DNS service discovery support" EXP_MODULES=on "With experimental modules" POPT=on "With system-wide POPT library" MAX_DEBUG=off "With maximum debugging" SMBTORTURE=off "With smbtorture" ===> Use 'make config' to modify these settings
I enabled device polling, and took out debugging in the kernel (Sanders, get it! Mmm, I’m hungry…)
diff /usr/src/sys/amd64/conf/GENERIC /usr/src/sys/amd64/conf/SANDERS 33d32 < makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols 78c77 < --- > options DEVICE_POLLING
ispfw_load="YES" kern.hz="2000" aio_load="YES"
kern.coredump=0 security.bsd.see_other_uids=0 security.bsd.see_other_gids=0 kern.ipc.maxsockbuf=16777216 kern.ipc.nmbclusters=32768 kern.ipc.somaxconn=32768 kern.maxfiles=65536 kern.maxfilesperproc=32768 kern.maxvnodes=800000 net.inet.tcp.delayed_ack=0 net.inet.tcp.inflight.enable=0 net.inet.tcp.path_mtu_discovery=0 net.inet.tcp.recvbuf_auto=1 net.inet.tcp.recvbuf_inc=524288 net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.recvspace=65536 net.inet.tcp.sendbuf_auto=1 net.inet.tcp.sendbuf_inc=524288 net.inet.tcp.sendspace=65536 net.inet.udp.maxdgram=57344 net.inet.udp.recvspace=65536 net.local.stream.recvspace=65536 net.inet.tcp.sendbuf_max=16777216 net.inet.tcp.mssdflt=9142
rc.conf (em0 flags)
I want to thank Zilla (see post comments) for the sysctl.conf help.
ifconfig_em0="inet xxx.xxx.xxx.xxx netmask 255.255.255.0 polling tso mtu 9194"
min receivefile size = 131072 aio read size = 1 aio write size = 1 use sendfile = yes lock directory = /var/run/samba/ keepalive = 300
I’m also using LDAP users and group. I wasn’t sure if there would be a noticible performance hit for local users or LDAP users. There doesn’t seem to be one.
We use Active Directory, and since Quest/Vintela still won’t make a FreeBSD client for the Quest Authentication Servers ( a sales rep once told me “There are just too many versions of BSD…”) , I have to use all the open source utilities like OpenSSL, OpenLDAP Client and Kerberos. I don’t mind having to do it, but it is always nice if you can maintain one standard process across ALL systems, and we have a lot more Linux and Solaris systems than FreeBSD. I’m the odd one.
That aside, I use the latest OpenSSL in FreeBSD 8.0, OpenLDAP 2.4.20, and the built-in version of Heimdal Kerberos.
I get similar performance form NFS, however, most desktop users have are either on a Windows or OS X, and CIFS seems to be the unifying network storage protocol.
One thing I have yet to really figure out is configuring Samba to use proper NT ACL’s. However, if you can live with UNIX style permissions, a setup like this is pretty good at serving out lots and lots of data. Maybe that will be next.