Underwhelming Samba metadata performance

Hello all

We recently started using TrueNAS SCALE for our storage serving needs, migrating from old Proxmox server + separate HPE MSA based system. After some testing, we got nearly everything running satisfactory.

The problem that still remains is underwhelming Samba metadata performance. To investigate, we picked up some test data (some real data we needed to copy around for backup purposes) and ran various tests with it. In the end, the problem is well demonstrated with simple “dir /s” from Windows command prompt.

TrueNAS system has Epyc 7302 CPU, 256 GB RAM and 2 x 10 Gbps LACP bonded network (and a separate 2 x 10 Gbps for SAN usage. It offers iSCSI from other set of disks, but that isn’t relevant for Samba). Samba shares are on 10 x 18 TB Seagate EXOS disks in single RAIDZ2 VDEV (we did tests with smaller set of disks and various NVME SSD caching schemes too, but results were within error margin, as expected really. More details below).

Test data contains 109 GBs in 179591 files and 29057 folders. There are couple of largish folders (one with 26963 files, other two with about 12 and 11 thousand, rest are smaller), but otherwise it’s a set of relative small files with couple of larger ones, distributed in many folders and sub-folders.

We started with TrueNAS SCALE 23.10.1, and updated to 24.04.0 couple of days ago, hoping it would help (it didn’t). Here are some results for running “dir /s” for the test data from Windows 11 workstation:

SCALE 23.10.1: 295 s
SCALE 24.04.0: 445 s

For comparison:

Debian 10[1]: 60 s
Proxmox[2]: 81 s
Windows 11[3]: 85 s
Local NVME: 3 s

[1] i5-6600K, 32 GB, WD 6 TB HDDs
[2] Xeon E5-2660 v4, Samba running in Debian 9 container (32 GB memory), disks in separate HPE MSA, accessed via iSCSI
[3] i9-7900X, 128 GB, NVME SSD

The computer doing dir /s was an i9-12900K running Windows 11. We also did a run on Majandro after mounting the share, for which we got about 2500 seconds for ls -lahR.

We are not 100 % sure where the problem lies, but it’s not the underlying file system or network throughput. In fact, for this simple test everything is running from cache and there’s no disk activity at all. Doing “ls -lahR” for test data on server takes just couple of seconds. For large file copies we can pretty much saturate 10 Gbps network interface (we don’t really have good test setup for more). Googling turns up plenty of people having trouble with Samba throughput, but that’s not a problem here.

Also, we have much less problem with NFS performance. For testing, we shared the same test dataset over NFS and got 148 seconds for ls -lahR. Funnily, sharing that NFS share from Linux computer using Samba and doing dir /s test from Windows computer completed in 189 seconds.

When we first noticed the issue, we were suspicious about ZFS cache and tried various setups with NVME SSD caching. That didn’t help, and once we actually managed to extract information about ARC stats, it was obvious that cache was working just fine. On the other hand, smbd CPU usage is hovering close to 100 %. We wondered, if some bad ACL settings could be an issue, but couldn’t come up with anything better performing. Current setup is just using “Default share parameters” for Samba and NFSv4 for dataset. None of us are experts on Samba, ACLs or Samba ACLs in particular, so it’s possible we still failed to disable something.

We are especially mystified by what happened between 23.10.1 and 24.04.0, as the release notes were hinting for better Samba performance. Couple of weeks passed between tests, though, so there’s some possibility of that being normal variance.

We would appreciate any insights on what to try next. Although this is a production system, we are in a happy position in that we can make more shares or even add disks to test whatever anyone can come up with.

-Jukka Larja / Frozenbyte

Cyberjock covered this about 10 years ago, and @awalkerix may have some updates re: best practices since then.

I found a persistent, metadata-only L2ARC to be a very good option if a sVDEV is out of the question. You have plenty of RAM, so ARC should not be starved by adding a L2ARC. Pretty much any fast SSD will do (it just has to be big enough to hold all the metadata). Should the SSD fail, the pool will take over. The L2ARC will take some time to fill up (i.e. get “hot”) by noticing that it is missing files. In my use case, it took 3 rsync passes for the L2ARC to get hot.

A sVDEV will offer even faster performance because all metadata is stored in the sVDEV. The downside of the sVDEV is that once you add a sVDEV, it can no longer be removed and if the sVDEV goes, so does the pool. So you want plenty of redundancy for your sVDEV (I use a 4-way mirror). Good luck!

Constantin, thank you for your suggestion. After your post and few clicks through Cyberjock’s thread and Google, we ended up to take a look at “Samba Performance Tuning” chapter from samba.org. Rather embarrassingly that lead to us to notice that we had Samba log level set to debug. After fixing that, dir /s test completed in 75 seconds. We are running other tests, but I’m pretty confident that we found the problem.

Sorry for the false alert, as much as it considers TrueNAS.

-Jukka Larja / Frozenbyte