Bad SMB write/read performance with 4 drives in 2x mirror configuration (RAID10)

nvs · September 27, 2024, 2:23pm

Hi,

I have the following HW:

CPU: Intel Xeon CPU E5620 @ 2.40 GHz
RAM: 64 GiB ECC
Boot drive: 256 GB Samsung SSD (SATA)
Data drives: 4x 20TB Seagate EXOS X20 (SATA, CMR)
Network: 10 Gbit/s
Its a rather old machine, prob from around 2013 (IIRC doesn’t support AVX instructions for example)

To give some background:

Until last week, I had this server configured with 2x 20TB drives in STRIPE configuration, running the latest TrueNAS CORE. I had a SMB share configured and got very decent sustained read and write performance to the NAS from a Win10 machine via the 10 Gbit/s connection. Sustained write performance was around 450-500 MB/s, see screenshot further below.
This week I upgraded the server to the latest TrueNAS SCALE (with a fresh ISO install) and installed two more of the same 20 TB drives, bringing the total amount of data drives now to 4x 20TB drives. My idea was to basically mirror my original 2x 20TB stripe. Which, if I understand correctly, should give me the same write performance as before and double my read performance. Reading online, if I understand correctly, the correct way to do this is to configure 2 VDEVs with 2x 20TB drives each in them. TrueNAS then combines both VDEVs automatically for use into the pool, giving me my wanted ~40TB of useful space while mirrored on two drives. I configured a SMB share again but noticed that the read/write performance seemed now worse than previously, which is not what I expected.
See the below screenshot showing the difference in write performance:

image1028×524 20.7 KB

Some additional notes/screenshots:

Pool is encrypted (before and also now)
In TrueNAS CORE I used the following SMB aux. parameters (to force SMB3 and encryption):

server min protocol = SMB3_11
server smb encrypt = required
server signing = required
client min protocol = SMB3_11
client smb encrypt = required
client signing = required

In TrueNAS SCALE I pushed the same SMB parameters via CLI using the following command (because the aux. parameters field seems to have been disabled in the web UI):

cli
service smb update smb_options="server min protocol = SMB3_11\nserver smb encrypt = required\nserver signing = required\nclient min protocol = SMB3_11\nclient smb encrypt = required\nclient signing = required"

TrueNAS scale screenshots:

image1652×818 143 KB

image809×616 16.6 KB

image537×1201 17.1 KB

image1634×434 79.3 KB

^ This is showing the CPU usage during a write of the same files as above (just double the amount of files)

If I understand correctly, under this configuration the write speed should normally be the same as with my previous configuration and the read speed should be ~2x faster even.

Any ideas what causes the worse read/write performance? Is the configuration I have done correct for the configuration I want to have/is this the best configuration?

Thanks in advance for any feedback.

volts · September 27, 2024, 3:27pm

How’s the performance if you don’t push the smb_options?

nvs · September 27, 2024, 4:00pm

@volts stupid question, but any idea how I can get back the original SMB default settings to do that? I don’t know what samba used for the 6 different fields I edited before.

Protopia · September 27, 2024, 4:28pm

You are copying files from windows using SMB, so it should be async writes, but the dataset is defined as synchronous.

You can try setting sync to disabled for the dataset and see if that makes a difference.

volts · September 27, 2024, 4:29pm

What does this show currently?

cli
service smb config

I don’t think there’s anything set by default; I think this will set it back to defaults. I’ll be curious if anybody knows better!

service smb update smb_options=""

Transport encryption can be configured in the GUI, but I’m curious to get back to defaults to see how the performance is:

volts · September 27, 2024, 4:35pm

I see Standard in the screenshot?

Protopia · September 27, 2024, 4:53pm

So SMB from Windows should stay as asynchronous, but there may be a bug.

nvs · September 27, 2024, 5:01pm

I’ve quickly spun up a TrueNAS scale VM (to be clear: the main machine this topic is about is running bare metal) just to check what the default SMB settings are with a fresh install. For whoever wants to know as well, these are the default SMB global parameters (as of writing this):

[global]
bind interfaces only = Yes
disable spoolss = Yes
dns proxy = No
load printers = No
logging = file
max log size = 5120
passdb backend = tdbsam:/var/run/samba-cache/private/passdb.tdb
printcap name = /dev/null
registry shares = Yes
restrict anonymous = 2
server multi channel support = No
server string = TrueNAS Server
winbind request timeout = 2
idmap config * : range = 90000001 - 100000000
fruit:nfs_aces = false
fruit:zero_file_id = false
rpc_server:mdssvc = disabled
rpc_daemon:mdssd = disabled
idmap config * : backend = tdb
create mask = 0775
directory mask = 0775

To push these back for the SMB service to use (and overwrite any old custom config), this is the command for in the console (restart the SMB service after settings this):

cli
service smb update smb_options="bind interfaces only = Yes\ndisable spoolss = Yes\ndns proxy = No\nload printers = No\nlogging = file\nmax log size = 5120\npassdb backend = tdbsam:/var/run/samba-cache/private/passdb.tdb\nprintcap name = /dev/null\nregistry shares = Yes\nrestrict anonymous = 2\nserver multi channel support = No\nserver string = TrueNAS Server\nwinbind request timeout = 2\nidmap config * : range = 90000001 - 100000000\nfruit:nfs_aces = false\nfruit:zero_file_id = false\nrpc_server:mdssvc = disabled\nrpc_daemon:mdssd = disabled\nidmap config * : backend = tdb\ncreate mask = 0775\ndirectory mask = 0775"

So, with this set, I did another write speed test with the same files as in the screenshots above. It seems however that windows (even though I rebooted the windows machine and truenas server) is sticky to still wanting to use encryption now. Not sure how to get it to not use encryption now, apart of forcing the opposite=no encryption in the config now.
The performance is maybe a little better (?), but still far off from what I’d expect given the previously seen continuous 450 MB/s write performance on the previous striped TrueNAS CORE system.

nvs · September 27, 2024, 5:20pm

For a complete picture, here also a screenshot of the current reading performance from the NAS to the Windows machine. Seems generally also close to single disk performance, and not 4x disk performance one would I think expect with my current config?

nvs · September 27, 2024, 5:24pm

FYI, it seems the option to set SMB encryption was removed from the web UI. I am on Dragonfish-24.04.2.2 and that setting is not there:

awalkerix · September 27, 2024, 5:36pm

To clarify, that’s not the option to enable SMB encryption. It’s the option to force its use. SMB encryption has been enabled for ages in TrueNAS, but its use is left up to client to negotiate.

The UI option was added in 24.10.

Forcing encrypted transport can have negative effect on performance (in our internal testing this was mostly bottlenecked client-side).

nvs · September 27, 2024, 5:42pm

Thanks for the clarification, sorry I wasn’t aware this was added in 24.10. I understand that version is not stable yet, thats why I didn’t install it.

FYI I’ve managed to do the write speed test from another machine which for sure didnt use SMB encryption now, but performance is pretty much the same with what I posted in the screenshot. So didn’t solve the issue.

nvs · September 27, 2024, 5:45pm

I’ve disabled Sync temporarily, restarted SMB service, and did another write test (I’m assuming I don’t need to restart the entire TrueNAS server). But it seems unaffected, issue remains unfortunately.

Protopia · September 27, 2024, 7:58pm

That’s OK - it was set to Standard so SMB from Windows should have used asynchronous writes already - but we have now ruled that out.

Protopia · September 27, 2024, 8:25pm

I think we need to talk a little about how asynchronous writes work, and how to conduct meaningful benchmark tests.

How asynchronous writes work
With asynchronous writes, the data is NOT written to disk immediately, and most important it is not written to disk before ZFS asks for the next block of data. So (to start off with anyway), the write speed looks the same as the network speed.

Imagine this as a bottle with a large (network speed) inlet pipe and a smaller (disk speed) outlet pipe. The bottle starts empty, and you can flow water into it very fast, and whilst it isn’t empty water also flows out but slower. If water flows in for a while then stops, then it will eventually empty, but if you keep water flowing in, eventually it fills completely and then water can only flow in as fast as it is also flowing out. And of course, by the time it does fill up, the amount of water that has gone in is not only the size of the bottle, but also the water that has already flowed out again.

So, if you are only measuring the network speed (the input flow to the bottle), but actually want to measure your disk-write speed (i.e. the outflow from the bottle) then you need to send a lot of data (water) until the memory is full (bottle is full) and only then will the network speed (water input flow) be an indicator of the disk-write speed (outflow).

Whether 30GB of data is enough will depend on how much memory is available to queue asynchronous writes and how long it takes to fill this memory given that as you try to fill it, some data is being written out to disk.

Some of your graphs show a steady performance, and some show a burst at the start and then a slow down.

Internet research says that the default memory for queuing writes is the minimum of 10% of memory and 4GB, but this is tuneable and the parameter settings in CORE and SCALE may be different.

Consistent benchmarks
To complicate things, if you want accurate measurements you have to start each repeated measurement from the same steady state.

Some of this is fairly obvious: make sure that nothing else is running on your network or your PC or the TrueNAS server that might compete for these resources.

But for Write & Read tests there are other things you need to do:

For write tests you need to let TN empty out its write buffers i.e. wait until your disks stop chuntering.

For read tests you need to clear the ARC cache (which isn’t as easy as running a command) in order to for TN to read the data afresh from disk rather than using what it already has in memory. Internet research suggests that you need to export and reimport a pool to clear its ARC entries!

Also for read tests, the size of files matters because ZFS tries to read-ahead for large files (and you will have less seeks) whereas many small files will have less read-ahead and far more seeks…

Unfortunately, I haven’t time to research expected speeds for your configuration - I just know (from years of experience) that performance testing has way more unexpected variables than you might think and that results can be weird until you understand and eliminate them.

nvs · September 27, 2024, 8:38pm

Hi, thanks for the feedback. I agree that this can be a lot more complicated than what we can see in a simple windows SMB copy graph. I have copied a lot larger batches of files (TBs of data) to this NAS in the previous configuration and in its current configuration, and the general impression still is that the read/write speeds are worse than before. And that while the read speeds should in principle have improved due to the additional two drives.

I am more than happy to test more things to troubleshoot this. If needed even to go back to CORE and try the same mirror setup there or to make another stripe array and do more testing. I am happy to be corrected but my feeling says something is fishy here and its worth for me to spend the effort troubleshooting this.

It would indeed also be interesting if someone could comment on what speeds I should expect with that RAID10 layout with these disks. - On that note, if there is reason to consider RAID10 for something better happy to consider that, too. But from my research this seemed like the best choice if I want:

40TB total space
At least one drive allowed to fail
Easy drive replacement in case of failure
Future possibility to easily expand the pool
Improved read performance (and same great write performance as I used to see with the simple stripe before)

nvs · September 27, 2024, 9:34pm

Did some iperf3 tests to the NAS from a Ubuntu VM and Windows10 VM, both connected via 10 Gbit/s. The Win10 machine is what was also used for the file copy test screenshots previously shown.

Not sure what is going on, but the performance doesn’t seem too great here already. Win10 connection has no retries, but the Ubuntu connection seems to have quite some retries. Generally the speed isn’t maxed out also, although if I increase -P to 2 or 3 (streams) then I get to max out the whole 10 Gbit/s pretty much. Not sure if this is a possible telling sign that something here might be fishy already (any chance the NIC driver is different in SCALE vs CORE?).

Protopia · September 27, 2024, 11:29pm

The performance tests you are running will only be a good indicator of the performance you can expect in reality if your normal workload is the same reading and writing of 30GB of data.

I suspect that in reality your workload will (like most people’s) be very different to this.

My own real-life usage is a mixture of the following:

Most of my data is almost permanently at rest. I have TB of video files and I watch at most a few GB per night on the TV over several hours. I need to stream these videos, and my disk read performance from a 5x HDD RAIDZ1 is easily adequate for that.
I need to move new video files from my PC to TN - but this is a batch operation and so long as it finishes in a timescale measured in minutes not hours that is fine. (With a Gb LAN they typically take a minute or two.)
Several forms of backup - again batch operations at night.
Reading and writing small-ish files (of a few KB or MB) where the primary copy is on the NAS.

Overall my experience has been that the LAN has been the speed limitation and upgrading the key parts of my LAN from 100Mb to 1Gb was the easiest and best way to improve performance, and…

As you can see from the specs in my signature, my own NAS is low powered, and my HDDs are RAIDZ1. Yet for files of a few GB I can read and write at Gb speeds limited by my LAN rather than the disks (thanks to sequential read-ahead and ARC - and I get an ARC hit ration of > 99.9% despite only having 10GB of memory!!)

(If I wanted to run VMs or saturate a 10Gb network, I would probably need a more powerful system. But I genuinely don’t have a need for these, and I am happy with the system I have. My only regret is not building a 5xHDD RAIDZ2 rather than RAIDZ1.)

That said, things are different for apps and VMs running on TN, and for those you may well need at a minimum SSD if not NVMe. This is particularly true for VM system drives which are usually zVols, doing random reads and synchronous writes where performance is much more critical. (But this is NOT what you are currently measuring.)

Protopia · September 27, 2024, 11:31pm

Edit: I misread the post about VMs.

So ignore the blurred stuff and read below.

I see from your most recent comment that the Windows machine may have been a VM running under TN. If your previous Windows tests were from such a VM rather than from a separate machine across a network, that might be useful information. If so please confirm how your Windows “disks” are configured in TN so we can understand the interaction with the native reads and writes.

iperf3 is a network only performance test - this seems to suggest that the issue is a networking one rather than a disk issue. Would you agree with that?

mrak · September 28, 2024, 4:46pm

I think you should try to separate possible network issues from possible pool issues.

Iperf result suggest that you might have network issues, which (among other things) might mean that you have bottleneck somewhere and it is unclear where exactly. Some ether adapters, as reported, might have different performance issues which might end up limiting your network speeds. But again, we do not have enough data to say if the issue is related to your truenas install, switch hardware or your workstation which is on the other end of iperf. You might try to configure iperf with multiple streams (and see what happens).

Also, from your case, note that your network speeds are still well above your actuall SMB read/write results so your network speed should not have impact on read/write (which, as other mentioned should be async anyway).
This kind of iperf results (assuming ethernet hardware is configured ok) might indicate packet size issues, for example that one side expects or sends jumbo frames and other side needs this translated into smaller packets.

There are a number of posts here regarding measuring pool speeds on truenas itself. You should try that too. Also, bear in mind that if you flood your truenas with data, at some point it will try to write ZIL and to the data vdev at the same time (if ZIL is not separated to completely different device) so in that kind of scenario (if I understand ZFS correctly) you might share total write speed in two streams, one to your pool and one to ZIL and in that case your write speeds are (roughly) within expectation. If possible, try to add SLOG device and check what happens. Unclear why same thing didn’t happen on your CORE install except to say that CORE might be better optimized.

p.s. I am motivated superuser learning the platform (at best) so take that into account while evaluating ideas and suggestions