Weird performance using excessive hardware

jarble · August 26, 2025, 8:55pm

Ok this one is bugging me to the point I am making a post to see if anyone has ideas as it really makes no sense.

On the top level my performance is ridiculously slow in relation to everything involved. I am only able to achieve just under 600MB/s using any setting or configuration I can think of. I have gone as far as nuking the turenas install and pool but got the same results.

System Overview

TrueNAS SCALE Version: 25.04.2.1
Storage Pool: 16× NVMe drives (currently in a striped vdev for testing; previously RAIDZ1)
Network Interfaces:
- 25 GbE on Windows 11 test client (AMD 7950X)
- 100 GbE on Linux test client
Test Clients:

Windows 11 machine with 7950X and 25 GbE NIC
Linux box with 100 GbE NIC

Full hardware and software details are in my forum signature.

Problem Description

Despite eliminating any vdev bottlenecks (using a full stripe), SMB throughput tops out just under 600 MB/s. This same ceiling appeared when the pool was configured as RAIDZ1, and persists across fresh OS and pool rebuilds.

Steps Taken to Troubleshoot

Reinstalled TrueNAS SCALE and recreated the pool from scratch
Converted the pool from RAIDZ1 to a single stripe of 16 drives
Verified network performance with iperf3:

25 GbE link saturates at ~24 Gb/s
100 GbE link saturates bidirectionally at ~86 Gb/s

Ran fio on the NAS itself to isolate storage stack: ~2,500 MB/s (appears single thread-bound)

Observed Performance

SMB writes (Windows → NAS): ~580 – 600 MB/s
SMB reads (Windows ← NAS): ~580 – 600 MB/s
iSCSI writes/reads (Windows) : ~580 – 600 MB/s (identical to SMB)
iperf3 (network-only): full line rate on both 25 GbE (~24 Gb/s) and 100 GbE (~86 Gb/s)
fio (local, random/sequential tests): up to ~2,500 MB/s

I am open to any suggestions and have it torn apart for testing so I have no issues with destructive testing.

Constantin · August 27, 2025, 2:56pm

I would expect your beefy infrastructure to be more than capable of saturating those links. Between on-NAS confirming high throughput (with random data, right?) and the network showing very high throughputs, I wonder if there is a misconfiguration going on somewhere.

For example, is SMB sync = always required? I would not think it would have a significant impact on an all flash system… but maybe it does? Also, did you confirm that your flash drives aren’t using a fast cache up front followed by a slow drive in the rear, which choke under sustained loads?

Also, is there some way of making two flows happen at once? (Ie add a third machine to the mix, then hammer the server with two SMB sessions of incompressible data at the same time). The reason I ask is that SMB is a single-threaded process.

If your CPU cannot keep up, you might see a doubling of transfer rates (2x 600MB/s) since everything else in the environment seems to be performant. Given your system specs, that seems somewhat of a remote possibility, however.

jarble · August 28, 2025, 3:59am

Thanks for the reply. I don’t think it is SMB related as I have also tested isci and nfs and both are the same. As for single thread overload I tested fio and could get about 2500MB/s before maxing a single thread out. For all the isci, smb, and nfs test the system is idling. No cores over 5% and the disk are at 30MB/s each.

jarble · August 28, 2025, 1:58pm

Still no closer to solving but I did encounter an interesting result. I have been trying to remove anything between the array and the test fio (slog, l2arc, arc) and got something interesting. When doing writes (primarycache=none) I am getting the expected speed of 2500~ but when testing read under the same conditions I am getting a much lower number of 610. Per drive utilization falls down into the 30~ acting much like the external testing but now on the local array. As before the CPU is fine with one thread never maxing out. Honestly this just confuses me more as reads should, in theory, be better than writes.

root@truenas[/mnt/Stripe/NFS]# fio --ramp_time=5 --gtod_reduce=1 --numjobs=1 --bs=1M --size=200G --runtime=60s --readwrite=read --name=testfile
testfile: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
fio-3.33
Starting 1 process
testfile: Laying out IO file (1 file / 204800MiB)
Jobs: 1 (f=1): [R(1)][100.0%][r=478MiB/s][r=478 IOPS][eta 00m:00s]
testfile: (groupid=0, jobs=1): err= 0: pid=45586: Thu Aug 28 09:41:42 2025
read: IOPS=582, BW=582MiB/s (610MB/s)(34.1GiB/60030msec)
bw ( KiB/s): min=182272, max=860160, per=100.00%, avg=596106.19, stdev=226910.09, samples=120
iops : min= 178, max= 840, avg=582.01, stdev=221.63, samples=120
cpu : usr=0.15%, sys=50.57%, ctx=35086, majf=0, minf=37
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=34942,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
READ: bw=582MiB/s (610MB/s), 582MiB/s-582MiB/s (610MB/s-610MB/s), io=34.1GiB (36.6GB), run=60030-60030msec

jarble · August 28, 2025, 2:04pm

Also I am just now seeing a typo in the main post but I can’t find a way to edit. The 16-16TB drives in the array are NOT nvme but mechanical. The slog, l2, and all clients are pure nvme. This is critical as an nvme array of that size would definitely be maxing out the cpu but mechanical is just fine.

jarble · September 3, 2025, 1:16pm

I simply can not figure this one out. Since my last post I got on a rabbit trail of thinking it was the HBA so I swapped in a much older raid card to test. I still get the same results though. Without any cache enabled the array is a minimum 2.5GB’s RW. This makes me think the array is good and the HBA was fine. I have tested a few endpoints and no matter if it is SMB, NFS, or iscsi my results are always at 400-600MB’s with cache enabled or disabled. Each of the endpoints have native disk speeds in the 6-7GB’s range so I don’t think that is the bottle neck. Additionally I did a point to point connection with an endpoint and the server (at a lower speed of 10g) and still got the same 400MB’s speed.

I feel like each part in isolation is working but I am missing some critical bit of information somewhere.

swc-phil · September 3, 2025, 5:48pm

I’m not a specialist, but it looks like a synchronous operation.

Also, I would suggest:

Directly connect your windows or/and linux box to the truenas. Thus eliminating the possibility of switch misconfiguration.
Running SMB read or write simultaneously on both windows and linux and measure the total throughput. Perhaps you would need a switch for this again.

Some questions:

How do you exactly measure SMB performance on linux and windows?
Are client storages fast enough to saturate more than 600MB/s? 600MB/s looks suspiciously close to the max bandwidth of SATA SSD…

jarble · September 8, 2025, 11:45pm

Unfortunately the thread is a bit of a mess. To answer some of the questions already posted.

Direct connection has the same speed.
All client devices storage exceeds 7GB/s RW
Have tested SMB, NFS, and isci all have the same speed
Tested alternate HBA but performance was identical
When testing the file transfers the per disk usage is around 20MB/s
When testing local the per disk usage is around 250MB/s
CPU is not taxed in any way even single thread
Multiple clients receive the same individual speed

I remain perplexed on this one. Even pulling directly from L1 the speed is unchanged.

swc-phil · September 9, 2025, 4:23am

And what about combined speed?

exx · September 10, 2025, 4:01am

Silly question, but do you happen to have a sata or sas L2ARC drive in there?

MBILC · September 10, 2025, 6:38pm

This is interesting as I have similar caps on my system also with a 4 drive NVMe set up I use as an NFS share for VMs, my original thread on the old forums:

jarble · September 12, 2025, 1:05pm

Negative the l2 is comprised of pure nvme. All four of the stripe nvme drives are identical pcie 4.0 disk.

jarble · September 12, 2025, 1:13pm

If I am reading your post correctly your speed vastly exceeds what I am getting on my setup. For local only storage everything seems to be working, ie the mechanical array is in line and if allowed to test arc it delivers arc speeds. But the moment it moves over the network something breaks, badly. I just can’t figure it out as the two test iperf and fio show no issues. This indicates that the network is good and the storage is good. My current idea is to swap the 100g network card for a lower speed card on the off chance there is a bug in truenas with this card past that I have nothing left.