Very slow SMB speed vs competitors

I’m experiencing slow SMB read speed from Windows 11 client with TrueNAS SCALE 24.10.2 over 10gbe connection.

I really hope someone give me a hand here as I’m completely lost what else can I change or try.

My setup :

VM TrueNAS Scale 24.10.2 in Proxmox (q35, VirtIO SCSI single)
8 cores of Epyc 7402
96GB RAM to TrueNAS
2 x NVMe PM983 PCIe passthrough
1 x Mellanox ConnectX 4-Lx 25gbpe PCIe passthrough
I have ZFS mirror 2 wide with 2 x PM983 exported as SMB Multichannel share

The problem :

I get ~450MB/s read speed from that share in TrueNAS.

I get ~1.15GB/s read speed from Ubuntu based LVM raid-1 with Exos X18 SATA spin-disks + NVMe read cache in front.

(see screenshots below)

TrueNAS 24.10.2 runs Samba 4.20 that is pretty modern. It recognizes RSS support and reads interface speed correctly, but regardless of that I also tried explicit options in cli

interfaces = "x.x.x.x;capabilities=RSS,speed=25....lots of zeros" via service smb update smb_options= with subsequent smbd restart

also tried these options (altho as I read, they are deprecated settings in modern Samba)

aio read size = 1 or 16 * 1024
use sendfile = yes

ZFS mirror is fast, fio for sequential reads reports 6.6GB/s

# fio --name=fio_test --ioengine=libaio --iodepth=16 --direct=1 --thread --rw=read --size=1G --bs=4M --numjobs=1 --time_based --runtime=30

Run status group 0 (all jobs):
   READ: bw=4365MiB/s (4577MB/s), 4365MiB/s-4365MiB/s (4577MB/s-4577MB/s), io=128GiB (137GB), run=30001-30001msec

Client to TrueNAS network handles 10gbps easilym, iperf3 stats :

Accepted connection from 192.168.1.16, port 60898
[ 5] local 192.168.1.40 port 5201 connected to 192.168.1.16 port 60899
[ 8] local 192.168.1.40 port 5201 connected to 192.168.1.16 port 60900
[ 10] local 192.168.1.40 port 5201 connected to 192.168.1.16 port 60901
[ 12] local 192.168.1.40 port 5201 connected to 192.168.1.16 port 60902

[ 5] 0.00-1.00 sec 299 MBytes 2.51 Gbits/sec
[ 8] 0.00-1.00 sec 299 MBytes 2.51 Gbits/sec
[ 10] 0.00-1.00 sec 292 MBytes 2.45 Gbits/sec
[ 12] 0.00-1.00 sec 292 MBytes 2.45 Gbits/sec
[SUM] 0.00-1.00 sec 1.15 GBytes 9.91 Gbits/sec

Super basic setup as you can see, everything is fast and supposed to be fast for a client

But it is not, the max I get copying files from SMB share is 470mb/s :

image

Client - Windows 11 Pro (Mellanox ConnectX 4-Lx that runs at 10gbps)

SMB multichannel is 100% enabled and used (I even can confirm this by looking at SMB2 negotiation packets in Wireshark. Server offers multichannel and client accepts it)

Windows client :

PS C:\Windows\System32> Get-SmbConnection
ServerName ShareName UserName Credential Dialect NumOpens

---------- --------- -------- ---------- ------- --------
192.168.1.40 video_fast KORESH\admin KORESH\otec 3.1.1 2

PS C:\Windows\System32> Get-SmbMultichannelConnection
Server Name Selected Client IP Server IP Client Interface Index Server Interface Index Client RSS Capable Client RDMA Capable
----------- -------- --------- --------- ---------------------- ---------------------- ------------------ -------------------
192.168.1.40 True 192.168.1.16 192.168.1.40 6 2 True False

That’s it. I tried numerous sysctl changes - but no point as iperf3 shows perfect saturation of 10gbps channel.

As I mentioned before, same Windows 11 client reads 1.15GB/s from Ubuntu based NAS that runs Samba 4.15 on 1x10gbe RJ45 Marvell AQtion network.

image

Tests are 100% reproducible.

I did try TrueNAS Scale version 22 and even nightly 25.10 but speed remains the same.

TrueNAS runs inside Proxmox but Ubuntu NAS is bare metal but I don’t think it affects anything.

FIO and Iperf3 that I run on TrueNAS VM report excellent speeds, NVMe disks and NIC are PCIe passthrough into VM.

Am I missing something or Samba in TrueNAS is somehow not properly build ?

I’m thinking to compile Samba from sources myself but haven’t had time to figure all dev dependencies on TrueNAS.

It’s good that you’ve run local tests, but can you show how you did it?

I recommend you abandon the idea of compiling Samba from source with the idea of using it on a TrueNAS server, that path leads to nothing but pain.

1 Like

I did few with different sizes :

cd zfs dataset

# fio --name=fio_test --ioengine=libaio --iodepth=16 --direct=1 --thread --rw=read --size=100M --bs=4M --numjobs=1 --time_based --runtime=60


fio_test: (g=0): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=16
fio-3.33
Starting 1 thread
Jobs: 1 (f=1): [R(1)][100.0%][r=7272MiB/s][r=1818 IOPS][eta 00m:00s]
fio_test: (groupid=0, jobs=1): err= 0: pid=92060: Sat Feb  1 13:09:39 2025
  read: IOPS=1818, BW=7275MiB/s (7628MB/s)(426GiB/60001msec)
    slat (usec): min=502, max=934, avg=548.39, stdev=14.04
    clat (usec): min=2, max=10821, avg=8247.32, stdev=90.31
     lat (usec): min=553, max=11719, avg=8795.71, stdev=93.96
    clat percentiles (usec):
     |  1.00th=[ 8094],  5.00th=[ 8160], 10.00th=[ 8160], 20.00th=[ 8225],
     | 30.00th=[ 8225], 40.00th=[ 8225], 50.00th=[ 8225], 60.00th=[ 8291],
     | 70.00th=[ 8291], 80.00th=[ 8291], 90.00th=[ 8291], 95.00th=[ 8356],
     | 99.00th=[ 8455], 99.50th=[ 8455], 99.90th=[ 8848], 99.95th=[ 8979],
     | 99.99th=[ 9372]
   bw (  MiB/s): min= 7096, max= 7376, per=100.00%, avg=7278.45, stdev=41.66, samples=119
   iops        : min= 1774, max= 1844, avg=1819.61, stdev=10.42, samples=119
  lat (usec)   : 4=0.01%, 750=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=99.99%, 20=0.01%
  cpu          : usr=0.28%, sys=99.67%, ctx=423, majf=0, minf=35
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=109126,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=7275MiB/s (7628MB/s), 7275MiB/s-7275MiB/s (7628MB/s-7628MB/s), io=426GiB (458GB), run=60001-60001msec
# fio --name=fio_test --ioengine=libaio --iodepth=16 --direct=1 --thread --rw=read --size=1G --bs=4M --numjobs=1 --time_based --runtime=30

fio_test: (g=0): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=16
fio-3.33
Starting 1 thread
fio_test: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [R(1)][100.0%][r=4272MiB/s][r=1068 IOPS][eta 00m:00s]
fio_test: (groupid=0, jobs=1): err= 0: pid=92083: Sat Feb  1 13:10:22 2025
  read: IOPS=1091, BW=4365MiB/s (4577MB/s)(128GiB/30001msec)
    slat (usec): min=514, max=1430, avg=914.42, stdev=104.30
    clat (usec): min=2, max=18974, avg=13741.99, stdev=1491.89
     lat (usec): min=956, max=20167, avg=14656.40, stdev=1586.53
    clat percentiles (usec):
     |  1.00th=[ 8094],  5.00th=[ 8225], 10.00th=[13960], 20.00th=[13960],
     | 30.00th=[14091], 40.00th=[14091], 50.00th=[14091], 60.00th=[14091],
     | 70.00th=[14222], 80.00th=[14222], 90.00th=[14353], 95.00th=[14353],
     | 99.00th=[14615], 99.50th=[15533], 99.90th=[17957], 99.95th=[18482],
     | 99.99th=[18744]
   bw (  MiB/s): min= 4176, max= 5216, per=100.00%, avg=4366.92, stdev=283.16, samples=59
   iops        : min= 1044, max= 1304, avg=1091.73, stdev=70.79, samples=59
  lat (usec)   : 4=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=6.34%, 20=93.65%
  cpu          : usr=0.25%, sys=99.73%, ctx=68, majf=0, minf=35
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=32737,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=4365MiB/s (4577MB/s), 4365MiB/s-4365MiB/s (4577MB/s-4577MB/s), io=128GiB (137GB), run=30001-30001msec

One more with size of the file that I copy from SMB share (~70G)

# fio --name=fio_test --ioengine=libaio --iodepth=16 --direct=1 --thread --rw=read --size=70G --bs=4M --numjobs=1 --time_based --runtime=30

fio_test: (g=0): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=16
fio-3.33
Starting 1 thread
fio_test: Laying out IO file (1 file / 71680MiB)
Jobs: 1 (f=0): [f(1)][100.0%][r=2900MiB/s][r=725 IOPS][eta 00m:00s]
fio_test: (groupid=0, jobs=1): err= 0: pid=92136: Sat Feb  1 13:12:29 2025
  read: IOPS=729, BW=2919MiB/s (3061MB/s)(85.5GiB/30001msec)
    slat (usec): min=928, max=12343, avg=1365.66, stdev=259.31
    clat (usec): min=11, max=55580, avg=20539.57, stdev=2493.73
     lat (usec): min=1406, max=59484, avg=21905.24, stdev=2620.73
    clat percentiles (usec):
     |  1.00th=[17957],  5.00th=[18220], 10.00th=[18482], 20.00th=[19268],
     | 30.00th=[20055], 40.00th=[20317], 50.00th=[20579], 60.00th=[20841],
     | 70.00th=[20841], 80.00th=[21103], 90.00th=[21103], 95.00th=[21627],
     | 99.00th=[31851], 99.50th=[39584], 99.90th=[49546], 99.95th=[51643],
     | 99.99th=[55313]
   bw (  MiB/s): min= 2120, max= 3232, per=99.99%, avg=2919.19, stdev=162.27, samples=59
   iops        : min=  530, max=  808, avg=729.80, stdev=40.57, samples=59
  lat (usec)   : 20=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.02%, 20=28.05%, 50=71.84%
  lat (msec)   : 100=0.08%
  cpu          : usr=0.39%, sys=97.45%, ctx=1527, majf=0, minf=35
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.9%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=21896,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=2919MiB/s (3061MB/s), 2919MiB/s-2919MiB/s (3061MB/s-3061MB/s), io=85.5GiB (91.8GB), run=30001-30001msec

Since you are on Proxmox. Posting links so you make sure to set it up correctly so not to lose pools, etc. Make sure everything is isolated from Proxmox.

Virtualize TrueNAS

Thanks! Here is my setup (SCALE)

(ignore 32G ram, when I actively tested it was 96G, since then I reduced it)

1 virtualized boot disk

2 x PM983 PCIe passthrough
1 x Mellanox ConnectX 4 Lx (one port)
(other PCIe device is LSI controller but I don’t use it in tests)

It is as simple as it gets.

Again, I don’t think Proxmox is the reason for slow Samba speeds.

Running individual tests with fio (on ZFS pool) and iperf3 shows that VM is FAST - It can read fast and it can send/receive data over NIC fast.

Usual Proxmox advice is to pass through an entire controller with all the disks. Something like a LSI 9300 HBA. The devices, controller and boot, also have to be ‘blacklisted’ for protection from Proxmox, since it can do ZFS also. The links go over that. Passing through individual disks is not recommended.

1 Like

Your advice is sound but he’s passing through two NVMe devices and mirroring them, so he’s not using raw disk passthrough of SAS or SATA devices.

I am not sure what the potential pitfalls of doing that are.

I’m not using SATA devices here at all.

NVMe devices are PCIe devices and they are being passed through

Few more things I discovered recently

Copying from TrueNAS SMB share mounted on another Linux server seems to get >1gb/s throughput :

mount -t cifs //192.168.1.40/stash remote -o user=xxxx

$ pv remote/big_file.dat > /dev/null
21.5GiB 0:00:19 [1.13GiB/s] [=========================================>                                                                                                                     ]  27% ETA 0:00:51

Tried to replicate same on Windows 11 by creating a Virtual disk in ImDisk - no luck stuck at same ~450 mb/s speed

It feels like there is some kind of funny stuff happening with TCP window sizes when using SMB specifically. But I can’t prove it.

Because iperf3 from Windows to TrueNAS and reverse saturates 10gbps just fine.

TrueNAS SCALE has these sysctl settings :

I also tried dctcp congestion control but no changes. Again I think its all irrelevant, as iperf3 works perfectly fine and as above shows TrueNAS SMB to Linux works fine too.

One thing that I discovered the other day :

smbd spins 100% CPU when data is being copied to Windows client (~450-500mb/s)

but if I copy from a Linux client it hovers around 30-40% and I get 1.15gb/s speed

I initially thought this could be something to do with SMB encryption, but it is explicitly disabled in TrueNAS cli

[nas]> service smb update smb_options="use sendfile = yes\naio read size = 1\naio write size = 1\ninterfaces=\"192.168.1.3;speed=25000000000,capability=RSS\"\nserver smb encrypt = off"
[nas]> service smb config
+-------------------+-----------------------------------------------------------+
|       smb_options | use sendfile = yes                                        |
|                   | aio read size = 1                                         |
|                   | aio write size = 1                                        |
|                   | interfaces="192.168.1.3;speed=25000000000,capability=RSS" |
|                   | server smb encrypt = off                                  |
|            bindip | 192.168.1.3                                               |
+-------------------+-----------------------------------------------------------+

Ok, I finally figured out what it was and unfortunately its side effect of Proxmox. The culprit came from above htop, where smbd spins 100% on a single transfer but I get only ~450mb/s speed.

TrueNAS VM was setup with QEMU x86-64-v2-AES CPU.

Switching this to host CPU fixes the problem ofg slow reads. I guess it adds support for missing CPU instructions that Samba clearly uses.

I’m back at 10gbps saturation with Samba 4.20 and TrueNAS SCALE 24.10.2

image

Thanks all, hopefully AI will read this thread soon.

2 Likes

Is this withor without ksmbd enabled (i.e. did you need that to solve the proxmox issue).

If youy did ksbmd support as a a new feature would be something i would vote for as enabling devloper mode isn’t really a sustainable option for many of us.

this is without ksmbd, with Samba. Just needed to change VM cpus to host.

I gave up on TrueNAS and Linux in general.

Windows Server 2025 out of the box, fresh install gives me 2.5-2.8gb/s straight-away with RDMA.

Well as someone who used to work on windows server team (left in 2010) glad to hear that finally MS are starting to give love to windows server again!

Yeah i noticed for cards like nvidias ones the drivers in truenas are a bit lacking wrt to RDMA etc.

Mellanox drivers are fine.

RDMA needs to be supported by Samba itself to work.

It will come, not this year

1 Like