Hi,
in short this is my situation with my prosumer-ish setup of RAIDZ1 on SCALE, 7 wide, 1.82 TiB (all flash, SATA, no dedicated HBA).
- Writing to SMB share: ~1.1 Gbit/s
- Reading from SMB share: ~2.4 Gbit/s
- Interface capability on both ends, connected directly: 2.5 Gbit/s
Expectation: SMB write speed closer to interface capability
Read/write speeds show basically no variation in between tests as well as during tests, looks like a hard cap from start to finish.
iperf with truenas as server, my PC as client:
Accepted connection from 192.168.0.98, port 58595
[ 5] local 192.168.0.99 port 5201 connected to 192.168.0.98 port 58596
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 138 MBytes 1.16 Gbits/sec
[ 5] 1.00-2.00 sec 139 MBytes 1.17 Gbits/sec
[ 5] 2.00-3.00 sec 139 MBytes 1.17 Gbits/sec
[ 5] 3.00-4.00 sec 139 MBytes 1.17 Gbits/sec
[ 5] 4.00-5.00 sec 139 MBytes 1.17 Gbits/sec
[ 5] 5.00-6.00 sec 139 MBytes 1.17 Gbits/sec
[ 5] 6.00-7.00 sec 139 MBytes 1.17 Gbits/sec
[ 5] 7.00-8.00 sec 139 MBytes 1.17 Gbits/sec
[ 5] 8.00-9.00 sec 139 MBytes 1.17 Gbits/sec
[ 5] 9.00-10.00 sec 139 MBytes 1.17 Gbits/sec
[ 5] 10.00-10.02 sec 2.99 MBytes 1.15 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.02 sec 1.36 GBytes 1.17 Gbits/sec receiver
iperf with my PC as server, truenas as client:
Connecting to host 192.168.0.98, port 5201
[ 5] local 192.168.0.99 port 58808 connected to 192.168.0.98 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 264 MBytes 2.21 Gbits/sec 145 503 KBytes
[ 5] 1.00-2.00 sec 284 MBytes 2.38 Gbits/sec 0 716 KBytes
[ 5] 2.00-3.00 sec 282 MBytes 2.37 Gbits/sec 0 776 KBytes
[ 5] 3.00-4.00 sec 282 MBytes 2.37 Gbits/sec 0 787 KBytes
[ 5] 4.00-5.00 sec 284 MBytes 2.38 Gbits/sec 0 790 KBytes
[ 5] 5.00-6.00 sec 282 MBytes 2.37 Gbits/sec 0 793 KBytes
[ 5] 6.00-7.00 sec 282 MBytes 2.37 Gbits/sec 0 797 KBytes
[ 5] 7.00-8.00 sec 284 MBytes 2.38 Gbits/sec 0 801 KBytes
[ 5] 8.00-9.00 sec 282 MBytes 2.37 Gbits/sec 0 803 KBytes
[ 5] 9.00-10.00 sec 282 MBytes 2.37 Gbits/sec 0 810 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 2.74 GBytes 2.36 Gbits/sec 145 sender
[ 5] 0.00-10.00 sec 2.74 GBytes 2.35 Gbits/sec receiver
iperf Done.
This matches my experience when doing regular file transfers.
I ditched any hardware in between server and client for testing, but adding my switch and the serverās Intel X520 with 2x SFP+ via LACP changes nothing (I disconnected all other network cabling from the server).
Copying within server (from boot pool to storage pool for example) seems fine, I guess (probably skewed by CPU compression? anywayā¦):
copy:
admin@truenas[~]$ time ( cp /home/admin/test /mnt/sata-ssd-01/mix/test3 ; sync )
( cp /home/admin/test /mnt/sata-ssd-01/mix/test3; sync; ) 0.00s user 4.24s system 37% cpu 11.162 total
admin@truenas[~]$ ls -lh /home/admin/test
-rw-r--r-- 1 admin admin 9.8G Jun 18 23:48 /home/admin/test
fio (no idea what Iām doing - any help is appreciated):
admin@truenas[/mnt/sata-ssd-01/mix]$ fio --filename=testthrough --direct=1 --rw=randrw --randrepeat=0 --rwmixread=100 --iodepth=128 --numjobs=12 --runtime=60 --group_reporting --name=4ktest --ioengine=psync --size=4G --bs=1MB
4ktest: (g=0): rw=randrw, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=128
...
fio-3.33
Starting 12 processes
4ktest: Laying out IO file (1 file / 4096MiB)
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
Jobs: 12 (f=12): [r(12)][-.-%][r=21.9GiB/s][r=22.5k IOPS][eta 00m:00s]
4ktest: (groupid=0, jobs=12): err= 0: pid=57402: Sat Jun 22 13:54:40 2024
read: IOPS=22.5k, BW=22.0GiB/s (23.6GB/s)(48.0GiB/2184msec)
clat (usec): min=66, max=20228, avg=522.09, stdev=381.78
lat (usec): min=66, max=20228, avg=522.18, stdev=381.78
clat percentiles (usec):
| 1.00th=[ 212], 5.00th=[ 289], 10.00th=[ 318], 20.00th=[ 334],
| 30.00th=[ 351], 40.00th=[ 388], 50.00th=[ 594], 60.00th=[ 619],
| 70.00th=[ 627], 80.00th=[ 635], 90.00th=[ 652], 95.00th=[ 668],
| 99.00th=[ 807], 99.50th=[ 2409], 99.90th=[ 5997], 99.95th=[ 8291],
| 99.99th=[ 9634]
bw ( MiB/s): min=21502, max=23978, per=100.00%, avg=22612.05, stdev=90.23, samples=48
iops : min=21501, max=23975, avg=22610.00, stdev=90.19, samples=48
lat (usec) : 100=0.05%, 250=3.45%, 500=41.31%, 750=54.01%, 1000=0.35%
lat (msec) : 2=0.30%, 4=0.15%, 10=0.39%, 20=0.01%, 50=0.01%
cpu : usr=1.03%, sys=93.71%, ctx=1708, majf=4, minf=111
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=49152,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=128
Run status group 0 (all jobs):
READ: bw=22.0GiB/s (23.6GB/s), 22.0GiB/s-22.0GiB/s (23.6GB/s-23.6GB/s), io=48.0GiB (51.5GB), run=2184-2184msec
I checked the MS recommendations concerning SMB performance and (excluding SMB multichannel which I can not activate with my client hardware as far as I see) nothing changed with those suggestions.
I wondered if the onboard SATA controller is just not up to the task since I do not use any HBA and there have been worrying reports on Reddit (much higher cap though) but internal copies seem fine.
I wondered if any of the network hardware is just garbage but I think I ruled that out on server side by testing all of them. I do not know what to do about the probably less-than-ideal Realtek stuff on the client side.
It does not look like a caching issue to me - test files have been ~10 GB, free RAM is available, disks should be faster than 1 Gbit/s on all sides, transfer speed is pretty constant.
It does not really look like a network issue to me - reads seem fine.
Questions:
- Is my expectation just too unrealistic? Am I wrong in expecting >2Gbit/s write performance? Is this just ZFS being ZFS (i.e. not optimized for performance)?
- How can I find out if this is a network/client/server/SMB/ZFS issue?
I am feeling a little bit lost. Any help or directions would be appreciated!
Hardware and TrueNAS config:
Client:
Intel Core i5-13600K
ASUS TUF GAMING B760M-PLUS D4
2x 16 GB DDR4 RAM
WD Red SN700 4 TB
Server:
TrueNAS Scale Dragonfish-24.04.1.1
Intel Core i5-12500
ASUS Pro WS W680-ACE
2x 32 GB ECC RAM
Samsung SSD 980 PRO 500GB
8x WD_Red_SA500_2.5_2TB
Config:
- Storage:
Data VDEVs
1 x RAIDZ1 | 7 wide | 1.82 TiB
Metadata VDEVs
VDEVs not assigned
Log VDEVs
VDEVs not assigned
Cache VDEVs
VDEVs not assigned
Spare VDEVs
1 x 1.82 TiB
Dedup VDEVs
VDEVs not assigned
- 1 dataset with a child SMB share (Dataset preset: Generic, child dataset preset: SMB, Purpose: default share parameters)
- Network:
--- testing: 2.5 Gbit/s interface of server connected directly to 2.5 Gbit/s interface of client, static IPs
--- production: switch TP-Link SG3210X (2x SFP+ 10 GBit/s, 8x RJ45 2.5 Gbit/s) connected to Intel X520 on server (LACP)