Hello,
I just set up my first NAS using TrueNAS, and the performance is below my expectations. I’d like to ask about potential problems, as online searches didn’t give me a clear answer.
Environment
The following is my environment:
- CPU: E5-2686 v4
- Memory: 64GB ECC DDR4
- TrueNAS version: 25.10.1
- 2.5Gb Ethernet connection.
For storage, I have 5 14TB Seagate dual actuator drives (Mach. 2). Because the interface is SATA, I can’t create the pool in the GUI if I want to utilize the full performance (the first actuator is in charge of the first part of the drive, and the second actuator is for the second half, so I need to use partitions to make the vdevs). I used the script mentioned here to partition each of the drives into two equal parts and created the pool in the command line.
And my pool structure looks like this
zpool create zp0 \
raidz2 \
/dev/disk/by-id/ata-ST14000NM0121_ZKL2QA73-part1 \
/dev/disk/by-id/ata-ST14000NM0121_ZKL2QVNV-part1 \
/dev/disk/by-id/ata-ST14000NM0121_ZKL2QZXM-part1 \
/dev/disk/by-id/ata-ST14000NM0121_ZKL2R3HC-part1 \
/dev/disk/by-id/ata-ST14000NM0121_ZKL2RJ1V-part1 \
raidz2 \
/dev/disk/by-id/ata-ST14000NM0121_ZKL2QA73-part2 \
/dev/disk/by-id/ata-ST14000NM0121_ZKL2QVNV-part2 \
/dev/disk/by-id/ata-ST14000NM0121_ZKL2QZXM-part2 \
/dev/disk/by-id/ata-ST14000NM0121_ZKL2R3HC-part2 \
/dev/disk/by-id/ata-ST14000NM0121_ZKL2RJ1V-part2 \
cache \
/dev/disk/by-id/nvme-HUSMR7638BDP3Y1_SDM000079040-part1
Basically, the first partition of each drive goes to the first vdev of raidz2, and the second partition of each drive goes to the second vdev of raidz2. There’s also a ssd L2ARC (256GB).
The idea is that all vdev1 drives use the first actuator and all vdev2 drives use the second actuator, so this should reach a higher performance.
Theoretical performance
According to this article, the theoretical performance of a single raidz vdev can be calculated as follows:
N -wide RAIDZ, parity level p :
- Read IOPS: Read IOPS of single drive
- Write IOPS: Write IOPS of single drive
- Streaming read speed: (N – p) * Streaming read speed of single drive
- Streaming write speed: (N – p) * Streaming write speed of single drive
- Storage space efficiency: (N – p)/N
- Fault tolerance: 1 disk per vdev for Z1, 2 for Z2, 3 for Z3 [p]
And adding vdevs will increase both IOPS and streaming rw speed. Since my application will mostly rely on streaming rw speed, I’ll focus only on this part.
I have tested with fio that, each independent partition of the Mach. 2 drive can reach a streaming rw speed of 200~250MB/s. I’ll use the lower end of 200MB/s for simplicity of calculation.
For each of the raidz2 vdevs, the theoretical streaming speed should be (5-2) * 200 = 600MB/s. And since there are two vdevs, the overall streaming speed should be 600 * 2 = 1200MB/s.
P.S. The fio parameters I used for testing:
[global]
direct=1
time_based=1
runtime=30
ramp_time=3
thread=1
group_reporting=0
ioengine=io_uring
[seqwrite_act0]
rw=write
bs=4m
iodepth=2
size=50%
[seqwrite_act1]
rw=write
bs=4m
iodepth=2
offset=50%
size=50%
[seqread_act0]
stonewall=1
rw=read
bs=4m
iodepth=2
size=50%
[seqread_act1]
rw=read
bs=4m
iodepth=2
offset=50%
size=50%
Testing
SMB
I set up SMB with a dataset of record size = 4M, and the rest are default.
Sending a single large file can reach the speed of ~285MB/s, which is pretty much the upper limit of 2.5Gb network (to prevent the bottleneck of HDD, the files I used are transferred from SSD).
However, sending a folder of RAW (~20MB) and jpg ~(5-10MB) files can only reach a speed of around 220-230MB/s.
This speed is much lower than the theoretical prediction, so I started to investigate further.
TN-Bench
I read this post and tried out the benchmarking tools (it’s designed really well). Here’s my results
############################################################
# Testing Pool: zp0 #
############################################################
* Creating test dataset for pool: zp0
✓ Dataset zp0/tn-bench created successfully.
============================================================
Space Verification
============================================================
* Available space: 38435.71 GiB
* Space required: 720.00 GiB (20 GiB/thread × 36 threads)
✓ Sufficient space available - proceeding with benchmarks
============================================================
Testing Pool: zp0 - Threads: 1
============================================================
* Running DD write benchmark with 1 threads...
* Run 1 write speed: 288.37 MB/s
✓ Average write speed: 288.37 MB/s
* Running DD read benchmark with 1 threads...
* Run 1 read speed: 6916.52 MB/s
✓ Average read speed: 6916.52 MB/s
============================================================
Testing Pool: zp0 - Threads: 9
============================================================
* Running DD write benchmark with 9 threads...
* Run 1 write speed: 651.72 MB/s
✓ Average write speed: 651.72 MB/s
* Running DD read benchmark with 9 threads...
* Run 1 read speed: 2094.20 MB/s
✓ Average read speed: 2094.20 MB/s
============================================================
Testing Pool: zp0 - Threads: 18
============================================================
* Running DD write benchmark with 18 threads...
* Run 1 write speed: 618.07 MB/s
✓ Average write speed: 618.07 MB/s
* Running DD read benchmark with 18 threads...
* Run 1 read speed: 2142.09 MB/s
✓ Average read speed: 2142.09 MB/s
Note that, since I didn’t set zfs_arc_max to 1 to prevent ARC as instructed by TN-Bench, the read speeds are inaccurate.
As you can see, the maximum write speed happens when thread=9. It is still only about 650MB/s, however.
Also, in a single-thread scenario, the speed dropped to ~290MB/s, and I don’t know why that is happening.
fio
The configuration:
[global]
directory=/mnt/zp0/media
filename=fio_testfile
size=50g
time_based=1
runtime=30
ramp_time=3
ioengine=io_uring
direct=1
thread=1
numjobs=1
stonewall=1
[seqwrite_4m_q8]
rw=write
bs=4m
iodepth=8
[seqread_4m_q8]
rw=read
bs=4m
iodepth=8
[randwrite_4k_q16]
rw=randwrite
bs=4k
iodepth=16
[randread_4k_q16]
rw=randread
bs=4k
iodepth=16
The result:
seqwrite_4m_q8: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=io_uring, iodepth=8
seqread_4m_q8: (g=1): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=io_uring, iodepth=8
randwrite_4k_q16: (g=2): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=16
randread_4k_q16: (g=3): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=16
fio-3.33
Starting 4 threads
Jobs: 1 (f=1): [_(3),r(1)][59.6%][r=2932KiB/s][r=733 IOPS][eta 01m:30s]
seqwrite_4m_q8: (groupid=0, jobs=1): err= 0: pid=268715: Sun Jan 25 20:43:00 2026
write: IOPS=55, BW=224MiB/s (235MB/s)(6756MiB/30132msec); 0 zone resets
slat (usec): min=36, max=192, avg=103.81, stdev=23.43
clat (msec): min=132, max=291, avg=142.82, stdev=21.28
lat (msec): min=133, max=291, avg=142.92, stdev=21.28
clat percentiles (msec):
| 1.00th=[ 134], 5.00th=[ 134], 10.00th=[ 136], 20.00th=[ 136],
| 30.00th=[ 136], 40.00th=[ 136], 50.00th=[ 136], 60.00th=[ 138],
| 70.00th=[ 138], 80.00th=[ 142], 90.00th=[ 150], 95.00th=[ 190],
| 99.00th=[ 253], 99.50th=[ 257], 99.90th=[ 275], 99.95th=[ 292],
| 99.99th=[ 292]
bw ( KiB/s): min=147456, max=246252, per=99.99%, avg=229572.70, stdev=21295.49, samples=60
iops : min= 36, max= 60, avg=56.03, stdev= 5.20, samples=60
lat (msec) : 250=99.35%, 500=1.07%
cpu : usr=0.62%, sys=0.02%, ctx=1689, majf=0, minf=0
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=99.9%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,1682,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=8
seqread_4m_q8: (groupid=1, jobs=1): err= 0: pid=268772: Sun Jan 25 20:43:00 2026
read: IOPS=703, BW=2814MiB/s (2951MB/s)(82.5GiB/30015msec)
slat (nsec): min=1025, max=50005, avg=2990.44, stdev=3541.67
clat (usec): min=248, max=423912, avg=11371.81, stdev=19383.19
lat (usec): min=250, max=423933, avg=11374.80, stdev=19384.41
clat percentiles (usec):
| 1.00th=[ 255], 5.00th=[ 273], 10.00th=[ 281], 20.00th=[ 285],
| 30.00th=[ 293], 40.00th=[ 310], 50.00th=[ 482], 60.00th=[ 2409],
| 70.00th=[ 14353], 80.00th=[ 22938], 90.00th=[ 35390], 95.00th=[ 42730],
| 99.00th=[ 81265], 99.50th=[109577], 99.90th=[173016], 99.95th=[206570],
| 99.99th=[299893]
bw ( MiB/s): min= 737, max=12776, per=100.00%, avg=2815.51, stdev=3316.43, samples=60
iops : min= 184, max= 3194, avg=703.80, stdev=829.09, samples=60
lat (usec) : 250=0.13%, 500=50.04%, 750=3.17%, 1000=0.34%
lat (msec) : 2=5.02%, 4=2.16%, 10=3.69%, 20=10.68%, 50=21.35%
lat (msec) : 100=2.83%, 250=0.58%, 500=0.04%
cpu : usr=0.32%, sys=0.40%, ctx=20902, majf=0, minf=0
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=21112,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=8
randwrite_4k_q16: (groupid=2, jobs=1): err= 0: pid=268858: Sun Jan 25 20:43:00 2026
write: IOPS=97, BW=391KiB/s (400kB/s)(11.5MiB/30129msec); 0 zone resets
slat (nsec): min=1234, max=41558, avg=4846.25, stdev=3295.09
clat (msec): min=40, max=821, avg=164.03, stdev=87.42
lat (msec): min=40, max=821, avg=164.04, stdev=87.42
clat percentiles (msec):
| 1.00th=[ 72], 5.00th=[ 88], 10.00th=[ 99], 20.00th=[ 110],
| 30.00th=[ 121], 40.00th=[ 129], 50.00th=[ 138], 60.00th=[ 148],
| 70.00th=[ 163], 80.00th=[ 192], 90.00th=[ 284], 95.00th=[ 342],
| 99.00th=[ 535], 99.50th=[ 676], 99.90th=[ 785], 99.95th=[ 785],
| 99.99th=[ 818]
bw ( KiB/s): min= 104, max= 584, per=99.75%, avg=390.60, stdev=132.44, samples=60
iops : min= 26, max= 146, avg=97.63, stdev=33.09, samples=60
lat (msec) : 50=0.07%, 100=11.67%, 250=75.39%, 500=12.32%, 750=0.96%
lat (msec) : 1000=0.10%
cpu : usr=0.08%, sys=0.14%, ctx=4133, majf=0, minf=0
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,2930,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
randread_4k_q16: (groupid=3, jobs=1): err= 0: pid=269031: Sun Jan 25 20:43:00 2026
read: IOPS=707, BW=2831KiB/s (2899kB/s)(83.1MiB/30074msec)
slat (nsec): min=1081, max=22709, avg=2397.51, stdev=1599.04
clat (usec): min=6, max=501254, avg=22636.54, stdev=37739.07
lat (usec): min=7, max=501256, avg=22638.93, stdev=37739.08
clat percentiles (usec):
| 1.00th=[ 12], 5.00th=[ 13], 10.00th=[ 14], 20.00th=[ 15],
| 30.00th=[ 18], 40.00th=[ 22], 50.00th=[ 24], 60.00th=[ 15795],
| 70.00th=[ 28443], 80.00th=[ 42730], 90.00th=[ 67634], 95.00th=[ 94897],
| 99.00th=[168821], 99.50th=[204473], 99.90th=[295699], 99.95th=[350225],
| 99.99th=[413139]
bw ( KiB/s): min= 1288, max= 3424, per=100.00%, avg=2836.90, stdev=326.69, samples=60
iops : min= 322, max= 856, avg=709.17, stdev=81.64, samples=60
lat (usec) : 10=0.10%, 20=34.92%, 50=20.70%, 100=0.21%, 250=0.04%
lat (usec) : 500=0.01%, 750=0.01%, 1000=0.02%
lat (msec) : 2=0.12%, 4=0.18%, 10=1.05%, 20=5.55%, 50=21.00%
lat (msec) : 100=11.73%, 250=4.18%, 500=0.24%, 750=0.01%
cpu : usr=0.28%, sys=0.36%, ctx=21123, majf=0, minf=0
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=21271,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
WRITE: bw=224MiB/s (235MB/s), 224MiB/s-224MiB/s (235MB/s-235MB/s), io=6756MiB (7084MB), run=30132-30132msec
Run status group 1 (all jobs):
READ: bw=2814MiB/s (2951MB/s), 2814MiB/s-2814MiB/s (2951MB/s-2951MB/s), io=82.5GiB (88.6GB), run=30015-30015msec
Run status group 2 (all jobs):
WRITE: bw=391KiB/s (400kB/s), 391KiB/s-391KiB/s (400kB/s-400kB/s), io=11.5MiB (12.1MB), run=30129-30129msec
Run status group 3 (all jobs):
READ: bw=2831KiB/s (2899kB/s), 2831KiB/s-2831KiB/s (2899kB/s-2899kB/s), io=83.1MiB (87.2MB), run=30074-30074msec
The result is only 235MB/s for streaming write. In my understanding, using ioengine=io_uring and a larger iodepth like 8 can enable larger concurrency. But the result is even worse than the single-thread performance measured using TN-Bench.
My Questions
- Why is the streaming write speed significantly lower than the theoretical value? Did I set up the pool the wrong way? Or are there any other bottlenecks in my system, like poor CPU single-core performance?
- In TN-Bench, why does running
ddin multiple threads increase the performance? I thought that, for writing a single large file, this shouldn’t matter that much. - If concurrency is needed in reaching a better streaming write speed, then why is a larger
iodepthnot helpful when testing using fio (I testediodepth=1and the result is 233MB/s)?
Thank you very much for reading this! Any suggestions or recommendations of resources/articles are helpful.