Another Slow nvme Write Speed Thread (Slower than SATA HDDs)

Relatively new to TrueNAS, please pardon any oversight. I’ve read a number of threads regarding slow write speeds on NVMe drives but most of them seemed to involve networking. The issue I’m facing is directly on TrueNAS Community (25.04.0) itself, verified with fio.

First, a bit about the setup:

  • CPU: Intel i7-8700k
  • Motherboard: ASRock Z370 Extreme4
  • RAM: 4 * CORSAIR VENGEANCE LPX 16GB DDR4 RAM 3200MHz
  • NVMe: 2 * 500GB m.2 NVMe drives (populated on the motherboard slots)
  • HDD: 6 * Seagate Exos X18 16TB 7.2K RPM SATA 6Gb/s 3.5in Hard Drive

The 6 SATA HDDs are plugged into an Lsi Logic Controller Card 9305-16i 16-port Sas 12gb/s Pci-express 3.0 Host Bus Adapter in the PCIe x 16 slot (where a graphics card would typically reside).

All SATA ports on the motherboard are unpopulated:

If an M.2 drive is installed in M.2_1, the SATA ports 3_0 and 3_1 are disabled, and if an M.2 drive is installed in M.2_2, the SATA ports 3_4 and 3_5 are disabled.

TrueNAS is natively installed (not virtualized).

The 2 NVMe drives are:

  • Samsung 960 EVO 500GB (PCIe 3.0 x4, NVMe)
    • Expected Performance:
      • Sequential Read: Up to ~3200 MB/s
      • Sequential Write: Up to ~1800 MB/s
      • Random Read/Write: ~330K/300K IOPS
  • Crucial P3 Plus 500GB (PCIe 4.0 x4, NVMe):
    • Expected Performance:
      • Sequential Read: Up to ~5000 MB/s
      • Sequential Write: Up to ~3600 MB/s
      • Random Read/Write: ~650K/800K IOPS

There are 2 pools: boot-pool and nfs. The boot-pool is a mirror comprised of the 2 500GB NVMe drives:

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:04 with 0 errors on Tue Jun 24 03:45:05 2025
config:

	NAME           STATE     READ WRITE CKSUM
	boot-pool      ONLINE       0     0     0
	   mirror-0     ONLINE       0     0     0
	     nvme1n1p3  ONLINE       0     0     0
	     nvme0n1p3  ONLINE       0     0     0

The second pool, nfs, is a raidz2 comprised of all 6 Seagate drives mentioned above.

Here are the results of an fio test on the NVMe pool - with read and write speeds of 440MB/s:

root@truenas[~]# fio --bs=128k --direct=1 --directory=/mnt/nfs/nvme/downloads/ --gtod_reduce=1 --ioengine=posixaio --iodepth=32 --group_reporting --name=randrw --numjobs=16 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based 
randrw: (g=0): rw=randrw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=posixaio, iodepth=32
...
fio-3.33
Starting 16 processes
Jobs: 16 (f=16): [m(16)][100.0%][r=222MiB/s,w=223MiB/s][r=1772,w=1784 IOPS][eta 00m:00s]
randrw: (groupid=0, jobs=16): err= 0: pid=1205172: Sun Jun 29 13:27:42 2025
  read: IOPS=3351, BW=419MiB/s (440MB/s)(24.7GiB/60320msec)
   bw (  KiB/s): min=162577, max=1007210, per=100.00%, avg=430880.11, stdev=10956.70, samples=1920
   iops        : min= 1269, max= 7865, avg=3364.80, stdev=85.62, samples=1920
  write: IOPS=3354, BW=420MiB/s (440MB/s)(24.7GiB/60320msec); 0 zone resets
   bw (  KiB/s): min=213446, max=1017983, per=100.00%, avg=431312.76, stdev=10284.48, samples=1920
   iops        : min= 1665, max= 7947, avg=3368.17, stdev=80.37, samples=1920
  cpu          : usr=0.41%, sys=0.13%, ctx=90940, majf=0, minf=585
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=1.1%, 16=72.9%, 32=26.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=93.9%, 8=3.7%, 16=2.0%, 32=0.3%, 64=0.0%, >=64=0.0%
     issued rwts: total=202154,202367,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=419MiB/s (440MB/s), 419MiB/s-419MiB/s (440MB/s-440MB/s), io=24.7GiB (26.5GB), run=60320-60320msec
  WRITE: bw=420MiB/s (440MB/s), 420MiB/s-420MiB/s (440MB/s-440MB/s), io=24.7GiB (26.6GB), run=60320-60320msec

Confirmation /mnt/nfs/nvme/downloads/ is in fact the NVMe drives:

# zfs list boot-pool/nvme/downloads
NAME                       USED  AVAIL  REFER  MOUNTPOINT
boot-pool/nvme/downloads  16.4M   400G  16.4M  /mnt/nfs/nvme/downloads

Here’s the output of the same test on the raidz2 HDD pool (1000MB/s ??):

root@truenas[~]# fio --bs=128k --direct=1 --directory=/mnt/nfs/userdata/ --gtod_reduce=1 --ioengine=posixaio --iodepth=32 --group_reporting --name=randrw --numjobs=16 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based 
randrw: (g=0): rw=randrw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=posixaio, iodepth=32
...
fio-3.33
Starting 16 processes
Jobs: 16 (f=16): [m(16)][100.0%][r=1203MiB/s,w=1209MiB/s][r=9622,w=9674 IOPS][eta 00m:00s] 
randrw: (groupid=0, jobs=16): err= 0: pid=1214483: Sun Jun 29 13:30:16 2025
  read: IOPS=7681, BW=961MiB/s (1007MB/s)(56.4GiB/60114msec)
   bw (  KiB/s): min=379745, max=1717750, per=100.00%, avg=988377.85, stdev=17970.56, samples=1904
   iops        : min= 2966, max=13418, avg=7720.12, stdev=140.40, samples=1904
  write: IOPS=7666, BW=959MiB/s (1005MB/s)(56.3GiB/60114msec); 0 zone resets
   bw (  KiB/s): min=420359, max=1683082, per=100.00%, avg=986025.17, stdev=17259.12, samples=1904
   iops        : min= 3282, max=13148, avg=7701.59, stdev=134.85, samples=1904
  cpu          : usr=0.72%, sys=0.20%, ctx=225568, majf=0, minf=584
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=3.5%, 16=72.6%, 32=23.7%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=94.5%, 8=2.7%, 16=2.0%, 32=0.8%, 64=0.0%, >=64=0.0%
     issued rwts: total=461770,460882,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=961MiB/s (1007MB/s), 961MiB/s-961MiB/s (1007MB/s-1007MB/s), io=56.4GiB (60.6GB), run=60114-60114msec
  WRITE: bw=959MiB/s (1005MB/s), 959MiB/s-959MiB/s (1005MB/s-1005MB/s), io=56.3GiB (60.4GB), run=60114-60114msec

Needless to say, 400MB/s seems abysmally slow for NVMe drives, I would expect closer to ~1200-1800MB/s given the 960 EVO is the laggard of the two.

I’ve turned compression and atime off on boot-pool/nvme/downloads and set sync to standard. As far as I can tell the i7-8700k is barely breaking a sweat so a CPU bottleneck seems unlikely, especially when considering the HDD’s are supposedly reaching 1000MB/s?

Here’s another fio test on the NVMe drives with different parameters that shows 5092MB/s reads and 302MB/s writes:

root@truenas[~]# cat nvme-seq.fio 
[global]
name=nvme-seq-test
time_based
ramp_time=5
runtime=30
ioengine=libaio
direct=1
bs=1M
iodepth=64
group_reporting=1
size=10G

[read-test]
readwrite=read
filename=/mnt/nfs/nvme/downloads/test.img
numjobs=1

[write-test]
readwrite=write
filename=/mnt/nfs/nvme/downloads/test.img
numjobs=1
root@truenas[~]# fio nvme-seq.fio 
read-test: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=64
write-test: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=64
fio-3.33
Starting 2 processes
read-test: Laying out IO file (1 file / 10240MiB)
Jobs: 2 (f=1): [f(1),W(1)][100.0%][r=5949MiB/s,w=229MiB/s][r=5949,w=229 IOPS][eta 00m:00s]
read-test: (groupid=0, jobs=2): err= 0: pid=1639560: Sun Jun 29 15:23:41 2025
  read: IOPS=4853, BW=4856MiB/s (5092MB/s)(142GiB/30001msec)
    slat (usec): min=57, max=4848, avg=204.39, stdev=312.79
    clat (usec): min=3, max=292916, avg=12977.56, stdev=18802.01
     lat (usec): min=139, max=297490, avg=13181.95, stdev=19087.15
    clat percentiles (msec):
     |  1.00th=[    6],  5.00th=[    7], 10.00th=[    7], 20.00th=[    7],
     | 30.00th=[    7], 40.00th=[   14], 50.00th=[   14], 60.00th=[   14],
     | 70.00th=[   14], 80.00th=[   14], 90.00th=[   15], 95.00th=[   15],
     | 99.00th=[   40], 99.50th=[  209], 99.90th=[  279], 99.95th=[  288],
     | 99.99th=[  292]
   bw (  MiB/s): min=  222, max= 8726, per=99.58%, avg=4835.47, stdev=1979.97, samples=59
   iops        : min=  222, max= 8726, avg=4835.46, stdev=1979.99, samples=59
  write: IOPS=285, BW=288MiB/s (302MB/s)(8634MiB/30001msec); 0 zone resets
    slat (usec): min=2273, max=48652, avg=3492.68, stdev=720.93
    clat (usec): min=3, max=292816, avg=219586.38, stdev=32207.07
     lat (msec): min=4, max=297, avg=223.07, stdev=32.64
    clat percentiles (msec):
     |  1.00th=[  153],  5.00th=[  169], 10.00th=[  178], 20.00th=[  186],
     | 30.00th=[  213], 40.00th=[  218], 50.00th=[  220], 60.00th=[  224],
     | 70.00th=[  230], 80.00th=[  253], 90.00th=[  262], 95.00th=[  271],
     | 99.00th=[  288], 99.50th=[  288], 99.90th=[  292], 99.95th=[  292],
     | 99.99th=[  292]
   bw (  KiB/s): min=227328, max=407552, per=99.66%, avg=293701.64, stdev=41667.91, samples=59
   iops        : min=  222, max=  398, avg=286.71, stdev=40.68, samples=59
  lat (usec)   : 4=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=30.54%, 20=62.67%, 50=0.54%
  lat (msec)   : 100=0.15%, 250=4.60%, 500=1.56%
  cpu          : usr=1.22%, sys=47.23%, ctx=13257, majf=0, minf=75
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=145619,8570,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=4856MiB/s (5092MB/s), 4856MiB/s-4856MiB/s (5092MB/s-5092MB/s), io=142GiB (153GB), run=30001-30001msec
  WRITE: bw=288MiB/s (302MB/s), 288MiB/s-288MiB/s (302MB/s-302MB/s), io=8634MiB (9053MB), run=30001-30001msec
root@truenas[~]# 

What am I missing here? Why are the NVMe drives performing so poorly?

First you need to do your test with datasets larger than your RAM size (64GB) otherwise you are hitting ram cache.

Any reason for not using the on-board SATA ports?

Pardon my ignorance, I’m confused. Are you suggesting not hitting the ram cache will improve NVMe write performance? I would think DDR4-3200 RAM would perform better better than 400MB/s? DDR4-3200 has a peak transfer rate of 25,600 MB/s. Is what I’m dealing with a RAM problem? I hadn’t considered that since the HDD tests came in at around 1000MB/s.

So I ran this fio test, which created 16 * 8GB files for a total of 128GB and the performance is even worse:

sudo fio --bs=128k --direct=1 --directory=/mnt/nfs/nvme/downloads/ --gtod_reduce=1 --ioengine=posixaio --iodepth=32 --group_reporting --name=randrw --numjobs=16 --ramp_time=10 --runtime=60 --rw=randrw --size=8192M --time_base
   READ: bw=210MiB/s (220MB/s), 210MiB/s-210MiB/s (220MB/s-220MB/s), io=12.6GiB (13.6GB), run=61730-61730msec
  WRITE: bw=210MiB/s (220MB/s), 210MiB/s-210MiB/s (220MB/s-220MB/s), io=12.7GiB (13.6GB), run=61730-61730msec

Yes.

If an M.2 drive is installed in M.2_1, the SATA ports 3_0 and 3_1 are disabled, and if an M.2 drive is installed in M.2_2, the SATA ports 3_4 and 3_5 are disabled.

With the 2 NVMe drives populated, only 2 of the 6 SATA ports on the motherboard are available. There are a total of 2 NVMe drives and 6 HDD’s, so the 16 port PCI SAS card seemed like a decent solution, particularly when considering future expansion. I’ve disabled the motherboard SATA controller and forced the M.2 slots to “M.2” in the BIOS (as opposed to leaving them on “Auto”) but this failed to yield any performance improvements.

I think we need to have you run some commands. It sound like you are using the boot_pool for more than just the boot_pool.
You only list two pools above, NFS and BOOT-POOL.

Thanks to Protopia for the following.
‘I have a standard set of commands I ask people to run to provide a detailed breakdown of the hardware, so please run these and post the output here (with the output of each command inside a separate </> preformatted text box) so that we can all see the details:’

lsblk -bo NAME,MODEL,ROTA,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
sudo zpool status -v
sudo zpool import
lspci
sudo storcli show all
sudo sas2flash -list
sudo sas3flash -list
1 Like