I recently got a bunch of used Intel NVME SSDs (DC P4510) to use with zfs and Truenas. I had tried 980 Pros before and the results were very disappointing, and I assumed it was to do with old, second hand or non-enterprise drives. As a test, I tried 10x Intel in stripe in Truenas with all default settings expect for compression turned off. The results on a Dell 24 NVME bay R740xd server were very disappointing so I tried on a newer Gigabyte server but got much the same results. I also tried ButterFS on Debain just to see if it was a ZFS issue but got similar results.
In all cases and all machines that I tested, direct speeds on a single NVME drive without ZFS were much faster than a 10x ZFS or ButterFS stripe. This appears to be an NVME specific issue as I got very respectable speeds (R: 209k/W: 89.7k IOPS) with 10x stripe on a 12G SAS JBOD shelf using 5x PM1643a and 5x WD SC550 SAS SSDs and a HBA.
Results:
System / Configuration
OS / Mode
Read IOPS
Read BW
Write IOPS
Write BW
Avg Latency
Dell R740XD (10x Stripe)
TrueNAS
12.5k
97 MiB/s
5.3k
42 MiB/s
28.6 ms
Dell R740XD (Single Disk)
Debian Raw
227.0k
1771 MiB/s
97.2k
759 MiB/s
1.4 ms
G293-S42-AAP1 (2x Stripe)
TrueNAS
13.5k
105 MiB/s
5.7k
45 MiB/s
26.6 ms
G293-S42-AAP1 (Single Disk)
TrueNAS Raw
199.0k
1553 MiB/s
85.2k
666 MiB/s
1.8 ms
And the specs:
Feature
Dell PowerEdge R740xd
Gigabyte G293-S42-AAP1
CPU
Dual Intel Xeon Gold 6130 (Skylake)
Single Intel Xeon Gold 6538Y+ (Emerald Rapids)
Cores/Threads
16 Cores / 32 Threads
32 Cores / 64 Threads
Base Clock
2.10 GHz
2.20 GHz
Memory Capacity
16GB (2 x 8GB)
32GB (1 x 32GB)
Memory Type
DDR4-2133 ECC
DDR5-4800 ECC
Storage Tier
NVMe (10-Disk Stripe)
NVMe (2-Disk Stripe)
OS Environment
TrueNAS / Debian
TrueNAS / Debian
I tried all sort of things but everything I tried only made negligible difference including block size and sync off. The FIO command I used for all result including the SAS JBOD was:
Can anyone shed any light on this or point me in the right direction for getting the best speeds form these NVME drives? I can’t find much online specifically about ZFS and NVMEs.
What do you mean when you say TrueNAS vs. TrueNAS raw?
You make no mention of virtualisation but specifically saying that one thing is raw implies that something isn’t. If that is an accurate understanding, please elaborate on how the virtualisation is configured, given that your single disk TrueNAS raw setup is within ~10% to your single disk Debian raw.
No virtualisation, just straight on the trueness machine via SSH or shell in the UI. I didn’t want virtualisation or network overheads to confuse the result. When I say raw I mean on /dev/nvme1n1 instead rather than the pool which was /mnt/nvme/testfile. Hope that makes sense.
In that case I would guess would be that it’s related to some form of some oddity in the configuration of the drives on the 24-slot NVMe backplane. For example, the system appears to benefit from populating specific slots in order to control which CPU(s) gets the load.
I had considered that and that’s why I tried 2x stripe as well. I also tried moving the drives to all be on the PCIe lanes from one CPU only. The Gigabyte server only has one CPU in it, so only 2 of the 4 NVME bays work anyway.
When I tried the 980 Pros a while back, I was using virtualisation and I tried different NUMA settings but it all made no difference. This was on an old R730xd which still performed very well with SAS SSDs. Until I saw the performance of the SAS SSDs, I assumed that ZFS was not best suited to SSDs.
It’s not something I really looked at, but Gemini sent me down a rabbit hole of high core vs higher vs faster single core performance. So I did check a couple of times and CPU was 6% with Debian and I think 23% with truenas on one of the tests I looked at. I only checked on the Dell R740xd. I don’t imagine nvme storage is more intensive on the CPU then than SAS though.
One of the things that sticks out is the latency differences which seem quite high and I imagine bad for IOPS.
IMO, you should repeat the test with disabled compression anyway. Also, some sources state that database datasets should be tuned with logbias=throughput. OTOH, it probably would make latency even worse…