For private I have a powerful NAS 96GB memory 16 core AM4 processor ConnectX4-Nic used at 10G.
However for some not understood reason, I do not manage 10G speed up and down to the NAS when using iSCSI or NVME-oF via TCP (RDMA not supported in the community edition). Using my ‘superfast’ NVME-pool.
With both protocols I can write about 5G towards the NAS and 9G towards my PC. Not extremely bad of course but why o why not 10G up and down!!
(testing using SMB, which is less complicated needs less writes does provide nearly 10G up and down)
I did find two interesting blog posts in this forum:
- one complaining about NVMEoF performance https://forums.truenas.com/t/nvme-over-tcp-device-slower-than-expected/59019 Which seems to be solved by specifying the --nr-io-queues flag subsystem with –nr-io-queues 4 added (I have no idea how to do that)
- one related to testing pool speed Performance test | TrueNAS Community
So I decided to perform the above performance test.
Run a 60-second test of 1MB writes. “how fast could I copy a large, multi-gigabyte video to the system?” If you plan to use it for smaller files, adjust the bs value.
CD to a dataset. /mnt//
sudo fio --ramp_time=5 --gtod_reduce=1 --numjobs=1 --bs=1M --size=100G --runtime=60s --readwrite=write --name=testfile
Test results
Here the result of the test on three different dataset types
• Sata SSD
WRITE: bw=448MiB/s (470MB/s), 448MiB/s-448MiB/s (470MB/s-470MB/s), io=26.3GiB (28.3GB), run=60130-60130msec
• PCIE4 NVME SSD
WRITE: bw=773MiB/s (811MB/s), 773MiB/s-773MiB/s (811MB/s-811MB/s), io=45.4GiB (48.8GB), run=60162-60162msec
• Raid Z1 Pool (4 drives + 2x NVME Special VDEVS
WRITE: bw=413MiB/s (433MB/s), 413MiB/s-413MiB/s (433MB/s-433MB/s), io=24.2GiB (26.0GB), run=60013-60013msec
What really surprised me is the very limited performance differences!! The NVME should outperform the SATA SSD and the RAIDZ1 by very big margin, what is not the case.
I can imagine that the ram-cache is spooling the test
If so is there a better one?
Bottom line
The bottom line is of course that I expect better performance when using the NVME-based pool in combination with the powerfull cpu.
So I wonder what is the bottle neck and how to get better performance.
Second question is how to specifying the --nr-io-queues and is that a good idea?