Slow write speed with high end config

Hi,

My current configuration with TrueNAS Scale 25.04.1 includes:

  • Supermicro CSE-847 X10DRI-T4+ 4U Server 36x
  • 2 x Intel Xeon E5-2673V4 SR2KE 20C. Peeks at 8% CPU when making intensive writes.
  • 128GB ECC memory
  • LSI SAS9400-16i controller
  • Fujitsu MCX455A-ECAT 100GbE Single Port QSFP28 PCIe ConnectX-4 VPI NIC CA05950 network card
  • Network switch is a Mikrotik XQ+85MP01D Compatible 100GBASE-SR4 QSFP28.
  • 18 Toshiba 22TB Enterprise drives configured as RAIDZ3 with 3 parity drives

Initially, I tested a dRAID3, but that lead to too much capacity loss compared to RAIDZ3

I am getting only 600 MB/s write speed, that goes up to 800 when adding a metadata NVME 1TB SSD.

I tried a direct connection to the PC, raising the MTU to 9000, that did not change the speed at all.

Looking at Reporting → Disks, I see that I/O average for each drive is only 50MiB/s, constantly dropping up & down to 10 MiB/s

Any idea what can be the bottleneck and how to solve it?

Also, I tried a Mellanox ConnectX-2 10G, and the speed dropped to 113MB/s!

I am also using a QNAP NAS, and write speed with these 10G cards is 700MB/s with a 9 Disk array.

1 Like

How are you testing speeds?

Have you used fio ?

I copy large files (20 to 80 GB) from a Gen 5 NVME SSD to the TrueNAS SMB share using FreeFileSync. That gives me real life speed.

CrystalDiskMark gives for small 100 MB chunks 1140 MB/s, which is not realistic.

With 8GB chunks, it gives 600 MBs, which is coherent with large file copy results.

Note that at the moment, I don’t use an SSD metadata dev, I’m waiting for an order to arrive.

I never used FIO. At the moment, my main concern is large files single user transfers, FreeFileSync is perfect for that.

Is looks like your performance is about normal going by the results on a test over at calomel.org
They do a nice job of explaining how they tested and even have results for HD and SSD

https://calomel.org/zfs_raid_speed_capacity.html

They show a w speed of 567MB/s for an array similar to mine, but with old 4TB drives that are 4 times slower than mine.

I should be getting around 2000 MB/s instead of 800 (with a metada vdev).

I tried to also test with bonnie++, but that’s not installed, and TrueNAS makes it very difficult to install this kind of software, they don’t recommend to use the CLI. Using it can break things and leave to unpredictable results from what I read.

Have you tried setting different Record sizes for your Dataset and testing?

Yes, setting the record size at 1MB gives a bit of more speed. Going above actually decreases it.

What happens if an NVMe slog device is attached? Ie are ZIL writes slowing it down?

A Metadata VDEV speeds things up. A Log VDEV (is that what you mean by slog device?) does not change much if I remember well.

When I look at the flashing HDD usage LEDs, they are not up continuously. Even with large writes, they seem to be ON only about 50% of the time.

I tried this:

fio --name=test --filename=/mnt/pool/testfile --rw=write --bs=1M --size=10G --numjobs=4 --iodepth=32 --direct=1

gives:

WRITE: bw=3522MiB/s (3693MB/s), 880MiB/s-882MiB/s (923MB/s-925MB/s), io=40.0GiB (42.9GB), run=11604-11630msec

The 3693 MB/s is really good, what I was expecting. But I cannot achieve that with a network copy to the SMB share which peaks at 600 MB/s.

There is a bottleneck somewhere that prevents making copies at the max available speed.

A cue might be that the speed with a 10G adapter drops at 133 MB/s, which is barely 1G speed.

Might I suggest that you test your network for the bottleneck. It could be a piece of hardware that you don’t suspect. At a minimum it would prove that the network is works as it should.

I would also power down, disconnect all drives, and boot from something like Ubuntu Live CD for the testing. Run something like iperf3 and you will likely have some solid reliable results. This takes TrueNASand your pool out of the picture. If you find the network speed, then it is TrueNAS troubleshooting time. I don’t think your pool is the bottleneck.

Are you running on bare metal or on top of a hypervisor?

Good hunting, hope you solve this problem and then post the results of your new significantly faster speed test.

Yes, you’re 100% right, the bottleneck is my ConnectX-4 card, which uses an outdated firmware 7 years old. I can’t even find a working WinOF-2 version that could update these cards.
Buying several ConnectX-5 cards might be the solution, but it’s too expensive at the moment. I’ll stick with 10G for the moment.

Still, it would be nice to at least make the 10G card work with TrueNAS, it does work at 10G on Windows.
I would not need to use a 100G switch just to get 10G speed!
It’s a ConnectX-2 card, any idea?

Do you by any chance have unreasonable compression setting on your dataset?

No, I set up no compression at all.

Does Debian Bookworm have the driver to make this work?

Why do I ask? I think outside the box all too often. My thought is, if you can get Bookworm (the version SCALE currently uses) then you can possibly install the driver yourself. What I do not know is if you would need to rebuild the kernel.

It is something you could look into. I built smartmontools 7.5 and installed it on my CORE and SCALE systems. I had to build it twice as these are different operating systems, but it was very easy. I did not need to rebuild the kernel.

DISCLAIMER Do at your own risk.

Thanks, I just have no idea about Bookworm.

I just received a new PCI → NMVE adapter, and using a 1TB Metadata VDEV plus some final tweaks, I am getting between 1 and 1.2GBs speed when copying a large 100GB file.

That’s far from the 3.5 array internal speed, but it’s not too shabby either.

I probably would get a bit less with the 10G card now even it it worked, so I’ll stick to the not yet optimized 100G one.

If I can optimize it further, I’ll post the results.

18-wide single vdev, or two 9-wide vdevs striped?
The former option is wider than many would consider or recommand.

Do you mean “a metadata L2ARC” (which should be fine with your RAM), or a non-redundant :scream: “special vdev”?

What’s the firmware version?

1 Like

18-wide single vdev.

You’re right, it’s a bit much, but after testing, resilvering when reaching 80% capacity should take 3 days, which seems acceptable with 3 parity drives. Also, all my data is backup up on LTO tapes.

I tried first a dRAID3 pool more suitable to large arrays. But the overhead is too large, it’s like having 2 drives less.

About the VDEV, it’s what the GUI shows as:

Metadata VDEVs 1 x DISK | 1 wide | 953.87 GiB

I might add another SSD to mirror it.

I don’t know the controller firmware version. How do I check it?

Anyways, the fio test shows a 3693 MB/s write speed, coherent with 15x200, so the bottleneck is the network.

I think that falls under the following warning

Important: the pool depends on the sVDEV to function, so if the sVDEV dies, your pool dies also. Take the same care with sVDEVs regarding redundancy and resilience as you did with the other VDEVs! Do not use crummy SSDs for the sVDEV unless you don’t care about pool life expectancy..

Special VDEV (sVDEV) Planning, Sizing, and Considerations

If that is a sVDEV, you better have good faith in your backups. Lose a sVDEV, lose the pool too.

I wouldn’t rely on a 18-wide VDEV either, in a home system a Z2 VDEV is likely redundant enough and having two, 9-wide Z2s would only cost you one drive’s worth of space but double IOPS. Add two more drives and you can have Z3s or more space.

As for the transfer speeds, I’d look into what is slowing the whole chain of stuff down, there is a weak link in there and it’s quite possible that the 18-wide array is part of that. Did you test your writes with random, incompressible data?

I’d pay special attention to network cards and hardware settings, especially if auto-nego is left on. I’ve had some unhappy results around some equipment like MikroTik and Intel cards in TrueNAS that I’m convinced can be traced back to MikroTik. It’s likely better to verify each link along the way and perma-set the speed, flow control, and duplex settings?