Is looks like your performance is about normal going by the results on a test over at calomel.org
They do a nice job of explaining how they tested and even have results for HD and SSD
They show a w speed of 567MB/s for an array similar to mine, but with old 4TB drives that are 4 times slower than mine.
I should be getting around 2000 MB/s instead of 800 (with a metada vdev).
I tried to also test with bonnie++, but that’s not installed, and TrueNAS makes it very difficult to install this kind of software, they don’t recommend to use the CLI. Using it can break things and leave to unpredictable results from what I read.
Might I suggest that you test your network for the bottleneck. It could be a piece of hardware that you don’t suspect. At a minimum it would prove that the network is works as it should.
I would also power down, disconnect all drives, and boot from something like Ubuntu Live CD for the testing. Run something like iperf3 and you will likely have some solid reliable results. This takes TrueNASand your pool out of the picture. If you find the network speed, then it is TrueNAS troubleshooting time. I don’t think your pool is the bottleneck.
Are you running on bare metal or on top of a hypervisor?
Good hunting, hope you solve this problem and then post the results of your new significantly faster speed test.
Yes, you’re 100% right, the bottleneck is my ConnectX-4 card, which uses an outdated firmware 7 years old. I can’t even find a working WinOF-2 version that could update these cards.
Buying several ConnectX-5 cards might be the solution, but it’s too expensive at the moment. I’ll stick with 10G for the moment.
Still, it would be nice to at least make the 10G card work with TrueNAS, it does work at 10G on Windows.
I would not need to use a 100G switch just to get 10G speed!
It’s a ConnectX-2 card, any idea?
Does Debian Bookworm have the driver to make this work?
Why do I ask? I think outside the box all too often. My thought is, if you can get Bookworm (the version SCALE currently uses) then you can possibly install the driver yourself. What I do not know is if you would need to rebuild the kernel.
It is something you could look into. I built smartmontools 7.5 and installed it on my CORE and SCALE systems. I had to build it twice as these are different operating systems, but it was very easy. I did not need to rebuild the kernel.
I just received a new PCI → NMVE adapter, and using a 1TB Metadata VDEV plus some final tweaks, I am getting between 1 and 1.2GBs speed when copying a large 100GB file.
That’s far from the 3.5 array internal speed, but it’s not too shabby either.
I probably would get a bit less with the 10G card now even it it worked, so I’ll stick to the not yet optimized 100G one.
If I can optimize it further, I’ll post the results.
You’re right, it’s a bit much, but after testing, resilvering when reaching 80% capacity should take 3 days, which seems acceptable with 3 parity drives. Also, all my data is backup up on LTO tapes.
I tried first a dRAID3 pool more suitable to large arrays. But the overhead is too large, it’s like having 2 drives less.
About the VDEV, it’s what the GUI shows as:
Metadata VDEVs 1 x DISK | 1 wide | 953.87 GiB
I might add another SSD to mirror it.
I don’t know the controller firmware version. How do I check it?
Anyways, the fio test shows a 3693 MB/s write speed, coherent with 15x200, so the bottleneck is the network.
Important: the pool depends on the sVDEV to function, so if the sVDEV dies, your pool dies also. Take the same care with sVDEVs regarding redundancy and resilience as you did with the other VDEVs! Do not use crummy SSDs for the sVDEV unless you don’t care about pool life expectancy..
Special VDEV (sVDEV) Planning, Sizing, and Considerations
If that is a sVDEV, you better have good faith in your backups. Lose a sVDEV, lose the pool too.
I wouldn’t rely on a 18-wide VDEV either, in a home system a Z2 VDEV is likely redundant enough and having two, 9-wide Z2s would only cost you one drive’s worth of space but double IOPS. Add two more drives and you can have Z3s or more space.
As for the transfer speeds, I’d look into what is slowing the whole chain of stuff down, there is a weak link in there and it’s quite possible that the 18-wide array is part of that. Did you test your writes with random, incompressible data?
I’d pay special attention to network cards and hardware settings, especially if auto-nego is left on. I’ve had some unhappy results around some equipment like MikroTik and Intel cards in TrueNAS that I’m convinced can be traced back to MikroTik. It’s likely better to verify each link along the way and perma-set the speed, flow control, and duplex settings?