New NAS, 3 VDEVs in RAIDZ2, 40MB/s Transfer Speeds—Noob Needs Help

niknor · August 27, 2024, 7:00pm

Hello, I’m trying to troubleshoot why my new NAS is only getting 40MB/s transfer speeds. I’ve tried everything I could think of, plus a bunch of suggestions from the internet, but no luck so far… I’m hoping a direct approach asking here might do the trick!

The data (large files) transfers start at 170-200MB/s and quickly drop to about 40MB/s.

Hardware

Intel i5-11400 @ 2.60GHz
32GiB Crucial Ballistix 3600 DDR4
Asus PRIME B560-PLUS AC-HES
LSI 9300-i16 (with an 80x80 fan in a 3D-printed mount, in IT Mode, per Amazon description)
Intel x540-AT2 10GbE (with a 120mm fan next to it)

Disks

OS: 2x Crucial SATA SSD 250GB (CT240BX500) (connected to the motherboard’s SATA ports)
L2ARC: 1x Kingston NVMe SSD 500GB (SKC3000S) (installed in the motherboard’s M.2 slot)
SLOG: 2x Intel Optane 16GB SSD (INTEL MEMPEK1W016GA) (1 in the motherboard’s slot, 1 in a PCIe adapter)
Data:
- 9x Seagate Barracuda Compute 6TB (ST6000DM003)
- 6x WD RED 6TB (WD60EFAX)
- All connected to the HBA.

Software

TrueNAS Scale Dragonfish-24.04.2
Plex running in a VM.

Topology

What I have tried:

Reinstalled TrueNAS Scale.
Changing from 2 VDEVs in RAIDZ2 (7-wide) to 3 VDEVs in RAIDZ2 (5-wide). With no improvement.
Tried copying from different SSDs and from 2 SSDs simultaneously, but the transfer speed gets halved.
Ran SMART Short tests on all disks—passed. Long tests on the 9 Seagate HDDs—also passed.
iPerf3 from a Linux WSL inside Win11 shows a sustained speed of about 3 Gbit/s (not the 10GbE I expected, but still better than 40MB/s).
Tested direct PC → NAS connection with different cables to rule out the switch or network issues.
Checked the HBA temperature, which reports around 52°C (confirmed with FLIR photos). This sounds ok to me.
Checked and the SSD is at 5% utilization when copying.
Checked transfering from another PC (linux with 1gbe and got also 40MB/s)
Did a fio test: READ: bw=2189MiB/s (2295MB/s) - WRITE: bw=2193MiB/s
Did a sas3flash -list and I see my HBA firmware being old (07.00.01.00), not sure if this has anything to do with my problems. ChatGPT thinks this is old.

I read that the Seagate Barracuda Compute drives are not intended for NAS use, but I had them unused.

I’ve also seen mixed advice on using an L2ARC cache, but I added it just in case.

Any help or tips would be appreciated. I’m really at a loss on what else to try or test to get to the bottom of this.

Protopia · August 27, 2024, 8:17pm

Insufficient memory for L2ARC. Remove it.
SMB from Windows is asynchronous so SLOG will do nothing. Remove it.
Both the WD RED drives and the Seagate Baracuda drives are SMR and so entirely unsuitable for TrueNAS / ZFS, and have very significant bulk write performance limitations.
Unclear what direction the file copies were being done.
Unclear what the network topology and link speeds are
Unclear what size the files were.
Unclear exactly what process was used for the benchmarks.
Thus unclear where and what the bottleneck could be.

etorix · August 27, 2024, 8:26pm

Entirely typical with SMR…
The excessively large L2ARC with insufficient RAM is not helping either. (The SLOG does nothing, including not hurting. If it were doing something, however, these 16 GB Optane M10 would somewhat lack in endurance and throughput—I use these drives… for boot.)

Ouch! That cannot help either… but upgrading to P16.00.12.00 will not solve the SMR issue.

You’re too modest. Point 3 does it.

Protopia · August 27, 2024, 8:40pm

If we assume that this was a write test rather than a read test, then point 3 would explain it. But it is unclear which direction it was going.

Also, async writes are cached in ARC with a limit of (I have a feeling) 4GB.

So async writes will go at network speed for the first 4GB and then go at disk speed thereafter (assuming network speed > disk speed).

So I stand by my statement that there is insufficient data to diagnose the cause with absolutely certainty.

But equally, it would probably have been better if the OP had asked here about which disks to buy (or not to buy) and whether to use L2ARC or SLOG and how to undertake benchmarking in order to avoid having issues, rather than buying and building and configuring and testing and then asking how to fix it.

Stux · August 27, 2024, 9:23pm

Did you run the iPerf in both directions?

niknor · August 27, 2024, 10:14pm

1) & 2) I will try if I need to reinstall.

3) I did not buy the Seagate HDDs for this, and I had the WD Red working in another NAS with a RaidZ1 x6 wide and they worked a lot better. Where I live, the options at the time were WD RED, WD Blue and Some WD Purple ones. I just wanted more space, so, using the HDDs I had, build this new NAS.

4) Sorry i forgot that part, I ran a few tests, from Windows to the NAS, and from the NAS to a Linux PC (NFS). Both tests were similar…

5) I have both in my windows pc and the NAS a 10gbe Intel Nic (Intel x540-AT2). and a “10gbe” Trendnet switch (My guess is that switch is what is limiting the connection to 3gb/s). The Link speed reported by windows is “10/10 (Gbps)”.

6) The sizes were all big files (2 to 10 GiB), mp4 video mostly.

7) My benchmark was trying to copy data to the NAS and from the NAS. I know is not the most scientific way of doing it. But the Previous NAS (with some of the drives), worked a lot faster.

Yes, I used --bidir

I am about to update the firmware, (another rabbit hole)… Hoping it will do something…

Protopia · August 27, 2024, 10:32pm

I apologise if I did not make myself clear. These drives are absolutely, completely, utterly and totally unsuitable for use in any form of ZFS redundant vDev regardless of whether you happened to have them spare in-hand. Why? Because if you ever have a drive fail you will not be able to resilver in any reasonable timescale. THEY ARE UNSUITABLE. (I don’t think I can be any more clear than this.)

Protopia · August 27, 2024, 10:36pm

@etorix

On reflection I am not sure that this is the case. I believe that the CMR cache on each SMR drive is likely to be c. 30GB (I couldn’t be bothered to go and research the exact sizes for the two drive models). So across 9x drives (excluding parity) this would be c. 270GB.

Copying a single big file of 10GiB (or even several of these) should not fill up the CMR cache.

Of course if you run the benchmarks several times in a row without allowing the SMR drives to catch up and go to idle, then you are not running a consistent benchmark, but I suppose you might actually fill the 270GB cache size.

niknor · August 28, 2024, 7:14pm

You are being very clear, my options are those drives, or nothing… I have another copy of my critical data… and, like I said before, I had them working just “fine” before… at least the WD REDs.

So, what you are saying, if I understood correctly (my English is not very good), is that my issues may not only be that my disks are SMR?

You were right, it did not do much…

If the problem is indeed just the crappy disks… is there any way to make them run faster?, I do not care if the pool dies when 1 fail…