Truenas Scale: Write performance over SMB drops from 113MB/s to ~20-30MB/s and back to 113MB/s

Hi,

i recently installed TrueNas Scale v24.1.0 on the following hardware to replace my old NAS:
RAM: Corsair 2x16GB
CPU: AMD Ryzen 5 2600
Mainboard: Asus Prime B450M-A II (BIOS is up2date)
Storage: 2x2TB SSD (Crucial BX500,
Emtec ECSSD2TX150) connected via SATA (6Gbps), configured as a mirror. No VDEV for metadata, log, cache, spare nor dedup

By now, there are no apps or VMs running.

Its connected via Gbit to mi workstation, when i copy data (testing with one big file) to it, after copying a few GB (i think it depends what i copy, sometimes after 7GB, or 15GB) the transferrate drops from 113MB/s to around 20-40MB/s, for 8-12 seconds, and then goes up to 113MB/s again.

When reading data there is no such drop in transfer speed.

Running dd to the dataset i get around 700MB/s, i used the following params:
dd if=/dev/zero of=/mnt/SSD/data_fast/testfile1 bs=1M count=100000 oflag=sync

Could this be because ZFS is writing the content of cache to disks? With htop i see (random?) writes to disk, but cannot correlate disk-writes with the dropping transferspeed.

What you think could cause this? Im aware that im using consumer SSDs, actually its also not a big problem, but would be good to know what you think about this behaviour.

Thank you
Felix

I’m guessing you’re doing a sustained sync write operation and consumer SSD’s are notoriously bad for them. The real cheap ones can even be slower than high quality HDD’s once the SLC write cache fills up.

The “rubberbanding” effect you see is probably because the write operation is flushing the buffer and the CPU has to wait on that IO sync acks to continue.

You will notice this problem would get even much worse if you’re doing your heavy writes in a VM hosted on the pool.

1 Like

It doesn’t even have to be sync writes to see this behavior. Less-expensive consumer SSDs like the BX500 and that Emtec often employ pseudo-SLC caching behavior, and when that internal cache runs out or needs to be flushed, you see the drag on performance:


Image source: Benchmarking cheap SSDs for fun, no profit (be warned)

Just to be clear - dedup is not in use at all here?

This is ZFS compression in action, because you’re sourcing from /dev/zero ZFS will just flatten that down to nothing. Create a new dataset with compression off and re-test against that path, and you will see much more accurate performance representations.

1 Like

I just updated to Truenas Scale 24.10 and I have also noticed a similiar issue on my NAS. I have a 10gig Fiber connection on the server, but due CPU limitations on my server I normally get around 360 MB/s (~2.8Gbit) consistent transfers but since the update I get 350MB/s then drop to 0MB/s for a second or 2 and then back to 350MB/s. when the graph updates it looks like it dropped to 50MB/s or so but it truely was just stopping. As the prior post advised Compression might have something to do with it. I checked since I thought I set it to off but it was LZ4 and I changed my Compression to off and now my transfers are far more consistent about 380MB/s +/- and no longer the pauses.

My desktop Asus TUF B450M motherboard has a 1GB Realtek RTL8111H NIC so I’m thinking the Prime series has the same. Contributors to this forum have said that Realtek NICs have unreliable performance in Truenas so that may be part of what’s causing the transfer drop off.

Out of curiosity, what is the sector size you are using? 4K?

ZFS Record Size: ZFS uses record sizes, typically 128KB by default, and this can impact write performance when dealing with large or small files. If the record size is too small or too large for your workload, it could affect performance.

Compression: Are you using compression on ZFS? If so, depending on the type and how much data you’re writing, compression could either help or hinder write performance, especially on smaller files.

ZIL (ZFS Intent Log): If you’re writing a lot of small files or doing synchronous writes, check if your ZIL (write-ahead log) is on a fast device, like an SSD. If the ZIL is on spinning disks, it can slow down write performance.

Cache Devices: Using a dedicated cache device (L2ARC) or a separate write log device (SLOG) for ZFS could improve sustained write performance, especially for write-heavy workloads.

SSD Endurance and Health: Since SSDs have limited write endurance, keeping an eye on their health (via SMART stats, for example) and ensuring that you’re not hitting their write limits could also be worth considering.

Hi,

dedup ist not used.

The sectorsize of the SSDs is 512byte, the pool uses 4KB.

I created a new dataset without compression and run DD again. With htop first i see a throughput of ~520MB/s, then dropping to ~20MB/s and
back to 520MB/s, and so on. So the behaviour is the same as over SMB.
The average throughput is around 82MB/s.
Looks like this behaviour is due to the consumer SSDs im using.

Thank you!

1 Like

What CPU are you using? LZ4 is an incredibly lightweight compression algorithm, so I’d be surprised to find a system in the intersection of “10Gbps network connection” and “too slow for LZ4” - there might be some weird edge cases for files that trip up LZ4 but usually it’s in the opposite direction (a compressible file fakes it out and gets stored uncompressed)

I have that sserver setup on a Dell R720 with Dual Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz, but it pegs the CPU alot but since I running on a Raidz1 of Mechanical HDDs I doubt I could get alot more speed out of them anyways.

I believe I have the same issue, basically at some random point in time SMB writes/reads have dropped to extremely low levels (also at ~23-25mb/s but I don’t know if that is a coincidence). In my case it ha nothing to do with the ZFS dataset itself (i.e. I don’t have dedup enabled) and I am using Seagate IronWolf Pro CMR drives designed for NAS.

Basically everything was working as expected (I have been running TrueNAS Scale for ~6 months) and then suddenly 2-3 days ago reads/write speeds dropped. I am going to restart the machine tomorrow to see if that helps, in my case I have an Intel network card (specifically Intel Corporation Ethernet Controller I225-V (rev 03))