Need advice on optimal 10-drive ZFS pool layout for mixed media & personal data

Sara · August 15, 2025, 8:21am

I switched to an external SSD.
Reason for that was pretty much this:
https://github.com/jameskimmel/opinions_about_tech_stuff/blob/main/TimeMachine%20backup%20performance%20comparisons.md

but there isn’t anything special about TimeMachine in that regard. Sync write performance is always poor on HDDs.

Not sure if I understand you. I do belive you that you saw this. But I guess it was a rather old discussion?

swc-phil · August 15, 2025, 9:04am

I’m also considering this option, but haven’t pulled the trigger yet.

But your very article is not about sync writes. SMB is not perfect, but I think the issue is that TM backup (at least to a remote location) just sux.

I did sudo zpool iostat tier1 -lv -y 1 300 during my last backup. The pattern was:

Writing 100-200MB of data (50-100MB of raw considering mirrors) this second.
Waiting 4-5 seconds without any drive activity.
Write again.

It is with a ‘sync=disabled’ dataset. So, (I think) SLOG wouldn’t help here.

swc-phil · August 15, 2025, 10:00am

I’ve found it Lack of fairness of sync writes · Issue #10110 · openzfs/zfs · GitHub .

widget0758 · August 15, 2025, 11:59am

If you have trashy SSDs then chances are they could fail before a rebuild has taken place - by your own definition they are trashy. You likely mean “more moderately priced” than “trashy”. If you were buying any old crap off of Ali Express then I’d argue your stats might not pan out.

5 of 12 drives are from one brand, 7 are from another. Batches are generally more of an issue than brands. Backblaze stats can assist with what not to get. I have those with 70k hours, those with 90k and some with 3k, across 2 brands (there really isn’t much choice any more), bought at multiple different points in time from multiple different vendors (diversifying the batches). I have a secondary instance that backs up the primary and any “cannot lose” data is also replicated to Backblaze B2 as stated previously because RAID is just redundancy, never a backup.

For me to lose data I’d need to lose 3 drives per VDEV across one primary and one secondary instance and for Backblaze to utterly shit itself.

My pool will always be more at risk than your made up mirror as mine actually exists.

I still maintain that a lot of homelabbers chase complicated setups that they simply don’t need and this added complication causes additional risk to their data. That’s fine for test rigs but not really for your main data store.

As the OP stated that “data safety” was a priority, throughput for Plex, and simple to manage long term, I’d argue a simplified setup and the old 3-2-1 are important factors over IOPS chasing funk.

widget0758 · August 15, 2025, 12:09pm

I’m not sure whether this is relevant to this but isn’t there a trick whereby you can use an include statement for your app yaml in the UI and thereby gain the ability to effectively use docker compose with more flexibility. Might have been a TechnoTim video.

I only host my containers on a VM because I moved from Core, Scale had Kubernetes at the time, and I’m wary of where they’ll end up with the app system. The include statement trick is nice because you get the monitoring in the UI.

Sara · August 15, 2025, 12:17pm

You are right, trashy is the wrong word. For me it means, Crucial M500, Samsung 850 EVO, ADTA SSDs. Ali Express I won’t even consider

Fair

But why is a s vdev complicated?
Also I would argue that unlike you, most homelabbers buy 8 times one single drive from the same batch that is currently in sale.

And then RAIDZ2 8 drives wide from one batch, is 8y down the road more at risk than a mirror with two different brands, IMHO.

Rocketplanner83 · August 15, 2025, 2:17pm

BLUF: @sara, you are right that 10 GbE will cap a single large sequential read to one client; the L2ARC still helps by cutting seek latency, improving mixed workloads and concurrency, and accelerating server-side tasks that never touch the NIC.

A few concrete cases where L2ARC > RAIDZ HDDs even on 10 GbE:

Mixed or random reads. HDD seeks are ~8–12 ms; NVMe read latency is ~80–150 µs. Directory walks, small reads, thumbnails, DB queries, and Plex scans feel much snappier from L2ARC, even if the wire cap is 10 Gb/s.
Many clients or parallel streams. Ten 1 GbE or four 2.5 GbE consumers can saturate disks with seeks; serving hot chunks from L2ARC keeps tail‑latency low and avoids thrash.
Server‑local consumers. Containers and VMs on the same box read at PCIe speeds; these bypass the 10 GbE cap entirely and benefit directly from NVMe.
Special vdev + L2ARC complement. With special_small_blocks=16K, metadata and ≤16 K data live on NVMe; L2ARC can still hold frequently re‑read 128 K–1 M record blocks from large files. That explains why I see ~700 GB resident in L2ARC.
Avoiding spin‑ups and head movement. Even if throughput is the same on the wire, serving hot data from NVMe reduces disk wakeups; lower noise and power, faster “time‑to‑first‑byte.”

If you want to double-check whether it is helping in practice, a few quick measurements:

Watch ARC and L2ARC hits while doing your “real” workload:
arcstat 1
Focus on hit% and l2hit%, plus mfu/mru movement.
See if disks are being saved to work:
zpool iostat -v <pool> 1
Compare HDD read IOPS with and without L2ARC populated.
Measure interactive latency rather than only throughput: time a Plex library scan, a Nextcloud thumbnail view, or find over a large tree; repeat after warming cache.

Rule of thumb I use:

One significant sequential transfer to a single 10 GbE client: L2ARC rarely increases peak MB/s; it can still reduce start‑up latency.
Anything with many small reads, concurrent users, or local containers/VMs: L2ARC materially helps; special vdev handles the tiny stuff, L2ARC keeps hot bigger blocks off the spindles.

Happy to compare notes on specific numbers if you share an arcstat snapshot during a Plex scan and a multi‑stream read test.

swc-phil · August 15, 2025, 5:54pm

Yeah, I use it for my only truenas container. But this approach:

Is more complicated (compared to the Next → Next → Next → Done).
Doesn’t allow storing docker images on an arbitrary (ssd-only in this case) dataset.