Is the "80% problem" in ZFS still relevant

tngxsantos · December 4, 2024, 8:07pm

Hi everyone, hope you’re all doing well!

I’ve been revisiting some older articles and videos about ZFS and came across the famous “80% rule.” The idea is that once your pool reaches about 80% capacity, performance starts to degrade because ZFS shifts into a “space-saving mode,” making writes slower as free space becomes tighter. You can read more about it in this article:

https://www.45drives.com/community/articles/zfs-80-percent-rule/

However, this content is about five years old, and ZFS has seen continuous development since then. I’d like to ask:

Is this “80% rule” still something we should take seriously in 2024?
In practice, at what point do you typically notice performance degradation in ZFS pools?
If this issue has been mitigated in recent versions, are there any official recommendations for maximizing pool usage without compromising performance?

I’d really appreciate your insights and experiences. Thanks in advance!

etorix · December 4, 2024, 8:13pm

It is not a hard limit, but it is as relevant as ever. Depending on your workload, you may notice degradation much before—the recommendation for block storage is less than 50%…

Stux · December 4, 2024, 8:36pm

Iirc, ZFS used to switch its allocation strategy at 90%, and now it’s 95%.

The article confirms that performance cliff at “about 94%”

Thus, the more accurate advice (and yes it’s still relevant) is to begin planning your capacity upgrade when you hit 80% and have it completed before you hit 90%.

Ergo, nothing has changed just because 5 years have gone by, but there was never an “80% problem”

The 50% thing for block storage is to avoid excessive fragmentation.

tngxsantos · December 4, 2024, 8:37pm

where would this recommendation be. 50%

HoneyBadger · December 4, 2024, 10:06pm

Well, 96% is where the SPA goes from first-fit to best-fit, but even prior to that it also prefers other metaslabs if the first-fit is 70% free-space fragmented, and won’t write to a metaslab that’s 95% allocated unless all others are also similarly allocated, and if you’re using spinning disks it will also bias towards lower LBAs as they’re on the outer tracks of the disks and therefore will have a higher transfer rate due to angular velocity being constant …

It’s not so much a singular “cliff” point where “beyond this point be dragons” as much as it is a general deceleration as you fill up the space; and workload is highly relevant to where the point at which you determine performance has “slowed.”

Ref. old forum post of mine:

As with so many things in ZFS, especially when performance is being discussed, the phrase “results may vary” applies strongly to how full you can make your pool. Contributing factors include:

Hardware resources available (total size of pool, amount of memory, speed of vdevs)

Granularity of data access both in recordsize and actual client-facing I/O

Volume and ratio of reads to writes

Nature of the workload in terms of CRUD (Create, Read, Update, Delete)

Your barometer of “reasonable” or “acceptable” performance

One user may be able to fill right up to 99.9% full because they’re just archiving endless amounts of questionably-ethically-acquired media, and never deleting any of it. When they run out of space, they add more vdevs, JBODs, etc. They don’t care about the performance because they’re never accessing it at rates beyond whatever an H.265 Blu-Ray rip comes out at these days.

Another user might wire together a few dozen 2TB SAS drives and never allow it to fill beyond 25% capacity because they want to carve out LUNs for a devops team to screw around with, resulting in a bunch of random I/O and overwrites.

There’s been a fair amount of, as @jgreco aptly described it, “fossilized knowledge” that’s been spread around re: ZFS, some of it being accurate for the time and others just having been out to lunch to begin with. I have no doubt that some of the things I’ve said in the past, am saying now, and will say in the future will be inaccurate and I hope that anyone coming upon a statement will look at the context surrounding it as well. (Although I’m reserving the right to be an old fogey about SMR drives for an indeterminate period.) So I’m glad to see that when a conversation comes up about some of the “sacred cows” of ZFS that we can all actually have debates and discussion over it, bring up relevant points like the changes in technology (affordable NAND, the IOPS/TB issue HDDs face - that are trying to be beaten up with multi-actuator) rather than just throwing ad-hominems and le downvotes like another site that may start with R and end with Eddit.

Stux · December 4, 2024, 11:42pm

NOTE: non-spinny disk based storage perhaps obviates.

tngxsantos · December 5, 2024, 10:30am

Is the content specifically related to configuring and optimizing ZFS pools for iSCSI usage, particularly in virtualization and block storage environments?

etorix · December 5, 2024, 11:11am

That’s it: Zvols, for virtual machines, and iSCSI (block storage, for whatever reason).

HoneyBadger · December 5, 2024, 3:24pm

Mitigates, but not obviates entirely.

SSDs obviously bypass the laws of physics regarding the read/write heads needing to be in alignment over the track you want to read - all NAND cells on an SSD are effectively the “same physical distance” so LBA weighting gets disabled for media that properly identifies itself as non-rotating.

However, SSDs introduce the fun challenges regarding NAND page programming and erasure - and similarly to how ZFS can work at its fastest and most efficient when it’s got a clear stretch of empty space to write new data into, NAND works best when fully programming a completely empty page - but SSDs can write in much smaller granularity than they can erase.

I’ve made some posts on this in the past mostly in the context of SSD overprovisioning, but TRIM makes a showing here which is a contributor as well.

SSD partitions for L2ARC, SLOG, Metadata and Boot
SSD / Optane overprovisioning
How to over-provision NVMe disks?
The path to success for block storage