ZFSisms that are not true, or no longer true

Whattteva · February 3, 2025, 7:19pm

This statement makes me think that you’re assuming activity = writes only, which is obviously not the case.

How exactly are you replacing the disks simultaneously? In a RAIDZ2, I guess you could replace 2 at once, but then you’d be left with no parity, I’m pretty sure majority of people replace just 1 disk at a time.

Oh yeah sure you can. 90% of people don’t do that. Majority of people barely have enough money (and complain a lot) to even buy ECC gear, let alone buy a full disk shelf just to upgrade.

I actually hold the same opinion here, and I have never had multiple disks fail in quick successions. But I do know a lot of people that fret over disk failures though. I mean you can see that fear because I see barely anyone ever recommends RAIDZ1 around here.

I’ve also definitely noticed pool performance drop quite significantly while resilverin and my use case is bit different from most people as I actually use my pool for block storage and I need every drop of IOPS Finally, I also rather have the flexibility to upgrade my pool by simply buying 2 disks at a time rather than x-wide disks at a time.

Arwen · February 3, 2025, 7:33pm

Yes, but I need details of “misconceptions” for items. Not discussions of various tunables that various people use, (unless it changed noticeably or was a misconception).

Plus, my goal was a short paragraph per item, not 20 lines…

Whattteva · February 3, 2025, 7:40pm

@Arwen I would maybe a few other common misconceptions I’ve personally seen pretty frequently.

People tend to equate regular RAID levels with ZFS:

RAIDZ1 → RAID 5
Mirror → RAID 10

Another one I see maybe less frequently is that some people mistake RAIDZ expansion to being able to change the type of your vdev. Ie. Upgrade RAIDZ1 to RAIDZ2.

Another one that’s also semi-common and I think this isn’t explained well even int he documents; Redundance exists at the zpool level rather than the VDEV level.

Arwen · February 3, 2025, 7:44pm

Well, none of those were ever true and have not changed.

Maybe the RAID-Zx expansion should list that it is not adding parity.

Arwen · February 3, 2025, 7:46pm

Wrong. It is the opposite. The vDevs have all the redundancy, except “copies=2/3”.

Whattteva · February 3, 2025, 7:47pm

Yes, I know it’s wrong, but many people understand it the other way, which is why I listed that.

RetroG · February 3, 2025, 7:49pm

to be fair… it was not said what type of activity but I’ve never seen regular use tank a scrub/resilver into taking days/weeks.

just above this quote I made the assumption that since you were talking about each disk being read 7 times you were talking about upgrading the entire VDEV with bigger disks via replacement, which is one of the only times that would normally happen. which the rest of what I said in that is based on.

which is why I called it an online replacement, online replacements offer additional safety by not degrading the pool during replacement.

RAIDZ1 really only makes sense for smaller pools… let’s say 5-wide at a max and as long as you are OK with the compromise… if you are building bigger pools, it make more sense to have a VDEV twice as wide with twice the redundancy, since it increases the self-healing potential with the same storage efficiency. (bad cables and controllers are a significant concern)

consider adjusting this memory value if your srubs/resilvers are slow… the default for the sequential scanning is 5% of RAM which (probably) isn’t enough. this literally halves the scrub time on my 72-wide pool.

also, these are absolutely valid reasons to use mirrors.

Arwen · February 3, 2025, 7:52pm

Oh. Yes, I will add that one.

HoneyBadger · February 3, 2025, 8:05pm

I’m happy to start up another one to discuss it in more detail.

Stux · February 3, 2025, 8:51pm

Zfsisms that are no longer true…

“You can’t remove a VDev after adding it”

“You can’t expand a Raidz VDev”

“You can’t build a hybrid SSD/hd pool”

HoneyBadger · February 3, 2025, 8:53pm

Partially Debunked: The presence of a raidz type vdev in your pool prevents vdev removal from working.

etorix · February 3, 2025, 9:12pm

Fair enough. The “ZFS-isms” about L2ARC would be:

Bump RAM to 64 GB or more before even considering L2ARC.
L2ARC should be between 5 times and, at the utmost, 10 times RAM (ARC).

As I understand it, the point of this thread would be to ask whether any of these guidances (not “rules”) should be revised.

Sara · February 3, 2025, 9:26pm

Depends. Two drives failing the same vdev could bring down my pool. But also 3 drives failing, each from one vdev would not bring down the pool.

Not even three but two drives would bring down the 2way mirror example I made.
But yeah, that is exactly my point
I say that the WD and Seagate from same vdev failing at the same time is less likely to happen than 3 WDs failing in a 6 wide RAIDZ2.

Agreed

Agreed again.
But for the more realistic example of 6 SATA ports and 2 way mirrors, RAIDZ2 has

worse reliability
way worse performance
only 33% more storage (absolute best case without any padding overhead, so you better not use zvols)
5y down the line you need to replace 6 drives all at once to get more storage while with mirror you can replace two drives with larger ones and already get more storage
can’t remove special vdev

richardm · February 3, 2025, 10:33pm

I’ve just pulled both my CACHE devices, I’m zeroing them out and I’ll add them back after I drop to 16GB RAM. 690GB total L2ARC.

I’ll run this configuration for one week and report back with numbers. For science.

NickF1227 · February 4, 2025, 1:50am

I think a reliability section can be added “mirrors are not always more reliable” but I’m not sure how pervasive that belief is. @jro has an excellent resource to calculate this
https://jro.io/r2c2/

NickF1227 · February 4, 2025, 1:57am

I’ve actually seen this fairly frequently lately. I’m confused by the “2” tho since it cannot be set to 2?

https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html#l2arc-mfuonly

I take issue with this as a general recommendation for a whole host of reasons. Primarily because by setting this tunable basically means you no longer have an Adaptive Replacement Cache, and instead you have a Most Frequently Used Cache.

Arwen · February 4, 2025, 4:10am

Good ones.

I’ll add the RAID-Zx caveat for the vDev removal.

Arwen · February 4, 2025, 4:48am

I’ve added that the old “rule” of L2ARC size being between 5 to 10 times the size of RAM no longer holds true. And sort of never was true.

Arwen · February 4, 2025, 4:51am

I have started a L2ARC Resource page:

Everyone please take the L2ARC discussions to that page.

Sara · February 4, 2025, 8:09am

That calculation has one huge flaw.

p or “Probability of a single drive failure” is static over time.

But drives don’t fail that way.
It ignores two things:

the bathtub curve
the bad batch problem