Questions about Fusion Pools (special/metadata vdevs / sVDEVs)

Stux · April 6, 2024, 12:01am

The Special VDev feature that allows Fusion Pools was added to ZFS since I last did all my learning/testing etc.

I never did play with it.

So, I have some questions

What sort of size is required?

From reading the manual it seems like a set of 3 M2 NVMe would be a good idea. 2 in mirror + a spare. (Why not a triple mirror then instead?)

What sort of of drive performance? Crappy QLC? TLC? Optane?

I assume if the drives begin to fill things are noticable and it would be possible to just replace with larger ones and increase the vdev size

Is the gain just from having metadata on SSD? Metadata is sync written right? So a SLOG would help with that, and this means the device doesn’t really need Optane like performance?

What sort of gains do people see? In what work loads?

Would you still bother on a flash pool?

Is there any guidance on when to consider their use?

Anything else to consider? Perhaps this topic can be a good discussion long running discussion

Constantin · April 6, 2024, 12:31am

LOL, I just started a long reply, so I will recycle some of that content here.

Planning / Preparation
A sVDEV requires a lot more planning but it can speed up operations for small files and metadata consistently, see here. For sVDEV use, your drive will be split by TrueNAS into two halves, one is for small files, the other is for metadata. Unlike L2ARC, if your sVDEV goes, so does your pool, so I use a three-way mirror of enterprise-grade SSDs. You will have to look into how much room your small files take up to plan for the proper SSD capacity to use as a sVDEV.

To plan for sVDEV, you need to review the size distribution of small files in your pool - current and expected. ATM, I cannot recall the CLI commands that compile that information, apologies. If your sVDEV small file catalog overflows, the small files will be written into the slower main pool and the benefit of a sVDEV re: small files will be diminished.

For my use case, a 1.6TB sVDEV seems to be sufficient (50TB pool). However, I made a point of nuking every small file I could by consolidating small files into sparsebundles and like archives on the server. My file size limit for small files is 32kB, however your limit will vary based on your use case and should be investigated carefully.

Similarly, you should determine how much metadata your pool needs. This helpful post has the CLI command for that. Remember, only 1/2 of your sVDEV pool will go towards metadata, so plan accordingly (i.e. allow sufficient room based on how big you expect your pool to be). The rule of thumb (and it will vary as the use case dictates) is 0.3% of your pool size will be metadata.

Performance
I found finder / directory traversal / rsync performance to better than via the use of a persistent, metadata-only L2ARC, but not by that much. Small file performance is no comparison vs. a single-VDEV HDD Z3 pool. I would only use enterprise-quality SSDs in at least a 3-way mirror. I have cold, qualified spares on hand (Intel S3610 @ 1.6TB, IIRC)

What happens if the sVDEV fills up?
Unfortunately, the GUI still gives the admin zero insight into how full a sVDEV is - either on the metadata or small files side. So you will have to brush up on your CLI skills to check occasionally, especially if your pool is undergoing major changes like the addition of a busy database, for example. If the partitions fill up, performance will likely become very uneven as some files end up in the pool vs. on the SSDs in the sVDEV. I have a SLOG in my system (p4801x) and it’s likely way faster than the older S3610s. So if a fast SLOG helps, I have one.

When to use it
Besides the standard Edmund Hillary / George Leigh Mallory response (“because it’s there!”), I’d argue sVDEVs are very useful in pools where there is a lot of changing, small data. Unlike a L2ARC, a sVDEV does not need to get “hot” as the metadata are already on a fast SSD. Similarly, HDDs and ZFS in particular suffer from writing small files due to all the latency involved - waiting for write confirmation, and all that.

I use a sVDEV simply for the speed it provides consistently when it comes to browsing directories or making backups. Rsync backups just blow along at ridiculous speeds as unchanged files are compared. However, at least the metadata bit can be emulated to a large extent with a properly-sized L2ARC (that is persistent and metadata=only) and the better ARC implementation that is coming may further speed things along.

Anything else?
I have found sVDEVs to be a great tool, if handled carefully. For example, DO NOT believe ZFS if if tells you that a perviously 100% healthy pool is suddenly dead w/o reason. It could be something as simple as your sVDEV SSDs having lost a power connection or the TrueNAS update process causing your SSDs to soft-brick themselves, necessitating a power cycle. Check the cabling and ignore bad advice from the GUI and the ZFS command line until you have exhausted all other options.

Stux · April 6, 2024, 11:54pm

Found the original Intel/ZFS presentation on “segregated vdevs”

Stux · April 7, 2024, 12:07am

Went down the Wendell hole

Seems like if you have a bunch of VM zvols you can avoid having a separate mirrored flash pool for the VMs and instead their small block size would count as small files.

This basically means I can integrate my planned “vm pool” as my special vdev basically in a unified vm and large file pool.

That seems neat.

Just need to keep an eye on the capacity of sVdevs.

Stux · April 7, 2024, 12:08am

Reddit discussion

https://www.reddit.com/r/zfs/comments/cm594b/why_is_nobody_talking_about_the_newly_introduced/

Davvo · April 7, 2024, 3:02am

Yep.

Stux · April 7, 2024, 3:17pm

Early information about special vDEVs quite vehemently states that they can’t be removed once added so get it right…

Later information says they can, because of vDEV removal, which is a newer pool feature.

Is this true? Does it work well? What is the permanent cost of removing a special vDEV, if you can?

etorix · April 7, 2024, 3:36pm

Same rules as other vdev removals: Possible if the pool consists of mirrors, not possible if there is any raidz involved—of course, special SSD vdevs are most useful to speed up big bulk storage of HDD raidz# data vdev.
If you can remove, the only cost is to go back to the performance of a regular pool.

Constantin · April 7, 2024, 4:41pm

I added some content at the bottom of my Suggestion to beef up sVDEVs thread. I hope it’s a good starting point for a community-based resource page that goes into the how and why of sVDEVs, how to effectively implement them and hopefully some illustrations how they affected use cases.