Metadata Drive

Fastline · September 26, 2024, 10:41pm

Hello Guys,

Does metadata has any impact on the performance in terms of transfer speeds when copying to or from NAS?

winnielinnie · September 26, 2024, 11:34pm

Metadata in your RAM (i.e, ARC) will improve your file listings, browsing, and overall snapiness.

More RAM = more room for ARC = more metadata can be loaded and stored in your ARC

Adding a “metadata drive” (which I think you mean a special vdev? L2ARC vdev?) should only be entertained if you cannot get better performance by upgrading your RAM or adjusting a tunable.

Fastline · September 27, 2024, 1:40am

Yes, the special vdev.

winnielinnie · September 27, 2024, 2:01am

Then the above applies. Keep in mind that you cannot remove a special vdev after it’s already been added.^[1]

With special exception to an “only mirrors” pool. ↩︎

pmh · September 27, 2024, 10:46am

Plus, it better have the same level of redundancy as the rest of the pool. If you add a single drive as special vdev to an otherwise redundant pool, and that drive fails, the data in your pool is toast. It is (like so many things in ZFS) not a cache.

I use it with Samba servers for a larger (~20) group of Apple Time Machine users and the general perception is that it works well. I have not done any “hard” benchmarks.

Fastline · September 28, 2024, 1:56am

Yes, yes. Aware of that.

Beside the directory trasversing, browsing, does it really makes any improvement in the file transfers?

Fastline · September 28, 2024, 2:03am

Oh, yes. You’re absolutely right. I think 2 way mirror is good. Can we call it safe choice? Or say 2*2-Way Mirror. A total of four disks. If one dies, spare drive can be added to it.

Nice. I think for servers holding TM backups, its really necessary unless you’re on SSD data disks.

Lately, i’ve been noticing that it takes too much time to browse or when previewing the files or calculating the file size so i plan to implement that. So, what kind of drive is recommended for metadata/special vdev? Like PLP, etc. In addition, how do i determine what disk capacity do i need for the metadata/special vdev? I’m open to Intel/Solidgm, SK Hynix, Micron. Would prefer Gen4. Just don’t want Samsung due to the issues i have read on internet.

Also, is special vdev/metadata drive still required in case of flash based NAS? Like SATA SSD or NVMe/U.2 etc.

And at last, for the pools where the data already exists and if i want to implement the metadata/special vdev, will all the metadata moved to this new special vdev or just the new one? If latter the case, is there any way to move all the metadata safely to this new special vdev?

etorix · September 28, 2024, 9:24am

Rather 3-way mirror. Redundancy is more important than capacity.

For browsing, a persistent L2ARC for metadata does the trick without additional constraints on redundancy, and is reversible. Special vdev further speeds up writes.

Nothing fancy here. PLP not needed. But redundancy, redundancy and redundancy. (Obviously, high endurance cannot hurt.) Preferably no QLC as the drive will do lots of small writes.

What would you try to achieve by speeding up what is already fast? The whole point of a special vdev is to be faster than the rest of the storage.

Only new data. To populate the special vdev, destroy and restore from backup or run a rebalancing script.
A metadat L2ARC would be populated over time just be reading the data.

Fastline · October 12, 2024, 6:54am

Hello @etorix

So sorry for the late reply. I was busy in work!

Okay, so 4 drives and 2 drives each in mirror, would that setup work? Would that be considered safe?

I’m just so much scared setting up metadrive from what i’ve read on forum. But then, the browsing is really slow with such large capacity disks.

Wait, L2ARC and metadata drive are two separate drives right? Is L2ARC of different type? I thought metadata drive is a special vdev and has nothing to do with L2ARC. Can you please explain?

Also, what do you mean by is reversible? I think from what i’ve read on forum, if i loose the metadata drive, i loose the pool, so how is that reversible? Or are you talking about the L2ARC?

On a spare NAS, i tested the metadata and it really makes the system faster but not the L2ARC or SLOG. So, are you sure i need to setup L2ARC? Is L2ARC a different thing and L2ARC for metadata a different thing?

And what kind of form factor? Do you guys recommend using U.2 or standard M.2 2280/22110 will work? I’ve a couple of spare SK Hynix PC801 (Gen4) drives. Or does it have to be Optane? Can i re-use that for the metadata purpose? I can install them in ASUS AIC or MSI Xpander (both are Gen4). Would it work with the help of AIC?

Also, are you sure that i don’t need PLP feature with the special vdev (metadata) drive? Secondly, what should be the capacity? My drives are 16*16TB configured in 2-way RAID-Z2 (8 disks each in RAID Z2 and two vdevs in total). That’s a fine configuration for data vdev right? I still have the redundancy of 4 drives in total, 2 drives each from one vdev and the NAS would still be operational, yeah?

Cool. I get ya. Sorry, just new to this all NAS world

Gotcha. Can you link me to the script, please?

Constantin · October 12, 2024, 11:10am

Have a look at the sVDEV resource, it might answer most of your questions, including how to rebalance.

winnielinnie · October 12, 2024, 2:16pm

Indirectly, it can improve file transfer performance, if the transfer operation involves many files or a large crawl. It’s especially noticeable for rsync.

L2ARC (or “Cache” vdev) is supplemental to your ARC in RAM. Think of it as increasing the capacity of your ARC, yet data that resides in the L2ARC is still slower than data that lives in the ARC (RAM).

If you suddenly lose power, and hence the ARC in RAM vanishes, your pool is still intact. Just reboot and you’re back in business. This is true for the L2ARC, as well. The only difference is that if your L2ARC device fails, you just lose out on its (possible) benefits, since it no longer exists.

The reason the term “metadata” causes confusion between an L2ARC and a Special vDev is because they can each be configured to boost metadata performance.

By default, the L2ARC can hold data and metadata, yet it can be set to exclusively only hold metadata (so that you assure it is only used for metadata, whereas data is only held in the ARC, not the L2ARC.)

This is a dataset property called secondarycache. You can change it from secondarycache=all to secondarycache=metadata.

A Special vDev is different. It is not only considered integral to the pool’s health, but it also written to. Even if it only receives (and stores) metadata writes, you cannot afford for it to fail. It would be as devastating as losing a storage vdev beyond its redundancy protection.

I really recommend seeing if you can boost metadata performance by adjusting a tunable and/or increasing your RAM before committing to a metadata or special vdev. It’s much, much, much safer, and you might be surprised that you can gain the same benefits without any drastic changes to the pool.

Constantin · October 12, 2024, 4:48pm

Yup, a L2ARC can be a great choice with no risk to try out and see if speeding up metadata is helpful. It’s important to have 64GB+ of RAM to prevent ARC starvation but other than that, set the L2ARC to metadata=only as well as persistent (persistent is default as of Scale, but not Core) then let the L2ARC “warm up” and see if you like the system better with L2ARC enabled.

See here for some comparisons re: L2ARC vs. sVDEV when it comes to a lot of directory traversals. That article also links back to older L2ARC testing. The TLDR is that a good sVDEV will beat a L2ARC re: speed but it also carries significant risks.