Clarifications for Metadata VDEV

N20VSLS · May 8, 2024, 6:57am

Hi everyone,

I’m planning a new server for my companies motion department and I think I’m on a good way so far to building the optimal server.

The machine will roughly consist of the following:
EPYC or Dual Xeon with combined ~16cores, highest possible clock (SMB is single threaded as far as I know so high clock = high performance per user? Max. ~10 users so a couple of cores to spare)
256 or 384GB DDR5 RAM
10 or 15 x KIOXIA CD8-V in 5 wide Z1 vdevs
Probably 100GBit NIC to the switch that then distributes to the clients (all 10GBit)

The goal of this machine is to have no more bottlenecks regarding storage or network and of course to future proof with this machine.

As far as I can evaluate the disks should be able (in said config) to saturate the 100Gbit connection, all other variables left aside.
If I’m not mistaken all things L2ARC etc will not really help in this configuration since the disks will already be so fast. But I was thinking of configuring the system so that all metadata gets stored in RAM for fast folder loading times. Is this a valid thought in this scenario? And my actual question: what happens to the metadata when the server restarts, will it get flushed to disk or does the server need to rebuild the metadata “tree” when it’s back up running again, and how would this affect performance?

Please correct me if I’m wrong, I’ve been using truenas for almost 10 years now but only in a private environment, this is the first time I’m fiddling with metadata vdevs and those kinds of performance levels

asap2go · May 8, 2024, 8:51am

Metadata is primarily stored on disk.
After it is first accessed due to a read it usually gets copied to ARC. If you restart the system then your ARC is empty again. Only after you access the metadata again will it go back to ARC.
A special metadata vdev only makes sense if you have a need to randomly access metadata of your entire dataset, while not having enough RAM to keep the metadata in ARC
AND you also have a device faster than your pool.
You seem to have an SSD pool already so I think there would be no real benefit.

e.g. I use a metadata VDEV to augment my HDD pool.
So I can open and search folders with many files in them faster. Also I can offload small files to the metadata vdev which makes the pool faster as the HDDs only need to deal with bigger files and operate closer to their ideal streaming workload. But an SSD pool has no need for any of this.
Also keep in mind that if your metadata VDEV fails then your entire pool is lost. So better avoid it if you don’t need it.

winnielinnie · May 8, 2024, 12:27pm

Is this going to be Core or SCALE?

As far as favoring “metadata” over “data” in the ARC, see this:

There’s a new adjustable parameter that you can dial up or down to your liking, if you’re not satisfied with the default metadata priority.