L2ARC tuning guide and common misconceptions

rungekutta · February 11, 2025, 9:59pm

This is the very subject of much of the discussion above. Point being that more RAM is always good, and L2ARC overhead eats into what would otherwise be available for ARC, BUT that RAM pressure tends to be overblown and the trade-off of cost-benefit (cost being RAM pressure) likely looks quite different to conventional guidelines on this forum. Which, by the way, tends to be even more conservative that iX’s own, already conservative advice.

The major selling point being that while SSD is (much) slower than RAM, you get a huge capacity leverage instead - 90ish bytes in RAM becomes a whole block of data on SSD, which can still be orders of magnitude quicker than spinning HDDs.

L2ARC may not help in your use case if your working set fits in ARC, and/or what falls outside doesn’t create performance problems. It is also possible that it could - in for example traversing directories with many small files, as the L2ARC can ensure that all your metadata is on SSD if not in RAM. Cheap & small SSD, secondarycache=metadata, minimal memory overhead (in the order of tens of megabytes likely). Kind of like sVDEV light (read-only).

rungekutta · February 11, 2025, 10:09pm

Not a path I would go down. But if you’ve got an USB SSD disk lying around, or able to source one cheaply, plus have another USB port, you could try that way. If L2ARC breaks it won’t break your pool.

Protopia · February 11, 2025, 10:48pm

Unfortunately I have neither SATA ports nor USB ports available (unless I use a USB hub which I am loathe to do to attach SSDs because it will likely be even less reliable than a normal USB port).

rungekutta · February 12, 2025, 6:34am

PCIe slots? Recycled enterprise HBAs go cheap on eBay with plenty of ports. But yeah, otherwise sounds like you’ve hit your limits.

Protopia · February 12, 2025, 9:21am

No - as per my sig its a Terramaster NAS appliance rather than a general purpose motherboard, so no slots. There is an internal USB port that has proven to be unreliable.

richardm · February 12, 2025, 2:47pm

Until simultaneous reads and writes occur and the L2ARC successfully intercepts reads thereby freeing the spinners to better handle incoming writes.

I say similar about SLOG devices, which are neither a read nor a write cache but can assist with the pool’s overall read and write performance by intercepting sync writes that interrupt normal pool operation.

The rabid LOG and CACHE device haters seem to forget this aspect.

Protopia · February 12, 2025, 3:25pm

There are no “rabid LOG and CACHE device haters” - those who do not recommend SLOG or L2ARC for most circumstances do so because they believe that it is a waste of money for those circumstances. However there definitely ARE circumstances where they are the appropriate fix for a specific performance issue.

As far as I can tell, to be useful all the special vDevs need to be faster than the data vDevs otherwise there is no point (except spreading I/O across more devices at the expense of increasing the number of devices that can break).

SLOG is useful when there are must-have synchronous writes to a slower device than SLOG. If synchronous writes are not necessary, then SLOG is not the solution - instead make them asynchronous.

L2ARC is certainly useful when the stats demonstrate that it is beneficial - the difficulty appears to be having some rules-of-thumb that can predict this rather than having to measure it.

Special Allocation vDevs for Metadata - there is a bit more evidence that these can be more generally helpful than the other solutions. On my next NAS build I will almost certainly buy suitable hardware to do these.

richardm · February 12, 2025, 4:03pm

There are no “rabid LOG and CACHE device haters"

We should agree to disagree on this. Apparently LOG and CACHE devices can sneak inside late at night and kick your dog. I have no other explanation for the amount vitriol flung at these things for the past 10+ years here, on Reddit, L1Techs, and anywhere else the topics arise.

Personally I chalk it up to “skill issue” and the occasional poorly fit workload.

Constantin · February 18, 2025, 5:39am

The sVDEV resource page has been updated with the new L2ARC info. Thank you for educating me re: L2ARC updates.

Apologies for the delay. There is significant unhappiness at work due to Project 2025 that is currently occupying most of my time.

richardm · February 18, 2025, 11:58am

Small update on my 16GB RAM + 690GB L2ARC experiment:

I’d wanted to move ~300k files from an iSCSI volume to SMB shares to create a bucket of metadata. But these apps make heavy use of symlinks + directory junctions so they’re not going to work from a network drive. So this is really just a block data experiment thus far.
With l2arc_mfuonly=2 the L2ARC was populating too slowly. My l2hit% kept climbing as the drive picked up MFU blocks so it was working and I reckon it’d eventually be just fine. But I’m impatient and I want strings of 100 showing up in the l2hit% column of arcstat when I access the same group of files in the same afternoon. So at roughly 150GB l2size I reset l2arc_mfuonly back to the default and the data is really packing in there now. It took a week to hit 150GB but only another day to double it.
I had to re-purpose one of the drives so my L2ARC is now 600GB, not 690GB. Long story…
l2asize is 557GB. When it reaches the full 600GB I plan to watch gstat along with arc_summary to see if the data churns constantly or if the writes settle down.
An idea for “janky data tiering” crossed my mind: With an empty L2ARC set a very high feed rate plus l2arc_headroom=0 then access your favorite, important, latency-sensitive data. When the L2ARC is full, switch l2arc_mfuonly from the default 0 to 2 to more-or-less “park” the L2ARCs existing data. Enjoy those sweet, sweet L2ARC hits. I’m gonna try this… Should hit 600GB in the next 2-3 days.
My ARC hit rate is north of 98% every time I check. So much for “L2ARC metadata just eats all your ARC unless you have eleventy TB of RAM.”
My OpenZFS host is virtualized so I can assign any amount of RAM up to 120ish GB – depends on how many VMs I need running. I’d typically run this ZFS host with 96GB RAM.
Heretical confession: I like the current setup so much that I plan to continue with 16GB indefinitely. The RAM is more useful for other VMs than for ARC caching.
Heretical statement: L2ARC can cover for a lack of RAM. I have a little script here that offlines the device for A/B testing. When it’s off my workflow becomes a righteous pain in the neck b/c my ARC is just too small.

OK, that wasn’t a small update. Sorry not sorry.

Constantin · February 18, 2025, 1:12pm

Very interesting and informative. But it also illustrates the importance of the use case and knowledge of the admin re: how to implement / tune the L2ARC.

The resource guide here should be populated with a step-by-step approach to help folk with less experience better explore / quantify the benefits of L2ARC and the tuning thereof for their use case. TrueNAS / OpenZFS can be notoriously finicky and the more we can collectively do to help folk understand L2ARC, the better.

So now we know that L2ARC pointers use far less data than they used to, we even have a calculator for it. Moreover, with the L2ARC being persistent by default in SCALE, the user no longer has to drop into tunables to set it to be that way.

For my particular biggest use case (directory traversals by rsync backups), setting the L2ARC to metadata only was very beneficial. But every use case is different.

In my current setup, I

substituted a sVDEV for the L2ARC
changed the recordsizes of my datasets to better reflect the data being stored there (1M for images, 128k or less for apps / DBs),
consolidated seldom-used datasets with lots of small files into archives, and
rebalanced the pool to ensure small files were being stored in the sVDEV.

All this more than doubled the performance of the NAS for read/write. It also crushed the amount of space that the metadata needs. With less metadata to juggle, I’ll wager that more of it can stay in the main ARC before the system has to refer to the sVDEV to find it.

To me, the beauty of L2ARC is that one can explore the benefits of adding a SSD to boost certain types of performance with little to no risk of pool damage. The beauty of sVDEV (with all the caveats that should be considered!) is that it additionally boosts small file performance, is always “hot”, and it also allows datasets to be tuned for maximum speed inside a larger pool without setting up a separate “speedy” SSD pool for such tasks (i.e. VMs, DBs, and the like).

All that said, the use case is what needs to drive these decisions. A lot of good can come from tuning TrueNAS / OpenZFS to take advantage of your hardware.

Arwen · February 18, 2025, 1:53pm

Agreed. That is partly what I had in mind for this Resource.

So, any one with starting ideas?

I’d say Archival data generally won’t benefit much from a L2ARC device. However, Backup data would benefit from a metadata only L2ARC. So perhaps we need to suggest work loads where a L2ARC would help.

Based on recent discussions, the problem is that we are so unlikely to get a reasonable consensus on any of the issues. That means a step by step guide to successfully using a L2ARC is back on the user, for trial and error. Prove me wrong, please.

Constantin · February 18, 2025, 2:49pm

There is nothing wrong with a generalized description of what an L2ARC can be. Lots of sub-options to discuss and explain succinctly.

Followed by a bunch of common use case suggestions and results that can serve as a jumping-off point for new admins.

A later section can delve into detailed arcstat, etc. tutorials to help folk quantify the benefits of their L2ARC implementation or diagnose issues.

richardm · February 18, 2025, 8:01pm

I can easily imagine scenarios where a sizable L2ARC paired with limited RAM wouldn’t help much. Or wouldn’t be called-for in the first place. But given that it can be turned on/off at-will, added/removed from the host at-will, won’t eat data if it breaks, can be tuned to not burn out a consumer SSD, and doesn’t need to be super-fast (just faster than the “main” disks), there’s little reason to not at least take it for a spin if they think a persistent read-cache might help their use-case.

Mine is a VMware virtual disk (vmdk). On a VMFS datastore shared with a few other VMs. Which resides on a first generation QLC drive. I’ve piled-on all the “do nots” for this one.

im.thatoneguy · March 18, 2025, 10:42pm

As a hard data point and note about use-case being important: on our archival server, we have 8TB of L2 ARC and it’s been in for about a year now and has used…

Data Units Written: 141 TB

Which is the equivalent of about 0.09x DWPD and the tunables are:

    l2arc_write_boost                                     1234217728
    l2arc_write_max                                        634217728

I wouldn’t be surprised if numbers wouldn’t be similar on our hot working server if we used ZFS since even on the “hot” data, our workload would mostly fit within 8TB for day-to-day access and then fall off to infrequently as older project files tend to not be edited even though they’re in active projects. If you’re on version v25 of something, v1 probably will never be read from again.

I think the largest danger to accelerated wear on a drive would be the perfect storm of small ARC that constantly is overfilling combined with a small L2ARC which also can’t catch much of that overflow and is also constantly evicting data.

A programmatic preheat tool though would be a nice ZFS feature though to designate datasets to just fill into L2ARC without being asked.

Arwen · March 19, 2025, 5:23am

There are 2 tunables that can be used to assist with this:

Persistent L2ARC, which survives a reboot intact. (But, does take a bit of time to re-populate the memory pointers after a reboot.) This is disabled by default.
ZFS Dataset attribute “secondarycache” which can be “none”, “metadata” or the default of “all”.

Media files read once in a while probably should be on a Dataset with “secondarycache=none”. While some others might benefit from “secondarycache=metadata” only.

Stux · March 20, 2025, 10:10pm

I think it’s enabled by default now.

Constantin · March 20, 2025, 11:54pm

IIRC, L2ARC is persistent by default in SCALE but in CORE you have to add a tunable to make it persistent.

So it depends on the platform you’re using.