ARC possible solution to unusual problem

So this started out as request for help but as I was writing the post an idea hit me so I decided to scrap the help request and test some things. I will try and break this down into a few parts. The issue I was encountering, what I found online, how I modified this to work, secondary changes I made as part of the change, preliminary results, request for feedback on this setup.

The issue I was encountering
I have been struggling to get ARC to really work for me. After a bit of digging the most likely culprit was the daily backups getting stored onto the pool. These are about 1TB each and have verification check steps that hit the data multiple times. I cant say for sure how exactly much of this was in L1 and L2 but as will be detailed in the later steps it was a significant amount. This in turn was causing data I cared about to be cold more often than not while the back up data could be glacial for all I care.

what I found online
As I was writing up my help post I was thinking about the question in a different way and found these commands zfs set secondarycache=none pool/dataset/folder and zfs set primarycache=none pool/dataset/folder

** how I modified this to work**
It turns out the google summary was wrong. As best I can tell this command only works pool/dataset and not down to the folder level. This made me rethink how all this needed to work. After a bit of back and forth I built a new dataset.

secondary changes I made as part of the change
As part of building a new dataset I took the opportunity to change the compression configuration from the default to a much more aggressive option. My idea here that this data is not time sensitive and tends to be very large so potentially getting some space savings without impacting my important data seemed like a good idea.

preliminary results
The first thing I noticed was as I stated moving my backup files to the new dataset L1 ARC would drop in big corresponding blocks. One folder alone freed up 10GB from L1 ARC. The second and more interesting behavior has been in the L2. I have always felt L2 was not working quite right but also know the L2 is a complex topic. That said the L2 that normally stayed around 800GB of usage spiked up to 2TB in about 24hrs while keeping a 35-40% hit rate. I have zero idea why this occurred and will be keeping an eye on it to see how things continue.

request for feedback on this setup
This was a bit of a wild hair on this one as I did not really find anyone with the issue I had but I would love to hear your ideas and thoughts on this.

System specs
CPU 2X AMD 7642
RAM 512GB
HDD 16-16TB 4 x RAIDZ1
SSD 4-1TB Stripe L2 ARC

1 Like

Welcome to TrueNAS forums!

Interesting use case and solution.

Before I read through your complete post, I assumed you were like many new comers to ZFS and did not understand ARC / L2ARC. But, you did your research, and quite well. Even to the point of making me re-think my L1 & L2 ARC usage.

You might think about using your L2ARC for metadata on the backup datasets. This one item alone might speed up backups a percentage point or 2.

With 512GBs of RAM and dual Epycs, you have a beefier system than many small office, home users.

It is helpful to specify the version of SCALE you are using, as their have been changes to ZFS & Linux over the releases, that impact ARC / L2ARC directly.

Further, more changes to L2ARC are coming. Probably by the end of the year. I don’t have the link or reference, just remember reading about it. Something to do with L2ARC evictions and reducing evictions of useful data or metadata.

2 Likes

Makes me wonder if a sVDEV would help here - for the small files and metadata while reserving the L2ARC solely for frequently-accessed file storage (i.e. NO metadata). I found almost no benefit for L2ARC in my system under such circumstances but my pool has very little read/write I/O, i.e. it’s mostly WORM.

The issue with L2ARC is that it first has to note a MISS for the file / metadata to be even potentially flagged for inclusion in the L2ARC - subject to write limitations, etc. so unlike sVDEV, it can take multiple attempted hits for something to be written into the L2ARC and hence it’s not as beneficial re: constantly changing data as a sVDEV would be.

For my use case (mostly WORM, not a lot of changes), a persistent, metadata-only L2ARC was quite performant compared to sVDEV re: rsync. Small files was no comparison, however, ditto metadata writes / pool structure traversals and so on.

Thanks for the reply. I was worried my post would just be adding to the noise around ARC as it is a frequent area of confusion. I appreciate the validation that I was not completely off base.
I love building and working on computer hardware so I have a few crazy servers in my home network.
I had it in my notepad for the post and missed it. I am running version 25.04.1
I will keep an eye out for data on those new changes.

Thanks for the information. I am going to let this change cook for a week or so and see what the performance is like before testing other tweeks.