So this started out as request for help but as I was writing the post an idea hit me so I decided to scrap the help request and test some things. I will try and break this down into a few parts. The issue I was encountering, what I found online, how I modified this to work, secondary changes I made as part of the change, preliminary results, request for feedback on this setup.
The issue I was encountering
I have been struggling to get ARC to really work for me. After a bit of digging the most likely culprit was the daily backups getting stored onto the pool. These are about 1TB each and have verification check steps that hit the data multiple times. I cant say for sure how exactly much of this was in L1 and L2 but as will be detailed in the later steps it was a significant amount. This in turn was causing data I cared about to be cold more often than not while the back up data could be glacial for all I care.
what I found online
As I was writing up my help post I was thinking about the question in a different way and found these commands zfs set secondarycache=none pool/dataset/folder and zfs set primarycache=none pool/dataset/folder
** how I modified this to work**
It turns out the google summary was wrong. As best I can tell this command only works pool/dataset and not down to the folder level. This made me rethink how all this needed to work. After a bit of back and forth I built a new dataset.
secondary changes I made as part of the change
As part of building a new dataset I took the opportunity to change the compression configuration from the default to a much more aggressive option. My idea here that this data is not time sensitive and tends to be very large so potentially getting some space savings without impacting my important data seemed like a good idea.
preliminary results
The first thing I noticed was as I stated moving my backup files to the new dataset L1 ARC would drop in big corresponding blocks. One folder alone freed up 10GB from L1 ARC. The second and more interesting behavior has been in the L2. I have always felt L2 was not working quite right but also know the L2 is a complex topic. That said the L2 that normally stayed around 800GB of usage spiked up to 2TB in about 24hrs while keeping a 35-40% hit rate. I have zero idea why this occurred and will be keeping an eye on it to see how things continue.
request for feedback on this setup
This was a bit of a wild hair on this one as I did not really find anyone with the issue I had but I would love to hear your ideas and thoughts on this.
System specs
CPU 2X AMD 7642
RAM 512GB
HDD 16-16TB 4 x RAIDZ1
SSD 4-1TB Stripe L2 ARC