Cleaning up sent snapshots automatically without sender delete permissions

Problem/Justification
Improve the safety of truenas backups done via zfs replication from accidental infections. Cryptolockers are really nasty programs and the groups using them do their best to remove any backups you may have.

Separate user accounts and environments (deliberately NOT integrating with AD, for example) are a first step towards walling off the backup from deletion.

Impact
It’s an optional addition that isn’t going to impact much anyone.

User Story
I want to be able to govern exactly what permissions my zfs sender and receiver have, to be the absolute minimum necessary to do their jobs.

This is already possible using linux accounts; by setting the zfs permissions on the receiving side to a restrictive subset for the user, I’m able to create a custom user which the sender uses to ssh into the receiver, which is able to write a new snapshot, but not delete an existing one.

All of this creates a problem: now you have to manually manage the backups on the receiving end or it will inevitably fill up. Especially if IT sizes the NAS on the receiving end as small as they can get away with. TrueNas also has a solution to that: automatically deleting old snapshots. This works wonderfully on the sender, but not on the receiver: the sender is the one that deletes the snapshots, which is a no-go here (permission denied).

I’d like to be able to set the sender to not delete them, but the receiver to do so. Currently I’m just doing it manually, or could write a script;

zfs list -t snapshot -o name -S creation | grep ^tank/zroot/$dataset/.*@auto | tail -n +$numToKeep  #  | xargs -n 1 zfs destroy -vr

Note; the script above uses a fixed # of snapshots to keep (remove the # to switch from testing to ‘actually delete’ mode). While that’s okay for the sender, for the receiver that still creates the possibility of abuse: if the sender is compromised, then the bad actor could ‘flood’ the receiver with however many snapshots it takes to push the oldest backup to a date past the infection. I’d like the ‘real thing’ when added to the UI to instead use the snapshot’s creation time: delete everything older than X days.

And I suppose it has an O(N^2) problem if used literally; the middleware probably has a database that could rather make it O(N ln(N)); useful if you have thousands of snapshots.

So I’d like the ‘real thing’ in middlewared to do this:

  • Add a new type of snapshot task “clean old snapshots”
  • Setting: ‘maximum age’ (default 90 days?)
  • Setting: Run time / interval (default 12:00AM daily?)