I’ve been offered a small amount of rack space in a COLO for free (call it a workplace benefit) and planning to move my offsite backup to here. This is giving me the chance to rethink my backup strategy.
I was previously rsyncing bi-weekly/monthly different datasets - but considering swapping this to encrypted zfs replication. Host will on its own isolated vlan with either wireguard/openvpn tunnel back to my house for remote access/replication.
I understand replication tasks (zfs send/receive) are block level tied to snapshots - but there’s 2 things im not to sure about:
If a dataset takes daily snapshots - with a max age of 1month is set to replicate every Saturday - is that replicating all snapshots or just the latest?
In this same example above - if the last replication happened (lets say 1-Jan) and was unable to replicate for 6 weeks(14-Feb) - how will it understand what blocks need to be copied if theirs no snapshot tieing it to the past (due to expiring after 4weeks)
Sorry I might not have been clear - I understood it would only be the delta - but is that from the latest point in time (including snapshots in between) or just delta from last to current replicated snapshot?
example: snapshot a is the latest ‘replicated’ one - b/c/d/e are in the middle - f is the latest snapshot ready to be replicated. So would b-f be individually replicated or just b-f combined into 1 snapshot on backup side?
Fantastic - Any more details on how the actual re-sync process works?
But if no common snapshot if found, it would have to be replication from scratch, i.e. wipe the destination dataset and replicate everything anew. You want to have some snapshots with long retention policy to prevent that.
If you allow the source to control snapshot deletion on the destination, it should only delete after sending the latest snapshots…
And if you set retain snapshots until replicated…
So, basically it should just work because the base snapshots on the destination won’t get deleted while the destination is down, until
It comes back and all the backlog snapshots are transmitted.
After running a full initial replication I let 2hours lapse and tried again. As suggested it does indeed fail as the source snapshots are gone and there is no common link.
So I retested with “Save Pending Snapshots” enabled and repeated the test. replication completed successfully and a small time later the snapshots on the source had been removed.
Interesting enough (more out of curiosity) I randomly nuked a few snapshots on the destination and it appears to take the latest common link and base its delta off that.
I would be interested if anyone has specifics on how the “Save Pending Snapshots” function actually works (i assume it sets some flag to stop the automated cleaning routine from removing them).
As far as understand, “Save Pending Snapshots” really is setting a snapshot to have the “Hold” flag set to prevent itself and any more recent snapshot from being destroyed, until the “Hold” as been removed with “Released” command.
Also, I believe setting the “Hold” flag multiple times increase an internal counter. The same goes with the “Release” flag by decreasing that same counter value. Deletion will only occur if the snapshot has expired and there is no “Hold” in place.