Pull Replication Issues and Server Recovery

I’d seen a number of postings in various places recommending pull replications instead of push replications so that a compromise of the primary server would not let the attacker destroy the backups.

My backups were a bit shaky anyway (poorly understood, overly complicated, insufficiently tested) so I decided to rebuild my whole process using new snapshots and pull replications. It’s working but it’s left me with some questions

  1. My snapshots are generally hourly with three day lifetimes. I think that means that if I had a three days where the backup server was unable to pull replications, the primary and the backup servers would not no longer have any snapshots in common and the replication wold fail when the issue was corrected. The only way to recover would be to rerun the replication with “Replication from scratch” enable which would result in sending everything back over the wire. However, with push replications, enabling “Save pending snapshots” would avoid that outcome. Is that correct?
  2. I have two pools and a single snapshot/pull replication for each of them. I’m assuming that if I use the “restore” option on the pull replication, it will push the latest version of the pool back to the primary. Is that correct? Am I correct in thinking that rather than using the “restore” option, I could restore an single dataset or branch of the tree with a one-off push replication back to the primary server if needed?
  3. Since the snapshots are of the entire pool, they contain multiple .ix-virt datasets which account for 5221 snapshots and easily push the total number of snapshots into the “excessive snapshots” warning zone. Do I need those datasets or will the be recreated when I reload the configuration file on a replacement primary server?
  4. The local backup server’s role in my backup process is to enable a complete local restore from a catastrophic hardware failure of the primary server. For a lesser issue, snapshots on the primary could be used for either to restore individual pools or to create clones for restored inside datasets.
    What process would I follow to do a complete server restore from the backup server? Is it as simple as 1) build a new server, 2) recreate the pools with the same names, 3) add an ssh keypair for the new server on the backup server, 4) push the pools back, 5) reload the old config file, and 6) reboot? That gets me all the data and all the apps back or are there some “not quite that simple” things that I’m missing?

I’m not opposed to adding more complexity to my process. I want it to be as complicated as necessary but not more complicated than that. BTW, there is another part of the backup process for off-site backups.

Any comments and insights would be greatly appreciated.

Bill

PS. I ran into some anomalies with password encrypted datasets under encrypted roots. I just couldn’t get a replication to create the password encrypted dataset on the backup server under an encrypted root and ended up having to move the dataset under an unencrypted root. But this was I have one in my NVMe pool and one in my HDD pool and it was only any issue with the one in the HDD pool where it kept insisting that I was trying to create an unencrypted dataset under an encrypted root. Clearly encryption is another area where my tactical knowledge is deficient. Any speculation as to the issue?