Replication or Rsync? Rollback causes replication to break on backup system

Jeff_Goldrich · May 26, 2025, 8:32am

Ok… so I’m sitting here for 4 days waiting for my 28TB of data to be replicated on my backup system - just because I rolled back a snap to see if I could troubleshoot some Plex problem I had. Stupid me did not realize my nightly replication task would now fail unless I checked “replicate from scratch”. This is unacceptable although I realize this is how it works. To re-transfer 28TB of data is not trivial. My question is would Rsync be a better option in this scenario? I would be able to rollback snaps on the primary and not have to transfer 28TB all over again. Am I missing something here…

It also may be nice to add “Caution-this will break any replication on remote systems” to the rollback dialog.

WARNING: Rolling the dataset back destroys data on the dataset and can destroy additional snapshots that are related to the dataset. This can result in permanent data loss! Do not roll back until all desired data and snapshots are backed up.

winnielinnie · May 26, 2025, 12:30pm

You rolled back the primary dataset? Was the rollback so far into the past that its latest snapshot had already been pruned on the backup dataset?

What did you need to troubleshoot on the primary that couldn’t be done without a rollback?

Two options.

You can “clone” from a snapshot into a new temporary dataset that you can point Plex to.
You can retrieve individual files and folders from a snapshot without the need to rollback anything.

These are alternatives that don’t require rolling back.

There are also ZFS bookmarks, which can get you out of these situations (sometimes), but it’s not exposed in the GUI, unfortunately.

smione · May 26, 2025, 12:35pm

What if, you could rollback your backup system to one of the common snapshots that may still exist in your primary? Have you tried that yet?

winnielinnie · May 26, 2025, 12:40pm

If you roll back to a snapshot, you permanently lose all snapshots that came after it. There’s no “undo” button.

The only exception to this is if you had created a pool checkpoint before doing the rollback.

Jeff_Goldrich · May 26, 2025, 12:44pm

Yes… rolled back the primary about 5 days… Tried to reinstall Plex but problems with some clients on the same network persisted. Could not troubleshoot so decided to rollback ix-applications dataset.

Jeff_Goldrich · May 26, 2025, 12:49pm

Didn’t think about rolling back the replicated data. Never wanted to mess with replicated system. I think that would break the replication task as well.

smione · May 26, 2025, 12:53pm

Technically its already broken right? As you need to restart the 28TB sync from scratch. So if you could figure out a common latest snapshot that exists on both sides, rolling back the backup dataset to that, should ideally let you continue the replication.

winnielinnie · May 26, 2025, 12:53pm

Your “ix-applications” dataset is 28 TB? How?

winnielinnie · May 26, 2025, 12:56pm

TrueNAS’s zettarepl would have detected that. It parses through the snapshot names and tries to automatically find a match.^[1]

If the backup’s snapshots are being pruned and it no longer has a snapshot that exists earlier than 5 days ago, then there’s nothing that can be done.

Unless you know for sure there is a pair of a matching common snapshot? If the GUI isn’t working, you can try a one-time replication in the command-line to bring them both to the source’s latest snapshot. ↩︎

Jeff_Goldrich · May 26, 2025, 12:58pm

I understand that all snaps are lost to the point of the rollback. Unfortunately that didn’t solve the problem. I moved the Plex server to my Mac as it will be easier to troubleshoot and better transcoding abilities compared to my HPe Microserver 10+. The Mac is on 24/7 anyway.

winnielinnie · May 26, 2025, 12:59pm

Still wondering how your “ix-applications” dataset is 28 TB.

Jeff_Goldrich · May 26, 2025, 1:02pm

No… very small indeed. Rolling back that snap (which is part of the main “tank” replication task blew the entire 28TB task.

winnielinnie · May 26, 2025, 1:07pm

Are you sure?

How does rolling back only the “ix-applications” dataset affect your entire pool and a larger 28TB dataset?

What did you roll back exactly?

Jeff_Goldrich · May 26, 2025, 1:07pm

I can’t take the chance of an incomplete or damaged replicated system even if I know exactly how to do that (and I don’t). at this point the safest thing (for me) to do is start from scratch.

Jeff_Goldrich · May 26, 2025, 1:17pm

Found the tank/ix-applications snap and rolled it back

winnielinnie · May 26, 2025, 1:18pm

How does that break everything else not related to “ix-applications”?

Did you create a recursive replication task that starts at the pool’s root dataset?

Jeff_Goldrich · May 26, 2025, 1:19pm

Yes exactly… Rolling back a few GB that was contained in tank/xi-applications destroyed my ability to run the replication task “tank”.

winnielinnie · May 26, 2025, 1:22pm

Then there’s a chance you can replicate “ix-systems” back to the primary pool from the backup pool, which could get you back to a state of having common snapshots.

I’m not sure how this might affect your currently running applications, since the ix-applications is likely to be “busy”, as apps and files are actively using it.

Another option is to treat “ix-applications” as a separate replication task, so that it’s not involved with your storage datasets. This would be much simpler to deal with in the future.

Jeff_Goldrich · May 26, 2025, 1:32pm

That is a good idea but alas that ship has sailed. I am 2 days into this new replication job and have created - in addition to the root “tank” task, a replication task for every dataset. This should make it easier to restore individual datasets with the restore button included in the replication dialog. I was thinking to switching to Rsync to bypass problems in the future but I’m still sticking to replication. I will however use Rsync on another server I have sitting around to back up the more important files on the main server.