Replication task issue

Hello!
I couple of months ago I finally decided to backup my main TrueNAS machine to a remote machine (still TrueNAS Scale). To do so, I use the replication task, sending the most recent snapshots of my three pools (pool0, pool1, pool2), to the dataset backuptank/backup/ on the remote machine.
In particular I snapshot my main pools daily, retaining the snaps for two weeks, and send them daily (both tasks at midnight).

Well, it has been a pain in the neck.

I guess the replication is straightforward to configure, and everything seems to work… until it doesn’t.
Every now and then, the replication fails with a message of this kind:

[EFAULT] cannot receive incremental stream: most recent snapshot of backuptank/backup/pool0 does not match incremental source.

The full log message is the following:

Error: [2024/12/16 00:00:05] INFO     [Thread-267] [zettarepl.paramiko.replication_task__task_18] Connected (version 2.0, client OpenSSH_9.2p1)
[2024/12/16 00:00:05] INFO     [Thread-267] [zettarepl.paramiko.replication_task__task_18] Authentication (publickey) successful!
[2024/12/16 00:01:36] INFO     [replication_task__task_18] [zettarepl.retention.calculate] Not destroying 'auto-2024-12-15_00-00' as it is the only snapshot left for naming schema 'auto-%Y-%m-%d_%H-%M'
[2024/12/16 00:01:36] INFO     [replication_task__task_18] [zettarepl.retention.calculate] Not destroying 'auto-2024-12-15_00-00' as it is the only snapshot left for naming schema 'auto-%Y-%m-%d_%H-%M'
[2024/12/16 00:01:36] INFO     [replication_task__task_18] [zettarepl.retention.calculate] Not destroying 'auto-2024-12-15_00-00' as it is the only snapshot left for naming schema 'auto-%Y-%m-%d_%H-%M'
[2024/12/16 00:01:37] INFO     [replication_task__task_18] [zettarepl.replication.pre_retention] Pre-retention destroying snapshots: []
[2024/12/16 00:01:37] INFO     [replication_task__task_18] [zettarepl.replication.run] For replication task 'task_18': doing push from 'pool0' to 'backuptank/backup/pool0' of snapshot='auto-2024-12-16_00-00' incremental_base='auto-2024-12-15_00-00' include_intermediate=False receive_resume_token=None encryption=False
[2024/12/16 00:01:38] ERROR    [replication_task__task_18] [zettarepl.replication.run] For task 'task_18' unhandled replication error ExecException(1, 'cannot receive incremental stream: most recent snapshot of backuptank/backup/pool0 does not\nmatch incremental source\n')
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zettarepl/replication/run.py", line 181, in run_replication_tasks
... 16 more lines ...
    raise self.process_exception
  File "/usr/lib/python3/dist-packages/zettarepl/replication/process_runner.py", line 37, in _wait_process
    self.replication_process.wait()
  File "/usr/lib/python3/dist-packages/zettarepl/transport/ssh.py", line 167, in wait
    stdout = self.async_exec.wait()
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/zettarepl/transport/async_exec_tee.py", line 104, in wait
    raise ExecException(exit_event.returncode, self.output)
zettarepl.transport.interface.ExecException: cannot receive incremental stream: most recent snapshot of backuptank/backup/pool0 does not
match incremental source

A few notes:

  • checking the list of snapshots on both machine, they coincide (minus the most recent one, obviously), but they differ in the columns USED and REFER;
  • if I delete the last two or three snapshots (recursively), I can manually restart the replication and it works for a few days;
  • I set the readonly flag on, recursively on backuptank/backup;
  • as far as I know, the datasets are not mounted on the backup machine (they are not at the moment, and I guess they won’t be during the replication);
  • as far as I know, the datasets on the backup machine are not in use by apps or anything else (I don’t have apps, sharing services are disabled).

Does anyone have any advice or hint on how to solve the issue? Or at least understand where the problem is.
I really don’t what is causing this and it’s unnerving.

Thanks a lot! :slight_smile:

I don’t have a lot of experience of replications, but my guess is that you have a mis-match of the naming schemas between a snapshot created by the replication and snapshots created on a schedule.

Mmm, I am not sure about that…
If I compare the output of zfs list -t snapshot pool0 on both the machines, I find the same names.
Also, when configuring the replication task from the GUI, it is possible to select directly the periodic snapshots, which is what I did. I don’t think there is room for mistakes (?)

However, I don’t understand why the logs says that 'auto-2024-12-15_00-00' is the only snapshot left for naming schema 'auto-%Y-%m-%d_%H-%M', since I can see the snapshots back until two weeks ago.