Hi,
I’m running a TrueNAS 25.04.2.5 primary and backup server. Data is replicated daily to backup server. We had a network outage that lead to the failure of one of the replication tasks that served a particular dataset. Since then, that particular replication task has stopped working.
I’ve tried deleting the receive_resume_token from the backup server, deleting the snapshot from the backup server that was being replicated during the network outage, and finally, I deleted the complete dataset from the backup server and started the replication task from scratch. No luck unfortunately. With each attempt, some of the 55TBs of data gets transferred for the replication task to fail eventually.
I suspected data corruption on backup but whatever corruption happened, it should’ve been “fixed” by deleting the dataset entirely. Scrub tasks on the primary don’t report any errors on their end, not that there should be any data corruption caused on primary by the network outage in the first place.
Below are some of the logs of replication failure after full dataset deletion on backup.
[2025/11/21 16:15:23] DEBUG [replication_task__task_7] [zettarepl.transport.local] [shell:1] [async_exec:133548] Running ['zfs', 'get', '-H', '-p', '-t', 'filesystem,volume', 'type', 'StoragePool/ImmutableStorage']
[2025/11/21 16:15:23] DEBUG [replication_task__task_7] [zettarepl.transport.local] [shell:1] [async_exec:133548] Success: 'StoragePool/ImmutableStorage\ttype\tfilesystem\t-\n'
[2025/11/21 16:15:23] DEBUG [replication_task__task_7] [zettarepl.transport.base_ssh] [ssh:root@172.10.70.30] [shell:2668] Connecting...
[2025/11/21 16:15:24] ERROR [retention] [zettarepl.replication.task.snapshot_owner] Failed to list snapshots with <Shell(<SSH Transport(root@172.10.70.30)>)>: DatasetDoesNotExistException(1, "cannot open 'StoragePool/ImmutableStorage': dataset does not exist\n"). Assuming remote has no snapshots
[2025/11/21 16:15:24] DEBUG [Thread-7982] [zettarepl.paramiko.replication_task__task_7] [chan 0] EOF received (0)
[2025/11/21 16:15:24] DEBUG [replication_task__task_7] [zettarepl.transport.base_ssh] [ssh:root@172.10.70.30] [shell:2668] [async_exec:133549] Waiting for exit status
[2025/11/21 16:15:24] DEBUG [Thread-7982] [zettarepl.paramiko.replication_task__task_7] [chan 0] EOF sent (0)
[2025/11/21 16:15:24] DEBUG [replication_task__task_7] [zettarepl.transport.base_ssh] [ssh:root@172.10.70.30] [shell:2668] [async_exec:133549] Error 1: "cannot open 'StoragePool/ImmutableStorage': dataset does not exist\n"
[2025/11/21 16:15:24] DEBUG [replication_task__task_7] [zettarepl.transport.local] [shell:1] [async_exec:133558] Running ['zfs', 'list', '-t', 'filesystem,volume', '-H', '-o', 'name', '-s', 'name', '-r', 'StoragePool/ImmutableStorage']
[2025/11/21 16:15:25] DEBUG [replication_task__task_7] [zettarepl.transport.base_ssh] [ssh:root@172.10.70.30] [shell:2668] [async_exec:133562] Error 1: "cannot open 'StoragePool/ImmutableStorage': dataset does not exist\n"
[2025/11/21 16:15:25] DEBUG [replication_task__task_7] [zettarepl.transport.base_ssh] [ssh:root@172.10.70.30] [shell:2668] [async_exec:133564] Error 1: "cannot open 'StoragePool/ImmutableStorage/aws': dataset does not exist\n"
[2025/11/21 18:36:08] WARNING [replication_task__task_7.process] [zettarepl.transport.base_ssh] [ssh:root@172.10.70.30] [replication_process:task_7] Listen side has not terminated within 5 seconds after connect side error
[2025/11/21 18:36:08] DEBUG [replication_task__task_7.process] [zettarepl.transport.base_ssh] [ssh:root@172.10.70.30] [shell:2668] [async_exec:133863] Stopping
[2025/11/21 18:36:08] DEBUG [replication_task__task_7.process] [zettarepl.paramiko.replication_task__task_7] [chan 116] EOF sent (116)
[2025/11/21 18:36:08] DEBUG [replication_task__task_7.process] [zettarepl.transport.local] [shell:1] [async_exec:133864] Stopping
[2025/11/21 18:36:08] ERROR [replication_task__task_7] [zettarepl.replication.run] For task 'task_7' unhandled replication error SshNetcatExecException(ExecException(1, "cannot send 'StoragePool/ImmutableStorage/aws': I/O error\n"), None) @cee:{"TNLOG": {"exception": "Traceback (most recent call last):\n File \"/usr/lib/python3/dist-packages/zettarepl/replication/run.py\", line 181, in run_replication_tasks\n retry_contains_partially_complete_state(\n File \"/usr/lib/python3/dist-packages/zettarepl/replication/partially_complete_state.py\", line 16, in retry_contains_partially_complete_state\n return func()\n ^^^^^^\n File \"/usr/lib/python3/dist-packages/zettarepl/replication/run.py\", line 182, in <lambda>\n lambda: run_replication_task_part(replication_task, source_dataset, src_context, dst_context,\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/lib/python3/dist-packages/zettarepl/replication/run.py\", line 278, in run_replication_task_part\n run_replication_steps(step_templates, observer)\n File \"/usr/lib/python3/dist-packages/zettarepl/replication/run.py\", line 672, in run_replication_steps\n replicate_snapshots(step_template, incremental_base, snapshots, include_intermediate, encryption, observer)\n File \"/usr/lib/python3/dist-packages/zettarepl/replication/run.py\", line 713, in replicate_snapshots\n run_replication_step(step, observer)\n File \"/usr/lib/python3/dist-packages/zettarepl/replication/run.py\", line 793, in run_replication_step\n ReplicationProcessRunner(process, monitor).run()\n File \"/usr/lib/python3/dist-packages/zettarepl/replication/process_runner.py\", line 33, in run\n raise self.process_exception\n File \"/usr/lib/python3/dist-packages/zettarepl/replication/process_runner.py\", line 37, in _wait_process\n self.replication_process.wait()\n File \"/usr/lib/python3/dist-packages/zettarepl/transport/ssh_netcat.py\", line 210, in wait\n raise SshNetcatExecException(connect_exec_error, self.listen_exec_error) from None\nzettarepl.transport.ssh_netcat.SshNetcatExecException: Passive side: cannot send 'StoragePool/ImmutableStorage/aws': I/O error", "type": "PYTHON_EXCEPTION", "time": "2025-11-21 18:36:08.030180"}}
[2025/11/21 18:36:08] DEBUG [replication_task__task_7.monitor] [zettarepl.transport.base_ssh] [ssh:root@172.10.70.30] [shell:2668] [async_exec:133863] Stopping
[2025/11/21 18:36:08] DEBUG [replication_task__task_7.monitor] [zettarepl.transport.local] [shell:1] [async_exec:133864] Stopping
[2025/11/21 18:36:08] DEBUG [replication_task__task_7.async_exec_tee.wait] [zettarepl.transport.base_ssh] [ssh:root@172.10.70.30] [shell:2668] [async_exec:133863] Error -1: None
[2025/11/21 18:36:08] INFO [replication_task__task_7.close_sftp] [zettarepl.paramiko.replication_task__task_7.sftp] [chan 5] sftp session closed.
[2025/11/21 18:36:08] DEBUG [replication_task__task_7.listen_exec.wait] [zettarepl.transport.base_ssh] [ssh:root@172.10.70.30] [shell:2668] [async_exec:133862] Error -1: ''
I’ve considered intermittent network outages that may cause it to fail but don’t see any outages on the network monitoring dashboards. Besides, other replication tasks have been working as expected it must be something else.
Please help with this and let me know if need further information.
Thanks,


