Replication Task and SSH Stuck

Hello,

I would like to understand something about Replication Task.

In fact, I have 2 TrueNas Scale server with this configuration:

TrueNas1 - Version 24.04.2.5% – OpenSSL 3.0.13 30 Jan 2024 (Library: OpenSSL 3.0.13 30 Jan 2024)
TrueNas2 - Version 24.10.0.2% – OpenSSL 3.0.13 30 Jan 2024 (Library: OpenSSL 3.0.13 30 Jan 2024)

Intel(R) Xeon(R) Silver 4309Y CPU @ 2.80GHz
128 Go RAM
1 ZPOOL - 1 x RAIDZ1 | 4 wide | 10.91 TiB

I’m trying to set up a Replication Task between the two.

In first,
TrueNas 1 TO TrueNas 2 → I can see the dataset but the replication task is stuck with this following error : Connection closed by 10.2.1.101 port 22 Broken pipe.

When I’m trying to perform a ssh connexion, I have the following message
debug1: SSH2_MSG_KEX_ECDH_REPLY received.
If I had this Auxiliary Parameters KexAlgorithms=curve25519-sha256
The SSH is OK, but I can’t see the dataset.

TrueNas 2 TO TrueNas 1–> I can’t see the dataset, for the ssh connexion à Have the same problem and the same workaround.

I have created a User with Admin Group and the same ssh_key.

Do you think the difference version between both is the problem ?

My two TrueNas are in the same Netwook.

Thanks in advance for your reply.

Regards,

Hi and welcome to the forums.

Don’t ask me why but I’ve never been able to get the replication wizard to work since the TN CORE days. However the good news is I can get it to work the manual way.

This can get confusing when talking about PUSH and PULL so Im going to assume you want PUSH and if you don’t then essentially do the opposite of what I say.

  1. Create a dataset on the secondary system called something like ‘Home’ just a plain old dataset leave the defaults nothing fancy.
  2. Create a dedicated replication user on the secondary system Im calling mine bob. Bob won’t need a password in fact disable his password. Point Bob’s home directory at the dataset you created above and tick ‘Create Home Directory’. Give Bob a shell zsh will do fine and check ‘Allow all sudo commands with no password’.
  3. Now hop onto the primary system and go to ‘Credentials’, ‘Backup’ and create an ‘SSH Keypair’ then copy the public key into Bob’s account on the secondary system.
  4. Now back on the primary same location create an ‘SSH Connections’ but change from ‘semi-automatic’ (LOL) to manual. Fill in the blanks and make sure you add username as bob (or whatever you called it). Click discover remote host key and you’re done.
  5. Now go configure your replication task using the bob connection and at some point you will most likely be prompted to ‘Use Sudo For ZFS Commands’ click it and your should be good to go.

PS: You will need the SSH Service running on the backup system but I’d suggest you turn off ‘Allow Password Authentication’.

Hi,

Thanks for your reply,

I expressed myself badly :wink:

Actually, I’m not trying to use the replication wizard, but rather manual replication.

Actions I took before creating the post :

– I created a new dataset on my secondary system.
– I created a new user “replication” with the same name, IUD and GUID like my primary server ( no password, and allow all sudo commands ) but his home directory is not in my new dataset so , I will listen you about that :wink: .
– I created a manuel connection :wink: ( ssh key are already present for my second user ) , Discover remote is OK.

– I created a Replication task with " Use for ZFS Commands " , I see all my dataset on my secondary server.

– SSH Service is running without the option “Allow Password Authentification”

Do you used the Weak Ciphers ?

– The task is well create, but I have a new message with permission denied.
Before I had a Connection Closed.

When I “click” On my new dataset ’ secondary Server ’ I have this following message Error

CallError

[EFAULT] Failed retreiving USER quotas for tank1/RESTAURATION

Maybe Is better cause is not the same message like connection closed.

I will try to understand why I have a permission denied !

Regards.

This normally happens DURING replication. Once replication is complete this error should disappear.

Alright,

in the meantime, I deleted the task that was in error and then recreated it.

I’m testing with an “empty” dataset that has just 4 files.

When I check my logs , I have that

2025/06/18 10:21:51] INFO     [replication_task__task_19] [zettarepl.replication.run] After recoverable error sleeping for 4 seconds
[2025/06/18 10:21:55] INFO     [replication_task__task_19] [zettarepl.replication.pre_retention] Pre-retention destroying snapshots: []
[2025/06/18 10:21:55] INFO     [replication_task__task_19] [zettarepl.replication.run] For replication task 'task_19': doing push from 'tank1/TEST-RESTAURATION' to 'tank1/REPLICATION' of snapshot='TEST-auto-2025-06-18_10-09' incremental_base=None include_intermediate=False receive_resume_token=None encryption=False
[2025/06/18 10:23:55] WARNING  [replication_task__task_19] [zettarepl.replication.run] For task 'task_19' at attempt 4 recoverable replication error RecoverableReplicationError('Connection closed by 10.1.2.3  port 22\nBroken pipe.')
[2025/06/18 10:23:55] INFO     [replication_task__task_19] [zettarepl.replication.run] After recoverable error sleeping for 8 seconds
[2025/06/18 10:24:04] INFO     [replication_task__task_19] [zettarepl.replication.pre_retention] Pre-retention destroying snapshots: []
[2025/06/18 10:24:04] INFO     [replication_task__task_19] [zettarepl.replication.run] For replication task 'task_19': doing push from 'tank1/TEST-RESTAURATION' to 'tank1/REPLICATION' of snapshot='TEST-auto-2025-06-18_10-09' incremental_base=None include_intermediate=False receive_resume_token=None encryption=False
[2025/06/18 10:26:04] WARNING  [replication_task__task_19] [zettarepl.replication.run] For task 'task_19' at attempt 5 recoverable replication error RecoverableReplicationError('Connection closed by 10.1.2.3 port 22\nBroken pipe.')
[2025/06/18 10:26:04] ERROR    [replication_task__task_19] [zettarepl.replication.run] Failed replication task 'task_19' after 5 retries

It looks like your snapshots are out of sync and you have no incremental base to continue. In replication select ‘Replication from scratch’ and that should sort it.

Replication from scratch is already enable ^^

I think I will delete my snapshots and tasks and recreate them

1 Like

I’d delete the dataset on the receive side and start again.

I suppose \ I hope :smiley: I’m close to success

I first time I selected “Include Dataset Properties”, with Information Properties Override & Properties Exclude are empty.

Error 2: "missing argument for 'x' option\nusage:\n\treceive [-vMnsFhu] ...

So, I removed " Include DataSet Properties,

I see again " Error 2: "missing argument for ‘x’ "

[2025/06/18 15:14:03] DEBUG    [replication_task__task_22] [zettarepl.transport.base_ssh] [ssh:root@10.2.1.101] [shell:17] [async_exec:558] Waiting for exit status
[2025/06/18 15:14:03] DEBUG    [replication_task__task_22] [zettarepl.transport.base_ssh] [ssh:root@10.2.1.101] [shell:17] [async_exec:558] Error 2: "missing argument for 'x' option\nusage:\n\treceive [-vMnsFhu] [-o <property>=<value>] ... [-x <property>] ...\n\t    <filesystem|volume|snapshot>\n\treceive [-vMnsFhu] [-o <property>=<value>] ... [-x <property>] ... \n\t    [-d | -e] <filesystem>\n\treceive -A <filesystem|volume>\n\nFor the property list, run: zfs set|get\n\nFor the delegated permission list, run: zfs allow|unallow\n\nFor further help on a command or topic, run: zfs help [<topic>]\n"
[2025/06/18 15:14:03] DEBUG    [replication_task__task_22] [zettarepl.paramiko.replication_task__task_22] [chan 5] EOF sent (5)
[2025/06/18 15:14:03] DEBUG    [replication_task__task_22] [zettarepl.transport.local] [shell:1] [async_exec:560] Running Pipe((['sh', '-c', '(zfs send -V -L -c tank1/test-replication@test-replication-2025-06-18_14-27 & PID=$!; echo "zettarepl: zfs send PID is $PID" 1>&2; wait $PID)'], ['ssh', '-i', '/tmp/tmp18hfc6u4', '-o', 'UserKnownHostsFile=/tmp/tmp1rrxl7ps', '-o', 'StrictHostKeyChecking=yes', '-o', 'BatchMode=yes', '-o', 'ConnectTimeout=10', '-p22', 'root@10.2.1.101', "sh -c 'PATH=$PATH:/usr/local/sbin:/usr/sbin:/sbin sudo zfs recv -s -F -x sharenfs -x sharesmb -x mountpoint tank1/test-replication'"]))
[2025/06/18 15:14:13] DEBUG    [replication_task__task_22.progress_observer] [zettarepl.transport.local] [shell:1] [async_exec:561] Running ['ps', '-o', 'command', '-p', '3712091']
[2025/06/18 15:14:13] DEBUG    [replication_task__task_22.progress_observer] [zettarepl.transport.local] [shell:1] [async_exec:561] Success: 'COMMAND\nzfs: sending tank1/test-replication@test-replication-2025-06-18_14-27 (100%: 57.\n'
[2025/06/18 15:14:13] DEBUG    [replication_task__task_22.progress_observer] [zettarepl.transport.progress_report_mixin] Unable to find ZFS send progress in 'COMMAND\nzfs: sending tank1/test-replication@test-replication-2025-06-18_14-27 (100%: 57.\n'
[2025/06/18 15:14:23] DEBUG    [replication_task__task_22.progress_observer] [zettarepl.transport.local] [shell:1] [async_exec:562] Running ['ps', '-o', 'command', '-p', '3712091']
[2025/06/18 15:14:23] DEBUG    [replication_task__task_22.progress_observer] [zettarepl.transport.local] [shell:1] [async_exec:562] Success: 'COMMAND\nzfs: sending tank1/test-replication@test-replication-2025-06-18_14-27 (100%: 57.\n'
[2025/06/18 15:14:23] DEBUG    [replication_task__task_22.progress_observer] [zettarepl.transport.progress_report_mixin] Unable to find ZFS send progress in 'COMMAND\nzfs: sending tank1/test-replication@test-replication-2025-06-18_14-27 (100%: 57.\n'
[2025/06/18 15:14:33] DEBUG    [replication_task__task_22.dataset_size_observer] [zettarepl.transport.local] [shell:1] [async_exec:563] Running ['zfs', 'get', '-H', '-p', '-t', 'filesystem,volume', 'used', 'tank1/test-replication']
[2025/06/18 15:14:33] DEBUG    [replication_task__task_22.dataset_size_observer] [zettarepl.transport.local] [shell:1] [async_exec:563] Success: 'tank1/test-replication\tused\t214272\t-\n'
[2025/06/18 15:14:33] DEBUG    [replication_task__task_22.dataset_size_observer] [zettarepl.transport.base_ssh] [ssh:root@10.2.1.101] [shell:17] [async_exec:564] Running ['zfs', 'get', '-H', '-p', '-t', 'filesystem,volume', 'used', 'tank1/test-replication'] with sudo=False
[2025/06/18 15:14:33] DEBUG    [replication_task__task_22.dataset_size_observer] [zettarepl.paramiko.replication_task__task_22] [chan 6] Max packet in: 32768 bytes
[2025/06/18 15:14:33] DEBUG    [Thread-49] [zettarepl.paramiko.replication_task__task_22] [chan 6] Max packet out: 32768 bytes
[2025/06/18 15:14:33] DEBUG    [Thread-49] [zettarepl.paramiko.replication_task__task_22] Secsh channel 6 opened.
[2025/06/18 15:14:33] DEBUG    [Thread-49] [zettarepl.paramiko.replication_task__task_22] [chan 6] Sesch channel 6 request ok
[2025/06/18 15:14:33] DEBUG    [replication_task__task_22.dataset_size_observer] [zettarepl.transport.base_ssh] [ssh:root@10.2.1.101] [shell:17] [async_exec:564] Reading stdout
[2025/06/18 15:14:33] DEBUG    [Thread-49] [zettarepl.paramiko.replication_task__task_22] [chan 6] EOF received (6)
[2025/06/18 15:14:33] DEBUG    [replication_task__task_22.dataset_size_observer] [zettarepl.transport.base_ssh] [ssh:root@10.2.1.101] [shell:17] [async_exec:564] Waiting for exit status
[2025/06/18 15:14:33] DEBUG    [replication_task__task_22.dataset_size_observer] [zettarepl.transport.base_ssh] [ssh:root@10.2.1.101] [shell:17] [async_exec:564] Success: 'tank1/test-replication\tused\t178560\t-\n'
[2025/06/18 15:14:33] DEBUG    [replication_task__task_22.dataset_size_observer] [zettarepl.paramiko.replication_task__task_22] [chan 6] EOF sent (6)
[2025/06/18 15:14:33] DEBUG    [replication_task__task_22.progress_observer] [zettarepl.transport.local] [shell:1] [async_exec:565] Running ['ps', '-o', 'command', '-p', '3712091']
[2025/06/18 15:14:33] DEBUG    [replication_task__task_22.progress_observer] [zettarepl.transport.local] [shell:1] [async_exec:565] Success: 'COMMAND\nzfs: sending tank1/test-replication@test-replication-2025-06-18_14-27 (100%: 57.\n'
[2025/06/18 15:14:33] DEBUG    [replication_task__task_22.progress_observer] [zettarepl.transport.progress_report_mixin] Unable to find ZFS send progress in 'COMMAND\nzfs: sending tank1/test-replication@test-replication-2025-06-18_14-27 (100%: 57.\n'
[2025/06/18 15:14:43] DEBUG    [replication_task__task_22.progress_observer] [zettarepl.transport.local] [shell:1] [async_exec:566] Running ['ps', '-o', 'command', '-p', '3712091']
[2025/06/18 15:14:43] DEBUG    [replication_task__task_22.progress_observer] [zettarepl.transport.local] [shell:1] [async_exec:566] Success: 'COMMAND\nzfs: sending tank1/test-replication@test-replication-2025-06-18_14-27 (100%: 57.\n'
[2025/06/18 15:14:43] DEBUG    [replication_task__task_22.progress_observer] [zettarepl.transport.progress_report_mixin] Unable to find ZFS send progress in 'COMMAND\nzfs: sending tank1/test-replication@test-replication-2025-06-18_14-27 (100%: 57.\n'

Do you have any ideas ?
I’m so sorry, I’m bad :confused:

Personally I’d do a clean start from a replication point of view. Remove any replication users you created and any dataset for home folders, any SSH key pairs and SSH connections and also any sent datasets and snapshots because it shouldn’t be this hard and I can’t tell the issue from here.

Then start from scratch using the above as a guide. Feel free to share screenshots if you are unsure of anything as thats probably easier.