Replication using SSH+Netcat fails due to helper script always passing dualstack_ipv6=true to python when using ipv4 socket

Hi all,

I am having issues running SSH+Netcat replications from one TrueNAS-scale machine (ElectricEel-24.10.2.1) to another over a site-to-site wireguard VPN. The VPN is functioning, and I can perform SSH replications without issue (but slower).

I am not an IT professional, just a hobbyist, but I did some troubleshooting with my friend chatGPT and it seems like the cause may be that zettarepl passes a command with “dualstack_ipv6=True” to python even though it has opened an ipv4 socket which causes it to fail and report an error.

I can reproduce the error on the passive system with this command form the shell:

python3 /tmp/zettarepl--transport--ssh_netcat_helper.py--f999cc87214dd28c4d49258b7f62b967 --listen 0 receive RaptorBackup/Photos

Which returns:

  File "/tmp/zettarepl--transport--ssh_netcat_helper.py--f999cc87214dd28c4d49258b7f62b967", line 61, in <module>
    s = socket.create_server((args.listen, port), family=address_family(args.listen), dualstack_ipv6=True)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/socket.py", line 901, in create_server
    raise ValueError("dualstack_ipv6 requires AF_INET6 family")
ValueError: dualstack_ipv6 requires AF_INET6 family

Please refer to the temporary file created on replication at /tmp/zettarepl--transport--ssh_netcat_helper.py--* where there is a line which reads s = socket.create_server((args.listen, port), family=address_family(args.listen), dualstack_ipv6=True).

If I copy that file, change it to ipv6=False, and then run it with: admin@truenas[~]$ python3 /tmp/fixed_helper.py --listen 0 receive RaptorBackup/Photos it succeeds in opening a port.

This seems like a significant bug and I’m surprised I haven’t found any reference to it searching the web/forums. Anyone have input or can suggest a workaround?

TLDR: When using SSH+Netcat transport in replication tasks on TrueNAS SCALE with IPv6 enabled at the kernel level (but blocked at the network layer), the replication helper script intermittently crashes due to an invalid use of dualstack_ipv6=True on an AF_INET (IPv4) socket. The replication helper script always passes dualstack_ipv6=True to socket.create_server(...) regardless of the socket family. This raises an exception when the socket family is AF_INET.

Thanks

I am also having the same problem and was able to replicate your tests. This is the first time I am setting up replication, let alone SSH+NETCAT.

If we refer to the Python documentation ( Google search: Python docs sockets low-level networking ), it notes that dualstack_ipv6 should be used in conjunction with no listener address specified, which makes sense since dual stack basically means you want to listen on multiple addresses.

Therefore the ‘listen’ argument needs be optional when this mode is opted, right now its mandatory and you must supply either a valid v4 or v6 address.

The Github repo for this replication code is over at Github ( search for: repo:truenas/zettarepl )

More specifically in regards to the helper script, ( /truenas/zettarepl/blob/master/zettarepl/transport/ssh_netcat_helper.py )

It seems this code was added in mid-year 2024. It might not have been fully tested since I don’t see any integration tests that specifically tests SSH+NETCAT.

I would imagine most people are probably just using SSH for the data transfer also - maybe an older version of TrueNAS Scale? (if they are even using replication)

Have you raised a ticket yet?

I don’t have any issues using NETCAT however if I setup replication over SSH and then later decided to switch to NETCAT it throws errors however the fix is to simply delete the replication task and recreate selecting NETCAT from the start.

Having recently tried to cut my offsite backups over to SSH+Netcat, I’ve discovered I’m seeing this issue as well. I’ll open a ticket for this, unless there’s one already, as it’d be quite handy to have in place.

1 Like

When you say NETCAT, do you mean SSH + NC, as I think I’m managing to miss a plain netcat option. This is running over a site to site VPN link, so I’m more than happy to throw caution to the wind & bin SSH to speed things up a tiny bit.

Thanks in advance!

Yes

Brilliant, Thanks for confirming (and unrealtedly, nice username here btw, that brings back memories), I best go take a closer look at the truncated error message the UI returns. Cheers for your help!

1 Like

Just throwing it out there that this also totally fixed my problem. I’m trying to send via an IPv4 tailscale address and the helper script just dies unless I replace dualstack_ipv6=True with False. Did anyone ever file a ticket about this?

As a side note, with an extremely high latency connection (Japan → Germany), I was seeing about 60 Mbps over SSH and now get approximately 900 Mbps over SSH+NC. It makes a huge difference!

Japan to Germany with netcat? Here’s me hoping that’s over a VPN. :slight_smile:

Yes, it’s through a tailscale tunnel. Shipping my data halfway around the world deserves a little protection!