Replication using SSH+Netcat fails due to helper script always passing dualstack_ipv6=true to python when using ipv4 socket

Hi all,

I am having issues running SSH+Netcat replications from one TrueNAS-scale machine (ElectricEel-24.10.2.1) to another over a site-to-site wireguard VPN. The VPN is functioning, and I can perform SSH replications without issue (but slower).

I am not an IT professional, just a hobbyist, but I did some troubleshooting with my friend chatGPT and it seems like the cause may be that zettarepl passes a command with “dualstack_ipv6=True” to python even though it has opened an ipv4 socket which causes it to fail and report an error.

I can reproduce the error on the passive system with this command form the shell:

python3 /tmp/zettarepl--transport--ssh_netcat_helper.py--f999cc87214dd28c4d49258b7f62b967 --listen 0 receive RaptorBackup/Photos

Which returns:

  File "/tmp/zettarepl--transport--ssh_netcat_helper.py--f999cc87214dd28c4d49258b7f62b967", line 61, in <module>
    s = socket.create_server((args.listen, port), family=address_family(args.listen), dualstack_ipv6=True)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/socket.py", line 901, in create_server
    raise ValueError("dualstack_ipv6 requires AF_INET6 family")
ValueError: dualstack_ipv6 requires AF_INET6 family

Please refer to the temporary file created on replication at /tmp/zettarepl--transport--ssh_netcat_helper.py--* where there is a line which reads s = socket.create_server((args.listen, port), family=address_family(args.listen), dualstack_ipv6=True).

If I copy that file, change it to ipv6=False, and then run it with: admin@truenas[~]$ python3 /tmp/fixed_helper.py --listen 0 receive RaptorBackup/Photos it succeeds in opening a port.

This seems like a significant bug and I’m surprised I haven’t found any reference to it searching the web/forums. Anyone have input or can suggest a workaround?

TLDR: When using SSH+Netcat transport in replication tasks on TrueNAS SCALE with IPv6 enabled at the kernel level (but blocked at the network layer), the replication helper script intermittently crashes due to an invalid use of dualstack_ipv6=True on an AF_INET (IPv4) socket. The replication helper script always passes dualstack_ipv6=True to socket.create_server(...) regardless of the socket family. This raises an exception when the socket family is AF_INET.

Thanks

I am also having the same problem and was able to replicate your tests. This is the first time I am setting up replication, let alone SSH+NETCAT.

If we refer to the Python documentation ( Google search: Python docs sockets low-level networking ), it notes that dualstack_ipv6 should be used in conjunction with no listener address specified, which makes sense since dual stack basically means you want to listen on multiple addresses.

Therefore the ‘listen’ argument needs be optional when this mode is opted, right now its mandatory and you must supply either a valid v4 or v6 address.

The Github repo for this replication code is over at Github ( search for: repo:truenas/zettarepl )

More specifically in regards to the helper script, ( /truenas/zettarepl/blob/master/zettarepl/transport/ssh_netcat_helper.py )

It seems this code was added in mid-year 2024. It might not have been fully tested since I don’t see any integration tests that specifically tests SSH+NETCAT.

I would imagine most people are probably just using SSH for the data transfer also - maybe an older version of TrueNAS Scale? (if they are even using replication)

Have you raised a ticket yet?