SCALE: Replication task setup /w default certificates not working

Following up on this here…

My main server is on the current Dragonfish stable, while the cold backup still is on Cobia 23.10.1.

Trying to either use an existing pull replication (backup pulling from main) or setting a new one up will fail because of the self-signed certificate.

Both systems have been moved over from CORE to Bluefish a while ago, and then moved on. So, they started with root, but have been added the admin user, according to the documentation hub. Suddenly, replications would fail, and I had to roll back the backup server to 23.10.1 to have the main on newer releases and have at least push replication from main to backup.

However, I’d like to have replication working both ways without keeping both systems on 23.10.1 or earlier… Is there an option to make the system accept a self-signed certificate? It used to just do it…

Thanks!

Error details:

error
CallError
[EFAULT] Unable to connect to remote system: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:992)
remove_circle_outline
More info...
 Error: Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/plugins/keychain.py", line 602, in remote_ssh_semiautomatic_setup
    client = Client(os.path.join(re.sub("^http", "ws", data["url"]), "websocket"))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/client/client.py", line 289, in __init__
    self._ws.connect()
  File "/usr/lib/python3/dist-packages/middlewared/client/client.py", line 72, in connect
    self.socket = connect(self.url, sockopt, proxy_info(), None)[0]
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/websocket/_http.py", line 136, in connect
    sock = _ssl_socket(sock, options.sslopt, hostname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/websocket/_http.py", line 271, in _ssl_socket
    sock = _wrap_sni_socket(sock, sslopt, hostname, check_hostname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/websocket/_http.py", line 247, in _wrap_sni_socket
    return context.wrap_socket(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/ssl.py", line 517, in wrap_socket
    return self.sslsocket_class._create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/ssl.py", line 1075, in _create
    self.do_handshake()
  File "/usr/lib/python3.11/ssl.py", line 1346, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:992)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 201, in call_method
    result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1342, in _call
    return await methodobj(*prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 177, in nf
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 44, in nf
    res = await f(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/keychain_/ssh_connections.py", line 97, in setup_ssh_connection
    resp = await self.middleware.call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1399, in call
    return await self._call(
           ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1353, in _call
    return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1251, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 181, in nf
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 50, in nf
    res = f(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/keychain.py", line 604, in remote_ssh_semiautomatic_setup
    raise CallError(f"Unable to connect to remote system: {e}")
middlewared.service_exception.CallError: [EFAULT] Unable to connect to remote system: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:992)
 

I can’t imagine nobody uses replication between TrueNAS SCALE systems with default certificates.

It seems, based on the older thread in ye olde forums, that something in that regard has changed a while ago (starting somewhere in Cobia).

However, it doesn’t seem to be reflected in the documentation (at least not how to work around it), and there wasn’t any “migration” on existing keypairs etc. when upgrading, as it just stopped working.

Hence, my question is: How to get it working again?

I’d like to have both systems, main and backup, on a current version, and also use pull replication again, with backup pulling from main, so there’s no link from main towards backup.

I’m still interested in how to get replication working again, with default certificates.

It “just worked” previously, for years, but changes made within Cobia somehow prevent this now from working. How can it be re-enabled?

Or should I open an issue report for this?

How did you do your scale upgrade?

From ISO? In which case you lost the /root directory which had authorized_keys in it.

Do you have the ssh key pairs under backup credentials?

If so? Copy the public key for the connection into the users authorized key field on the remote server in their credentials.

When I did my scale upgrades I did in place and had the authorized_keys still.

But in place is no longer recommended.

Hey, thanks for taking the time :slight_smile:

From CORE to SCALE? That was initially in place. However, the upgrade from Bluefish to Cobia failed, and I reinstalled from *.iso, and imported the config backup. After that, everything was done via in place upgrades.

The keypairs were in place ever since FreeNAS I think. Everything worked fine until after 23.10.0.1. Now, my backup server remains on this version, because it is the last that accepts the connection with default certificates (which leaves push from main server working). The error message from above is from creating a completely new keypair as per the documentation. Even when manually moving over the keys, the connection is still refused, because of the self-signed certificate.

I take it that this is still assumed to work, in the grand scheme of things? Then I might have to bite the bullet and try to reinstall both systems from scratch and also redo their entire configuration to get rid of the last CORE remnants that might sleep within those old config file…

Connect via http not https when auto setting up the ssh pair.

Depends on your network security.

I wouldn’t expect network setup to impact this, as both servers are in the same subnet on the same switch, so in that regard communication between the two servers should be smooth sailing.

Using http instead of https also doesn’t work. Therefore, I’ll reinstall both servers with miminum settings (just configure network interfaces and login, no pool import, nothing) just to have a clean sheet and then try to set replication up again, from scratch. This should rule out anything in the config.

I’ll be back :wink:

1 Like

So, I just reinstalled both systems from scratch, adjusted the network settings via IPMI and tried to create the replication task. Both systems are otherwise on default settings.

Using https to set up the SSH connection fails, same error as before, self-signed certificate. That seems like an issue.

However, http worked in so far as that now the connection is established, and the key is written to the replication user on the other system. However, setting up the task doesn’t allow access to the other system’s datasets, and the resulting task fails with authentication error…

Something isn’t quite right…




error
FAILED
[EFAULT] Authentication failed.
remove_circle_outline
More info...
 Error: Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/job.py", line 469, in run
    await self.future
  File "/usr/lib/python3/dist-packages/middlewared/job.py", line 511, in __run_body
    rv = await self.method(*args)
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 187, in nf
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/replication.py", line 484, in run
    await self.middleware.call("zettarepl.run_replication_task", id_, really_run, job)
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1564, in call
    return await self._call(
           ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1428, in _call
    return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1321, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zettarepl.py", line 401, in run_replication_task
    self._run_replication_task_job(f"task_{id_}", job)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zettarepl.py", line 461, in _run_replication_task_job
    raise CallError(make_sentence(message.error))
middlewared.service_exception.CallError: [EFAULT] Authentication failed.
 

Is there any update to this?

More info in the older thread here - cobia-cobia replication w/ default certs - Unable to connect to remote system: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed | TrueNAS Community

I don’t like having to run older versions but when something so integral to the entire platform is broken you really don’t have much choice.