Snapshot-Replication Site-2-Site slow

Hello everyone,

I am completely new to the TrueNAS world.

We have replaced our old QNAP at work with two identical TrueNAS systems on Super Micro hardware.

Dragonfish-24.04.1.1
AMD EPYC 7313P 16-Core Processor
128GB ECC RAM
Kioxia SAS NVMe and so on

We use the TrueNAS at the server location as a Veeam repository and would like to copy the snapshots to the second system, which is located in an external data center.

A Fortigate is used on our side and an OPNsense on the remote side. Both are symmetrically connected via an IPsec tunnel and 1 Gbit/s each.

We can now push the data from NAS A to NAS B, but are pretty much limited to 50 Mbit/s. I’ve been messing around with SSH and SSH + Netcat for days now, but can’t get above 50 Mbit/s. I have checked the firewalls umpteen times. Everything that should go through, goes through. Traffic shapers are not active.

The iperf test also brings a maximum of 50 Mbit/s from A to B. From B to A, however, a good 960 Mbit/s.

I am at my wit’s end. Perhaps one of you has a tip on how I can debug the problem.

This is your issue.

You need to fix this first.

I’m sorry, not even sure where to start.

Seems to be a ISP bandwidth limitation on the upload side.
Are you on Cable?
Try running Speedtest:

Thank you for your answers.

The problem is definitely somehow along the way and is not a general one. Our site also has an Enterprise grade line. All other services are working at full speed. If I let the two old QNAP replicate via rsync, then they also use the full bandwidth. Only the TrueNAS don’t seem to get along with each other.

I will test this up and down again tomorrow and call in another service provider to check the network where the bottleneck is.

Yeah, that your iperf3 showed 50 from A to B says there is an issue there. Could be an upstream router or any number of things. Until the iperf3 can be fixed, you will not beat it.

I wish I only had 50 up. :grinning:

1 Like

Hello everyone,

First of all, thank you for all the answers.

We spent 8 hours (!) yesterday with an external network technician testing everything up and down. The result is probably that the TrueNAS only ever sets up a TCP stream, just like our iperf tests. If we add the parameter -P to iperf and define e.g. 50 streams, I get over 900 Mbit/s on the WAN route.

Our QNAPs used to do an ordinary rsync and also utilised almost the entire bandwidth. So I don’t currently understand how I can manage this with the two TrueNAS.

What is the RTT between the end points?

But don’t forget that A to B without parallel streams is 50, but B to A is 960 according to the original post. So, there is still a difference and some sort of issue. I agree to check the trip time, I suspect that will better define the issue. I don’t think you can make zfs use multiple streams. That being said, after the initial send completes, I suspect that replication will perform faster than rsync due to how it works.

Another thing I would wonder is the performance of the receiving side machine, i.e., is it performing the fastest it can and perhaps there is a limit there on the receiving zfs machine.

If it turns out the trip time is quite large compared to the other direction, there are some potential benefits via adjusting the tcp window scaling and size. RFC1323.

With the RTT we can work out the BDP.

The max TCP window size needs to be configured to be about 2x the BDP.

50 streams reduces the bandwidth by 50x thus reducing the BDP by 50x, thus possibly allowing maximum bandwidth to be achieved.

TCP can’t scale the window past the maximum window size, and you need a very large window to deal with long fat pipes…

So… what’s the RTT :wink:

1 Like

Thanks again for the answers,

the RTT over IPsec VPN from source to destination is <8ms.
From Destination to Source also <8ms.

Another thing I would wonder is the performance of the receiving side machine, i.e., is it performing the fastest it can and perhaps there is a limit there on the receiving zfs machine.

I doubt that, as both systems are exactly the same.

As I work with @Hunduster, i just wanted to add some Information.

We spend another 2 hours with external network techicans and observed the following:
iPerf with only one TCP stream get’s us about 200-400Mbit’s Upload to and from both sides. Adding TCP Streams, increases the Bandwith to 980Mbit’s with 6 Streams.

With one Stream it’s still about 4-6 Times as fast, as the ZFS replication.
Via an iPSEC Tunnel, or via DNAT.

Is anyone aware of a way to cap the maximum bandwith while replication in TrueNAS? Maybe via CLI or somethin the like.