Replication Failed - Unable to connect to port 22

Hi,

I’ve got a Problem with the replication from my main TrueNAS to my backup TrueNAS.
This worked fine until I upgraded to TrueNAS Scale Dragonfish and as well on Dragonfish for small delta-replications.
But now, I tried to replicate a dataset freshly.

Unbenannt

I get this message after about 3h replication and the target (backup) TrueNAS is not responding anymore. (even via Web Interface)
It ransfered a couple of TB up until then but didn’t finish.
After a hard reboot of the target server, the system works fine again.
But during each replication try, it crashes again.

I already installed the target TrueNAS fresh and restored the config. But didn’t help.

Can somebody help me in this matter?

Thanks a lot!

BR
Julian

Just as a side note:
I’ve reduced the throughput of the replication to around 256 MB/s (around 2 Gbit/s). This apparently helps as the replication is now running for 24 hours straight without crashing the target server.

May this be caused by some hardware issue? Memory for example - which cannot handle the throughput?!

Thanks again for any pointers :slight_smile:

What are the hardware specs of the source and target?

Hi,

A. Source:

  • 8 Core / 16 Thread Intel Xeon E-2288G
  • 128 GB RAM ECC
    (4x Kingston Server Premier - DDR4 - 32 GB - DIMM 288-pin - 2666 MHz / PC4-21300 - unbuffered)
  • ASUS P11C-X Mainboard
  • Asus 10gbe Ethernet Card
  • Lenovo DCG ThinkSystem 430-16i SAS/SATA HBA
  • Dragonfish-24.04.0
  1. Pool:
  • 6 Wide á 12 TB (WD120EDAZ)
  • RAIDZ2
  1. Pool (the one to be backed up):
  • 5 Wide á 16 TB (ST16000NM001G)
  • RAIDZ2
  • Used Space: 27.44 TiB

B. Target:

  • 4 Core / 4 Thread Intel Xeon E3-1220 v5
  • 32GB RAM ECC (2x Kingston 16 GB PC4-2133P ECC // KVR21E15D8/16 )
  • ASUS P10S-V/4L Mainboard
  • Asus 10gbe Ethernet Card
  • LSI Logic Controller Karte lsi00301 SAS 9207-8i
  • Dragonfish-24.04.0

Pool:

  • 8 Wide á 12 TB (WDC_WD120EFAX)
  • RAIDZ1

Btw: The replication crashed again - but replicated way more Data than before.
Before it broke after 2 to 5 TiB - now it reached 15 TiB.

Thanks an BR
Julian

Have you disabled swap and lru_gen on both sides?

Do you know what chipset the asus 10gbe card is using?

I don’t think you described the source and destination pools.

Hi,

Swap is enabled on both sides.
lru_gen is enabled on both sides as well.

The Card seems to be based on Aquantia AQtion AQC107.

Sorry - of course:
Source:

  1. Pool:
  • 6 Wide á 12 TB (WD120EDAZ)
  • RAIDZ2
  1. Pool (the one to be backed up):
  • 5 Wide á 16 TB (ST16000NM001G)
  • RAIDZ2

Target:
Pool:

  • 8 Wide á 12 TB (WDC_WD120EFAX)
  • RAIDZ1

No Erros in SMART Scans.

Can you look at the logs on the destination server, particularly /var/log/messages , just before it crashes? I suspect the clues we need will be there.

1 Like

Then I suggest disabling lru_gen at a minimum on both sides if running dragonfish.

The issue is most likely that lru_gen is conflicting with the arc during the replication causing a chronic swap situation which causes unresponsiveness.

Hi,

I’ve set both - swap and lru_gen - to disabled and started the replication again.
In case it crashes once more - I’ll have a look into the “message” logs and see what’s mentioned in there.

Thanks a lot so far!

2 Likes

Hi,

so - as mentioned, I tried the replication again with those settings changed (disabled).
It seems that solved it - the replication went through without any interruption.

May I ask - as i couln’t get useful info in my own search - what does the disabling of “lru_gen” do? Do I need to change it back in the next release or so?!

Thanks again for your helpful input!!

BR
Julian

1 Like

It’s supposed to be some clever caching mechanism. It assumes all free memory is for its own use. Which conflicts with ZFS Arc.

It was enabled by default in kernel 6.6 by upstream. Hence the issue.

It will be disabled by default in Dragonfish 24.04.1, so there should be no need to disable swap or lru_gen when the next update hits.

Perfect - thanks a lot for the explanation!

1 Like