I’ve upgraded to Dragonfish where the lru_gen setting is 0 by default.
So this solution might not apply in this case.
I’ve got the same behaviour. I needed to reinitialize the replication - so do a full backup/replication to the backup NAS.
After a while (2 - 4 TB) already transfered, the connection breaks and the target (backup) TrueNAS Instance is not reachable anymore (e. via Web GUI).
After a reboot it works fine again.
This happens every time i try to do a full backup. The past months doing delta runs worked fine - most likely due to the lower amount of data to be transfered.
I’ve got an ASUS XG-C100C network adapter in both servers which proved itself reliable over time.
I even upgraded to Electric Eel Beta 1 on the backup truenas to see if this makes any difference. But same here.
Do you have any Idea what could be the reason for those crashes and how I can fix this issue?
The data to be transfered is about 65 TB (3 Datasets in source → 1 in target system)
The memory is not under pressure - it’s mostly free on the target system.
Highest used memory is around 4 GB and cached around 3 GB out of 32 GB total)
actually - no - the very small (500 GB) Dataset works fine of course due to the lower amount of data. The other two not - it doesn’t crash at a specific time or amount of data of a specific dataset. Feels kind of random.
For testing purposes, I switched to the onboard 1GbE Adapter. This replicated smoothly for the past couple of days.
But I don’t think the network adapter itself is the problem. But am not sure either.
It does sound like a NIC issue. Maybe a driver or maybe it is too hot? You should rule out the cooling as that should be simple. Can you roll back to the working version to verify it still works? That would verify or eliminate the NIC or driver.
I think it’s related to the NICs temperature. Saw that the thermal pad on top of the chipset wasnt properly set up.
Aftermodeling a bit it seems to work properly now!