Network Connection Drops Briefly When Cloud Sync Tasks Run

Hi everyone,

I’m experiencing a issue with TrueNAS SCALE (version Dragonfish-24.04.2.5). I have 9 cloud sync tasks configured, 6 of which run hourly (e.g., at the top of the hour: 12 AM, 1 AM, and so on).

The problem occurs a few minutes after the beginning of each hour, shortly after the cloud sync tasks start running. During this time, the network connection briefly drops, causing all my virtual machines to shut down. I have to restart them manually every time. Additionally, all k3s applications also restart due to the connection interruption.

I tested this by disabling all cloud sync tasks, and the issue did not occur. This leads me to believe the cloud sync tasks are the root cause. All tasks use Yandex via WebDAV and are not particularly complex.

I’ve checked the system and kernel logs, but I couldn’t find anything that explains why this is happening.

Here are my questions:

  1. Is this behavior normal? If anyone else uses hourly cloud sync tasks, could you test and let me know if you experience similar behavior?
  2. If not, what could be causing it?
  3. How can I analyze and troubleshoot the root cause?
  4. What steps can I take to resolve this issue?

If needed, I’m happy to provide log files or generate a debug report to help with the analysis.

Any help or suggestions would be greatly appreciated!

Thanks in advance!

Here is an excerpt showing how such a cloud sync job is configured:

sudo midclt call cloudsync.query | jq

“direction”: “PUSH”,
“transfer_mode”: “SYNC”,
“encryption”: false,
“filename_encryption”: false,
“encryption_password”: “”,
“encryption_salt”: “”,
“create_empty_src_dirs”: false,
“follow_symlinks”: false,
“credentials”: {
“id”: 5,
“name”: “Yandex Disk "usenex"”,
“provider”: “YANDEX”,
“attributes”: {
“client_id”: “xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx”,
“client_secret”: “xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx”,
“token”: “{"access_token":"xxxxxxxxxxxxx","token_type":"OAuth","refresh_token":"1:xxxxxxxxxxxxx","expiry":"2026-01-15T15:48:58.191874Z"}”
}
},
“schedule”: {
“minute”: “0”,
“hour”: “",
“dom”: "
”,
“month”: “",
“dow”: "

},
“locked”: false
},
{
“id”: 7,
“description”: “PUSH SYNC \\TRUENAS\Usenet\YouTube "xxxxxxxx"”,
“path”: “/mnt/HDD_RAID5-DataStorage/Usenet/YouTube”,
“attributes”: {
“folder”: “/Usenet/YouTube”,
“fast_list”: false
},
“pre_script”: “”,
“post_script”: “”,
“snapshot”: false,
“bwlimit”: ,
“include”: ,
“exclude”: ,
“transfers”: 16,
“args”: “”,
“enabled”: false,
“job”: {
“id”: null,
“method”: “cloudsync.sync”,
“arguments”: [
7
],
“transient”: false,
“description”: null,
“abortable”: true,
“logs_path”: null,
“logs_excerpt”: “<6>INFO : \nTransferred: \t 0 B / 0 B, -, 0 B/s, ETA -\nElapsed time: 1.5s\n\n<6>INFO : There was nothing to transfer\n”,
“progress”: {
“percent”: 100,
“description”: “checks: 719 / 719”,
“extra”: null
},
“result”: null,
“error”: null,
“exception”: null,
“exc_info”: null,
“state”: “SUCCESS”,
“time_started”: {
“$date”: 1737568801000
},
“time_finished”: {
“$date”: 1737568806000
},
“credentials”: {
“type”: “UNIX_SOCKET”,
“data”: {
“username”: “root”
}
}
},

I still haven’t been able to solve the problem. Does anyone have any tips or suggestions to solve it?

Why not upgrade SCALE?

EDIT: I forgot which version of SCALE k3s was removed?

I’ve tried that as well by now, but it didn’t make any difference. Clearly, the issue isn’t with Kubernetes. Do you have any other suggestions or ideas on how to solve the problem? At this point, it seems to me like it might be a bug.

What is the hardware?

With more information, someone else might be able to spot a potential culprit or mixture of factors.

I’m using a UGREEN 8-Bay NAS with:

  • Intel Core i5 (12th Gen, 10 cores)
  • 64 GB DDR5 RAM (upgraded)
  • OS: TrueNAS SCALE version Dragonfish-24.04.2.5

Everything else is working flawlessly and very well. The log entries show no errors; I just checked them again.

What network card?

Do you notice the CPU usage spiking or running elevated before the slowdowns?

Network Card Details:

  • Model: Aquantia AQC107 (Device ID: 04c0, Revision: 03)
  • Driver in Use: atlantic
  • Kernel Modules: atlantic
  • Supported Speeds: Up to 10 Gbit/s
  • PCI Addresses:
    • 58:00.0 (first network interface)
    • 59:00.0 (second network interface)

I haven’t noticed any significant CPU spikes or elevated usage.

Let me know if you need additional details or logs!

Unfortunately, I’m still stuck with this issue. Nothing I’ve tried so far has worked, and I still can’t reproduce the problem on demand or identify any pattern for when it happens. I’m really running out of ideas here.

Is there a way to provide logs or a stack trace directly to the developers, or would someone from the team be willing to take a look remotely? I can make remote access available if that would help.

Thanks in advance for any advice or support!