Network dying

Every few days my truenas box’s network just dies for a few minutes and all my apps and the webui become inaccessible. Ex:

The last time this happened I was trying to access a folder through SMB and during that it just disconnected.

I am not sure whether this is software or hardware related. I swapped out my switch and the behavior remains. Other devices do not have this issue.

What is the best way to diagnose this? I am using my motherboard’s 2.5Gb port.
this is new behavior, not sure if related to dragonfish but didn’t have this a few months ago.

Thanks for your help

Your hardware and software specs are missing. Can you list them?
2.5GbE is prone to errors. That might be the cause.
Try running it in 1 GbE mode.

Motherboard: Asrock B760M Pro RS/D4 (Dragon LAN which was acquired by Intel I believe).
CPU:13th Gen Intel(R) Core™ i5-13500
Memory:126 GiB DDR4 non-ecc
OS Version:TrueNAS-SCALE-24.04.1.1

Switch: Mikrotik CRS310-8G+2S+

Don’t want to use 1Gbps as I saturate that link all the time, need some headway for local transfers on top of my 1Gbps fiber connection.

Can confirm that I didn’t have issues previously with 1Gbps

Other option is to go PCIE 10Gbps. My switch has SFP 10Gbps support, but it’s a bit overkill.

That would be a very good option.

There’s no such thing, sky’s the limit :slight_smile:
I’m using 2x10Gbit links with SMB multipath to get an effective 20Gbit/s pipe. It’s not enough, it won’t be enough until I can quantum tunnel the packets from one host to another.

10GbE cards like the X540 (RJ45) or X520 (SFP+) are very cheap to find second hand nowadays, and unless you get a knockoff they are very resilient.

1 Like

Hey @cmplieger

Looks like your onboard 2.5G is a Realtek-based card. I’m also not seeing your graph going above the 1G line so it’s possible that the bottleneck is elsewhere here.

Can you describe your pool layout, disks, and crucially whether or not you’re using deduplication?

Hey @HoneyBadger it’s not going above 1GBps because that is the limit of my fibre connection :slight_smile:
I was not performing any other transfers at this time.

Disk are as follows

Pool: apps
2x disks in a mirror, NVME SSD

Pool : temp downloads
1x sata ssd

Pool: data
2x RAIDz1 vdevs
VDEV 1: 5x 18TB SATA HDD
VDEV 2: 5x 20TB SATA HDD

I am only using data vdevs, so I have 0 dedup vdevs.

@essinghigh @dan bought a x520 and SFP+ cable and going to try it out. I have no other devices on the network able to do above 1GBps for now, so it’s total overkill but if reliability is fixed i will be happy.

As my EE friend likes to say, it’s not overkill, it’s just the right amount of kill.

1 Like

Ah, so the graph demonstrated a WAN download then over the fibre?

Understood - but are you using the deduplication feature at all?

@HoneyBadger no, none of my datasets use ZFS dedup.

The graph does indeed show a WAN download.
I suspect is a driver thing. But I’m not sure how to diagnose or see logs for that.

after a week running on 10Gbps can confirm I get no more network issues. thanks for all your suggestions

got a x520 + random amazon SFP+ copper cable, works great out of the box.

2 Likes