Truenas SCALE falling over after large transfers (with intel nic)

I am having an issue where truenas scale will fall over after a large NFS or SMB transfer from another NAS on the same network. I am using a Intel 82576 v2 NIC via PCIE x1 (the onboard realtek is disabled). This same thing was happening with the built in Realtek nic so i bought an intel card per the recommendations of most of the truenas community.

I started a ~320GB transfer last night before going to sleep, woke up and can’t login to truenas. Checked the network client devices panel on my Ubuiquiti UDMP and the SCALE machine is nowhere to be seen. I’m not sure if I am doing something wrong or if my install is borked in some way.

System Specs

  • Truenas SCALE 24.04.2.2
  • intel i7-8700
  • ASROCK H370M-HDV
  • 32GB DDR4 3200
  • Nvidia Quadro P2000
  • Intel 82576 v2 Dual NIC PCIE x1
  • 3x 2TB in RAIDZ1

The Transfer was coming from a Synology DS220+ with 2x 16TB. I also have a single port Intel 82573 v2 card that i can swap in.

Hi, hard to think that the NIC can’t be the cause (either Is a raccomended Intel one).
Maybe Is just an overheating?
Did you try to totally disable the integrated one from BIOS, and check eventually if something can be adjusted on the installed one?
Did you hit some logs or message?

Thank you for the reply, I’ll try digging through the bios and doing as you suggested. I pulled down a debug log package and am sifting through it but i’m relatively new to truenas and not sure exactly wha to look for.

Overheating is something i hadn’t considered, the NIC does have a heat sink on it, and i have good airflow. Maybe i should repaste the heatsink.

It does also look like an update to 24.04.2.3 happened overnight as well, but i only recall downloading and not applying the update. Looking at the transferred files, the NIC went down with ~ 0.27GB left in a 320.5GB transfer (maybe you are right and it did overheat)

If its happening on both the realtek nic and the intel nic I would start with replacing the ethernet cable and looking at the switch / other end of the network.

1 Like

Thanks, I’ll try switching cables as well. I’m using new ubiquiti branded cables currently, and the truenas system is plugged directly in to the Dream Machine Pro, i can try using another port on the console.

This at least gives me a better idea of when it went down
Screenshot 2024-10-12 at 10.21.12 AM

updating this, I implemented the suggestions in this thread, but it looks like SMB might be the cause of the issue, with large transfers it seems to occasioanlly crash and take the system with it, but i was able to do a 1.47TB NFS transfer without a similar issue.

Did you hit something interesting in more /var/log/messages from cli?
Maybe opening a ticket attaching the debug will be the better choice here

1 Like

I think I will do that, thank you for the suggestions and assistance. I am glad that its not overheating though, did not want to de-rack the hardware and redo the heatsink on the NIC

1 Like

reviving this again due to the fact that truenas is still doing this after about ~2 days of uptime. really not sure whats going on here, woke up and had to power cycle my truenas machine to access it. gonna have to pull logs for real this time and dig

I don’t see a boot device in this list. What are you booting from?

a samsung 870 evo 500GB SSD, to add the machine was mostly idle overnight, only small time machine backups every hour

I started to experience some other issues including random restarts and short uptimes.

Mystery solved, folks. We can pack it in. Failing CPU was the cause, confirmed by a failed burn in test nowhere near its thermal limits. Swapped the i7-8700 for an i5-9400 i had laying around and its been smooth sailing.

5 Likes

Glad you solve. Honestly from the sympthone you described, all could think except a failing CPU.
If you have another motherboard laying around as the other CPU, make another test on the i7… I have seen CPU fail hard on certain Hw combination but working rock solid on another (just for be sure to don’t throw an usable CPU)

1 Like

Just whack it on ebay as mint condition as other do… JJ

1 Like

Love waking up and seeing this in my logs, followed by a reboot

2024-11-24T19:55:34-05:00 truenas kernel - - - mce: [Hardware Error]: CPU 4: Machine Check: 0 Bank 3: be00000000800400

2024-11-24T19:55:34-05:00 truenas kernel - - - mce: [Hardware Error]: TSC 0 ADDR 7fb987e46522 MISC 7fb987e46522

2024-11-24T19:55:34-05:00 truenas kernel - - - mce: [Hardware Error]: PROCESSOR 0:906ed TIME 1732496114 SOCKET 0 APIC 8 microcode fc

you really should run a full memtest asap.
And hope you don’t have throw your old CPU :upside_down_face:

I ran one not long ago and it fully passed, time for another, and maybe reseating my sticks and CPU my luck with this machine is very poor lol :melting_face::melting_face::melting_face:

 DDR4 3200

can you be check if they are running in some kind of oc or overvolt?
try setting them fixed to 2666, better 2400mhz. If i well remember intel 8/9 gen run max at 2666, right?

I did set them in the BIOS to 2666, but I can drop it to 2400 and check the voltage and see if that helps. AFAIK I do not have any OC’ing going on on the cpu or ram (no XMP)

yeah don’t wanna assume that’s the problem, more a “don’t understimate anything”. In my little experience sometimes the DRAM left to auto in the bios do fancy stuff, but if you already set them to the max supported frequency, you should be fine.
For the same assumption, entering into memtest you can check the effective latency, if sometimes they are lower than expected.
If memtest not fail, but you continue have those problems, you really should start think that is the mainboard the problem (having another one to test will be great for you, without spending money for test components)