I am having an issue where truenas scale will fall over after a large NFS or SMB transfer from another NAS on the same network. I am using a Intel 82576 v2 NIC via PCIE x1 (the onboard realtek is disabled). This same thing was happening with the built in Realtek nic so i bought an intel card per the recommendations of most of the truenas community.
I started a ~320GB transfer last night before going to sleep, woke up and can’t login to truenas. Checked the network client devices panel on my Ubuiquiti UDMP and the SCALE machine is nowhere to be seen. I’m not sure if I am doing something wrong or if my install is borked in some way.
System Specs
Truenas SCALE 24.04.2.2
intel i7-8700
ASROCK H370M-HDV
32GB DDR4 3200
Nvidia Quadro P2000
Intel 82576 v2 Dual NIC PCIE x1
3x 2TB in RAIDZ1
The Transfer was coming from a Synology DS220+ with 2x 16TB. I also have a single port Intel 82573 v2 card that i can swap in.
Hi, hard to think that the NIC can’t be the cause (either Is a raccomended Intel one).
Maybe Is just an overheating?
Did you try to totally disable the integrated one from BIOS, and check eventually if something can be adjusted on the installed one?
Did you hit some logs or message?
Thank you for the reply, I’ll try digging through the bios and doing as you suggested. I pulled down a debug log package and am sifting through it but i’m relatively new to truenas and not sure exactly wha to look for.
Overheating is something i hadn’t considered, the NIC does have a heat sink on it, and i have good airflow. Maybe i should repaste the heatsink.
It does also look like an update to 24.04.2.3 happened overnight as well, but i only recall downloading and not applying the update. Looking at the transferred files, the NIC went down with ~ 0.27GB left in a 320.5GB transfer (maybe you are right and it did overheat)
If its happening on both the realtek nic and the intel nic I would start with replacing the ethernet cable and looking at the switch / other end of the network.
Thanks, I’ll try switching cables as well. I’m using new ubiquiti branded cables currently, and the truenas system is plugged directly in to the Dream Machine Pro, i can try using another port on the console.
This at least gives me a better idea of when it went down
updating this, I implemented the suggestions in this thread, but it looks like SMB might be the cause of the issue, with large transfers it seems to occasioanlly crash and take the system with it, but i was able to do a 1.47TB NFS transfer without a similar issue.
I think I will do that, thank you for the suggestions and assistance. I am glad that its not overheating though, did not want to de-rack the hardware and redo the heatsink on the NIC
reviving this again due to the fact that truenas is still doing this after about ~2 days of uptime. really not sure whats going on here, woke up and had to power cycle my truenas machine to access it. gonna have to pull logs for real this time and dig
I started to experience some other issues including random restarts and short uptimes.
Mystery solved, folks. We can pack it in. Failing CPU was the cause, confirmed by a failed burn in test nowhere near its thermal limits. Swapped the i7-8700 for an i5-9400 i had laying around and its been smooth sailing.
Glad you solve. Honestly from the sympthone you described, all could think except a failing CPU.
If you have another motherboard laying around as the other CPU, make another test on the i7… I have seen CPU fail hard on certain Hw combination but working rock solid on another (just for be sure to don’t throw an usable CPU)
can you be check if they are running in some kind of oc or overvolt?
try setting them fixed to 2666, better 2400mhz. If i well remember intel 8/9 gen run max at 2666, right?
I did set them in the BIOS to 2666, but I can drop it to 2400 and check the voltage and see if that helps. AFAIK I do not have any OC’ing going on on the cpu or ram (no XMP)
yeah don’t wanna assume that’s the problem, more a “don’t understimate anything”. In my little experience sometimes the DRAM left to auto in the bios do fancy stuff, but if you already set them to the max supported frequency, you should be fine.
For the same assumption, entering into memtest you can check the effective latency, if sometimes they are lower than expected.
If memtest not fail, but you continue have those problems, you really should start think that is the mainboard the problem (having another one to test will be great for you, without spending money for test components)