I have an issue where my system will go offline after 2-7 days requiring a hard reboot.
After it happened a couple times, I moved the machine to monitor with a display and keyboard. When it happened again, I saw that the display still showed the standard terminal input, but the system was not responsive to keyboard input. web interface was down. no lights on ethernet port.
I ran memtest86 for a full 24 hour day through 7 1/2 passes, no errors were found.
I looked through the BIOS for any power saving settings or anything else. Nothing stood out as a possible cause.
TrueNAS did not report any issues in its logs.
The system is currently used as a plex server. It has a static IP set in the machine and a reservation made in the router. Other devices using the same technique of static IP + reservation have been running for months without issues. so likely not a network configuration issue. Looking to see if anyone can pinpoint anything obvious before I go look for new RAM or NIC or entire motherboard.
How much RAM to you have installed? It looks like a mix of two types of RAM, two 4GB and two 2GB modules which would equal 12GB of RAM. That should be enough RAM if all you are doing is running a basic Plex. Of course mixing RAM modules can lead to timing issues. I’m not saying that is the case but it is a possibility.
Your testing should be the following:
MemTest86+ and you might want to run it for a while longer, maybe 2 to 3 days.
CPU Stress Test (like Prime95) for at least 4 hours.
If this testing all passed, turn off your Plex and let the system run for maybe 14 days. That is double the time of 7 days. If the system continues to work, now you have a place to start looking, when Plex is running.
Other than the mixing of the RAM, I don’t see any hardware listed that I could point a finger at.
I remember to have read a long thread where troubleshooting a similar issue, the culprit was Plex performing his task, but i can’t find It again at the moment. I will edit if i can found It, but for the moment i really would start as Joe suggest: let the Nas run as minimal and see if the hangs stop
Mmm, mine is behaving the same, and the only thing I have done recently is update from EE to FT. Also I’m not a plex user, though I do run 2 VM’s and one has Emby on it.
Thanks for the initial suggestions. I’ve read the linked posts with similar issues and as such, I’ve stopped the plex app from running, and rebooted the system. I have another machine running a script that will ping the server every 5 minutes so I can easily see if and when it goes down.
The server now has no applications running. only the basic scrub and SMART tests periodically. if nothing definitive comes up, I’ll try more ram testing and CPU stress testing. Otherwise I’ll update in a couple weeks or if it crashes anyway.
It is a mix of ram to form 12GB. it’s also 1.5v ram, but the motherboard documentation specs 1.35v. Theoretically that should make it unstable. but if it’s incompatible then it’s supposed to not even boot. or have crazy errors. if it won’t run as a basic system with no services, I may just buy a 4x4GB 1.35v set and try that. It just feels dumb to spend $30 on DDR3 in 2025 if I can avoid it
This is one of those times where you need to decide if you want to stick with TrueNAS and build a solid running NAS, or piece together a NAS like you have, and many people have. Maybe you cannot afford to buy new or fairly new parts, maybe you are just playing around with it for now. A lot of people can relate.
You can save $30 by just running MemTest86+ for a few weeks. You realize that if it passes, you saved a little money. If it fails, new RAM doesn’t mean it will pass. The memory controller could be failing, CPU, bad solder joint. You name it, It may impact the testing. Power supply as well.
Also, even if you do get new RAM, I would still run 14 days of testing. But if you didn’t want to do that, at least 5 full passes, and cross your fingers it just works.
I wish I had a fast and easy answer but stability issues are tuff to diagnose at times when they are intermittent.
As for the RAM you are using… If your motherboard cannot provide the 1.5V, then it would be best practice to replace the RAM. But I’d test it first.
Then test the crap out of the CPU as well. It interfaces to the RAM so an issue with the CPU could cause problems as well.
I’m quite sure you missreading doc or don’t well remember, care: the 1.35v sticks are DDR3L, and they are quite mandatory only for Intel 6-7 gen, previous gen work reliably with the 1.5v; older XMP sticks (or just very old DDR3) can have 1.65v, and this Is the upper max voltage you don’t want to pass, to avoid damaging the Memory controller.
Also, as far i remember, first gen Intel supports only max 16gb of RAM, in case you think about fullfill the mainboard with 8gb sticks (btw, on AliExpress, those RAM are cheap, ~5€ each). Having 2 sticks 8gb single side can be more reliable then 4 small sticks double side (but i don’t think Is your problem, just a good-to-know of you decide to upgrade)
I would try to install a discrete NIC.
It happened to me multiple times that the NIC on the MoBo first just got junky, and in the end completely died.
Other than that, I would repaste the CPU (that is an OLD CPU) and maybe the Chipsets too, (anything that has a heatsink on it).
And finally, it is possible that the time of this system has simply came.
I have an old, Intel Atom D525 motherboard, that starts up fine, runs fine, but just does similar things after some hours.
Actually, today you can even dumpster dive something much more capable and still reliable system.
To update this forum. I tried shutting down Plex on the server. that did not resolve the issue. I ran prime95 for 11 hours and memtest86 for 48 hours. There were no hardware errors detected.
I bought a new NIC, but before I could properly test it, TrueNAS had a weird issue where it would only recognize 4GB of RAM regardless of how I installed the sticks. figured that must be a RAM issue despite the memtest results. bought a kit of Timetec 4x4 DDR3L and it still only recognized 4GB. BIOS, Ubuntu, and memtest86 all recognized the 16GB properly. reinstalled TrueNAS and it fixed that issue. But now I have the new RAM anyway.
I’m doing a memtest to qualify the new RAM, and then I’ll fire up the server as normal again to see if the issue still happens. once I rule out the new RAM, I’ll start using the new NIC, and hopefully soon I’ll know what caused the issue. I’m still doing one fix at a time to make sure I know what caused the issue. I think it’s the NIC, but I want to be certain. I’ll update again when I have more answers.
Try reseating your CPU.
It is possible that some pins in the socket dont have good enough connection.
(and, if you have, make a thorough wipe with Isopropyl alcohol on the bottom side of the CPU, just to be sure, it is totally clean.)
I’ve got to be honest, this is the first time I’ve ever heard of this kind of problem being related to TrueNAS software. That is such an obscure problem to have. This will stick with me for a while.
Sorry you have some extra RAM now, hopefully you can put it to good use instead of collecting dust. And good on you for running MemTest86+ on the new RAM.
Yeah I searched the forums multiple times while trying to solve it. Could not find anyone else with this issue. some people get confused why their 16GB looks like 15.6GB. But nobody that had 16GB look like 3.4GB. in the console it only reported 4GB installed. reseating, cleaning with alcohol, blowing slots with compressed air did not help. Only reinstalling TrueNAS fixed that one
Alright so With the new RAM and new NIC. the issue still persists. System locks up after a few hours to a few days and stops communicating (At one point it only ran for about 2 hours). Here are the updated ram and NIC specs for those who care:
RAM: 4x4GB of Timetec 75TT16NUL2R8-4G
External NIC: StarTech ST1000SPEXI with Intel 1210 NIC
@Gyula_Masa Your suggestion is up next. I’ve removed the CPU, cleaned all the pads with rubbing alcohol and a Q-tip. I blew out the socket with compressed air and inspected the pins and pads. all looked well. It’s all back together and running with fresh Arctic MX-6 thermal paste. CPU is idling along at 40 degrees C.
If It doesn’t work, I’ll replace the thermal paste on the chipset next.
If none of this works, I’ll start looking for a new motherboard.
Do you have ever run this system with older TN version? System lockup with other os?
Have you tried if, in a clean boot condition (no config upload - no pools/disks connected) the lockup still occurs?
Also, have you tried a clean install on another disk? (Also an USB stick just to test).
I’m honestly out of idea, situation Is pretty strange. The RAM issue you encounter Is also pretty not usual
Well, that limitation is between the 32 bit and 64 bit addressing of the RAM.
(3.4GB of RAM is the 4 GB physical RAM in “computer world”
But that was solved way back in XP times.
And I am 100% sure, that TrueNAS has NO 32 bit version available.
First. I recommend as oxyde did!
Save your current config to a place, where you can access it, when your system is offline.
Take a new SSD or even USB drive with at least 16 GB capacity and install TN on this machine.
Import your configuration file.
Another possibility is to search for the “Allow over 4GB decoding” or similar, depending on the BIOS, and enable it. (I see little to no chance, that this will solve the problem, but you can always try)
Check, if the problem persists.
If does, you should also try to reset the CMOS (remove the battery and short the jumper for 20 sec/overnight) on your MoBo first, and if still there, update your motherboard BIOS.
(corrupted BIOS can also cause such weird behavior.)
If you have access to another machine, you can try to boot your current System to that one and check if the problem is present there or not.
I just recently had similar weird PC…
It is a core i7 2600K system, that sat in my basement for like 4 years unused.
It had such weird behavior, as I wanted to recomission it.
It took me like 2 weeks to fiddle with it, and in the end, I got angry and I bought a full, MoBo, INtel Xeon+16GB RAM kit from Aliexpress for 37 EUR.
Later, I MIGHT investigate further the old system.
BUt, unfortunately, Computer HW dont live forever…
In this thread you had two problems, the original system instability, and the RAM issue. Is the RAM capacity issue resolved? Are you seeing 16GB (ish) in TrueNAS after it bootstraps? No more 4GB? I just want to get the facts correct.
I suggest running a SMART Long/Extended test on your boot drive. Make sure nothing fully is going on there, and the same with the NVMe drive. Post the output of smartctl -a /dev/sd? if you need any assistance decoding the data.
Run a SCRUB on the boot drive. Normally this is done every 7 days, but can’t be too sure.
Does the system lockup with all apps disabled? This may mean you run the system for several days, possibly a week it it seemed to die only when apps were active.
I highly doubt thermal paste will fix this problem, but that is just where my head is right now. Some chipsets get hot! very hot. Add more cooling if you feel it is a heat related problem.
And make sure the case is all closed up, not open, otherwise airflow will not be as expected.
I know you have a lot of advice. Pick the ones you feel are appropriate and if those do not find the issue, move on to the next one. But if something changes it behaviour or you notice it is hanging on a specific thing, take note of it.
With the CPU reseating, the system ran for about 4.5 days before it locked up again. the longest yet honestly.
The RAM issue is not an issue anymore. I resolved that on my own by reinstalling TrueNAS. RAM works fine since I did that a couple weeks ago.
I’ve now replaced the thermal pad on the chipset with fresh thermal paste, and the system is up and running again.
If this fails I will remove all of the drives, and run ubuntu on a completely different drive for a couple weeks (I have an HDD with ubuntu loaded) to isolate if it’s a possible hard drive/TrueNAS issue.
If that fails, I will reset the BIOS and get a new SATA cable.
I have no other systems capable of running this hardware setup. I’ve also never ran a different or older version of TrueNAS. I pieced everything together fresh a few months ago, and I started noticing issues after I had everything set up and first allowed it to run continuously.
I did try wiggling the SATA cables while the system was running to see if old cables could be causing an error. It did not cause an error.
I have ran tests with and without the side panel on the case. It does not seem to make a difference.
The system logs showed nothing. The system seems to freeze before any logs can be made