Intermittent unscheduled system reboots

This has been happening for a long time, I ran memtest for a few days with no issues, and replaced the power supply, but I’m out of ideas. It doesn’t happen regularly and I have no idea what to check or what logs I can look at to help troubleshoot. Currently TrueNAS-13.0-U6.7 but it happened with older versions as well. Anyone have any ideas?

TrueNAS @ truenas.local

New alerts:

  • truenas.local had an unscheduled system reboot. The operating system
    successfully came back online at Tue Apr 22 12:26:09 2025.

Current alerts:

  • truenas.local had an unscheduled system reboot. The operating system
    successfully came back online at Thu Apr 3 12:54:05 2025.

  • truenas.local had an unscheduled system reboot. The operating system
    successfully came back online at Fri Apr 4 13:26:44 2025.

  • truenas.local had an unscheduled system reboot. The operating system
    successfully came back online at Sat Apr 5 08:34:43 2025.

  • truenas.local had an unscheduled system reboot. The operating system
    successfully came back online at Thu Apr 10 06:32:09 2025.

  • truenas.local had an unscheduled system reboot. The operating system
    successfully came back online at Sun Apr 13 04:32:20 2025.

  • truenas.local had an unscheduled system reboot. The operating system
    successfully came back online at Mon Apr 14 19:55:42 2025.

  • truenas.local had an unscheduled system reboot. The operating system
    successfully came back online at Tue Apr 15 00:45:25 2025.

  • truenas.local had an unscheduled system reboot. The operating system
    successfully came back online at Sun Apr 20 04:07:24 2025.

  • truenas.local had an unscheduled system reboot. The operating system
    successfully came back online at Sun Apr 20 23:33:37 2025.

  • truenas.local had an unscheduled system reboot. The operating system
    successfully came back online at Mon Apr 21 21:32:22 2025.

  • truenas.local had an unscheduled system reboot. The operating system
    successfully came back online at Tue Apr 22 12:26:09 2025.

Could it be unstable power delivery?
Do you have a UPS, is it working properly?

Your BIOS likely has a setting that determines what the board should do if it experienced sudden powerloss, a common setting is to power the system back on if that happens.

I suggest you temporarily change that to keep the system powered off and see if it stops “rebooting” and instead just stays off. At least that would show you if it’s power related or not.

Good tip!, I’ll give that a try!

Ah nuts, that wasn’t it. It was already set to Power Off on AC interruption. I’ll keep a monitor hooked up to see if I can catch any messages in the console when it happens.

Your problem could be caused by several things, unfortunately it is all hardware, not software. I’m not saying it couldn’t be software however if that hasn’t changed in a while, I would not suspect it.

You are looking for something that causes the computer to reboot. Here is a list of possible items:

  1. Power supply
  2. Motherboard
  3. Any add-on card
  4. Bad Reset switch (if your computer has one of those)
  5. And of course you have already checked for constant power.

What can you do?

  1. CPU Stress Test for 24 hours. Ensure the test runtime matches (24 hours running the test and the test reports it has been running for 24 hours) to rule out a reboot and automatic restart of the test. This would only be possible if you have the computer configured to restart the test automatically.
  2. If step 1 passes, Memtest68+ for at least 5 FULL passes. This could take 10 hours or 2 days depending on your system.
  3. If your system does reboot during the tests above, disconnect all drives, all add-on cards, run the system with as few items as possible and rerun the failing test. You are trying to isolate the problem to a specific item.

Good luck, these kind of problems can be a real pain to troubleshoot.

I just replaced the power supply so that seems unlikely. Motherboard would be a PITA to replace but I will keep it in mind. The only add-on card is the video card. I’ll try disconnecting the reset switch since that would cost me $0 to test. I replaced the main SSD last summer so that is pretty new. The console currently is listing a bunch of plugin_dispatch_values: Low water mark reached. Dropping 100% of metrics. and I don’t know if that’s related or not.

I haven’t tried a CPU stress test yet, is there one that you’d recommend? I had only ran memtest for a single pass previously, so I’ll try that one again for multiple passes.

Thanks, I appreciate the suggestions!

Is the system on an UPS?

Yes, it’s plugged into a CyberPower 1350VA UPS.

There are two new errors on the screen from around 1am:

aggregation plugin: Unable to read the current rate of "truenas.local/cpu-1/cpu-system".
utils_vl_lookup: The user object callback failed with status 2.

There have been reports with that model ups randomly turning off for no reason and/or during a self test dropping power as well as reporting the battery is good when it is unable to support a load during a self test. So it might be the ups doing a self test that interrupts the power momentairly or responding incorrectly to a power disturbance. You can test by removing the UPS from the circuit and see if the random reboot issues are still there. You can test the ups battery by unplugging the ups from power while under a normal load. It will either support the load or it won’t.

Update: I disconnected the reset switch, ran memtest for over 24 hours, it passed the tests 14 times with no errors. I ran the different CPU tests on the Ultimate Boot CD and they all ran fine with no issues or reboots. TrueNAS still had an unexpected reboot so the next thing I’m going to check is the video card, then I’ll try it plugged in directly and not through the UPS. After that the only thing left is the motherboard and hard drives. I’m really hoping it isn’t the motherboard since replacing that would probably be a pain.

Are there no other logs that would get written to for troubleshooting purposes?

Thanks again for your help, folks! I have ruled out the video card and UPS power outlet. There are no other add-ons left so that just leaves me with the motherboard. :weary:

Can I take a backup of my current setup and restore it onto new hardware? Or do I have to replace the motherboard with the same one and make sure that I plug everything back into the same ports, etc? Like, I’d love to take this opportunity to get a newer motherboard/CPU/RAM but if I have to setup everything from scratch it’s not really worth it.

You can just move all the disks to a new machine and just boot it. Does not have to be the same at all. It will “just work”. I’ve moved my TrueNAS installation across at least four servers already :slight_smile:

Edit. Btw consider not using consumer grade anything, especially motherboards and/or processor and ram. You can get old enterprise servers at a fraction of the cost, away from the left ridge of the bathtub curve, that will work forever.

2 Likes

Just to expand on the buy-a-used-server idea and why it’s good.

Businesses often lease hardware. It’s cheaper short term, and they get on a conveyor belt where the Dell truck swings by every three years and swaps out their hardware, including the OS so they keep up with Microsoft. Usually this isn’t physically in the business but at a co-location or datacenter with strongly controlled air conditioning with the ac blowing upwards from a floor tile under the racks. Cold and dry, with the most conditioned, clean, persisitent power money can buy. It’s very intentionally the best conditions for computer hardware regardless of how hard the machines are working.

This adds up to off-lease servers that have been pampered their entire lives, coming up for sale each refresh cycle. Some companies just auction their stuff off (by the pallet) or hire a third party to do it.

By the time it gets to you, it has been documented, sorted, probably function tested and obviously has survived x amount of time so aside from being DOA, there is a lot of life left in the box. There are lots of cottage industries dedicated to collecting and reselling this stuff. You end up with a pretty safe purchase for pennies on the dollar, built to a specification for reliablilty that consumer goods simply don’t need.

Bottom line, vendors/resellers may vary but in general it’s a smart way to get some awesome hardware dirt cheap. I just haven’t been convinced to buy used storage (cost isn’t low enough, barely trust alot of it in new condition), but others are more adventurous and have had great luck with entire (populated) storage arrays. I don’t know much about left ridges of bathtub curves, but I know some enterprise hardware. :slight_smile:

2 Likes

To follow up on this excellent explanation - this is what I was referring to Bathtub curve - Wikipedia, precisely as you described:

I’ve been buying used storage for years now – companies like gohardware offer 5 year no questions asked warranty, and even if they did not – it would not have mattered, cost is low enough to self-insure: it’s about $10-$12/TB. While my sample size is small – I have had noticeably fewer failures, due what I think is the same reason – avoiding early failures: these used disk have either survived, or have failed, repaired/re-certified, and tested.

That’s the point – the industry realized long ago that redundant cluster of flaky devices ends up being more reliable and cheaper than any single device can achieve. Disks are expected to fail. And their failure is harmless, just barely inconvenient. With storage, only cost matters. Of course, we are talking about used enterprised disks, that have been operating in ideal conditions, and built for much higher reliability standards compared to consumer stuff that is built to price, often with parts that failed to pass the stricter reliability requirements and tolerances (called “binning”)

2 Likes

And you’re the guy I was talking about that has no trouble with it. :slight_smile:

1 Like

I used to find some good ebay deals here https://www.labgopher.com/ but it looks broken now.

Has anyone ordered from https://www.theserverstore.com/ or https://newserverlife.com/ before? They seem to have similar stuff.

2 Likes

I bought a lot of gear from https://unixsurplus.com; they have a warehose near to where I live, so I don’t have to pay for shipping. They are also very flexible – they can assemble you a server for your target price and needs. Most my drives are from goharddrive and serverpartdeals.

I’m not very familiar with those you linked to, but thank you for new leads :slight_smile:

2 Likes