TrueNAS Scale 24.10.2.2 rebooting every few minutes

Hi,

I’ve been tied up with home repair projects and am not sure how long this has been going on, but I noticed today that my machine is turning off and rebooting every 30 minutes or so.

I’d appreciate any advice on what to look at first to diagnose what’s going on and troubleshoot it.

Start with checking logs.

If you can find out if the system is cleanly shutting down, that helps. If it appears the system is suddenly rebooting, perform a few tests to validate your hardware is stable. Memtest86+ and Prime95 type cpu stress test.

Since it appears you can see when the system is rebooting, examine that graph report to see when these started and then think back as to if anything physical or software changes occured. Maybe you moved it or installed some software or reconfigured the system.

2 Likes

By any chance, as a side effect of your home repair work, has your NAS been put on a circuit with any large power-draw appliances and without a sufficient UPS? I know the “every 30 minutes or so” sounds suspicious, but the first thing that came to mind was a freezer/fridge where the compressor is cutting in every approx half hour.

5 Likes

I would look at the servers power supply system especially if the temp of the room where the server is has increased. Lots of robots on the production floor at work would start rebooting periodically during production at the beginning of every summer if a power supply was bad.

1 Like

Also attach a monitor and place a smartphone or a camera in front of that to record a video …

4 Likes

Shutdown the NAS, disconnect all disks related to your TrueNAS system.

Run memtest86 if you haven’t done it already. If that passed, continue with OCCT, see below.

Attach an SSD you can spare and install Windows 10/11 on it. (just for testing purpose…) → Update: OCCT is available on Linux since April 2025 :muscle:
Install OCCT free edition.
Start a CPU stress test on all cores.
If no errors, start CPU stress test using core cycling. (this will let the CPU boost to max frequency on that core which may trigger hardware error).

OCCT helped me to diagnose weird issues in the past. For example, my system was randomly slowing down and there was weird stuttering. OCCT showed after a short test that my CPU is throwing WHEA hardware errors… That wasn’t fun :slight_smile:

Additonally, if you have a voltmeter, you may measure the 12V and 5V rails of your PSU while doing the stress tests, and then with all disks connected.

2 Likes

Thank you all for the advice!

Zooming out to a week shows that this rebooting loop only began about a day ago. There wasn’t any change to the system that I can think of, and it’s been on this same circuit + UPS for 6 months now. The big gap is when I noticed the loop and turned it off overnight so I could sleep.

Looking at journalctl -b-1 nothing is popping out to me. There’s plenty of errors, but right near the cutoff at the bottom looks normal other than no indication of a safe/regular shutdown.

journalctl.txt (310.4 KB)

I’m going to boot the system into memtest and run that overnight.

1 Like

While I have not used it, OCCT is available in Linux and does not need to be installed. The site states to just make the file executable. I would just grab a Live Linux bootable ISO, disconnect all drives, boot into Linux, copy the file into /tmp, make it executable and run it. I can’t say this will work as I am not near a server this week. The wife said I could not take it on vacation with us. Drats.

Anyway, I don’t see the need to install Windoze is my point since a Linux version exists. Or use The Ultimate Boot CD (UBCD) which has both the memtest and cpu stress tests on it. I love that CD (now in flash drive of course).

2 Likes

Disconnect the UPS temporarily to see if anything changes.

2 Likes

Cool, Linux support in OCCT is new (since April 2025).

So in the light of this, no reason to struggle with Windows. :sunglasses:

1 Like

Don’t forget the practical things. When was the last time the computer truenas is installed on was maintained? Has your cooling clogged up with dust? Are your cooling fans clogged up or no longer spinning? Has the temperature increased in the space where the truenas server is? Start with the basics before going straight to Operating System…

1 Like

Again, thanks to everyone for the input.

I moved the system to another area of the house to be near a monitor. Different circuit and UPS. Loaded up Memtest and the temperature immediately went pretty high:

So I took things apart and discovered what I hoped was the culprit of the boot loops:

With the whole machine cleaned out of dust, Memtest ran 20 degrees cooler and finished:

I tried running the normal Truenas OS again just to see, and it did the first reboot so I started on the next phase of testing with OCCT. I booted to a live distro and while looking up how to temporarily install occt I heard the system reboot…

So memtest ran overnight no problem, but once booted to a real OS the machine will crash. Weird. I skipped to the advice to use a camera and captured the live OS reboot. No text or indication at all appeared:

1 Like

Some kind of Dell HW Watchdog being overly aggressive?
It doesn’t look like it comes with iDRAC or similar BMC features, but maybe there’s something related to a watchdog in the BIOS that you can tinker with?

1 Like

OK, I perused the BIOS settings, reset them to defaults and reconfigured. I tried running the system in NUMA and then SMP since Memtest86 shows it uses SMP… Then I tried removing the network cable since the memory test was done without that plugged in. No dice. System consistently clunks off and reboots after about five minutes every time.

I’m at a loss here.

Sorry for being a possibly dead horse, but can you specifically look for a “Watch Dog Timer” (or similar) in the Maintenance or System Configuration section of the BIOS, if one exists?

1 Like

Not at all, I appreciate the advice!

I’ve looked through the menus, no WDT options :expressionless:

I took all of the heatsinks off and put new thermal paste down, just in case it helped. Then ran the built-in diagnostics, which took several hours and found no issues.

This plus the clonk when it reboots and the lack of diagnostics from any software crash shows it’s a hardware issue (which would also tally with your lack of touching things). As it clonks i’m gravitating to power issues rather than anything else. Have you swapped out the PSU for a different one? and is it on a UPS?

2 Likes

Second! Prove out that PSU if you can. If not, IMO, it’d be the first thing I’d replace. Agree this is absolutely hardware and PSUs burn out weird like yours might be doing.

1 Like