TrueNAS-13.0-U6.2 - server reboots every 65 min

This is my first post - please be gentle: :grinning:

My current system:
Hardware: Dell PowerEdge T20 - E3-1225 v3 3.2GHz; 12GB ECC RAM
Software: TrueNAS-13.0-U6.2

Issue: My TrueNAS server shows an unscheduled reboot every 65 min.

2024-09-03_TrueNAS_Alerts

The reboots are 65 min +/- 2 sec or so.

I first noticed the issue mid last week. At the time, my server was running version 13.0-U6.1. I tried installing the update thinking it might fix the problem, but no change.

I approached it as though it were a hardware problem. I have swapped out the power supply (no change). I have booted into BIOS, running a diagnostics test. The test does not find any issues.

Just for some background, the original install for the server was FreeNAS version 9.10 back in 2017. I have migrated to version 11, then 12, and now 13. The system has been pretty rock-solid until this recent issue.

I only have 2 services enable (SSH and SMB). I have 1 plugin installed (Nextcloud). I have tried stopping the Nextcloud jail, but no change in behavior.

I grabbed a screenshot of the local display on one of the reboots - not sure if this info may be helpful.

Iā€™m not sure where to look in terms of logs, etc. that might help reveal what is going on.

Any assistance is greatly appreciated.

Are there any watchdog settings in the bios/bmc?

None that I have found.

Definitely nothing that I have changed.

Update:

Hardware changes:

  • removed heat sink, cleaned, new thermal paste
  • replaced CR2032 coin-cell battery
  • replace power supply
  • replaced RAM (2x 8GB ECC)

Software changes:

  • updated to TrueNAS-13.0-U6.2 (from 13.0-U6.1)
  • disabled 2 cron jobs (for reading temps) - no current tasks running
  • stopped Nextcloud plugin (no other plugins installed)

Still experiencing the same issue - reboot every 65 min.

Any additional thoughts appreciated.

It seem to be panicking with fstatat.
https://man.freebsd.org/cgi/man.cgi?query=fstatat&apropos=0&sektion=2&manpath=FreeBSD+13.0-RELEASE&arch=default&format=html
Are there any errors on the disk?

I donā€™t see any disk errors.

Iā€™m only looking in the GUI.

Is there somewhere else to check or something else that I should look at?

Iā€™m curious why (almost) exactly 65 min every time.

Seems like a clue, but I canā€™t seem to figure it out.

New update to share:

I have 1 ZFS pool - 4x 5TB drives

I unplugged all 4 of the 5TB drives and rebooted.

The pool is obviously offline. The server has been stable, though and hasnā€™t rebooted.

Any ideas on narrowing down to determine which drive(s) may be the causing the issue, or if there is some other issue related to the pool?

I can see that there a ZFS update available. I have thought about trying to upgrade the pool, but donā€™t have much confidence that it will actually solve the problem.

I decided to try unplugging 1 of the 5TB drives at a time and booting the system.

I went through all 4 drives one at a time.

In each case, the system booted and showed the pool as degraded, but still working (all samba shares accessible, etc.)

In each case, the system still rebooted after 65 min.

So - still the same issue, and not able to narrow the problem down to a single drive.

I tried running a manual scrub, but the scrub canā€™t complete in 65 min. I tried pausing the scrub before the system rebooted, but it does not save the scrub progress. It starts a new scrub after every reboot.

If any set of four drives is fine but the whole five fails it could be an issue with power supplyā€”though itā€™s hard to conceive why the PSU would wait exactly 65 minutes to show its discontent.

I suspected the PSU as well.

I have replace the PSU, as it was not terribly expensive. I still have the same issue.

(I have also replaced the coin-cell battery, the RAM, and cleaned the heat-sink and re-installed with new thermal paste.)

1 Like

3 days ago, my system just mysteriously stopped rebooting.

Update was 3 days and some change.

This afternoon, it started rebooting again. Back to every 65 min for the last few hours.

@NickF1227 , @ericloewe can you help here?

Wild guess here. Do you have the HDD set to spin down after 1 hour? Could there be an issue when this happens or when they try to rewake?

1 Like

Thanks for the assistance.

Here is a look at the config for my drives:

All of the drives are configured the same - always on, and adv. power management disabled.

Given that you have eliminated the PSU, the HDDs from the equation, the only thing left is the CPU / motherboard? I presume you have been watching the CPU / motherboard temps and they are fine? Was the system cooler over the three days that it did operate w/o rebooting? T

o me, this looks like a ā€œthermalā€ issue, at the same time I see nothing wrong with trying out a 13.3 upgrade and see what happens. Your pool data should be safe as long as you have a backup of the config file.

Hi @Constantin - thanks for the reply.

I have completed enough testing that Iā€™m fairly positive it is a problem specific to the 4x 5TB pool and/or one of the drives in the pool.

I have done both of the following to help confirm:

  • I have disconnected the 4x 5TB drives in the pool and rebooted the server. When I do this, the pool is obviously offline, but the server is then stable and does not reboot.
  • I have boot the served to a ā€˜liveā€™ Linux distro on USB drive. Again, the server is stable and does not reboot.

CPU temps all look good (avg temps 26C, hottest core 29C). No difference noted during the 3 days of uptime.

If you send me a PM with a debug file Iā€™d be happy to take a look. Without it Iā€™m only going to be guessing at this point.

1 Like

Hi @NickF1227 - I appreciate the offer to assist!

I would be glad to grab/share debug file(s) info. Iā€™m not sure what all files I need to provide. If you can tell me what files/info are valuable, I will capture and share.

Thanks - Brian

Go to System Settings ā†’ Advanced and then click Save Debug on the top right hand side.

Wellā€¦that was easy. (I had looked around in the UI, but just hadnā€™t found it. :grinning:)

It doesnā€™t look like I can send PMs (Iā€™m new to the forum). Try this link.
[Redacted - Mod]