TrueNAS Scale keeps rebooting

Hi,

I am experiencing periodic resets in TrueNAS Scale ElectricEel-24.10.2.

I am a first-time user of TrueNAS. I decided to repurpose an old computer to deploy a NAS and some apps such as home assistant. Nothing critical.

I installed TrueNAS scale in my computer, created a pool with one hard drive and tested access to files from my network. I added home assistant and tested adding a few smart switches I have. So far so good.

I left the server running all night, and around 2am, it shut down for about 20min and re-started on its own. After that it run for maybe 15hs without any issues. After that, I started experiencing periodic shut downs. The server stays on for some minutes and then it turns off. Some times it re-starts on its own after a few minutes, some times it stays off for a longer period and then it re-starts.

Moreover, when I try to shut it down, most of the times it re-starts on its own.

I tried to change the low power reset on BIOS: I changed the setting in the Power Supply Idle Control in the BIOS from Auto to Low Current and then to Typical. In all conditions the system did the same: automatically turn on and then off.

I checked the CPU temperature logs, no core seem to exceed 60C and the maximum peak temps were very brief and repetitive. They did not seem be a trend of temperature increase.

I pulled a log from the messages report in, attached is an excerpt that shows the log between two shutdowns. The log corresponds to the following sequence:

  1. Manual turn on (by connecting power the cable to the utility outlet) (20:48:25)
  2. Let it run until it shut down on its own (20:54:20)
  3. Let it run for 2 minutes and manually command a shutdown (20:58:08)
    (in this case, it did not turn on its own. After I manually turned it on at 21:12 again, it run OK for about 1h and then proceeded to self reboot a few times.)

(cannot attach it as I am a new user, sorry) Below are the messages when it was about to shut down

[code]

Feb 18 20:54:20 truenas syslog-ng[2533]: syslog-ng shutting down; version=‘3.38.1’
Feb 18 20:55:53 truenas syslog-ng[2545]: syslog-ng starting up; version=‘3.38.1’
Feb 18 20:54:20 truenas kernel: br-3f9b022fb1dd: port 1(vethbb932ce) entered disabled state
Feb 18 20:54:20 truenas kernel: veth167b1f5: renamed from eth0
Feb 18 20:54:21 truenas kernel: br-3f9b022fb1dd: port 1(vethbb932ce) entered disabled state
Feb 18 20:54:21 truenas kernel: vethbb932ce (unregistering): left allmulticast mode
Feb 18 20:54:21 truenas kernel: vethbb932ce (unregistering): left promiscuous mode
Feb 18 20:54:21 truenas kernel: br-3f9b022fb1dd: port 1(vethbb932ce) entered disabled state
Feb 18 20:54:30 truenas kernel: vethabe449f: renamed from eth0
Feb 18 20:54:30 truenas kernel: br-3f9b022fb1dd: port 2(vetha32da9c) entered disabled state
Feb 18 20:54:30 truenas kernel: br-3f9b022fb1dd: port 2(vetha32da9c) entered disabled state
Feb 18 20:54:30 truenas kernel: vetha32da9c (unregistering): left allmulticast mode
Feb 18 20:54:30 truenas kernel: vetha32da9c (unregistering): left promiscuous mode
Feb 18 20:54:30 truenas kernel: br-3f9b022fb1dd: port 2(vetha32da9c) entered disabled state
Feb 18 20:54:31 truenas systemd-journald[683]: Received SIGTERM from PID 1 (systemd-shutdow).
Feb 18 20:55:23 truenas kernel: Linux version 6.6.44-production+truenas (root@tnsbuilds01.tn.ixsystems.net) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Tue Jan 28 03:14:06 UTC 2025
Feb 18 20:55:23 truenas kernel: Command line: BOOT_IMAGE=/ROOT/24.10.2@/boot/vmlinuz-6.6.44-production+truenas root=ZFS=boot-pool/ROOT/24.10.2 ro libata.allow_tpm=1 amd_iommu=on iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=on zfsforce=1 nvme_core.multipath=N

[\code]

I am new to Linux and I don’t know how to read the logs properly, hopefully someone can help. This project is partially a way to learn more about systems other than Windows.

Hardware specs:
CPU: AMD Ryzen 5 3600X
Motherboard: GIGABYTE X570 I AORUS PRO WIFI
RAM: G.SKILL Trident Z RGB (For AMD) Series 16GB(2x8GB) DDR4 3600MHz
Boot Drive: INTEL 660P M.2 512GB
Graphics Card: MSI Radeon RX 5700 XT EVOKE
HDD: Seagate FireCuda 2TB Solid State Hybrid Drive Performance SSHD 2.5 Inch SATA 6GB/s
PSU: IN WIN A1 RGB Mini-ITX Black Tower with Wireless Charging and 600W 80Plus PSU ( IW-A1-BLA-P) [The PSU comes with the case]

As a reference, I was using this same PC, no hardware changes, for work and gaming using Windows 11 until a few weeks ago with no problem. I had to change a few settings in the BIOS (related to secure boot and virtualization) to be able to install TrueNAS.

I’m not an expert, but to me the log provided just confirm the shutdown-restart, without giving the reason.
The first step i would try, due the fact you are using a gaming mainboard, Is tryng disable every CPU Power control.
Also, check on monitoring/netdata if the crash come with certain operation, maybe this can point to the problem.
If nothing fancy come out, you need to start from beginning and avoid Hw problem… Memtest at first.

Thanks for the reply!

I also thought that the log was just confirming the shut down, thanks for the confirmation! Do you know of other logs that I can pull that would tell me the cause of the shutdown?

I tried to disable all CPU Power control. It seems to keep doing the same.

I also run memtest86+ and got a pass for the memory. Do you have a suggestion on how to keep testing? (keep in mind I have been a windows user all my life, so I am trying to figure out how to do this)

hey, I’m currently running into the same issue. Though it’s because of a some reported nvme errors. I replaced the nvme and still got a freezing system. No display output, didn’t find something in the logs so far, still searching. As long as the system is idle, the freeze doesn’t happen. Only observed it under load after around an hour oder two.

In your place, i would try to see in containers log if Is some app that trigger the reboot, instead you can keep the long way:

  • let RAM run slower
  • let truenas run with minimal setup, keep away everything except the boot disk, disable everything.
  • add 1 thing at the time and let It run

Btw, consider to sell this conf - buy second hand server grade parts ti achieve a reliable nas

OK, sorry for the delay in the response. I was trying a few of the suggestions.

  • I tried lowering the speed of the RAM and had the same issue
  • I tried to reset all BIOS setting to default and return them one by one to the expected valued. No fixing.

However, in doing these changes I notice that the reboots were happening even in the BIOS screen, and in some conditions, even before reaching this point. This made me suspicious that it may be the hardware itself. So I decided to do what you suggested.

I took the computer apart and re-connected it outside of the case. Had the same issue (but now I have it outside).

I tried without the additional SATA drive (only kept the NVMe boot drive),
With one RAM stick at a time
Without the graphics card

In all cases I get the same behavior: a few minutes working and then reboot, some times even reboot in the BIOS screen.

To the best of my knowledge, I am down to two components:

  1. The microprocessor being faulty
  2. the PSU being faulty. I am wondering if it is not a case that the PSU is being operated at too light load (53W out of 650W), but I suspect this is not the case, the PSU was built into the case and is 80+ gold power supply. 53W should be enough for it to regulate.

My next step will be to test the PSU, but I need to get another one test, and I don’t have one.

What really bothers me is that the computer was working fine before changing the OS from windows to TrueNAS. I can recall some cases where the computer may have rebooted on its own during the night, but it was very uncommon. I thought it was some planned update (but I was not monitoring for it, so I don’t know).

In my experience, i would suggest to focus more on PSU - motherboard than PSU - CPU.
PSU Is the easiest part to test… Just grab another one from another PC (either one from a friend or a familiar on free rent :smile: ). If you have a multimeter you can also take some value (but do It only if you know how).
Motherboard are really tricky to test, and probably hard to grab another one with similar spec to test CPU… If those reboot weren’t so much i would have suggest you to flash a newer BIOS (if available), but if the process fail without a programmer and with a soldered chip you can directly throw the mainboard…
To me you have covered most of the test, apart that there are few choice

Just make sure to use the cables that come with the PSU, don’t let them intermingle unless they happen to be the exact same brand. Some brands even have different pinouts between their models, always doublecheck.

1 Like

Yep, remember some old thread were folks fried all disks using wrong cable.
Always swap the entire PSU and not only the cable

solved my problem by removing one of two RAM sticks, after running a memtest and detecting an error.

2 Likes

Hi!

Just an update: it was the power supply :man_facepalming:
I got a different power supply and now it is working without interruption.

I have been testing it for a week now.

Thanks for the help! although the trusnas was not the source of the problem, hopefully this thread will help some other poor soul get to the hardware issue!

3 Likes