Random reboots after upgrade to 24.10

Ork_77 · November 7, 2024, 1:18pm

Hi everyone. Yesterday I upgraded from the latest version of Truenas scale Dragonfish to 24.10.
The transition went well, apart from a couple of applications which for unspecified reasons did not migrate. Being a novice in Linux (but with 40 years of computer experience) I simply reinstalled them. Unfortunately, however, the real problem emerged shortly thereafter. In the last 12 hours the system has spontaneously rebooted 5 times, sometimes every few minutes, sometimes after hours. Sometimes just using Plex, sometimes with the system doing absolutely nothing.
In the past six months of continuous power on I had experienced I believe two restarts. The system is not mission critical but such a frequency of reboots is unacceptable. Given the emergence of the problem with the transition to 24.10, I think it is reasonable to exclude hardware problems.

Do you have any ideas on what I could check to understand what’s going on (remember I’m NOT an expert in linux)?

Is there a possibility to go back to the previous version and in this case what would happen to all my applications?

The system is a Ryzen 1700x, Asus x370 Prime motherboard, Geforge (old, not supported by drivers), 16GB of RAM, no ECC (checked and works well)

Thanks to anyone who knows and can help me!

Juppers · November 9, 2024, 3:36am

I just migrated from core to scale today and am also seeing random reboots. I have looked through several logs and I don’t see anything. It just reboots without warning or any logical reason. I’ve seen another thread with the same issue today as well, so we aren’t alone.

Juppers · November 9, 2024, 6:54am

Looking like i’m making at least some sense of it, I did find errors in syslog that correspond to the crashes. Can you check if you are seeing similar?

syslog:Nov 8 16:42:05 Onyx kernel: perf: interrupt took too long (2588 > 2500), lowering kernel.perf_event_max_sample_rate to 77250
syslog:Nov 8 17:19:45 Onyx kernel: perf: interrupt took too long (2513 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
syslog:Nov 8 19:37:20 Onyx kernel: perf: interrupt took too long (2799 > 2500), lowering kernel.perf_event_max_sample_rate to 71250
syslog:Nov 8 21:31:49 Onyx kernel: perf: interrupt took too long (2667 > 2500), lowering kernel.perf_event_max_sample_rate to 75000
syslog:Nov 8 21:31:49 Onyx kernel: perf: interrupt took too long (2667 > 2500), lowering kernel.perf_event_max_sample_rate to 75000
syslog:Nov 8 21:51:06 Onyx kernel: perf: interrupt took too long (2574 > 2500), lowering kernel.perf_event_max_sample_rate to 77500
syslog:Nov 9 01:08:12 Onyx kernel: perf: interrupt took too long (2766 > 2500), lowering kernel.perf_event_max_sample_rate to 72250

Ork_77 · November 9, 2024, 8:43am

Hi, I would like to do that and I already spent hours but for a complete linux noob it is not easy.
var/log/ I guess the logs are here? but which one should I check? Kern.log? If this is the one I just see nothing after the reboot (I mean…a bunch of stuff during the rebooting and then nothing until the next one)

LarsR · November 9, 2024, 9:06am

For first gen ryzen there were some bios settings that hat to be disabled.
For older bios versions those settings were: erp-ready, amd cool&Quit and global c-state control. On newer bios options there was an option for power supply idle controll which had to be set typcial current from low power.

Ork_77 · November 9, 2024, 9:21am

Yes, I’m aware of this. My platform worked for six months with everything enabled (It was my old rig for many years and when I used it as truenas server I didn’t change a thing). Now I tried to change these settings following an old post/solution:

Precision Boost Overdrive, (can’t find it)
Core Performance Boost (disabled)
Global C-State Control (disabled)
PSS Support, Can’t find it
D.O.C.P. lowered my mem frequency from 3200 to 2400 with awful timings…

Wainting to see if something is going to change.

Juppers · November 9, 2024, 1:22pm

Try this command, it will search for that phrase in every log file in /var/log.

grep “kernel: perf: interrupt took too long” /var/log/*

Ork_77 · November 9, 2024, 2:08pm

yes… I have the error:

/var/log/kern.log:Nov 6 00:14:18 truenas kernel: perf: interrupt took too long (2625 > 2500), lowering kernel.perf_event_max_sample_rate to 76000
/var/log/kern.log.1:Nov 1 13:52:50 truenas kernel: perf: interrupt took too long (7503 > 7010), lowering kernel.perf_event_max_sample_rate to 26500
grep: /var/log/libvirt: Is a directory
/var/log/messages:Nov 6 00:14:18 truenas kernel: perf: interrupt took too long (2625 > 2500), lowering kernel.perf_event_max_sample_rate to 76000
/var/log/messages.1:Nov 1 13:52:50 truenas kernel: perf: interrupt took too long (7503 > 7010), lowering kernel.perf_event_max_sample_rate to 26500
/var/log/messages.1:Nov 1 13:52:50 truenas kernel: perf: interrupt took too long (7503 > 7010), lowering kernel.perf_event_max_sample_rate to 26500

But for what I understand there is not a direct correspondence with the crashes that are more frequent and all started a couple of days ago…

neofusion · November 9, 2024, 2:26pm

Yes, this is the most likely cause, in my mind.

There have been numerous posts by people since 24.10.X running older Ryzens specifically complaining about “random reboots when idle”, a key symptom of the older Ryzen power issue at idle.

Update the BIOS and set Power Supply Idle Control to Typical.

I wouldn’t change the other settings. The above should resolve the crashes.
Changing PBO, CPB and so on are likely red herrings.

Juppers · November 9, 2024, 4:55pm

I don’t have a Ryzen CPU. I’m using an AMD FX-6200.
I also saw the same in 24.04 while I was step upgrading from 13.3.
Core had been rock solid for about 5 years on this hardware, aside from a failed power supply last year. It was replaced with a Corsair 750W gold.

neofusion · November 9, 2024, 5:02pm

Then you should probably make your own thread and post full system details.

Juppers · November 9, 2024, 5:06pm

Will do. Was hoping to find some commonalities to help narrow down the issue.

Ork_77 · November 18, 2024, 8:51am

I decided to try this solution and wait a few days to see how it went. Well, not only have I no longer had any random reboots but it seems that the error problems on all the disks in my pool have ALSO been resolved (errors in the checksums, sometimes in the order of 3/4 per disk which I was unable to give an explanation.
I was also able to restore normal RAM performance (however, I left the c-states disabled, I didn’t notice any changes in temperature so, since everything is fine now, I’ll leave things as they are)
Thank you all!

Topic		Replies	Views
Random reboots since upgrading from Core to Scale TrueNAS General	9	305	November 11, 2024
Truenas core halting every few days TrueNAS General CORE , Hardware	14	486	May 20, 2024
Random reboots with new hardware TrueNAS General SCALE , Networking , Hardware	9	258	September 19, 2025
Crashing/Freezing TrueNAS General SCALE , Hardware , 24_04-Dragonfish	17	612	August 25, 2024
TrueNAS Scale keeps rebooting TrueNAS General SCALE	10	1686	March 1, 2025

Random reboots after upgrade to 24.10

Related topics