Crash Truenas Scale between 3 and 4 a.m.

Hello/Bonsoir,
I’m having a problem at the moment, my server crashes every night around 3:30 am for some unknown reason. The server is inaccessible in SMB, Web UI and even SSH, I have to restart it manually, it didn’t do that to me when I was on the previous version.
I’m currently on the latest version of Scale Eal electric
and I tried to disable my Dockge package to see if it was one of my applications that was causing a problem, but obviously not.

Here’s my configuration in case:
i5 9600k
z 390 aorus
64Gb memory
1x250gb OS
1x500gb Apps
3x8To hdd raidz 1
2x2To mirror
650W alim

It’s been several weeks since the crash, but two days ago it stopped crashing when I changed the NTP…I thought I was done, but it came back today at 4am :cry:

Despite the title of the thread, something seems similar to what the Op Is facing (plus similar hw).
Maybe start debugging the things has been suggested there

1 Like

I was just sure about this thread before creating mine and unfortunately, I don’t have the same thing, not even reboots, I’ve tried setting the memory to 2400 MHz instead of 2133.

Are there any nighly tasks/jobs that might trigger this?

Have you tried multiple ntp servers? If not which one are you using and is it reliable?

Also have you tried to connect a monitor during the hang (if console display something) or do you catch some errors in logs?

No, no task is programmed, which makes me wonder what it could be.

Yes, I have a plug-in monitor and only a Freez that displays my machine’s IP address with the 9 choices for configuring the server directly.

For NTP addresses I use this one being in France

Unable to share images, here are the different NTP

0.fr.pool.ntp.org_ Non Oui Non 6 10
1.fr.pool.ntp.org_ Non Oui Non 6 10
2.fr.pool.ntp.org_ Non Oui Non 6 10
3.fr.pool.ntp.org_ Non Oui Non 6 10

Perhaps you could be more specific about when it started crashing relative to software updates and ntp changes.

It sounds like it was stable for a few weeks… but now its crashing each night?

Its not a problem we are seeing… so its likely to be hardware related. However, having a 24 hour cycle would be unusual… I’d also check the system clock…

Yes when we switched to the latest version this problem appeared, my friend and I looked into where it was coming from and it disappeared for a few days just before this weekend as said above, when we changed the NTP.
For the record, last night I changed the time on my bios, which was 2 hours off, and this morning the server didn’t crash or freeze.
I’m going to leave it as it is to confirm whether it’s a question of NTP or system time.

1 Like

Well dear friends, I’m back to announce that the server has crashed again…I’m getting fed up, I don’t want to have to fiddle around every 2 months, I keep testing where the problem could be coming from but nothing helps…

Just a thought

maybe it’s an external electric issue (smart device rebooting @ 3:30am cutting off power?)

Unfortunately, I don’t have anything like that at home.

Using the monitoring or netdata can you see if there Is something abnormal before a crash?

Hi
I can report I have a similar issue since upgrading to Electric Eel. No problem at all for more than year now two random crashes. One in Jan and one in Feb. Nothing I can see in the logs, nothing on a monitor and only a cold restart gets it going again. Yes it is on a UPS and no other machines on the same UPS have this problem. I’ve replaced an older nvme drive and that didn’t help and next I will replace the ram unless anyone else has a suggestion?

Have you run a memory test? Test the RAM good.
I would try that before replacing RAM.

Have you submitted a Bug Report directly (when you noticed it) after the problem occured? Ensure you include the log data that is an option during the bug report.

This certainly does sound like an NTP issue, but could also be some software function that is run at a specific time of day. The logs hopefully would identify the exact time of day and possibly the culprit.

In the meantime, as others have said, run a RAM test and CPU test, run them for as long as you can. The RAM I would test for at least 5 complete cycles and CPU at least 5 hours, longer if your system is normally very active. And while I don’t think these would root out the problem, they should give you a sense the computer (for what you can test) is pretty solid.

One piece of added advice… When you upgrade to a newer version of TrueNAS, do not upgrade the ZFS feature set. Ignore it. This allows you to roll back to the previous version if you discover a problem, such as what you currently have.

Just to be clear, you have a keyboard connected and the keys were also not functional? I do not want to assume anything, it causes problems.

Did you check the BIOS time again? Make sure it is correct and not 2 hours off again.

Yes, I can confirm that the keyboard works and that the time differences have been resolved. It’s been a few days since I removed two RAM, so I’m at 32GB… it’s not crashing at the moment. Maybe my processor doesn’t like having both DIMs (4 Ram).

I’ve just discovered that my server crashes when I add movies/serials to my plex (it just did it to me just now).