Came home from vacation - TrueNas UI unavailable - system panic

Came home from vacation and wanted to double check the health of my system. UI system unavailable. I can ping the system on the network but cannot SSH (asks for password then just hangs)

No idea what went wrong so I plugged a monitor in.
This is what I can see

If this is not enough to determine the issue, is there a place I can gather more stats?
Seems like this could potentially break my raid if the system suddenly panics or needs to be physically shutdown.

It looks like the system may have run out of RAM. Can you provide hardware details and what version of TrueNAS this is?

The output of less /var/log/messages which will be very long would be next steps once you’ve rebooted the system. Feel free to DM the output to me, or a debug file

Thank you @NickF1227 for the reply.

The system was just set up for testing. No traffic or any applications/shares/etc. It was just sitting there for a burn in. Then I went on vacation. So it wouldn’t have run out of memory from usage, maybe a memory leak?

Version is Dragonfish-24.04.0

Hardware
OS Version:TrueNAS-SCALE-24.04.0

Mainboard:B450M Steel Legend
Model:AMD Ryzen 7 3700X 8-Core Processor
Memory:31 GiB

Shared the Crash Logs here - https://file.io/Ocza0oUikboF

Shared here because I couldn’t DM

Thank you for your response

24.04.0 had issues with agressive swap usage, which could lead to freezing systems. Because of this, swap has been disabled in .01 and .02. Try a restart, update the system to .02 and let the system run for a while and see if the problem reoccurs.

Thanks LarsR. Do you think that was my issue? As I mentioned. I had a very basic setup, no actual usage on the system and it was just sitting there for 60 days when this happened.

Does the log file indicate this type of issue? Why would it be using swap if there was 0 space used and 0 users accessing it?

Can’t say for sure, but i was going off of Nick’s comment about out of memory and given the posts i remember with the .00 version and aggressive swap slowing down systems and even some reports of systems crashing there should be no harm in upgrading to .02.

1 Like

In either case, updating from 24.04.0 to 24.04.2 should be done considering the known issues with the former.
High swappage on .0 was leading to a massive amount of IOWAIT, which would cause system lockup similar to what you are seeing.

1 Like

I’ll give the upgrade a shot. Would like to confirm that my system symptoms look like the swap issue.

If I have the chance (test build) should I just re-install rather than upgrade?

Re-installing won’t be nessessary since the .1 and .2 update were only minor bug fixes to address the swap issue and no major features were added/removed.

I’m on pass 4 of a memtest86. It’s been running for 24 hours with a single error.

Is there any way to look through the logs or are we betting on a 24.0 bug so I shouldn’t investigate my case any further?

Pass 6 and no errors. Should I chalk it up to a sofware issue? Is there no way to see in the debug file and confirm that’s what the issue is? I’d hate to build and start putting my data on this box and have it lead to issues

I’ve let the memtest run for 91 hours and no errors found.

It would seem the issue is not with memory. Can anyone see anything in the debug? https://file.io/VWc0YD46bIga

It looks like on Aug 20th there was a hard lockup on CPU8, and it remained like this until the reboot on Sep 2nd.

Spurious LAPIC timer interrupt on cpu 8 - CPU8 received an interrupt from an unknown source? I’m not 100% certain on this.

Considering you were on .0 I would not worry about the hardware side for the moment, update to latest and see if it crops up again.

Will go for a fresh re-install now that I have another new drive to add to the system.

Really hopefully it doens’t crash again as I’ll be putting data on it.