TrueNAS Scale Crashing every few days

Hello Everyone, I’m new to the forum and experiencing some problems with my server. My system crashes around every 4 days and can only be recovered using a hard restart, when looking at the system it greets me with the attached image. After hard restart there is no error messages from the system. I’ve read the old forum site and have tried the following: turned off all the power saving settings in bios and disable memory XMP nether have solved the issue, I’ve also disabled S.M.A.R.T tests as thought I read somewhere that they can cause crashes but I’m waiting to see what this does. Would Love some Guidance form the forum team. Thanks.

  • System Version: TrueNAS-Scale-22.12.3.2
  • AMD Ryzen 5 5600
  • 32GB 2x16GB Kingstone Fury DDR4-3200
  • MSI B550 Gaming Gen 3 Mobo
  • gt 710
  • 2 x 2TB WD Red (Mirror)
  • 240GB WD Green Sata SSD (Boot)
  • PNY 500GB M.2 (Cache)
  • EBay 2.5gbe Network (wgetech W2511-SR)
  • Seasonic X series 650W

I am a not really a Linux expert (not at all) but just looking at what this crash log says suggests to me that this is a hardware issue with your memory. I would therefore suggest the following:

  1. Remove and reseat each memory stick just in case it is a poor connection. See if this stops the crashes. If not, then …

  2. Run a BIOS or bootable memory test. On my system, I installed a UEFI memory test so I can run it without difficulty if I ever need it. If this points to a memory issue, then …

  3. Remove one stick of memory and see if it helps. If not, replace this stick and remove a different one until you have tried removing all.

2 Likes

Agreed, Memory is where I’d start first based on those error messages. To rule out a kernel bug specific to that Kernel version you could update to 23.10 (or 24.04-RC.1) as well.

1 Like

I would totally run memtest before anything else. iirc we had a CORE user with a faulty network card that caused a similar situation, but I’m going by memory here (pun intended).

2 Likes

Could I run a memory test through TrueNAS or only somewhere in the bios? Never run a memory test before so I wouldn’t know where to find it.

Typically this is something you’d want to perform outside of a normal operating system.

Passmark maintains a fairly good bootable USB image that does the job, but there are some opensource variants as well. Once you boot into it, its fairly straight forward and you just start the test, check back 24 hours later. It’ll tell you if there are errors.

MemTest86 - Download now!
Memtest86+ | The Open-Source Memory Testing Tool

From there, if there are errors, you might disable AMP/XMP in your BIOS and run at stock JDEC speeds (BIOS defaults) and try again.

You really want to use something that boots into a specialized environment, not FreeBSD. I used MemTestPro to test my system way back when I was getting weird errors. It creates a bootable USB stick that you can then set your BIOS to boot from instead of the usual drive(s). The test can take a long time, depending on how many passes you want the system to torture your memory with. All in all, I found it a good investment.

I always used to use MemTest86+

1 Like

As previously stated, I found a version of one of the memtest packages that runs in UEFI.

Cleaned and reseated the memory, and ran a memory test, it came back all good! must have been a connection issue with the dust build put around the memory stick nearest to the CPU cooler. Thanks Protopia & Everyone!

5 Likes