Hello Everyone, I’m new to the forum and experiencing some problems with my server. My system crashes around every 4 days and can only be recovered using a hard restart, when looking at the system it greets me with the attached image. After hard restart there is no error messages from the system. I’ve read the old forum site and have tried the following: turned off all the power saving settings in bios and disable memory XMP nether have solved the issue, I’ve also disabled S.M.A.R.T tests as thought I read somewhere that they can cause crashes but I’m waiting to see what this does. Would Love some Guidance form the forum team. Thanks.
I am a not really a Linux expert (not at all) but just looking at what this crash log says suggests to me that this is a hardware issue with your memory. I would therefore suggest the following:
Remove and reseat each memory stick just in case it is a poor connection. See if this stops the crashes. If not, then …
Run a BIOS or bootable memory test. On my system, I installed a UEFI memory test so I can run it without difficulty if I ever need it. If this points to a memory issue, then …
Remove one stick of memory and see if it helps. If not, replace this stick and remove a different one until you have tried removing all.
Agreed, Memory is where I’d start first based on those error messages. To rule out a kernel bug specific to that Kernel version you could update to 23.10 (or 24.04-RC.1) as well.
I would totally run memtest before anything else. iirc we had a CORE user with a faulty network card that caused a similar situation, but I’m going by memory here (pun intended).
Typically this is something you’d want to perform outside of a normal operating system.
Passmark maintains a fairly good bootable USB image that does the job, but there are some opensource variants as well. Once you boot into it, its fairly straight forward and you just start the test, check back 24 hours later. It’ll tell you if there are errors.
You really want to use something that boots into a specialized environment, not FreeBSD. I used MemTestPro to test my system way back when I was getting weird errors. It creates a bootable USB stick that you can then set your BIOS to boot from instead of the usual drive(s). The test can take a long time, depending on how many passes you want the system to torture your memory with. All in all, I found it a good investment.
Cleaned and reseated the memory, and ran a memory test, it came back all good! must have been a connection issue with the dust build put around the memory stick nearest to the CPU cooler. Thanks Protopia & Everyone!