I’m in the early days of transitioning from QNAP to TrueNAS. I’ve built a machine and installed TrueNAS but I am having problems with the system rebooting all the time. Reading here suggests that reboots are usually hardware related and RAM the most likely culprit.
I researched what memtest is and setup a bootable USB. The system crashed within minutes of running memtest. Testing reported multiple fails. I was able to get a replacement DDR5 4800 SODIMM (16GB) from the supplier. I’ve installed that a few hours ago and have been running memtest over it. The replacement RAM module is also coming up with multiple errors. It’s currently sitting at Pass 5 and reporting 43 errors.
It seems really unlucky to have 2 separate RAM modules to be faulty. Same store, so I guess it could be from the same ‘batch’ but honestly it seems unlikely to get 2 faulty modules in a row.
I was reading poorly seated RAM can cause errors so I will shutdown shortly and reinstall the module but I’m pretty sure it’s properly inserted.
When I’m running memtest, it’s just the motherboard, RAM, USB drive and keyboard in play. Is it possible that a faulty motherboard or CPU will present as a RAM failure?
Would really value the thoughts of folks more experience than I on where to go next. Much appreciated.
Can you please take a photograph of the installed motherboard, maybe a few. I want to examine what connectors are mated. The main one I am focused on is the 4 pin CPU 12VDC connector, it is connected?
These are the things I would recommend:
When running Memtest, write down the memory address which fails. Do this for EVERY run of Memtest. Why you ask? It could isolate a motherboard issue.
In the BIOS, reset to factory.
Rerun Memtest. The goal here it to get at least four complete Memtest passes.
If it fails, in the BIOS, if you can, select a slower speed for the RAM. DO NOT manually manipulate voltage levels, not unless you are willing to take the risk of causing damage. We call this Underclocking. Run Memtest again.
If it fail yet again, do you have a power supply you can temporarily use? Try that and run the test again.
If it fails again, are you noticing any similarities in the failing memory address? They do not have to be the exact same number however if they are in the same general area, well just post them all regardless and we can take a look. It could be the motherboard.
If all of these fail, odds are the motherboard is faulty. But you need to try all the other items first if you can.
If underclocking the RAM works, is the RAM on the QVL? If it is, contact the motherboard manufacturer about it or call the motherboard failed. If the RAM is not on the QVL then it could be incompatible or a crappy motherboard.
If you have to tweak something to make the system work, then you really need to run Prime95 CPU Stress Test as well. The CPU installed is not a big heat generator however the method the CPU is attached, even small temperature changes can cause joints to break if made poorly. If you do not trust the stability, replace it.
My last comment which you may not like… I have purchased from AliExpress in the past and often the stuff they sell is poorly built. You might be better off purchasing an ASUS N100 motherboard for example. A true name brand.
Best of luck to you and I really hope you get it working.
Thanks for your detailed reply. When I plugged all this together and it didn’t work it was a bit disheartening. I appreciate all the assistance offered. Helps enormously to feel less isolated.
I’ve attached some images of the build. I hope they show what you were hoping to see. The cables supplied with the power supply had two 4 pin CPU connectors, they look identical, so I plugged one at random and left the other hanging.
When you say “is the RAM on the QVL”, what does QVL stand for?
No problems with the observation about variable quality from Aliexpress. You definitely need to pick through the offerings. I use a Topton soft router for my OPNsense gateway and it has been no fuss, just runs. That gave me some confidence in purchasing from the same supplier again. My local PC parts store where everything else was purchased from don’t have any ITX boards with 6 SATA ports. The price point on Aliexpress is attractive but you win some, you lose some…
Qualified Vendors List. It is the listing of RAM that was actually tested on the motherboard by the developer and guaranteed to work. If you RAM is on the list then you can rule out the RAM, since this is your second stick with the same problem.
Try those troubleshooting steps out. Maybe it will be the power supply. But AliExpress is pretty good about returns. If you get to that point, pull the trigger and contact them. You can ask for a replacement or a refund, that is up to you. And yes, the price point is nice, until you have problems and it turns out to be the less expensive knock-off copy cat item. China does a lot of that unfortunately. I was reading in a company security email today that Samsung had a few people sell China something like 4.3 billion dollars worth of Samsung proprietary information so China could flood the market with knockoff products. And trust me, they are substandard.
Good luck and please post if you need anything else, and if you figure out what is going on.
Given this motherboard doesn’t have any documentation besides what’s printed on the PCB, and even that’s a bit patchy, I’m thinking there is no QVL.
Taking your recommendation, I wound the max speed for the RAM back from Auto to 4600. Who knew you could do that! Memtest ran for 17hrs, completing 19 passes with 0 errors. I’m feeling much more optimistic. At worst case, I can leave the RAM throttled and the system is likely to be stable.
The current RAM is Crucial, which I think of as a credible brand. I’m going to buy another stick of exactly the same spec RAM but a different brand. If that throws errors at 4800, but is also stable at 4600, that to me implies the motherboard can’t keep up at 4800. I think that would give me a strong case to return the motherboard.
Always worth thinking about overheating, but in this case I don’t think it’s likely. When running TrueNAS, this system is mostly idle and the temps are really low - around 40C. Even when running Memtest which maxes out the CPU (I believe) over the 17 hour run time, the maximum temp was 87C which is below the max temp for the CPU by a decent margin.
I’m not sure about the temp of the RAM itself, but at present it’s running without the case so there’s heap of airflow.
Glad the underclocking is working. DDR5 is fast memory and honestly is not needed in an N-100 board, but they built it that way. I doubt there is a VRM issue, this seems to be a frequency/noise issue. Likely just poor manufacturing and not enough high frequency attention to detail. This might have been originally a DDR4 design and when a faster CPU was installed, well they cut corners. I’m not saying that is what happened, and hopefully that isn’t the issue, however underclocking the RAM can work just fine for you.
The only issue you may run into should you just leave the RAM underclocked, if you have to reset the BIOS for whatever reason, you will need to manually change the RAM speed again. Put a paper note or label inside the case to slow the RAM down.
With ALL that said, check to see if there is a BIOS update. If there is, that may fix the issue.
Whatever you end up doing, run MemTest86 like you have been doing AND run a CPU stress test. While you may think they are independent tests, they share common parts. I’d run the CPU stress test for an hour (or 5), I can’t imagine that CPU getting very hot but you are looking to also saturate the motherboard area with heat. Any poor solder joints will hopefully show themselves.
Some people here working in the corporate world will run these tests for weeks, up to a month each if it is mission critical.
Can’t wait to see what your next posting is. Hopefully there is a BIOS update that fixes it.
Can’t find a manual for that board. Does anything in the BIOS setup with regard to memory settings mention “XMP?” Also, are any of the memory voltages (VDDP, VDDQ, et al) exposed and adjustable within said BIOS?
No, there’s nothing I could see about XMP. Similarly, nothing I could see about VDDP, VDDQ or changing voltage values. But, TBH, they could be there but navigating the BIOS menus is a bit impenetrable. The setting for max memory frequency is about 2 levels down under a menu item for system agent(?!?!).
I will have to give this stress-ng test a look. Wish it came prebuilt so a person could just copy it over and run it, or maybe I missed that.
My personal preference for a CPU stress test is something like Prime95 (mprime). UBCD (The Ultimate Boot CD) is an easy one to obtain. It contains the RAM and CPU stress tests and easy to burn the ISO to a bootable USB flash drive.
To get it, first of all, be careful what you click on. I dislike these advertising sites but nothing is free I guess.
You can use something like Rufus or balenaEtcher to put the ISO on a USB Flash drive. Then you need to boor from it of course.
I tried to look for BIOS files for the N18 model you have but I found nothing at all. The Topton website was useless to me. Maybe you would have better luck. Also, looks like that motherboard is no longer in stock at AliExpress. You might be stuck with the one you have now.