ECC memory vs Non-ECC memory - Poll!

What does a car accident have to do with ECC?

Hmm, in addition to DDR5ā€™s On-Die Error Correction, DDR5 has;

An additional feature of the DDR5 SDRAM ECC is the error check and scrub (ECS) function.
ā€¦
The DDR5 SDRAM can run the ECS in automatic mode, where the DRAM schedules and performs the ECS commands as needed to complete a full scrub of the data bits in the array within the recommended 24-hour period.

It is not clear if the ECS function uses ā€œOn-Dieā€ ECC or requires the Full ECC. Nor if it is Registered ECC memory or also works with Un-Buffered ECC memory.


DDR5 also has post packing repair functions;

Post-package repair (PPR) is broken into two separate repair features, hPPR (hard) and sPPR (soft), which may be better described as permanent repair (hPPR) and temporary repair (sPPR). hPPR is nonvolatile with power cycling and sPPR is not.

This is the ability to ā€œspareā€ out a RAM row as desired. Apparently DDR4 had this feature too, though the document I was reading was less clear about DDR4;

Seemingly useless until disaster strikesā€”and then itā€™s too late to put it on.
This could be a cartoon by Edith Pritchett.

2 Likes

I was going to go in a slightly different direction - ECC is like an airbag in that it usually cost extra but it can a very useful feature when itā€™s needed.

Similarly, a ECC subsystem is also like an airbag in that it likely will only ever fire in anger once. Subsequently, the DIMM will be ejected from the computer by the admin and replaced with a new DIMM/ECC subsystem every time the ECC reports a problem.

But, computer systems can continue to run after a ECC DIMM error has been detected. For all I know, memory-subsystem programmers may even fence the memory problem and stop the CPU from using those blocks? Many cars wonā€™t start once an airbag has been triggered though it may also have to do with the fuel pump inertia switch being sprung, etc.

Survivorship bias is a way to show that all knowledge gained from personal experience is a matter of luck so it does not apply to any other situation or person. So running my Truenas system for almost 3 years on non-ECC RAM without any data loss or any other issues is a one off, but the personal experience of others running Truenas systems on ECC RAM without any issues is conclusive and proves that ECC is a requirement. Whatever belief system you chose to follow, may the hardware odds always be in your favor.

You seem to have missed my post that both DDR4 & DDR5 have spare DRAM rows and can ā€œrepairā€ the DIMM. Not sure how it works. Nor how I could have missed such a feature. But, hey, Iā€™ve been living under a rock for yearsā€¦ (well, it feels like it).

1 Like

I did miss that, sorry. Thatā€™s a neat feature. When my QVL-listed but eBay used max-spec memory in the c2750d4i started to go kaput I just replaced it. Stock memory continues to work, thank you, IXsystems!!!

A bit OT, but I figure the ECC experts would be watching this threadā€¦

Do the two scripts by Mastakilla still work in the current TrueNAS SCALE (EE)?

Iā€™ve installed them on my box, since the ASRock Rack X470D4U doesnā€™t report ECC errors to the BMC, so you donā€™t actually know if / when a memory-related error has occurred.

The scripts seem to work (as in I can run them without errors), but until Iā€™ve got a mem error in the logs, I guess Iā€™ll just have to wait & see. I did have to adjust ā€˜date -vā€™ to ā€˜date -dateā€™ in the first script, since ā€˜-vā€™ is no lonfger a valid parameter.

If you are serious about using TrueNAS, remind youself that you are building a server. Thatā€™s why I use ECC memoryā€¦ and RAIDZ2ā€¦ and backups in both physical hard drives as well as off-site cloud backups.

I blame the ā€œretailā€ stuff for not embracing ECC more. Itā€™s really hard to find hardware that is ECC compatible (AMD unofficially with certain CPUā€™s + Mobos). You almost always have to go for enterprise level HW for homelabing. Bad guy Intel intentionally disables ECC, because they want you to buy their Xeons. So here we are, trying to have nice things, but not for a high price.

1 Like

For me personally, my servers would always be enterprise stuff (which by default has ECC) simply because the cost in time, headache and mental distress isnā€™t worth it.

I had a server that used all gamer gear for about 2-3 years and every once in a while, it would just randomly lock up for no reason and only a hard reset would get it back to working state. This went on for like a couple of years, but it gave me a lot of headache because whenever it happens, I would spend an hour or two chasing a wild goose before figuring out that I need to hard reboot the server (it still kinda semi-worked, but some things either didnā€™t work or was slow as snail). Compounding the issue is the fact that hard rebooting is sometimes not an option if Iā€™m remote (no physical access to the server).

I decided I had enough, bit the bullet, bought all enterprise stuff and Iā€™ve been running for over a decade with rock solid stability even when I donā€™t reboot for a year (was lazy with updates for a year). Add IPMI into the mix and I was wondering why I didnā€™t do it from day 1. The extra cost was well worth it for the convenience from IPMI, the peace of mind, and the stability. Also, having oodles of RAM is cheap with ECC gear with the abundant availability on R/LR-DIMM modules and you can get way higher capacity on a single stick also.

6 Likes

For a home user, IPMI is largely a nice to have as long as you have a physical console and monitor to interact with. I leave my IPMI network connection disconnected from the network as now there is no need to use it, ever.

The last time I interacted with the IPMI I was still fiddling with the fan settings instead of relying on the fan script. The latter does a better job and it also significantly reduces fan speed / noise / power most of the time.

I cannot get over the fact that you need to install and run a Java application from a client PC in order to use the serverā€™s IPMI.

Please correct me if Iā€™m wrong.

Are there any IPMI that can be accessed via a web page that doesnā€™t require Java?

For Supermicro, X10 or better introduces the HTML5 option. Similarly, IIRC, HP Gen8, Dell 12th Gen, and Cisco M4 all introduce it.

Other vendors Iā€™ve no idea of an easy way, but most if not all the modern ones after First Gen Scalable Xeons or any AMD EPYC shouldnā€™t be locked to Java.

5 Likes

The Sun SPARC servers we use at work donā€™t seem to need Java for the GUI. They all also include a command line interface via SSH which covers tons features, like console, remote power and such.

I donā€™t recall the IBM AIX HMCā€™s requiring any special Java for GUI access. Console is via SSH to the VIOs, (physical POWER servers configured by the HMC).

IBMā€™s DataPower appliances, (based on x64 hardware or virtualized), donā€™t seem to need Java for the GUIs.

The Cisco UCS GUIs donā€™t seem to need Java.

Recent VMWare GUIs donā€™t seem to need Java.

IBMā€™s BigFix console used to use Java, though I donā€™t know its current status. Been at least 3 years since Iā€™ve used the BigFix console. (For those that donā€™t know, BigFix is some automation application that can push patches, packages, and other changes to client serversā€¦ been around for at least 10 years.)


I do find the prior need for Java quite annoying. Especially when you are forced to update your Java for security concerns. Then find out later that broke your IPMI / BMC access to older servers.

Java = Write once, then update every time the base language is updated :frowning:

2 Likes

This was true back with Super micro X9 series. Starting with X10 series, it uses HTML5. I think you may also be able to do a firmware upgrade on some X9 series to get the HTML5 IPMI, but I may be wrong on that though.

Also, is it just me or none of @NickF1227 links in the signature works.

Xeon D-1500 boards from AsRock Rack and Gigabyte still use Java. Later boards, basically Xeon Scalable and newer, use HTML5.
IPMI on Gigabyte MC12-LE0 and MJ11-EC1 boards is arguably a much better experience than Supermicro X11 IPMI.

X10 from Supermicro, and the equivalent of X11 from other manufacturers.
If youā€™re still on X9 hardware maybe itā€™s time to get out of the cave and get something more modern. :wink:

2 Likes

Really? Thatā€™s bizarre, given thatā€¦

Note that plain Java, for the Supermicro desktop tool whose name eludes me at the moment, is substantially less irritating than JavaWebStart, so thatā€™s worth giving a try.

Note regarding the note: JavaWebStart, even with the recentish fork (I think thatā€™s what it is?) OpenWebStart or whatever itā€™s called, just breaks all the time for no apparent reason. I hate it with a passion.

Not even for the iKVM functionality? I guess thatā€™s the sort of internal feud Iā€™d expect from Sun.

Dell Gen 12 needs a very late iDRAC firmware, from around the time they re-merged the Gen 12 and Gen 13 branches. So maybe 2015-2016ish or so? But I can confirm that it works about as well as Gen 13.

Good point.

I would never bother with iKVM on SPARC servers because it is trivial to SSH in and use the serial console. (The Sun SPARC servers I have managed, never used GUIs on the console, like X-Windows loginā€¦)

Now the Sun x64 servers are another story. The one I owned, (bought with employee discount even though I was a contractor and told them soā€¦), did have a screwy IPMI. I donā€™t remember the details because I got rid of it more than 10 years ago. (It was getting long in the toothā€¦) And the ones I managed at work did have issues with consoles, because, well x64, BIOS and such.

Pretty similar for Fujitsu iRMC. Nice web UI but no VGA console without an extra paid license. So i do not know if that is Java or not. But serial console over IP via SSH or telnet included in the base package - perfect for me. Serial console FTW!

1 Like