ECC Problems, dmidecode -t shows Error Correction Type: None

My Hardware is:

OS: ElectricEel-24.10.0.2 (Truenas Scale)
CPU: Intel Xeon E5 2678
Board: X10SRH-CLN4F
With ECC RAM

On checking dmesg. It shows this:

[   15.728478] EDAC sbridge: CPU SrcID #0, Ha #0, Channel #0 has DIMMs, but ECC is disabled

On dmidecode -t memory | grep -i “error correction”
It shows:

        Error Correction Type: None
        Error Correction Type: None

On dmidecode -t memory | grep -i “total width”

It shows 72 bits. Which means the RAM is definitely ECC

I went into supermicro BIOS. Checked something called “Patrol Scrub” Which is ECC enable/disable → And its enabled.

Had done a full memtest earlier and it does show ECC in RAM Info.

Everything points to ECC memory being present, but the OS is not detecting it.

Any suggestions?

Should I consider re-installing the OS?

There is also an ‘Enable ECC’ Option in the Supermicro BIOS with options of Auto, Disable, Enabled.

What is this currently set to, and which specific DIMM’s are you using?
What does the memory section of IPMI report?

@Micromecca thanks for your help
I went through the manual. But the BIOS has no option that says “ECC”.
The manual says it does, but the BIOS does not show that option.

The DIMM’s are a samsung DDR4.
Is it posible that we the supplier messed up and sent NON-ECC RAM?
Despite truenas reporting 72 bits?

I’ll boot into memtest check it again. Maybe I am confusing the previously reported memtest with another system.

DMIDECODE shows this

Array Handle: 0x0032
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 32 GB
        Form Factor: DIMM
        Set: None
        Locator: DIMMD1
        Bank Locator: P0_Node0_Channel3_Dimm0
        Type: DDR4
        Type Detail: Synchronous
        Speed: 3200 MT/s
        Manufacturer: Samsung
        Serial Number: XXXXXXXX (redacted)
        Asset Tag: DIMMD1_AssetTag (date:23/13)
        Part Number:
        Rank: 2
        Configured Memory Speed: 1866 MT/s
        Minimum Voltage: Unknown
        Maximum Voltage: Unknown
        Configured Voltage: Unknown

Take a look in the IPMI and see what’s reported in there. This should at least indicate whether the board is detecting the ECC functions correctly.

System > Component Info > Memory

Which BIOS version is currently present?

Afaik, this should mean that ram are ecc but are operating in no-ecc mode… Considering your motherboard (no-ecc arent neither mentioned) this seems really really odd to me.
In your place i would check if a BIOS update Is available and try It (be aware of the risks).

Must be some weird settings hidden in some obscure menu.

On my desktop (which is however a Zen 4), the ECC option is hidden in the advanced and kinda obscured custom BIOS settings (AMD CBS), set to “Auto” by default, which however means disabled for god knows what reason. If it’s not specifically enabled, I get your situation.

Probably something similar.

There are no details whatsoever that the IPMI gives

The BIOS version is 3.4 and firmware is 03.93

Both are latest versions according to supermicro website

This can explain the discrepancy on manual - BIOS menù.
But not your situation!

Does the output of sudo dmidecode -t bios show anything relevant?
Also sudo dmidecode -t 17 should give more info about RAM model

After going through every possible software config, we realized that the vendor messed up (and so did I). They sent non ECC RAM.

Got it sorted out, I got a replacement for all 4 modules with ECC RAM. Now the dmidecode, truenas dashboard everything shows ECC.

My sincere apologies :crying_cat_face:

4 Likes