ECC Check: active or not?

Hi all, i’m a bit confused about my system is running or not “in ECC mode”.
HW Recap:

This is the output of dmidecode -t memory

dmidecode -t memory
# dmidecode 3.5
Scanning /dev/mem for entry point.
SMBIOS 3.0.0 present.

Handle 0x0014, DMI type 16, 23 bytes
Physical Memory Array
        Location: System Board Or Motherboard
        Use: System Memory
        Error Correction Type: Single-bit ECC
        Maximum Capacity: 64 GB
        Error Information Handle: Not Provided
        Number Of Devices: 4

Handle 0x0015, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x0014
        Error Information Handle: Not Provided
        Total Width: Unknown
        Data Width: Unknown
        Size: No Module Installed
        Form Factor: Unknown
        Set: None
        Locator: DIMM CHA3
        Bank Locator: BANK 0
        Type: Unknown
        Type Detail: None

Handle 0x0016, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x0014
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 16 GB
        Form Factor: DIMM
        Set: None
        Locator: DIMM CHA1
        Bank Locator: BANK 1
        Type: DDR4
        Type Detail: Synchronous Unbuffered (Unregistered)
        Speed: 2400 MT/s
        Manufacturer: Samsung
        Serial Number: 39BDA523
        Asset Tag: 1825
        Part Number: M391A2K43BB1-CRC
        Rank: 2
        Configured Memory Speed: 2400 MT/s
        Minimum Voltage: 1.2 V
        Maximum Voltage: 1.2 V
        Configured Voltage: 1.2 V

Handle 0x0017, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x0014
        Error Information Handle: Not Provided
        Total Width: Unknown
        Data Width: Unknown
        Size: No Module Installed
        Form Factor: Unknown
        Set: None
        Locator: DIMM CHB4
        Bank Locator: BANK 2
        Type: Unknown
        Type Detail: None

Handle 0x0018, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x0014
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 16 GB
        Form Factor: DIMM
        Set: None
        Locator: DIMM CHB2
        Bank Locator: BANK 3
        Type: DDR4
        Type Detail: Synchronous Unbuffered (Unregistered)
        Speed: 2400 MT/s
        Manufacturer: Samsung
        Serial Number: 39BDA9C5
        Asset Tag: 1825
        Part Number: M391A2K43BB1-CRC
        Rank: 2
        Configured Memory Speed: 2400 MT/s
        Minimum Voltage: 1.2 V
        Maximum Voltage: 1.2 V
        Configured Voltage: 1.2 V

Searching info on old forum, i found an old script that i (and chat gpt :stuck_out_tongue_winking_eye:) update to work in python3

ecc_check.py
#!/usr/bin/env python3

import mmap
import struct
import os

MEM_START = 0xFED10000
FILESIZE = 32*1024

# Script must be run as root
if os.geteuid() != 0:
    print("You must be root or root-like to run this script.")
else:
    try:
        with open("/dev/mem", "r+b") as f:
            mem = mmap.mmap(f.fileno(), FILESIZE, prot=mmap.PROT_READ, offset=MEM_START)

            print("5004-5007h:", end=' ')
            for i in range(0x5004, 0x5004 + 4):
                print(f"{struct.unpack('B', mem[i:i+1])[0]:x}", end=' ')

            print("\n5008-500Bh:", end=' ')
            for i in range(0x5008, 0x5008 + 4):
                print(f"{struct.unpack('B', mem[i:i+1])[0]:x}", end=' ')

            print("")
            mem.close()
    except Exception as e:
        print(f"An error occurred: {e}")

but the output is not what i expected (show 0 instead of 3)

5004-5007h: 11 31 0 0
5008-500Bh: 11 31 0 0

Truenas itself show in dashboard

31.7GiB total available (ECC)

What’s happening?

Looks like ECC. You may try MemTest86, and possibly buy the Pro version to have a go at ECC error injection if that helps you feel more confident.

2 Likes

Do note that not all CPUs, nor all Motherboards, support such a feature.

1 Like

I think to have find the solution:
goint into detail about what @etorix say, in my case in addition of Error Correction Type: Single-bit ECC the fact that Total Width and Data Width have respectively 72 bits and 64 bits indicate the additional 8 bits of ECC.
And despite the script i found still “working”, seems not compatbile for CPU after Ivy Bridge gen (Intel withdrew the compatibility of some of its CPUs, creating a stir at the time, and those scripts helped people to understand that).
Seems that the passmark memtest should show if ECC is enabled or not (with the free edition too) and probably i have miss that part when i run it some week ago :sweat_smile: so for sure at next reboot i will check again for double proof.