Checksum errors, pool state: unhealthy, errors: none

PaulDaisy · July 30, 2025, 3:24pm

My storage status in the GUI indicates says that the pool is not healthy and some data loss has occurred. But the system is not saying what files need to be restored:

truenas_admin@truenas[~]$ sudo zpool status -v

*** other pools with no errors ***

pool: hdd-pool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: Message ID: ZFS-8000-8A — OpenZFS documentation
scan: scrub repaired 0B in 02:33:09 with 4 errors on Thu Jul 17 11:01:32 2025
config:

    NAME                                      STATE     READ WRITE CKSUM
    hdd-pool                                  ONLINE       0     0     0
      raidz1-0                                ONLINE       0     0     0
        1320e715-0888-42eb-b329-6a2766e2683c  ONLINE       0     0     6
        15e39781-a4bf-4a4e-a2e7-f49fa9705f0e  ONLINE       0     0     6
        e718a819-3d7c-4453-a5ba-fb601b64628a  ONLINE       0     0     6

errors: Permanent errors have been detected in the following files:

pool: ssd-pool
state: ONLINE
config: ***

How do I identify and repair the issue?

This issue surfaced when replicating the pool to an archive pool. The replication task indicated that a month old snapshot was corrupted, which I deleted; newer snapshots exist and are supposedly fine. The replication task resumed and is running now.

Thanks!

PK1048 · July 30, 2025, 3:47pm

I would run a scrub and see if the CKSUM error counts change. If they don’t, then run another scrub. If the CKSUM error counts still do not change then the pool is OK and the error was transient. Still, it would be good to identify the cause, but that may not be possible.

Johnny_Fartpants · July 30, 2025, 3:57pm

Can you provide some information on the hardware you are using?

PaulDaisy · July 30, 2025, 10:17pm

I will run scrubs once the replication task completes. It appears to be under way now.

The hardware is:

Ryzen 5 5600G, 32G of non-ECC RAM, ASRock B550M ITX/ac mb, 3x8Tb new, 5/2025 manufacture IronWolf HDD in RAIDZ1, NVMe system drive non redundant, 1x500Gb SSD apps drive (non redundant). All of these are on the motherboard SATA and NVMe.

I just added a PCIe LSI HBA with 2x8Tb archive Seagate drives, new old stock from ServerExhange, SMART data only shows the 4 power cycles I caused. This mirror pool was added specifically to back up the 3x8Tb pool that only has 4.2Tb of data on it.

I know, non-ECC RAM is suspect. That is the next project.