Hundreds of thousands of checksum errors on one Vdev during scrub

Hello, I am fairly new to TrueNAS, so apologies for any obvious oversights I might make. I just rebuilt my server after upgrading my motherboard and CPU due to PCIe bottlenecks (had 2 8x HBAs plugged into 1x slots) to an ASUS X-99 with a 5930k. After upgrading, they are each plugged into 16x slots. I have one pool with three vdevs in RAIDZ1, each with four drives. I was doing a scrub after importing the pool to make sure all my data was okay, but as soon as I started it, it began throwing thousands of checksum errors on all of the drives of one of the vdevs. The number of errors is exactly the same across all of the drives, and the errors always occur on the third vdev. Using ‘zpool status -v’ gives me permanent errors in only 6 files. Latest output posted below.

I have tried to isolate the problem by making sure each vdev is connected to the HBA using a single 4-way SAS-to-SATA splitter, replacing the cable for the faulty vdev, swapping the cable between vdevs, swapping the ports on the HBAs, and reseating everything, but I still get the same behavior on the same four drives. All drives show good in CrystalDisk info. This leads me to believe it is not an HBA, drive, or cable issue.

I also do not think it may be a power supply issue, as I have upgraded to a 1000W PSU (I also suspected my previous 600W PSU may have been an issue with 12 drives). I have them all plugged into a single cable that goes back to the PSU using splitters. However, I’ve switched the order of the splitters along the PSU cable, and the error still appears on the same vdev, regardless of the cable configuration I use.

My last possible culprit is my RAM. I was going to have 32 GB, but those sticks ended up not being compatible with the new board/CPU, so I am currently running a single 8 GB stick, as that’s all I had in hand that would boot the system.

My question is: given all my troubleshooting, is it likely that this is an insufficient RAM issue? And would increasing it to 32 or even 16 GB resolve it?

Latest output of ’ zpool status Vault -v’:

pool: Vault
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https:/openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub canceled on Fri Jan 9 08:36:09 2026
config:

    NAME                                      STATE     READ WRITE CKSUM
    Vault                                     ONLINE       0     0     0
      raidz1-0                                ONLINE       0     0     0
        09afa170-10a7-4937-a115-751b10581ca7  ONLINE       0     0     0
        3733cdeb-bb96-4855-99e1-73b9c332a578  ONLINE       0     0     0
        87d1a365-9d40-4642-bea8-5507a2dbc495  ONLINE       0     0     0
        671eea0e-75b3-471a-b3d1-c21cf0a43b47  ONLINE       0     0     0
      raidz1-1                                ONLINE       0     0     0
        5390448b-ce52-4c06-8dd3-9755eebec82a  ONLINE       0     0     0
        256e691c-1073-4425-884e-1b58fc8fd466  ONLINE       0     0     0
        3533d64b-d114-404a-9a5e-d376ee09dd4b  ONLINE       0     0     0
        ea5cd5ec-d9e8-4c1e-b31d-925d355ed4c3  ONLINE       0     0     0
      raidz1-2                                ONLINE       0     0     0
        312c2a36-881f-4070-95a5-3939de160972  ONLINE       0     0  625K
        0efdb3af-90cb-4005-8f20-11f088cc6bc4  ONLINE       0     0  625K
        48f1c5aa-bc1f-41b2-a9c4-87ec2bfff401  ONLINE       0     0  625K
        cfe2990f-69f3-46ca-8b27-dea41a9a6bc0  ONLINE       0     0  625K

errors: Permanent errors have been detected in the following files:

    /mnt/Vault/Share/Santiago_Share/Media_Rips/Star Wars- The Rise of Skywalker Bonus Disc/Star Wars- The Rise of Skywalker Bonus Disc_t00.mkv
    /mnt/Vault/Share/Santiago_Share/Media_Rips/Star Wars - The Force Awakens - BLU-RAY/The Force Awakens - BLU-RAY_t02.mkv
    /mnt/Vault/Share/Santiago_Share/Media_Rips/PIRATES1/title_t00.mkv
    /mnt/Vault/Share/Santiago_Share/Media_Rips/PIRATES3/title_t00.mkv
    /mnt/Vault/Share/Santiago_Share/Media_Rips/Star Wars- The Rise of Skywalker Bonus Disc/Star Wars- The Rise of Skywalker Bonus Disc_t01.mkv
    /mnt/Vault/Share/Santiago_Share/Media_Rips/Star Wars - The Force Awakens Bonus Disc - BLU-RAY/The Force Awakens Bonus Disc - BLU-RAY_t00.mkv

Current Hardware Cofiguration:
CPU: i7-5930K
Motherboard: ASUS X-99 Pro / USB 3.1
RAM: 8 GB SkHynix 2666 MHz
Vault Pool: 2x LSI HBA 9207-8i (one with 8x 2 TB HDDs, one with 4x 2 TB HDDs; I know this setup may not be optimal but it is what I have lol)

Did you swap the problem drives / VDEV between the HBAs? Do you have excellent cooling and plenty of air flow over the HBAs?

Yes, I tried keeping all the drives on one SAS connection and moving it to every possible connector on the HBAs.
To be honest probably not enough, I’ve got a couple 40 mm fans strapped to the HBAs but they quite thick so probably getting choked. Already ordered replacements to get better airflow over the heat sinks.
I just find it odd that no matter what cable config I try it the same vdev that gives me issues, even both HBAs are essentially the same.

I doubt it’s RAM related, those kinds of errors would be sprinkled throughout your pool, not just one VDEV getting hammered.

To me, this looks like a HBA problem under strain. Better cooling may help but you may also want to consider getting a new 16-port modern HBA to reduce the cooling needs - ie 9400-16i or like. It sucks because those cards were $50 on eBay just a few months ago and now they’re $100.

Also be sure that both HBAs are flashed to IT mode and are running the latest firmware.

I have a 2116 built into my motherboard and that thing is hairy to update. As others have noted, if the HBA came branded from supermicro, you have to get the updater utility from supermicro, not LSI/Broadcom/whoever it is this week that used to physically make your HBAs.

1 Like

Also, I doubt it’s PSU related. It’s unlikely that your system is running at more than 200W unless you have a monster CPU / GPU in there doing transcoding or whatever. My 8-HDD, 6-SDD system is running at barely over 100W but is also 99% idle.

Power cables and SATA cables are worth investigating. See if anything melted, discolored, or just doesn’t look right. I’ve had SATA cables go bad on me, after working fine for years, leading to billions of checksum errors. Deoxit is an oil I like to apply pretty regularly now to connections that have been balky.

1 Like