Increasing Checksum Error Count on Disks + Pool Data Errors

While it could very well be a bad wall receptacle or loose bad wiring, There is a possibility that has not been raised as best I can tell.

The disks are connected to an HBA card. The HBA card is overheating under heavy use. When not properly cooled the card may work fine most of the time then partway through some intensive operation, something intensive like a scrub or resilver, the card will overheat and start causing hardware errors.

Most HBA cards require way more air flow across the heatsink than one would think. Even some server chassis have issues cooling them sometimes depending upon layout and airflow patterns…

At least it is something to take a look at when the powr input issue is solved.

The replacement arrived and was installed last night. I successfully scrubbed, cleaned up any errors that came back, and scrubbed again to find that it looks clean!

terrehbyte@truenas:~$ sudo zpool status -v Main
  pool: Main
 state: ONLINE
  scan: scrub repaired 0B in 04:11:49 with 0 errors on Fri Sep 26 09:06:18 2025
config:

        NAME                                      STATE     READ WRITE CKSUM
        Main                                      ONLINE       0     0     0
          raidz1-0                                ONLINE       0     0     0
            925e9ea6-5c01-467f-9609-faa9b1647fba  ONLINE       0     0     0
            643cad25-e3c2-4b98-9093-9ba24adec52a  ONLINE       0     0     0
            c3c6de1a-5af0-4df1-ae09-b0f8148f5d87  ONLINE       0     0     0

errors: No known data errors

I’ll keep checking (and let the scheduled scrubs continue to do their work), but I’m hoping I can call it good for a while and chalking it up to a potentially failing power supply at the moment.

The HBA card is new to this setup, so I’ll see if I can give it more cooling to avoid any heat related issues. I’ll also make a note to check on what the electrical situation looks like, per everyone’s notes above.

2 Likes

put a small fan blowing directly on the heat sink for cooling. They need lots for forced air cooling across the heat sink. The cards are designed for servers with lots of fans screaming at high rpm forcing volumes of air through a chassis and even then placement within the chassis may make a need for added direct cooling.

2 Likes