Scrub task stuck

Server got into errors again while the SMART long tests were still at 10%. I was (re-)starting some apps as well at the time, so I can’t say for sure what caused it this time around but I doubt it was the apps.

zpool status
root@truenas[/mnt/kea/home/jonas]# zpool status -vL kea
  pool: kea
 state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-JQ
  scan: resilvered 505M in 00:00:25 with 0 errors on Mon Apr 22 22:33:57 2024
config:

        NAME        STATE     READ WRITE CKSUM
        kea         ONLINE       0     0     0
          raidz2-0  ONLINE      21    16     0
            sdk2    ONLINE       3    22     0
            sdl2    ONLINE       3    21     0
            sdi2    ONLINE       3    18     0
            sdj2    ONLINE       3    18     0
            sda2    ONLINE       0     0     0

However, again /sda is the only drive without errors. The long test on /sda also kept running along, while those for the other drives stopped.

This made me think that maybe it’s also a power issue after all and /sda happens to have it’s own 12V rail while the others are all on one. This indeed happened to be the case. I’ve distributed power to the HDDs to 2+2+1 now (with the SSDs on the fourth remaining one).

I then started another scrub task and within 15 minutes or so got the same issue again with UUID 39d9d498-4434-4de2-8561-fb77b95bf4f0 supposedly having been removed and re-appearing on its own seconds later. This also cancelled the scrub task, so the one last night didn’t actually complete either.

This is the same drive that has the G-Sense_Error_Rate of 1, maybe it really has an issue? A SMART short test on it completes fine, I now started another long one just on this drive.
I could also try another cable to the HBA but my trust in this drive is fading.

What I don’t understand is why I always get errors on (the same) four drives. Shouldn’t it be just this single one if it were the sole culprit?


Independent of that, is my understanding correct that this is how I can update my HBA’s firmware? Do I need the BIOS part as well?

wget https://www.supermicro.com/wdl/driver/SAS/Broadcom/3008/Firmware/3008_FW_PH16.00.14.00.rar
7z t 3008_FW_PH16.00.14.00.rar
7z x 3008_FW_PH16.00.14.00.rar
cd IT/UEFI
sas3flash -l 2024-04-23_firmware_update.txt -o -f SAS9300_8i_IT.bin