Degraded pool on two brand new disk

Hi

I have a pool of two 4TB Seagate Ironwolf hard drives (same batch). The pool is set up to mirror across the drives and both showing the status of ‘degraded’. I’ve run S.M.A.R.T tests which highlighted no errors.

The only difference between these two drives and the others in my NAS is I’m using a PCIe SATA card with them. Not sure if that’s relevant or a contributing factor, but I thought I’d mention it.

I’ve tried scrubbing the pool twice as I could see others have had success doing this, but the number of errors increased after each.

Do I have two bad disks here or is there something else I could try before replicating the pool and sending the disks back? Any help or suggestions here would be very helpful.

Thanks

Whats the output of

sudo zpool status -v ?

Hey there. Thanks for the reply. Here’s the output for Storage3, my problem pool:

  pool: Storage3
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 01:49:39 with 83 errors on Thu Oct 24 01:32:19 2024
config:

        NAME                                            STATE     READ WRITE CKSUM
        Storage3                                        DEGRADED     0     0     0
          mirror-0                                      DEGRADED     0     0     0
            gptid/65440e70-6873-11ef-89db-74563cfa323c  DEGRADED     0     0  179K  too many errors
            gptid/654eb3c7-6873-11ef-89db-74563cfa323c  DEGRADED     0     0  179K  too many errors

errors: Permanent errors have been detected in the following files:

        Storage3/UserData@auto-2024-10-23_22-06:/*redacted*
        Storage3/UserData@auto-2024-10-23_22-06:/*redacted*
        Storage3/UserData@auto-2024-10-23_22-06:/*redacted*
        Storage3/UserData@auto-2024-10-23_22-06:/*redacted*
        Storage3/UserData@auto-2024-10-23_22-06:/*redacted*
        Storage3/UserData@auto-2024-10-23_22-06:/*redacted*
        Storage3/UserData@auto-2024-10-23_22-06:/*redacted*
        Storage3/UserData@auto-2024-10-23_22-06:/*redacted*
        [ loads more of these... ]
        /mnt/Storage3/UserData/*redacted*
        /mnt/Storage3/UserData/*redacted*
        /mnt/Storage3/UserData/*redacted*
        /mnt/Storage3/UserData/*redacted*
        /mnt/Storage3/UserData/*redacted*
        /mnt/Storage3/UserData/*redacted*
        [ loads more of these... ]

The same amount of checksum errors on both discs points to a overheating controller or cabling issues.

I your case i suspect the culprit to be your sata controller.

2 Likes

Thanks for getting back much appreciated. I’ve put an order in for a new SATA controller and I’ll switch out the cables too and let you know how I get on

Thanks again

Get a LSI SAS HBA flashed to IT mode instead. E.g. a 9300-8i.

1 Like

Thanks. I’ve ordered a SAS2008 card, it seems there’s quite a lot of documented steps for flashing it to IT mode. All being well, everything should be here tomorrow

3 Likes

Turns out flashing it to IT mode wasn’t as straightforward as I thought it was going to be. Long story short, it’s done, working a treat. Thanks again @Farout for your suggestion and help.

I’ve documented my steps here: Tutorial: Flash LSI 9211-8i with IT Firmware for TrueNAS hopefully it helps someone else in the future

2 Likes

Well done.
You need not even install a shell on the USB thumbdrive, as your motherboard should provide one. You can boot to the UEFI Shell from the BIOS menu, then list your devices (map) and try them one by one until you find your thumbdrive (fs0: and dir to list files; repeat with fs1:, etc.) and its sas2flash or sas3flash utility.