Pool degraded, ghost of old disk follow me

Colossus86 · August 16, 2024, 6:47pm

Hi all,
I had the following problem:
I have a configuration with 6 disks (4+2) and the other day disk2 and disk3 quickly failed.
I replace disk2 and do resilvering, everything is ok.
I replace disk3 and do resilvering, everything is ok.
I start a scrub and around 30% I get the degraded pool warning… but the serial is of the old disk2 on the position of new disk3…
If i reboot, pool is ok (not degraded)

I also carried out an in-depth smart test on both replaced disks and they are ok…

what can I check?

Thank you

etorix · August 16, 2024, 7:10pm

First, output of zpool status -v, formatted as code please.
Then more details about the hardware, especially controllers, and TrueNAS version.

Colossus86 · August 16, 2024, 7:28pm


root@truenas[~]# zpool status -v
  pool: NAS32TB
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
  scan: scrub in progress since Fri Aug 16 16:58:27 2024
        14.2T scanned at 956M/s, 12.5T issued at 844M/s, 28.4T total
        20.8M repaired, 44.09% done, 05:28:26 to go
config:

        NAME                                            STATE     READ WRITE CKSUM
        NAS32TB                                         DEGRADED     0     0 0
          raidz2-0                                      DEGRADED     0     0 0
            gptid/2056e36d-5af1-11ed-ac4c-0cc47ad9cf64  ONLINE       0     0 0
            gptid/2060cb7e-5af1-11ed-ac4c-0cc47ad9cf64  ONLINE       0     0 0
            gptid/9f1b3bdd-58e0-11ef-b131-0cc47ad9cf64  ONLINE       0     0 0
            gptid/52f7af41-5944-11ef-b131-0cc47ad9cf64  FAULTED    727     0 0  too many errors
            gptid/20398d95-5af1-11ed-ac4c-0cc47ad9cf64  ONLINE       0     0 0
            gptid/208ab6cd-5af1-11ed-ac4c-0cc47ad9cf64  ONLINE       0     0 0

errors: No known data errors

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:29 with 0 errors on Tue Aug 13 03:45:29 2024
config:

Motherboard: Supermicro X10SL7-F
CPU: i3-4170
RAM: 16GB ECC

hard disk was 6x WD80EFZZ, the new 2 are Seagate Exos

etorix · August 16, 2024, 9:28pm

Good hardware (provided that the onboard SAS controller is flashed to the appropriate latest firmware), sane pool layout (I was a bit worried by the “4+2”).

Not sure where the “disk2” and “disk3” monikers come from. The order in which drives are listed, as well as derive numbers (da#/ada#) or letters (sdX) may change across reboots.

If the drive with GPTID is new (burnt in?) 52f7af41-5944-11ef-b131-0cc47ad9cf64 has good long SMART tests you may want to check or replace the data cable. But the base hypothesis is that this drive is failing (this may happen even with new drives) and needs to be replaced.

Fleshmauler · August 17, 2024, 12:31am

This is going to sound condescending & I don’t mean it to; did you physically validate the serial numbers on the disks to make sure you didn’t put the old one back in by mistake?

Stux · August 17, 2024, 2:00am

Also that would explain the twin failures in succession.

I’ve personally removed the wrong disk because of the “was blah blah” message referring to the “wrong” disk (it’s historical and doesn’t represent the current state necessarily)

Thus it is necessary to verify the serials when pulling the disk.

Colossus86 · August 17, 2024, 4:36am

Impossible. I externally labeled each disk with its serial number.
I also replaced the WDs with Seagates. They are different aesthetically

Ok this answers why I am given the serial number of the old disk… is there a way to update the serial numbers?

Update: now disk2 also gives it to me degraded again… wtf?
Now i’ve tried to replace disk3 and change SATA cable and port on mobo…
Could it be a power supply problem?
They are both drives on the same cable

etorix · August 17, 2024, 6:34am

Yes. Having ruled out the obvious, it’s time to consider the (not so) unplausible.

Topic		Replies	Views
Shutdown and Pull the drive TrueNAS General CORE	58	344	August 7, 2025
Pool degraded, 2 drives in removed state TrueNAS General CORE	6	696	October 16, 2024
Replaced a broken disk and having trouble resilvering TrueNAS General CORE	33	595	October 7, 2024
One or more devices has experienced an error resulting in data corruption. Applications may be affected (boot-pool) TrueNAS General CORE , ZFS	11	673	April 8, 2024
Constant State of Disk Degradation TrueNAS General SCALE	9	130	August 31, 2025

Pool degraded, ghost of old disk follow me

Related topics