Hi all,
I had the following problem:
I have a configuration with 6 disks (4+2) and the other day disk2 and disk3 quickly failed.
I replace disk2 and do resilvering, everything is ok.
I replace disk3 and do resilvering, everything is ok.
I start a scrub and around 30% I get the degraded pool warning… but the serial is of the old disk2 on the position of new disk3…
If i reboot, pool is ok (not degraded)
I also carried out an in-depth smart test on both replaced disks and they are ok…
root@truenas[~]# zpool status -v
pool: NAS32TB
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub in progress since Fri Aug 16 16:58:27 2024
14.2T scanned at 956M/s, 12.5T issued at 844M/s, 28.4T total
20.8M repaired, 44.09% done, 05:28:26 to go
config:
NAME STATE READ WRITE CKSUM
NAS32TB DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/2056e36d-5af1-11ed-ac4c-0cc47ad9cf64 ONLINE 0 0 0
gptid/2060cb7e-5af1-11ed-ac4c-0cc47ad9cf64 ONLINE 0 0 0
gptid/9f1b3bdd-58e0-11ef-b131-0cc47ad9cf64 ONLINE 0 0 0
gptid/52f7af41-5944-11ef-b131-0cc47ad9cf64 FAULTED 727 0 0 too many errors
gptid/20398d95-5af1-11ed-ac4c-0cc47ad9cf64 ONLINE 0 0 0
gptid/208ab6cd-5af1-11ed-ac4c-0cc47ad9cf64 ONLINE 0 0 0
errors: No known data errors
pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:29 with 0 errors on Tue Aug 13 03:45:29 2024
config:
Good hardware (provided that the onboard SAS controller is flashed to the appropriate latest firmware), sane pool layout (I was a bit worried by the “4+2”).
Not sure where the “disk2” and “disk3” monikers come from. The order in which drives are listed, as well as derive numbers (da#/ada#) or letters (sdX) may change across reboots.
If the drive with GPTID is new (burnt in?) 52f7af41-5944-11ef-b131-0cc47ad9cf64 has good long SMART tests you may want to check or replace the data cable. But the base hypothesis is that this drive is failing (this may happen even with new drives) and needs to be replaced.
This is going to sound condescending & I don’t mean it to; did you physically validate the serial numbers on the disks to make sure you didn’t put the old one back in by mistake?
Also that would explain the twin failures in succession.
I’ve personally removed the wrong disk because of the “was blah blah” message referring to the “wrong” disk (it’s historical and doesn’t represent the current state necessarily)
Thus it is necessary to verify the serials when pulling the disk.
Impossible. I externally labeled each disk with its serial number.
I also replaced the WDs with Seagates. They are different aesthetically
Ok this answers why I am given the serial number of the old disk… is there a way to update the serial numbers?
Update: now disk2 also gives it to me degraded again… wtf?
Now i’ve tried to replace disk3 and change SATA cable and port on mobo…
Could it be a power supply problem?
They are both drives on the same cable