Yes, but I mean why TN used the two spares one in mirror-1 (with one drive failed) and the other one in the healty mirror-2 (no drive failures) instead to use one in mirror-0, keeping the vdev online?
It looks like drives failed first in mirror-1 and mirror-2.
Do you know the timing of the failures?
I managed to recover the pool, without using the snapshots, and these are the steps I went through.
Starting conditions:
mirror-0 = I/O suspended and offline, 1 drive failed, 1 sane
mirror-1 = 1 drive failed, 1 sane, spare-1 kicked in
mirror-2 = 2 sane drives, 1 took offline and spare-2 kicked in
Resilvering is going on with expected time to completion 1+ month
1st priority: get mirror-0 accessible to replace the drive
I ran from shell zpool clear, and this allowed the mirror-0 sane disk accessible to be put online again;
With vdev accessible I offlined the mirror-0 faulty drive (Offline > Replace), swap with another one and resilver started.
Resilver of the mirror-0 took 45 minutes.
2nd priority: replace mirror-1 faulty drive
The vdev was accessible (never went offline) so I offlined the mirror-1 faulty drive (Offline > Replace), swap with another one and resilver started.
Resilver of the mirror-1 took 45 minutes.
At the end TN returned spare-1 to his spare role without manual intervention.
On mirror-2 no faulty drives and no changes made.
At the end TN returned spare-2 to his spare role without manual intervention.
Actually the pool is online, no data loss, and no error reported.
Probably better I start looking to other vendors⦠WD? Any suggestion?
Your choice is between Seagate, Toshiba and WDāfull stop. Thereās no reason to exclude one because its SMART reports are less readable; the critical parameters require no decoding.
That was over 12 days ago. I canāt say the drive is good on old (yes even 12 days old) data.
Why? I still have yet to see any proof that the drives have actually failed. Data corruption, yes, but actual failure, nope.
On the drives you plan to replace, you should run a SMART Long test on each. See if they pass or fail. If you really wanted to, run badblocks on each drive (after you have replaced them of course) to determine if they are going into the trash or are now ready to be your cold spares.
Of course if you feel like replacing the drives regardless, that is fine as well, it is just many people here canāt afford to buy new drives when they are not actually bad.
That is EXCELLENT news!!! Welll done.
Thanks, I would be so lucky also the next time when it will happen again ā¦
admin@TNscale20bay[~]$ sudo zpool status SATA-pool
pool: SATA-pool
state: ONLINE
scan: resilvered 330G in 00:39:12 with 0 errors on Sun Dec 1 15:26:47 2024
config:
NAME STATE READ WRITE CKSUM
SATA-pool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c8f6b750-b6e4-49cd-bd6a-900abf16f428 ONLINE 0 0 0
fb142f4c-b40a-487c-9952-bffc8830cebe ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
1e25c7b8-0c84-42a9-83f3-a31699907349 ONLINE 0 0 0
53ceb564-4db0-47b2-92e6-a2b9725ff3a6 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
04281099-6984-4e3a-8a38-a61f2ff04bb3 ONLINE 0 0 0
58d0ad6e-3b1d-4bc9-81a9-6c3f9239b10b ONLINE 0 0 0
spares
09bfcea9-1165-4b6a-8b04-ef0b5607e38d AVAIL
25e2daa8-f839-4cb6-ad95-bf9bed968eaf AVAIL
errors: No known data errors
admin@TNscale20bay[~]$
This time Iāve been lucky!