2 out of 4 Disks have SMART failure

ChrisChros · March 12, 2025, 3:54pm

Hello together,

during the winter, I have upgraded the Hardware of my NAS. I chose four recertified Seagate Exos X16 as hard drives in a 1x RAID Z2.
2 weeks ago, the first HDD showed some SMART errors, so decided to send the drive back to the seller. Now today the second HDD shows also some SMART failure too, so I disconnected the drive as well. So 2 out of 4 disks have problems.
Now my redundancy is gone and I would like to “stop” my dataset to secure my files. Is there any special procedure to follow?

Thanks for any advice,
Regards Chris

Jorsher · March 12, 2025, 4:16pm

Which vendor?

I’m using tens of recertified. Some have had a single-digit number of bad sectors, I pull and replace while running a full test to reallocate the bad sectors. Some have gone back in production without any additional errors after months. This is for data I can replace if lost, so not something I’m suggesting.

Just export the pool? That will disconnect it from further use. You can re-import later.

ChrisChros · March 12, 2025, 5:04pm

the drives are from a german vendor, mindfactory. I am already in connection with them. My first drive is since 1,5 weeks in there shop, but nothing so far.

Thanks for the suggestion with disconnecting the dataset.

Jorsher · March 12, 2025, 5:09pm

See if you’re affected by this:

It sounds like yours weren’t sold as new, but…

Were your drives ‘manufacturer recertified’ or does the vendor ‘recertify’ them. It’s all luck, but 2 out of 4 is a very poor ratio compared to the luck I’ve had with manufacturer recertified drives.

ChrisChros · March 12, 2025, 5:20pm

Thanks for the information.

I don’t know if the vendor has recertified them or the OEM. In their shop they are names as " Seagate Factory Recertified", so I assume they are recertified by the OEM.

I have compared already the SMART and FARM values, and for me they looks okey. Also the operating hours of the heads were unremarkable and fits with the operating hours of the drive.

My last 5x 2TB WD Red drives were also recertified drives and have survived 5 years without any failure.

Jorsher · March 12, 2025, 5:24pm

I’d assume the same from ‘factory recertified.’ Maybe just bad luck, then.

How many errors did you get? I don’t recommend this for important data, but if I get <10 and the numbers don’t increase, I just keep using them.

If you have another pool or drive you can copy the data to, maybe make a copy then disconnect the pool until you can restore parity.

ChrisChros · March 12, 2025, 5:29pm

the first drive had 278 Currently unreadable (pending) sectors and the second one now 178.

Jorsher · March 12, 2025, 5:31pm

Ouch. Yeah I wouldn’t trust those for continued use.

ChrisChros · March 12, 2025, 5:35pm

I have a second NVMe dataset, but the capacity is not big enough. So I have now disconnected them and will wait till the first drive is delivered, so I can start a resilver task.

georgelza · March 12, 2025, 7:35pm

Not sure how you got to that.

I bought 5 Ironwolf drives originally, 1 outright failed, and was refunded…
we now back to 5… out of those 5. the one I bought from a different vendor from the original 5 is all good still and the SMART and FARM numbers match up, the other 4, from the original purchase, well, all 4 is failing, have failed,

after the initial 5, I bought 2 replacement ironwolfs (1 as per above) and a 1 x exos.

the Exos I havent even introduced into a disk pool as I can see the discrepancy in the FARM log.

All but that 1 have issues…

Chris, suggest you run smartctl -l farm /dev/sdX on the various drives and see if the power on number and the power cycle numbers make sense, lign up.

G

Jorsher · March 12, 2025, 7:42pm

Not sure how I got to what?

Stux · March 12, 2025, 8:39pm

Reconnect the disk.

A faulty disk that has some faulty sectors is better than no disk.

Then get the disks replaced.

I personally obtain a replacement disk before sending an in use disk back for RMA… (this may involve purchasing a “spare” disk.

ChrisChros · March 12, 2025, 9:07pm

I have already checked these informations and they look valid for me.

ChrisChros · March 12, 2025, 9:10pm

to late, the second disk is already on the way to the vendor. The last 2 disks are now disconnected and down till I have a new disk.

Jorsher · March 12, 2025, 9:13pm

It’s too late now, but I wouldn’t recommend doing that. As you know, if one more drive fails, you will lose everything. With a ‘erroring’ drive, it may last long enough to rebuild.

Anyway, hope it all works out.

Protopia · March 12, 2025, 9:50pm

Yes, there is a defined procedure for this - it is called “Shutdown & power-off”.