Need help with reading SMART report

Good day, guys!

Really need help on deciding if I should replace the disk.

The smart report for the disk could be found here admin@athena[~]$ sudo smartctl -x /dev/sdg[sudo] password for admin: smartct - Pastebin.com (smartctl -x /dev/sdg)

For some time the disk gave Extended offline error (x3 times) on 90 per cent completition. After ordering a new disk, I decided to try few things with the disk to “revive” it:

  • tried to offline the disk from pool,
  • format it (long zeroed format)
  • put the disk back in into the pool.

Additionally, swapped the sata cable from MB to the disk.

After that, two of the extended test and one short finished with no errors.

So the question is: should I pay and replace the disk, or cancel the new disk ordering.

Thank you in advance!

If you are running RAIDZ2 then I would continue using the old disk and put the new one in your cupboard just in case…

I agree. The disk itself looks good - zero reallocated sectors - and it looks fine now.

That said, I would personally cancel the order - because putting a new disk in a cupboard means that you are running down the warranty before it has even been used.

@Protopia what could be the reason for such situation?

My only thoughts bad PowerSupply or orico m2 to x6 sata hba.

Where should I start looking in order to find “the answer”?

Have you got a heatsink on the controller chip?

A few issues are discussed in this thread, there’s some good info, pity it devolved and the thread got locked.

I can’t see why a bad SATA cable would cause a problem at a specific LBA, so I guess the “long zeroed format” (??? I have no idea what this is and how you do it - and would be interested to know for the future) probably fixed it.

I would just keep an eye on your regular long test results in case it starts to fail again. (Implement the @joeschmuck Multi-Report script so that you get notified by email whenever there is a problem like that.)

Wiping the disk with “fill with zeroes” will do that, on the disks page.

It can “fix” pending sectors… but I find the fix is normally temporary, once defects begin to grow they seem to continue… generally.

1 Like

No, simple chip with no heatsink.

Seeing this, I’m getting back to the idea of swapping the disk? But the question still stands, why it happend, and why software solution resolved hardware issue (if the disk is really almost broken).

You generally do that using the dd command. But I agree with @Stux that while you can force a sector to be remapped as bad thus looking like the problem is fixed, generally the original issue is caused by the platter surface flaking off. Once it starts, I’ve never seen it stop. I have set my own personal pain level at 9 sectors, once I hit 10 then I am looking to replace the hard drive (note that multi-report critical sector value is = 9 and is user changeable). Some people don’t want a single sector error however that may be a bit too narrow minded. All drives are apt to have some errors, they are just mapped out of sight at the factory. They do a pretty good job of it too. All you will see are any new ones pop up.

@ghostklart please do not use pastebin and link to it. Some folks here are hesitant to click on links. This forum is capable of you posting the same data here and it will still be available years from now where pastebin has a limited life as I have seen when links fail to work.

As for your drive: You of course know that you have a single LBA 7700271576 that has failed 3 times. There is likely more sector errors beyond that point. If it is under warranty, do the RMA before those three failures roll off the log.

Replace that by a proper SAS HBA.

1 Like

So new disk then?

Will swap for pcie x1 asmedia adapter

1 Like

It probably needs a heatsink. You can order a small one (or 16) off eBay for a few dollars and just “glue” it with a bit of thermal heatsink compound (not actually gluing, but will be sufficient)

Symptoms of over heating include random errors, and early failure.

Worth checking the specs and determining which controllers used. ASMedia?

And make sure there’s no port multiplier.

Yes, replace the drive if it is under warranty. If it is not under warranty then keep an eye on it. If you start having sector errors then it is time to spend the money for another drive.

So, as a final result:

Changed Orico ASMedia ASM1166 HBA for PCIe x1 HBA (with the same chipset).

The drive gave another SMART long test error, so i swapped it for a new one.

So far so good for now.

Thank you everyone for the help.

1 Like