SMART Test Passed with errors. Time to swap out disk?

Please see the image below. I have read contradictory information on if these SMART errors are anything to worry about. I do have a spare disk or two, but I would appreciate some professional insight. Does 9 raw read errors and 1 multi zone error on the Multi_Report summary constitute a requirement to change the disk?

Thanks for your insights

IMO, no, it doesn’t constitute a requirement to replace the disk. However, the SMART test failure does.

If the disk were passing the SMART self-tests, IMO, single digits in other error counters are a reason to keep an eye on the disk, but don’t require replacing it ASAP. If they go past single digits, I’m probably replacing it. But if it’s failing the SMART self-tests–which your disk is–it’s time to replace it.

3 Likes

Thank for your response Dan. Still a bit confused. It did actually pass as far as I can see in the image above? It just comes up with the 9 raw read and 1 multi-zone errors.

Well, you haven’t included the column headings, so it’s hard to know what I’m looking at–but “completed–read failure” (the one with the red background) is a failed SMART self-test.

My bad. Here is the table with the headings in tact. Is your opinion still the same Dan? Thanks mate.

Yes–it failed its last SMART self-test. Time to replace.

1 Like

Thank you mate

@joeschmuck what is SMART Status? And why is it green if the last test failed?

Yes, this was my confusion. I am in the process of resilvering the replacement disk but academically it would be nice to know for future considerations.

It is green because the SMART Status being reported by the drive is “Passed”. This is basically a Power On Self-test, the absolute minimum to pass and includes a minor read operation but nothing fancy. That is why I have some things color coded, drags your eye right to it when it is RED.

@Okedokey Scroll down in the report, look for drive “sde” and look for “Most recent Short and Extended Tests” and you likely will have more information about the drive failure. Any time the drive cannot complete the extended or short test, it is a failure and you should be looking for another drive. That drive does have a few hours on it.

1 Like

Thanks for the feedback. Interested to hear your opinions on the additional data here noting that I have only had the TrueNAS setup for a few weeks:

But you’ve had this drive powered for 4 years and 9 months and it’s showing its age. Time to replace ahead of a full failure.

I already have. I’m just requesting a review of the data as it says it has passed and there are nuances to the information contained that I am not overly aware of.

Ignore that statement, you have to look at:

  1. The long test: if it fails, you change the drive;
  2. The data: if important parameters like errors or pending sectors starts to accumulate, you change the drive.

Personally I would run another long test: if it fails again the drive learns to fly; if it completes without errors you can continue using it while keeping in mind that’s on its last leg. Prepare accordingly.

That’s the thing. It says it passed in both the tables above that I have provided. I have changed the disk already as mentioned a couple of times, but out of all the information TrueNAS provides I find the SMART testing the most opaque in terms of making decisions.

As written, ignore the line that says it passed on top and look at the long test on the bottom.
Then you look at the data.

Any read or write “failure” in SMART long test (not to mention the short test) means that the drive goes on RMA if under warranty, and to the recycle bin if not.

3 Likes

Read failure at 10%

The drive is done. You replace it, then zero it (wipe with zeros)

If you can get someone else to cover the replacement cost, more power to you.

Otherwise the drive goes in the bin

2 Likes

Thanks mate. Yep it was replaced this morning. Just wanted to understand the report. Thanks again.

Here is a full diagnosis you you can understand any important non-zero values, but as everyone has told you, replace the drive.

SMART overall-health = PASSED, which I explained above, do not assume this value means anything other than the drive electronics are working, that is the safe way to think of it.

  1. 1 Raw_Read_Rate_Error = 9, Non-Seagate drives should remain at zero and Seagate drives will appear to be some crazy number. 9 means the drive did not access the intended data location nine times. This value can go up or down as it is an evaluation of errors over a period of time.
  2. 9 Power_On_Hours = 41076, meaning the drive has had power applied for 4.68 years worth of time.
  3. 200 Multi_Zone_Error_Rate = 1, This is not always a significant factor and this value can go up or down as it is an evaluation of errors over a period of time.
  4. #1 Extended offline Completed: read failure 10% 41070 5634345992 means that your drive could not complete the self-test due to a failure to read a portion of the drive platter(s). There is 10% remaining of the test, it occured at hour 41070, and that long number is the LBA (sector) that failed to be read.

And Extended self-test reads all the drive sectors whereas a Short test reads an inner track, a middle track, and an outer track, and it typically lasts almost 2 minutes no matter who’s drive it is. This could be be different based on the drive make/model but it provided you an idea what is happening.

2 Likes