Please see the image below. I have read contradictory information on if these SMART errors are anything to worry about. I do have a spare disk or two, but I would appreciate some professional insight. Does 9 raw read errors and 1 multi zone error on the Multi_Report summary constitute a requirement to change the disk?
IMO, no, it doesn’t constitute a requirement to replace the disk. However, the SMART test failure does.
If the disk were passing the SMART self-tests, IMO, single digits in other error counters are a reason to keep an eye on the disk, but don’t require replacing it ASAP. If they go past single digits, I’m probably replacing it. But if it’s failing the SMART self-tests–which your disk is–it’s time to replace it.
Thank for your response Dan. Still a bit confused. It did actually pass as far as I can see in the image above? It just comes up with the 9 raw read and 1 multi-zone errors.
Well, you haven’t included the column headings, so it’s hard to know what I’m looking at–but “completed–read failure” (the one with the red background) is a failed SMART self-test.
Yes, this was my confusion. I am in the process of resilvering the replacement disk but academically it would be nice to know for future considerations.
It is green because the SMART Status being reported by the drive is “Passed”. This is basically a Power On Self-test, the absolute minimum to pass and includes a minor read operation but nothing fancy. That is why I have some things color coded, drags your eye right to it when it is RED.
@Okedokey Scroll down in the report, look for drive “sde” and look for “Most recent Short and Extended Tests” and you likely will have more information about the drive failure. Any time the drive cannot complete the extended or short test, it is a failure and you should be looking for another drive. That drive does have a few hours on it.
I already have. I’m just requesting a review of the data as it says it has passed and there are nuances to the information contained that I am not overly aware of.
The data: if important parameters like errors or pending sectors starts to accumulate, you change the drive.
Personally I would run another long test: if it fails again the drive learns to fly; if it completes without errors you can continue using it while keeping in mind that’s on its last leg. Prepare accordingly.
That’s the thing. It says it passed in both the tables above that I have provided. I have changed the disk already as mentioned a couple of times, but out of all the information TrueNAS provides I find the SMART testing the most opaque in terms of making decisions.
Any read or write “failure” in SMART long test (not to mention the short test) means that the drive goes on RMA if under warranty, and to the recycle bin if not.
Here is a full diagnosis you you can understand any important non-zero values, but as everyone has told you, replace the drive.
SMART overall-health = PASSED, which I explained above, do not assume this value means anything other than the drive electronics are working, that is the safe way to think of it.
1 Raw_Read_Rate_Error = 9, Non-Seagate drives should remain at zero and Seagate drives will appear to be some crazy number. 9 means the drive did not access the intended data location nine times. This value can go up or down as it is an evaluation of errors over a period of time.
9 Power_On_Hours = 41076, meaning the drive has had power applied for 4.68 years worth of time.
200 Multi_Zone_Error_Rate = 1, This is not always a significant factor and this value can go up or down as it is an evaluation of errors over a period of time.
#1 Extended offline Completed: read failure 10% 41070 5634345992 means that your drive could not complete the self-test due to a failure to read a portion of the drive platter(s). There is 10% remaining of the test, it occured at hour 41070, and that long number is the LBA (sector) that failed to be read.
And Extended self-test reads all the drive sectors whereas a Short test reads an inner track, a middle track, and an outer track, and it typically lasts almost 2 minutes no matter who’s drive it is. This could be be different based on the drive make/model but it provided you an idea what is happening.