SMART worst value dropped from 253 to 200 after burn in, while raw value remains at 0

So I have just resilvered a WD Red EFRX 4 TB HDD. Is it normal that the Raw_Read_Error_Rate, UDMA_CRC_Error_Count and Multi_Zone_Error_Rate values drop from 253 when the drive is brand new to 200 after burn in with badblocks and long SMART tests with no errors recorded? Raw_Value remains at 0. How should I understand this drop?

When new after short and conveyance SMART tests:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   253   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   100   253   021    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       2
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       0
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       1
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       2
194 Temperature_Celsius     0x0022   117   112   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   253   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

After long SMART, Badblocks, another long SMART and resilvering:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   100   253   021    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       3
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       99
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       2
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       6
194 Temperature_Celsius     0x0022   111   110   000    Old_age   Always       -       36
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

You are looking at “Normalized” values by the manufacturer. While these can help during troubleshooting, they are not relevant in most cases.

Error Rates change as the drive gets use. It can go up and come back down. you can Google what these values mean but it is something like 1 error per 100000000 operations, just a rough example as each manufacturer does what they want.

Pay attention to the RAW values for a Hard Drive.

2 Likes

Looking closer at the values for another disk whose replacement is currently burning in, I do not understand why the read failures in the unsuccessful Extended tests do not trigger Raw_Read_Error_Rate to go up? Or maybe that is what triggered the 1 in Multi_Zone_Error_Rate?

And just as a point of curiosity: what might have happened at LBA 1881998288? Weak magnetism? Scratches from write head?

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   183   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   172   154   021    Pre-fail  Always       -       6358
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       115
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   021   021   000    Old_age   Always       -       58298
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       115
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       21
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       217
194 Temperature_Celsius     0x0022   112   101   000    Old_age   Always       -       38
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     58193         -
# 2  Extended offline    Completed: read failure       10%     58089         1881998288
# 3  Short offline       Completed without error       00%     57451         -
# 4  Short offline       Completed without error       00%     57283         -
# 5  Extended offline    Completed: read failure       10%     57257         1881998288
# 6  Extended offline    Completed: read failure       10%     56765         1881998288
# 7  Short offline       Completed without error       00%     56373         -
# 8  Short offline       Completed without error       00%     56157         -
# 9  Extended offline    Completed without error       00%     56073         -
#10  Short offline       Completed without error       00%     55989         -
#11  Short offline       Completed without error       00%     55822         -
#12  Extended offline    Completed without error       00%     55737         -
#13  Short offline       Completed without error       00%     55654         -
#14  Short offline       Completed without error       00%     55413         -
#15  Extended offline    Completed without error       00%     55330         -
#16  Short offline       Completed without error       00%     55246         -
#17  Short offline       Completed without error       00%     55078         -
#18  Extended offline    Completed without error       00%     54993         -
#19  Short offline       Completed without error       00%     54912         -
#20  Short offline       Completed without error       00%     54696         -
#21  Extended offline    Completed without error       00%     54612         -

Either way, looks like the drive is failing. It would be a warranty claim but I am guessing that doesn’t apply anymore due to the age of the drive.

My comments back, based off of my experience and the little I know…

The Raw Read Error Rate and other error rates are likely the result of actual data requests. The Extended Self-test is completely an internal function. This is why I suspect that value has not gone up.

With that said, from Seagate:

10^10 to 10^12. The counts are cleared when the Number Of Bits Transferred To the Host = 10^12.

Do a Google search for Normal SATA SMART Attribute Behavior
You will find a white paper from Seagate that describes this stuff. It helps to diagnose problems and what is and isn’t really important data.

Keep in mind that this is how Seagate does it, Western Digital and others may do it differently. I know some things are different.

What I have learned over the decades is, focus on the critical parameters, understand what they mean to the end user, not to the engineer (which I am, but not for hard drive reliability). And don’t “assume” anything, it will bite you in the ass one day.