HGST HC520 Failed SMART usage Attribute: 45, what is it?

Good morning.

Today I woke up to the following emails from my NAS:

Device: /dev/sdb [SAT], FAILED SMART self-check. BACK UP DATA NOW!.

and

Device: /dev/sdb [SAT], Failed SMART usage Attribute: 45 Unknown_Attribute..

However, after a hour or so from happening I go to work and remotely smartctl -a the disk, to this:

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     HUH721212ALE601
Serial Number:    {REDACTED}
LU WWN Device Id: 5 000cca {REDACTED}
Firmware Version: LEGL0002
User Capacity:    12,000,138,625,024 bytes [12.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5625
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Nov 25 08:37:42 2024 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (   87) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   1) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   132   132   054    Pre-fail  Offline      -       96
  3 Spin_Up_Time            0x0007   197   197   024    Pre-fail  Always       -       352 (Average 315)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       113
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   140   140   020    Pre-fail  Offline      -       15
  9 Power_On_Hours          0x0012   096   096   000    Old_age   Always       -       32812
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43
 22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100
 45 Unknown_Attribute       0x0023   050   050   001    Pre-fail  Always       -       64441221375
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       647
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       647
194 Temperature_Celsius     0x0002   136   136   000    Old_age   Always       -       44 (Min/Max 21/60)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
231 Temperature_Celsius     0x0032   100   100   000    Old_age   Always       -       0
241 Total_LBAs_Written      0x0012   100   100   000    Old_age   Always       -       3292001523016
242 Total_LBAs_Read         0x0012   100   100   000    Old_age   Always       -       4423018296855

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     29987         -
# 2  Vendor (0x70)       Completed without error       00%     29926         -
# 3  Vendor (0x71)       Completed without error       00%     29926         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

Ok, 22 is helium levels… What is 45 tho? And why it is not failing anymore?

Should I replace it? Pool is 8wide raidz-2 and never got degraded. Seems like it went to normal levels on its own.

All disks are used enterprise disks, HC520.

Other disks show this:

 45 Unknown_Attribute       0x0023   100   100   001    Pre-fail  Always       -       1095233372415

050 vs 100 :thinking:

Short SMART tests are of limited use. Run a long test
sudo smartctl -t long /dev/sdb
and then check
sudo smartctl -x /dev/sdb

To know what this means, contact HGST technical support. But any error should be a ground for RMA.

Here it goes

wailord% sudo smartctl -x /dev/sdb                                                                                                                                                                                                   
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)                                                                                                                                                 
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org                                                                                                                                                          
                                                                                                                                                                                                                                     
=== START OF INFORMATION SECTION ===                                                                                                                                                                                                 
Device Model:     HUH721212ALE601                                                                                                                                                                                                    
Serial Number:    {REDACTED}                                                                                                                                                                                                           
LU WWN Device Id: 5 000cca {REDACTED}                                                                                                                                                                                                 
Firmware Version: LEGL0002                                                                                                                                                                                                           
User Capacity:    12,000,138,625,024 bytes [12.0 TB]                                                                                                                                                                                 
Sector Sizes:     512 bytes logical, 4096 bytes physical                                                                                                                                                                             
Rotation Rate:    7200 rpm                                                                                                                                                                                                           
Form Factor:      3.5 inches                                                                                                                                                                                                         
Device is:        Not in smartctl database 7.3/5625                                                                                                                                                                                  
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4                                                                                                                                                                              
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)                                                                                                                                                                             
Local Time is:    Mon Nov 25 09:41:53 2024 CET                                                                                                                                                                                       
SMART support is: Available - device has SMART capability.                                                                                                                                                                           
SMART support is: Enabled                                                                                                                                                                                                            
AAM feature is:   Unavailable                                                                                                                                                                                                        
APM feature is:   Disabled                                                                                                                                                                                                           
Rd look-ahead is: Enabled                                                                                                                                                                                                            
Write cache is:   Enabled                                                                                                                                                                                                            
DSN feature is:   Unavailable                                                                                                                                                                                                        
ATA Security is:  Disabled, NOT FROZEN [SEC1], Master PW ID: 0xfffd                                                                                                                                                                  
Wt Cache Reorder: Enabled                                                                                                                                                                                                            
                                                                                                                                                                                                                                     
=== START OF READ SMART DATA SECTION ===                                                                                                                                                                                             
SMART overall-health self-assessment test result: PASSED                                                                                                                                                                             
                                                                                                                                                                                                                                     
General SMART Values:                                                                                                                                                                                                                
Offline data collection status:  (0x84) Offline data collection activity                                                                                                                                                             
                                        was suspended by an interrupting command from host.                                                                                                                                          
                                        Auto Offline Data Collection: Enabled.                                                                                                                                                       
Self-test execution status:      ( 249) Self-test routine in progress...                                                                                                                                                             
                                        90% of test remaining.                                                                                                                                                                       
Total time to complete Offline                                                                                                                                                                                                       
data collection:                (   87) seconds.
Offline data collection                                                                                                                                                                                                     [162/214]
capabilities:                    (0x5b) SMART execute Offline immediate.                                                                                                                                                             
                                        Auto Offline data collection on/off support.                                                                                                                                                 
                                        Suspend Offline collection upon new                                                                                                                                                          
                                        command.                                                                                                                                                                                     
                                        Offline surface scan supported.                                                                                                                                                              
                                        Self-test supported.                                                                                                                                                                         
                                        No Conveyance Self-test supported.                                                                                                                                                           
                                        Selective Self-test supported.                                                                                                                                                               
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   1) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE                                                                                                                                                      [137/214]
  1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
  2 Throughput_Performance  P-S---   132   132   054    -    96
  3 Spin_Up_Time            POS---   197   197   024    -    352 (Average 315)
  4 Start_Stop_Count        -O--C-   100   100   000    -    113
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
  7 Seek_Error_Rate         PO-R--   100   100   067    -    0
  8 Seek_Time_Performance   P-S---   140   140   020    -    15
  9 Power_On_Hours          -O--C-   096   096   000    -    32813
 10 Spin_Retry_Count        PO--C-   100   100   060    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    43
 22 Helium_Level            PO---K   100   100   025    -    100
 45 Unknown_Attribute       PO---K   050   050   001    -    64441221375
192 Power-Off_Retract_Count -O--CK   100   100   000    -    647
193 Load_Cycle_Count        -O--C-   100   100   000    -    647
194 Temperature_Celsius     -O----   139   139   000    -    43 (Min/Max 21/60)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0
231 Temperature_Celsius     -O--CK   100   100   000    -    0
241 Total_LBAs_Written      -O--C-   100   100   000    -    3292001882408
242 Total_LBAs_Read         -O--C-   100   100   000    -    4424118828991
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      1  Comprehensive SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL     R/O    256  Device Statistics log
0x04       SL      R/O    255  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x0c       GPL     R/O   5501  Pending Defects log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x12       GPL     R/O      1  SATA NCQ Non-Data log
0x13       GPL     R/O      1  SATA NCQ Send and Receive log
0x15       GPL     R/W      1  Rebuild Assist log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x24       GPL     R/O    256  Current Device Internal Status Data log
0x25       GPL     R/O    256  Saved Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80       GPL     R/W    688  Host vendor specific log
0x81-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xb2       GPL     VS     688  Device vendor specific log 
0xc8       GPL     VS      12  Device vendor specific log 
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     29987         -
# 2  Vendor (0x70)       Completed without error       00%     29926         -
# 3  Vendor (0x71)       Completed without error       00%     29926         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       256 (0x0100)
Device State:                        DST executing in background (3)
Current Temperature:                    43 Celsius
Power Cycle Min/Max Temperature:     21/45 Celsius
Lifetime    Min/Max Temperature:     21/60 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -40/70 Celsius
Temperature History Size (Index):    128 (115)

Index    Estimated Time   Temperature Celsius
 116    2024-11-25 07:34    43  ************************
 ...    ..( 35 skipped).    ..  ************************
  24    2024-11-25 08:10    43  ************************
  25    2024-11-25 08:11    44  ************************* 
  26    2024-11-25 08:12    43  ************************
  27    2024-11-25 08:13    44  ************************* 
  28    2024-11-25 08:14    43  ************************
  29    2024-11-25 08:15    43  ************************
  30    2024-11-25 08:16    44  ************************* 
  31    2024-11-25 08:17    43  ************************
  32    2024-11-25 08:18    43  ************************
  33    2024-11-25 08:19    43  ************************
  34    2024-11-25 08:20    44  ************************* 
 ...    ..(  2 skipped).    ..  ************************* 
  37    2024-11-25 08:23    44  ************************* 
  38    2024-11-25 08:24    43  ************************
  39    2024-11-25 08:25    44  ************************* 
 ...    ..( 20 skipped).    ..  ************************* 
  60    2024-11-25 08:46    44  ************************* 
  61    2024-11-25 08:47    43  ************************
 ...    ..( 53 skipped).    ..  ************************
 115    2024-11-25 09:41    43  ************************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled


Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4              43  ---  Lifetime Power-On Resets
0x01  0x010  4           32813  ---  Power-on Hours
0x01  0x018  6   3292001882408  ---  Logical Sectors Written
0x01  0x020  6      5740448208  ---  Number of Write Commands
0x01  0x028  6   4424118828991  ---  Logical Sectors Read 
0x01  0x030  6      6665212411  ---  Number of Read Commands
0x01  0x038  6    118129181600  ---  Date and Time TimeStamp
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4           32766  ---  Spindle Motor Power-on Hours
0x03  0x010  4           32766  ---  Head Flying Hours
0x03  0x018  4             647  ---  Head Load Events
0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
0x03  0x028  4           26381  ---  Read Recovery Attempts
0x03  0x030  4               0  ---  Number of Mechanical Start Failures
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               0  ---  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              43  ---  Current Temperature
0x05  0x010  1              43  N--  Average Short Term Temperature
0x05  0x018  1              40  N--  Average Long Term Temperature
0x05  0x020  1              60  ---  Highest Temperature
0x05  0x028  1              21  ---  Lowest Temperature
0x05  0x030  1              58  N--  Highest Average Short Term Temperature
0x05  0x038  1              25  N--  Lowest Average Short Term Temperature
0x05  0x040  1              48  N--  Highest Average Long Term Temperature
0x05  0x048  1              25  N--  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4             154  ---  Number of Hardware Resets
0x06  0x010  4              81  ---  Number of ASR Events 
0x06  0x018  4               0  ---  Number of Interface CRC Errors
0xff  =====  =               =  ===  == Vendor Specific Statistics (rev 1) ==
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c)
No Defects Logged

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            4  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            5  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS

I see nothing conclusing here. Maybe after the long SMART test?

80% remaining on the long test since I launched it last time I posted :upside_down_face: I guess we’ll have to wait