Clarification on: Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors

First, I’ll be preventively replacing the drive, its failing smartctl long at the same LBA (as an aside, is there no way to continue or is that just pointless?). Set up is 3xvdevs and vdevs are in 2-wide mirrors. I’m not hugely concerned for a Home NAS, but it would be a PITA to start from scratch. I have a couple new drives coming in as well which I’ll make sure to do a burn-in properly before adding them into the system/pool.

So, the question, is this error on each time it encounters that sector or just periodically if it’s detected within a smartctl query, or is this truly “new” (I dismissed the first alerts)? Smart isn’t indicating an increase value for #5, #197/#198

FYI, There’s no ZFS (scrub is incoming once the long test across all my drives are done) errors.

Thanks in advance, it’s been awhile since I ran an enthusiast/home NAS/ZFS so just getting back into it

In case anyone is curious (EXOS Drives, CMR):

Device Model:     ST20000NM004E-3HR103
Serial Number:    ZX209FXA
LU WWN Device Id: 5 000c50 0e82f4c66
Firmware Version: SN01
User Capacity:    20,000,588,955,648 bytes [20.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5625
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Dec  9 10:18:22 2024 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Disabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Write SCT (Get) Feature Control Command failed: scsi error aborted command
Wt Cache Reorder: Unknown (SCT Feature Control command failed)

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (  559) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (1676) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   080   064   044    -    111201752
  3 Spin_Up_Time            PO----   092   091   000    -    0
  4 Start_Stop_Count        -O--CK   100   100   020    -    133
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  7 Seek_Error_Rate         POSR--   080   061   045    -    101292906
  9 Power_On_Hours          -O--CK   096   096   000    -    4025
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   100   100   020    -    7
 18 Unknown_Attribute       PO-R--   100   100   050    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
188 Command_Timeout         -O--CK   100   100   000    -    0
190 Airflow_Temperature_Cel -O---K   073   049   000    -    27 (Min/Max 23/32)
192 Power-Off_Retract_Count -O--CK   100   100   000    -    3
193 Load_Cycle_Count        -O--CK   096   096   000    -    8055
194 Temperature_Celsius     -O---K   027   051   000    -    27 (0 20 0 0 0)
197 Current_Pending_Sector  -O--C-   100   100   000    -    1
198 Offline_Uncorrectable   ----C-   100   100   000    -    1
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
200 Multi_Zone_Error_Rate   PO---K   100   100   001    -    0
240 Head_Flying_Hours       ------   100   100   000    -    1241 (184 169 0)
241 Total_LBAs_Written      ------   100   253   000    -    35474834624
242 Total_LBAs_Read         ------   100   253   000    -    114784199762
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
0x04       GPL     R/O    256  Device Statistics log
0x04       SL      R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x0a       GPL     R/W      8  Device Statistics Notification
0x0c       GPL     R/O   2048  Pending Defects log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x13       GPL     R/O      1  SATA NCQ Send and Receive log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x24       GPL     R/O    768  Current Device Internal Status Data log
0x2f       GPL     R/O      1  Set Sector Configuration
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1       GPL,SL  VS     160  Device vendor specific log
0xa2       GPL     VS   16320  Device vendor specific log
0xa4       GPL,SL  VS     160  Device vendor specific log
0xa6       GPL     VS     192  Device vendor specific log
0xa8-0xa9  GPL,SL  VS     136  Device vendor specific log
0xab       GPL     VS       1  Device vendor specific log
0xad       GPL     VS      16  Device vendor specific log
0xb1       GPL,SL  VS     160  Device vendor specific log
0xb4       GPL,SL  VS      16  Device vendor specific log
0xb6       GPL     VS    1920  Device vendor specific log
0xbe-0xbf  GPL     VS   65535  Device vendor specific log
0xc1       GPL,SL  VS       8  Device vendor specific log
0xc3       GPL,SL  VS      24  Device vendor specific log
0xc6       GPL     VS    5184  Device vendor specific log
0xc7       GPL,SL  VS       8  Device vendor specific log
0xc9       GPL,SL  VS       8  Device vendor specific log
0xca       GPL,SL  VS      16  Device vendor specific log
0xcd       GPL,SL  VS       1  Device vendor specific log
0xce       GPL     VS       1  Device vendor specific log
0xcf       GPL     VS     512  Device vendor specific log
0xd1       GPL     VS     656  Device vendor specific log
0xd2       GPL     VS   10256  Device vendor specific log
0xd4       GPL     VS    2048  Device vendor specific log
0xda       GPL,SL  VS       1  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      4016         -
# 2  Extended offline    Completed: read failure       90%      4006         5946051152
# 3  Short offline       Completed without error       00%      4003         -
# 4  Extended offline    Completed: read failure       90%      4002         5946051152
# 5  Short offline       Completed without error       00%      3999         -
# 6  Short offline       Completed without error       00%      3992         -
# 7  Short offline       Completed without error       00%      3967         -
# 8  Extended offline    Completed without error       00%      3945         -
# 9  Short offline       Completed without error       00%         0         -

The drive has been running for less than half a year.

It is parking its arm an average of twice per hour.

It hits an LBA error from a long selftest.

It has pending / uncorrectable sectors.

This sounds like infant mortality. I wouldn’t trust it.


Since this is a Seagate Exos, I bet it is using “energy efficient” EPC defaults, which is why it keeps parking the drive head, even though TrueNAS defaults to disabling all “APM” features. (These Seagate drives ignore APM settings, since they use EPC instead.) You can use the Linux version of SeaChest to permanently disable the EPC settings for each Exos drive you purchase.

You’ll hear people say that such enterprise drives are rated for a trillion-billion load cycles, but I just don’t trust the manufacturers. I’d rather my HDDs constantly spin and never park their heads, unless I power off the system.

3 Likes

Ah, yeah that Load Cycle Count I’m assuming? Thanks for pointing it out, this has only been running in this system for a little over a week (yes I did do a long test)

Its been running for months off my Mac, lol was trying to avoid getting back into building out a system, I ran OpenSolaris/ZFS and eventually FreeNAS at home but gave all that up over a decade ago, this is part of the reason I figured I need to stop using this janky set up and build out this system. I checked my other drives and that Load_Cycle_Count for drives that are “new” to this system are low, (8-10), while the ones on the MAC are high (the partner drive is close). Sigh. Oh well, I definitely am going to pull the Mac drives, it will just take some time as new drives come in, burn in, and replace/resilvering

Your drive failed at 4002 hours due to what you said, the Long test failure. It is not a preventive but rather a reactive and required repair. It will not get better.

If the drive is under warranty, RMA the drive.

What happens is the drive tried to read that one sector but couldn’t, and it is highly likely there are many more sectors it cannot read beyond that area due to a physical defect. You are seeing two things as I believe: 1) The test failure, 2) the drive attempted to write data to a failing sector (ID 197/198). This does not need to be the sector identified by the test failure, it can be anywhere else on the drive.

Hope that helps some.

3 Likes

For sure, timing just sucks (I guess there’s never a great time), my confidence on the one hot spare isn’t great but it at least passed another long test and not going to the replacement drives tested since i’m traveling/remote, but here we go. Hopefully no more failures.

I guess I’ll skip the scrub and start the replacement.

(FYI, these are all “Reman’d drives” hence the confidence comment)