Current_Pending_Sectors and Multi_Zone_Error_Rate after S.M.A.R.T test

If I’m dealing with pending sectors on a TrueNAS SCALE disk, are this possilbe steps to address my issue:

  1. Offline the Disk in the GUI: Go to the Storage Menu > Disks, select the problematic disk, and choose to offline it.

  2. Enable Raw Write Mode: Execute the following command in the terminal: sysctl kern.geom.debugflags=0x10. This enables raw write mode for the disk.

  3. Wipe the Disk: Use the dd command to wipe the disk. For example: dd if=/dev/sdc of=/dev/sda bs=512

  4. Disable Raw Write Mode: Run this command: sysctl kern.geom.debugflags=0x00

  5. Online the Disk in the GUI: Go back to the Storage Menu > Disks and bring the disk online.

  6. Run a Scrub: Initiate a scrub from the GUI.

or should I do this procedure:

  1. Offline the Disk:

    • First, identify the disk you want to take offline. You can do this using the zpool status command or the TrueNAS web interface.
    • Use the following command to offline the disk (replace pool_name and disk_name with your actual pool and disk names):
      sudo zpool offline pool_name disk_name
      
    • This command will take the specified disk offline.
  2. Wipe the Disk:

    • After taking the disk offline, you can wipe it using the wipefs command. Be cautious, as this will erase all data on the disk:
      sudo wipefs -a /dev/disk_name
      
    • Replace disk_name with the actual name of the disk.
  3. Bring the Disk Back Online:

    • To bring the disk back online, use the following command:
      sudo zpool online pool_name disk_name
      
    • The disk will be added back to the pool, and the resilvering process will begin.

The OUTPUT of SMARTCTL of the disk is the following:

admin@truenas[~]$ sudo smartctl --all /dev/sdb
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.29-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Green
Device Model:     WDC WD30EZRX-00DC0B0
Serial Number:    WD-WCC1T1020321
LU WWN Device Id: 5 0014ee 20889a4ba
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Jun  4 15:43:59 2024 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 243) Self-test routine in progress...
                                        30% of test remaining.
Total time to complete Offline 
data collection:                (39360) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 395) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x70b5) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       538
  3 Spin_Up_Time            0x0027   186   173   021    Pre-fail  Always       -       5666
  4 Start_Stop_Count        0x0032   085   085   000    Old_age   Always       -       15646
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       1
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   069   069   000    Old_age   Always       -       23357
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       2518
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       92
193 Load_Cycle_Count        0x0032   162   162   000    Old_age   Always       -       115432
194 Temperature_Celsius     0x0022   109   107   000    Old_age   Always       -       41
196 Reallocated_Event_Count 0x0032   199   199   000    Old_age   Always       -       1
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       266
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   195   000    Old_age   Offline      -       56

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       10%     23294         1565528952
# 2  Extended offline    Completed without error       00%     23245         -
# 3  Short offline       Completed without error       00%     23175         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Or can ignore the error on output IDs 197 and 200.

I would have expected that in SCALE you could do the whole procedure in the GUI ?
Offline; Wipe; Replace and allow Resilver…

1 Like

Maybe this can be done, but I read in case of minor errors like Current Pending Sectors, those sectors often can be recovered by using dd command. I took already a backup of my RAID1 POOL and will try this procedure that I found: fixing-freenas-error-currently-unreadable-pending-sectors.

Offlining a drive in a raidz1 means no more redundancy, which is a dangerous situation.
Better replace the drive first; then you can do what you want with the old drive.

Writing to the drive will not “repair” it but force the drive to reallocate the pending sectors. Failed sectors are a ground for RMA.
Multi_Zone could also be a cabling or controller issue. You should check the other drives and set up regular SMART tests. :point_up:

@etorix is correct, you’re fooling yourself if you simply move pending sectors to reallocated sectors, it still means the drive is going bad or has a defect. I never accept a drive with pending sectors if new. Offlining will mean no parity checks. All the steps you are thinking about should be avoided. Definitely do not simply ignore the errors either. The drive should be RMA if under warranty. You need a new drive and need to replace it, ideally in the UI.

@Jan_Tiedemann
What you are asking, if I am reading this posting correctly is to try and recover full operation of the hard drive that has quite a few errors and the really big one… It will not pass a self-test. As the others have said, replace the drive. If under RMA, you may be able to get an advanced RMA, the drive ships to you, you replace it, and use the same box to return the failed drive.

You have good sound advice for if you want to retain your data. If you desire to try to force the drive to remap the sectors, you have 266 and likely a lot more, and the time you will spend trying to force this could be weeks. Look for my hard drive troubleshooting guide in the TrueNAS forum (must google it). There is a section on how to do what you are asking to do.

This reads more like the platter surface is flaking off, it happens to us all.

Backup your data! You have no idea how easy it is to write over the wrong drive.

2 Likes

2.66 years powered on. If under warranty you should get it replaced.

I’d buy a nas drive and replace now, while awaiting the rma…

3 Likes