How do I do a scrub and how do I view is "ID5"?

I have " unreadable (pending) sectors" and the suggestion seems to be to “run a scrub”. Then run extended SMART test. And monitor the “ID5” value. Great but I don’t know how to do any of those things.

I see where I can set up a scrub that runs periodically and I see where the SMART data is but I don’t see how to just start a scrub nor do I see “ID5” in the SMART data that I have currently.

The scrub runs on Sunday. Should I just hang tight and wait to see what that scrub does?

What is the output of zpool status?

You should immediately run a smart long test on the drive, then post the smart data; assuming you have enough parity on the pool (ie, RAIDZ2) you can wait Sunday for the scheduled scrub.

If you don’t, I would not run a scrub until you restore parity.

truenas% zpool status
  pool: Main
 state: ONLINE
  scan: scrub repaired 0B in 3 days 07:39:39 with 0 errors on Wed Jul 24 07:39:42 2024
config:

	NAME                                            STATE     READ WRITE CKSUM
	Main                                            ONLINE       0     0     0
	  raidz2-0                                      ONLINE       0     0     0
	    gptid/3565c104-9997-11ec-94a1-3cecef623c38  ONLINE       0     0     0
	    gptid/3582f5df-9997-11ec-94a1-3cecef623c38  ONLINE       0     0     0
	    gptid/35c2d11f-9997-11ec-94a1-3cecef623c38  ONLINE       0     0     0
	    gptid/35ddaa8f-9997-11ec-94a1-3cecef623c38  ONLINE       0     0     0
	    gptid/35a0c1bc-9997-11ec-94a1-3cecef623c38  ONLINE       0     0     0

errors: No known data errors

  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
	The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:00:12 with 0 errors on Thu Aug 15 03:45:12 2024
config:

	NAME        STATE     READ WRITE CKSUM
	boot-pool   ONLINE       0     0     0
	  nvd0p2    ONLINE       0     0     0

errors: No known data errors

I started a SMART long test and it says it will be done 2024-08-20 09:44:56

1 Like

You can totally wait for the scheduled scrub.

You can manually bring up that data by using smartctl -a /dev/sdX
Replace X with whatever letter matches the drive you want to examine.

That will query the drive and produce a report that includes a table akin to the one below:

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       3893
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       59
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   054   054   000    Old_age   Always       -       18446
 10 Spin_Retry_Count        0x0033   100   100   030    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       59
 23 Helium_Condition_Lower  0x0023   100   100   075    Pre-fail  Always       -       0
 24 Helium_Condition_Upper  0x0023   100   100   075    Pre-fail  Always       -       0
 27 MAMR_Health_Monitor     0x0023   100   100   030    Pre-fail  Always       -       527199
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       58
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       59
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       34 (Min/Max 18/48)
196 Reallocated_Event_Count 0x0033   100   100   010    Pre-fail  Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       170000387
222 Loaded_Hours            0x0032   054   054   000    Old_age   Always       -       18446
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       691
240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       161903853171
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       369727046955

Note the leftmost column header, “ID”. ID5 would correspond to row five in that table, Reallocated Sector Count.

1 Like

My disk seems to be called /dev/ada3 instead of /dev/sdX ? If that is true then ID5 is:

5 Reallocated_Sector_Ct 0x0033 100 100 001 Pre-fail Always - 0

I would paste the whole output but I can’t get it to paste right.

That’s good. If the data is recent enough, you can continue using the drive, but keep it under monitoring (weekly long tests).

1 Like

The “3” refers to partition 3 of the disk “ada” but smartctl works on the whole disk either way, so /dev/ada and /dev/ada3 would result in the same output. Nothing to be concerned about.

Edit: See @dan’s post below for a correction to my statement above.

It would be interesting to see the whole output.
Put it inside the following tags:
[code]

[/code]
And it should format correctly.

1 Like

I have this:

root@truenas[~]# ls -dl /dev/ada*
crw-r-----  1 root  operator  0x89 May  8 20:32 /dev/ada0
crw-r-----  1 root  operator  0x90 May  8 20:32 /dev/ada0p1
crw-r-----  1 root  operator  0x92 May  8 20:32 /dev/ada0p2
crw-r-----  1 root  operator  0xab May  8 20:32 /dev/ada1
crw-r-----  1 root  operator  0xb3 May  8 20:32 /dev/ada1p1
crw-r-----  1 root  operator  0xb5 May  8 20:32 /dev/ada1p2
crw-r-----  1 root  operator  0xad May  8 20:32 /dev/ada2
crw-r-----  1 root  operator  0xb7 May  8 20:32 /dev/ada2p1
crw-r-----  1 root  operator  0xb9 May  8 20:32 /dev/ada2p2
crw-r-----  1 root  operator  0xbb May  8 20:32 /dev/ada3
crw-r-----  1 root  operator  0xc7 May  8 20:32 /dev/ada3p1
crw-r-----  1 root  operator  0xc9 May  8 20:32 /dev/ada3p2
crw-r-----  1 root  operator  0xbd May  8 20:32 /dev/ada4
crw-r-----  1 root  operator  0xcb May  8 20:32 /dev/ada4p1
crw-r-----  1 root  operator  0xcd May  8 20:32 /dev/ada4p2

No, it doesn’t; it refers to the disk ada3. BSD isn’t Linux. A partition of a BSD disk would be denoted with the letter p, as in ada3p2.

3 Likes

Thanks for the correction.

I thought you meant you were trying to post the full output of the smartctl command.
At least, that’s the one I meant when I said it would be interesting to see.

I was but thought it would no longer be interesting. But here it is:

root@truenas[~]# smartctl -a /dev/ada3
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD140EFGX-68B0GN0
Serial Number:    9LHP2SBG
LU WWN Device Id: 5 000cca 28fd7ada2
Firmware Version: 85.00A85
User Capacity:    14,000,519,643,136 bytes [14.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Aug 19 13:36:27 2024 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121)	The previous self-test completed having
					the read element of the test failed.
Total time to complete Offline 
data collection: 		(  101) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (1459) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   001    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   134   134   054    Old_age   Offline      -       104
  3 Spin_Up_Time            0x0007   083   083   001    Pre-fail  Always       -       349 (Average 328)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       45
  5 Reallocated_Sector_Ct   0x0033   100   100   001    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   001    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       20568
 10 Spin_Retry_Count        0x0012   100   100   001    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       45
 22 Unknown_Attribute       0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       722
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       722
194 Temperature_Celsius     0x0002   031   031   000    Old_age   Always       -       45 (Min/Max 22/50)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       8
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x000a   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     20563         -
# 2  Extended offline    Completed without error       00%        32         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I put the prompts and command so you can see exactly what command I’m doing.

Replace it: 1 Offline Unctable, 8 Pending Sectors and a read failure during the smart test.

2 Likes

Agreed. The two SMART attributes you cite wouldn’t necessarily put me there (though they’d concern me), but the failed SMART test definitely would.

1 Like

Thank you.
As the others have already mentioned, the SMART report shows a failed test.
Since there is only one extended test registered it’s difficult to say how long the drive has had an issue.

If the drive is on warranty, the failed test makes it a no-brainer to ask for an RMA.
If it’s not, then you need to decide how you value your data.

Ok. It is an WD 14TB Red Plus SATA III 3.5" Internal NAS HDD purchased 2/26/2022 but B&H says that it is no longer available.

Is WD 14TB Red Pro 7200 rpm SATA III 3.5" Internal NAS HDD a close enough match? (it changed from Plus to Pro but I don’t know what that means).

Really, any 14 TB drive is likely to be fine. But you should be able to RMA this one to WD; I recall they have a 3-yr warranty.

2 Likes