pedz
August 19, 2024, 1:30pm
1
I have " unreadable (pending) sectors" and the suggestion seems to be to “run a scrub”. Then run extended SMART test. And monitor the “ID5” value. Great but I don’t know how to do any of those things.
I see where I can set up a scrub that runs periodically and I see where the SMART data is but I don’t see how to just start a scrub nor do I see “ID5” in the SMART data that I have currently.
The scrub runs on Sunday. Should I just hang tight and wait to see what that scrub does?
Davvo
August 19, 2024, 1:43pm
2
What is the output of zpool status
?
You should immediately run a smart long test on the drive, then post the smart data; assuming you have enough parity on the pool (ie, RAIDZ2) you can wait Sunday for the scheduled scrub.
If you don’t, I would not run a scrub until you restore parity.
pedz
August 19, 2024, 2:24pm
3
truenas% zpool status
pool: Main
state: ONLINE
scan: scrub repaired 0B in 3 days 07:39:39 with 0 errors on Wed Jul 24 07:39:42 2024
config:
NAME STATE READ WRITE CKSUM
Main ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/3565c104-9997-11ec-94a1-3cecef623c38 ONLINE 0 0 0
gptid/3582f5df-9997-11ec-94a1-3cecef623c38 ONLINE 0 0 0
gptid/35c2d11f-9997-11ec-94a1-3cecef623c38 ONLINE 0 0 0
gptid/35ddaa8f-9997-11ec-94a1-3cecef623c38 ONLINE 0 0 0
gptid/35a0c1bc-9997-11ec-94a1-3cecef623c38 ONLINE 0 0 0
errors: No known data errors
pool: boot-pool
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 00:00:12 with 0 errors on Thu Aug 15 03:45:12 2024
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
nvd0p2 ONLINE 0 0 0
errors: No known data errors
pedz
August 19, 2024, 2:28pm
4
I started a SMART long test and it says it will be done 2024-08-20 09:44:56
1 Like
Davvo
August 19, 2024, 2:32pm
5
You can totally wait for the scheduled scrub.
You can manually bring up that data by using smartctl -a /dev/sdX
Replace X with whatever letter matches the drive you want to examine.
That will query the drive and produce a report that includes a table akin to the one below:
SMART overall-health self-assessment test result: PASSED
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 3893
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 59
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 054 054 000 Old_age Always - 18446
10 Spin_Retry_Count 0x0033 100 100 030 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 59
23 Helium_Condition_Lower 0x0023 100 100 075 Pre-fail Always - 0
24 Helium_Condition_Upper 0x0023 100 100 075 Pre-fail Always - 0
27 MAMR_Health_Monitor 0x0023 100 100 030 Pre-fail Always - 527199
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 58
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 59
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 34 (Min/Max 18/48)
196 Reallocated_Event_Count 0x0033 100 100 010 Pre-fail Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
220 Disk_Shift 0x0002 100 100 000 Old_age Always - 170000387
222 Loaded_Hours 0x0032 054 054 000 Old_age Always - 18446
223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 0
224 Load_Friction 0x0022 100 100 000 Old_age Always - 0
226 Load-in_Time 0x0026 100 100 000 Old_age Always - 691
240 Head_Flying_Hours 0x0001 100 100 001 Pre-fail Offline - 0
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 161903853171
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 369727046955
Note the leftmost column header, “ID”. ID5 would correspond to row five in that table, Reallocated Sector Count .
1 Like
pedz
August 19, 2024, 2:45pm
8
My disk seems to be called /dev/ada3 instead of /dev/sdX ? If that is true then ID5 is:
5 Reallocated_Sector_Ct 0x0033 100 100 001 Pre-fail Always - 0
I would paste the whole output but I can’t get it to paste right.
Davvo
August 19, 2024, 2:47pm
9
That’s good. If the data is recent enough , you can continue using the drive, but keep it under monitoring (weekly long tests).
1 Like
The “3” refers to partition 3 of the disk “ada” but smartctl works on the whole disk either way, so /dev/ada and /dev/ada3 would result in the same output. Nothing to be concerned about.
Edit: See @dan ’s post below for a correction to my statement above.
It would be interesting to see the whole output.
Put it inside the following tags:
[code]
[/code]
And it should format correctly.
1 Like
pedz
August 19, 2024, 3:34pm
12
I have this:
root@truenas[~]# ls -dl /dev/ada*
crw-r----- 1 root operator 0x89 May 8 20:32 /dev/ada0
crw-r----- 1 root operator 0x90 May 8 20:32 /dev/ada0p1
crw-r----- 1 root operator 0x92 May 8 20:32 /dev/ada0p2
crw-r----- 1 root operator 0xab May 8 20:32 /dev/ada1
crw-r----- 1 root operator 0xb3 May 8 20:32 /dev/ada1p1
crw-r----- 1 root operator 0xb5 May 8 20:32 /dev/ada1p2
crw-r----- 1 root operator 0xad May 8 20:32 /dev/ada2
crw-r----- 1 root operator 0xb7 May 8 20:32 /dev/ada2p1
crw-r----- 1 root operator 0xb9 May 8 20:32 /dev/ada2p2
crw-r----- 1 root operator 0xbb May 8 20:32 /dev/ada3
crw-r----- 1 root operator 0xc7 May 8 20:32 /dev/ada3p1
crw-r----- 1 root operator 0xc9 May 8 20:32 /dev/ada3p2
crw-r----- 1 root operator 0xbd May 8 20:32 /dev/ada4
crw-r----- 1 root operator 0xcb May 8 20:32 /dev/ada4p1
crw-r----- 1 root operator 0xcd May 8 20:32 /dev/ada4p2
dan
August 19, 2024, 4:14pm
13
No, it doesn’t; it refers to the disk ada3. BSD isn’t Linux. A partition of a BSD disk would be denoted with the letter p, as in ada3p2.
3 Likes
Thanks for the correction.
I thought you meant you were trying to post the full output of the smartctl
command.
At least, that’s the one I meant when I said it would be interesting to see.
pedz
August 19, 2024, 6:38pm
16
I was but thought it would no longer be interesting. But here it is:
root@truenas[~]# smartctl -a /dev/ada3
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: WDC WD140EFGX-68B0GN0
Serial Number: 9LHP2SBG
LU WWN Device Id: 5 000cca 28fd7ada2
Firmware Version: 85.00A85
User Capacity: 14,000,519,643,136 bytes [14.0 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Aug 19 13:36:27 2024 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: ( 101) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: (1459) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 001 Pre-fail Always - 0
2 Throughput_Performance 0x0004 134 134 054 Old_age Offline - 104
3 Spin_Up_Time 0x0007 083 083 001 Pre-fail Always - 349 (Average 328)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 45
5 Reallocated_Sector_Ct 0x0033 100 100 001 Pre-fail Always - 0
7 Seek_Error_Rate 0x000a 100 100 001 Old_age Always - 0
8 Seek_Time_Performance 0x0004 128 128 020 Old_age Offline - 18
9 Power_On_Hours 0x0012 098 098 000 Old_age Always - 20568
10 Spin_Retry_Count 0x0012 100 100 001 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 45
22 Unknown_Attribute 0x0023 100 100 025 Pre-fail Always - 100
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 722
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 722
194 Temperature_Celsius 0x0002 031 031 000 Old_age Always - 45 (Min/Max 22/50)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 8
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x000a 100 100 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 20563 -
# 2 Extended offline Completed without error 00% 32 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
I put the prompts and command so you can see exactly what command I’m doing.
Davvo
August 19, 2024, 7:46pm
17
Replace it: 1 Offline Unctable, 8 Pending Sectors and a read failure during the smart test.
2 Likes
dan
August 19, 2024, 8:25pm
18
Davvo:
Replace it
Agreed. The two SMART attributes you cite wouldn’t necessarily put me there (though they’d concern me), but the failed SMART test definitely would.
1 Like
Thank you.
As the others have already mentioned, the SMART report shows a failed test.
Since there is only one extended test registered it’s difficult to say how long the drive has had an issue.
If the drive is on warranty, the failed test makes it a no-brainer to ask for an RMA.
If it’s not, then you need to decide how you value your data.
pedz
August 19, 2024, 9:17pm
20
Ok. It is an WD 14TB Red Plus SATA III 3.5" Internal NAS HDD purchased 2/26/2022 but B&H says that it is no longer available.
Is WD 14TB Red Pro 7200 rpm SATA III 3.5" Internal NAS HDD a close enough match? (it changed from Plus to Pro but I don’t know what that means).
dan
August 19, 2024, 9:18pm
21
Really, any 14 TB drive is likely to be fine. But you should be able to RMA this one to WD; I recall they have a 3-yr warranty.
2 Likes