Unhealthy zfs 1 error

dirtydevver · November 18, 2024, 9:26am

so ive just noticed a unhealthy status and noticed one of the drives in my zfs pool has 1 error on it, but it dont show me any details.

ive done all the SMART tests and all have passed successfully and ive done a scrub.

is there anything further i can do to try to determine what the error is and if its a problem that needs to be looked at immediately, or is there a way i can clear the error and just closely monitor things in the short - medium term?

Protopia · November 18, 2024, 10:14am

We can help with this.

Go to System Settings / Shell and run sudo zpool status -v and copy and paste the results here (putting them in between two lines containing only ```).

dirtydevver · November 18, 2024, 10:45am

pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:10 with 0 errors on Fri Nov 15 03:45:11 2024
config:
    NAME        STATE     READ WRITE CKSUM
    boot-pool   ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        sdc3    ONLINE       0     0     0
        sda3    ONLINE       0     0     0
errors: No known data errors

pool: data1
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using ‘zpool clear’ or replace the device with ‘zpool replace’.
see: Message ID: ZFS-8000-9P — OpenZFS documentation
scan: scrub repaired 0B in 05:07:15 with 0 errors on Sun Nov 17 14:21:27 2024
config:
    NAME                                      STATE     READ WRITE CKSUM
    data1                                     ONLINE       0     0     0
      raidz1-0                                ONLINE       0     0     0
        b623ae24-a1fb-460f-9138-bae2dc44cd8b  ONLINE       1     0     0
        5975cdb8-1bcf-4fbd-8ffa-996d5e9f31b9  ONLINE       0     0     0
        be83df2f-fa2e-45de-bddc-0b2591ecc925  ONLINE       0     0     0
        e02853d5-6d5b-4884-a8f8-10e90b36f728  ONLINE       0     0     0
        4d1c1c68-2116-4d10-82c2-8073a948f8fc  ONLINE       0     0     0
        3a552bcc-dfa9-45d0-92c3-3696c494c0fe  ONLINE       0     0     0
errors: No known data errors

Protopia · November 18, 2024, 11:05am

Also smartctl -x /dev/sde please.

I think you can do a sudo zpool clear data1 with reasonable safety.

etorix · November 18, 2024, 11:50am

Unless drives have been reshuffled by a reboot since the first post, this should be
smartctl -x /dev/sdg

dirtydevver · November 18, 2024, 12:43pm

wow thanks this is interesting info and pasted below

truenas_admin@truenas[~]$ sudo smartctl -x /dev/sdg
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD60EFRX-68MYMN1
Serial Number: WD-WX11D55PX170
LU WWN Device Id: 5 0014ee 26199e5b4
Firmware Version: 82.00A82
User Capacity: 6,001,175,126,016 bytes [6.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5700 rpm
Device is: In smartctl database 7.3/5528
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Nov 18 12:41:00 2024 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x04) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 2984) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 684) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x303d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0
3 Spin_Up_Time POS–K 211 198 021 - 8450
4 Start_Stop_Count -O–CK 099 099 000 - 1362
5 Reallocated_Sector_Ct PO–CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O–CK 001 001 000 - 73317
10 Spin_Retry_Count -O–CK 100 100 000 - 0
11 Calibration_Retry_Count -O–CK 100 100 000 - 0
12 Power_Cycle_Count -O–CK 100 100 000 - 133
192 Power-Off_Retract_Count -O–CK 200 200 000 - 67
193 Load_Cycle_Count -O–CK 196 196 000 - 12970
194 Temperature_Celsius -O—K 124 103 000 - 28
196 Reallocated_Event_Count -O–CK 200 200 000 - 0
197 Current_Pending_Sector -O–CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O–CK 200 200 000 - 0
200 Multi_Zone_Error_Rate —R-- 200 200 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning

General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb6 GPL,SL VS 1 Device vendor specific log
0xb7 GPL,SL VS 40 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 15
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It “wraps” after 49.710 days.

Error 15 [14] occurred at disk power-on lifetime: 7741 hours (322 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER – ST COUNT LBA_48 LH LM LL DV DC
– – – == – == == == – – – – –
40 – 51 00 00 00 00 cc 24 f2 08 40 00 Error: UNC at LBA = 0xcc24f208 = 3424973320

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
– == – == – == == == – – – – – --------------- --------------------
60 00 38 00 00 00 00 cc 24 f2 08 40 00 5d+08:26:00.000 READ FPDMA QUEUED
60 00 30 00 00 00 00 cd 75 10 40 40 00 5d+08:25:59.992 READ FPDMA QUEUED
60 00 38 00 00 00 00 cd 75 10 08 40 00 5d+08:25:59.989 READ FPDMA QUEUED
60 00 38 00 00 00 00 cd 75 0f d0 40 00 5d+08:25:59.985 READ FPDMA QUEUED
60 00 30 00 00 00 00 cd 75 0f a0 40 00 5d+08:25:59.981 READ FPDMA QUEUED

Error 14 [13] occurred at disk power-on lifetime: 44541 hours (1855 days + 21 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER – ST COUNT LBA_48 LH LM LL DV DC
– – – == – == == == – – – – –
40 – 51 04 00 00 00 bd a5 00 00 e0 00 Error: UNC 1024 sectors at LBA = 0xbda50000 = 3181707264

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
– == – == – == == == – – – – – --------------- --------------------
25 00 00 04 00 00 00 bd a5 00 00 e0 08 1d+04:11:00.642 READ DMA EXT
25 00 00 04 00 00 00 bd a4 fc 00 e0 08 1d+04:11:00.635 READ DMA EXT
25 00 00 04 00 00 00 bd a4 f8 00 e0 08 1d+04:11:00.632 READ DMA EXT
25 00 00 04 00 00 00 bd a4 f4 00 e0 08 1d+04:11:00.629 READ DMA EXT
25 00 00 04 00 00 00 bd a4 f0 00 e0 08 1d+04:11:00.626 READ DMA EXT

Error 13 [12] occurred at disk power-on lifetime: 44473 hours (1853 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER – ST COUNT LBA_48 LH LM LL DV DC
– – – == – == == == – – – – –
40 – 51 00 08 00 00 2b 52 85 40 e0 00 Error: UNC 8 sectors at LBA = 0x2b528540 = 726828352

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
– == – == – == == == – – – – – --------------- --------------------
25 00 00 00 08 00 00 2b 52 85 40 e0 08 5d+07:18:40.945 READ DMA EXT
ea 00 00 00 00 00 00 00 00 00 00 e0 08 5d+07:18:40.883 FLUSH CACHE EXT
35 00 00 00 01 00 02 b9 91 84 18 e0 08 5d+07:18:40.883 WRITE DMA EXT
ea 00 00 00 00 00 00 00 00 00 00 e0 08 5d+07:18:40.883 FLUSH CACHE EXT
c8 00 00 00 08 00 00 08 50 8d 88 e8 08 5d+07:18:40.768 READ DMA

Error 12 [11] occurred at disk power-on lifetime: 44051 hours (1835 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER – ST COUNT LBA_48 LH LM LL DV DC
– – – == – == == == – – – – –
40 – 51 03 b8 00 00 28 b3 8f 88 e0 00 Error: UNC 952 sectors at LBA = 0x28b38f88 = 682856328

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
– == – == – == == == – – – – – --------------- --------------------
25 00 00 03 b8 00 00 28 b3 8f 88 e0 08 27d+23:24:45.490 READ DMA EXT
25 00 00 01 20 00 00 28 b3 6e 68 e0 08 27d+23:24:45.347 READ DMA EXT
25 00 00 02 00 00 00 28 b3 6c 68 e0 08 27d+23:24:45.325 READ DMA EXT
25 00 00 00 80 00 00 28 b3 6b e8 e0 08 27d+23:24:45.325 READ DMA EXT
25 00 00 00 20 00 00 28 b3 6b c8 e0 08 27d+23:24:45.306 READ DMA EXT

Error 11 [10] occurred at disk power-on lifetime: 32733 hours (1363 days + 21 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER – ST COUNT LBA_48 LH LM LL DV DC
– – – == – == == == – – – – –
40 – 51 00 08 00 00 7e 0b 10 18 e0 00 Error: UNC 8 sectors at LBA = 0x7e0b1018 = 2114654232

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
– == – == – == == == – – – – – --------------- --------------------
25 00 00 00 08 00 00 7e 0b 10 18 e0 08 2d+09:07:55.136 READ DMA EXT
25 00 00 00 70 00 00 75 cb 08 98 e0 08 2d+09:07:55.091 READ DMA EXT
25 00 00 00 08 00 00 75 cb 08 90 e0 08 2d+09:07:55.091 READ DMA EXT
25 00 00 00 08 00 00 75 cb 08 88 e0 08 2d+09:07:55.086 READ DMA EXT
25 00 00 00 60 00 00 88 55 b4 20 e0 08 2d+09:07:55.055 READ DMA EXT

Error 10 [9] occurred at disk power-on lifetime: 31899 hours (1329 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER – ST COUNT LBA_48 LH LM LL DV DC
– – – == – == == == – – – – –
40 – 51 00 08 00 00 7a f5 b3 88 e0 00 Error: UNC 8 sectors at LBA = 0x7af5b388 = 2062922632

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
– == – == – == == == – – – – – --------------- --------------------
25 00 00 00 08 00 00 7a f5 b3 88 e0 08 9d+16:38:48.674 READ DMA EXT
25 00 00 00 40 00 00 62 ad c6 b8 e0 08 9d+16:38:48.658 READ DMA EXT
25 00 00 00 08 00 00 77 a0 64 10 e0 08 9d+16:38:48.658 READ DMA EXT
25 00 00 00 08 00 00 77 a0 64 18 e0 08 9d+16:38:48.658 READ DMA EXT
25 00 00 00 08 00 00 77 a0 64 08 e0 08 9d+16:38:48.646 READ DMA EXT

Error 9 [8] occurred at disk power-on lifetime: 30904 hours (1287 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER – ST COUNT LBA_48 LH LM LL DV DC
– – – == – == == == – – – – –
40 – 51 00 08 00 00 6d d5 b3 88 e0 00 Error: UNC 8 sectors at LBA = 0x6dd5b388 = 1842721672

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
– == – == – == == == – – – – – --------------- --------------------
25 00 00 00 08 00 00 6d d5 b3 88 e0 08 2d+08:08:25.445 READ DMA EXT
25 00 00 00 80 00 01 18 eb 14 88 e0 08 2d+08:08:25.382 READ DMA EXT
25 00 00 00 10 00 01 18 eb 13 f8 e0 08 2d+08:08:25.201 READ DMA EXT
25 00 00 00 10 00 01 18 eb 13 e8 e0 08 2d+08:08:25.201 READ DMA EXT
25 00 00 00 10 00 01 18 eb 13 d8 e0 08 2d+08:08:25.200 READ DMA EXT

Error 8 [7] occurred at disk power-on lifetime: 30843 hours (1285 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER – ST COUNT LBA_48 LH LM LL DV DC
– – – == – == == == – – – – –
10 – 51 00 08 00 02 b9 91 83 f8 e0 00 Error: IDNF 8 sectors at LBA = 0x2b99183f8 = 11703256056

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
– == – == – == == == – – – – – --------------- --------------------
35 00 00 00 08 00 02 b9 91 83 f8 e0 08 1d+23:04:57.374 WRITE DMA EXT
ea 00 00 00 00 00 00 00 00 00 00 e0 08 1d+23:04:57.329 FLUSH CACHE EXT
ea 00 00 00 00 00 00 00 00 00 00 e0 08 1d+23:04:55.194 FLUSH CACHE EXT
35 00 00 00 01 00 02 b9 91 84 18 e0 08 1d+23:04:55.194 WRITE DMA EXT
ea 00 00 00 00 00 00 00 00 00 00 e0 08 1d+23:04:55.193 FLUSH CACHE EXT

SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Extended offline Completed without error 00% 7773 -

2 Conveyance offline Completed without error 00% 7744 -

3 Short offline Completed without error 00% 7744 -

4 Short offline Completed without error 00% 7516 -

5 Short offline Completed without error 00% 7402 -

6 Short offline Completed without error 00% 7234 -

7 Short offline Completed without error 00% 7066 -

8 Short offline Completed without error 00% 6898 -

9 Short offline Completed without error 00% 6731 -

#10 Short offline Completed without error 00% 6563 -
#11 Short offline Completed without error 00% 6395 -
#12 Short offline Completed without error 00% 6227 -
#13 Short offline Completed without error 00% 6059 -
#14 Short offline Completed without error 00% 5891 -
#15 Short offline Completed without error 00% 5724 -
#16 Short offline Completed without error 00% 5557 -
#17 Short offline Completed without error 00% 5389 -
#18 Short offline Completed without error 00% 5221 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
Device State: Active (0)
Current Temperature: 28 Celsius
Power Cycle Min/Max Temperature: 23/30 Celsius
Lifetime Min/Max Temperature: 2/49 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (467)

Index Estimated Time Temperature Celsius
468 2024-11-18 04:44 28 *********
… …( 71 skipped). … *********
62 2024-11-18 05:56 28 *********
63 2024-11-18 05:57 27 ********
… …(364 skipped). … ********
428 2024-11-18 12:02 27 ********
429 2024-11-18 12:03 28 *********
… …( 37 skipped). … *********
467 2024-11-18 12:41 28 *********

SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 3 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 9 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 605596 Vendor specific

etorix · November 18, 2024, 1:05pm

EFRX = WD Red Plus (CMR)
1,362 cycles for 8,450 hours of operation… except there are errors reported at 44,473 and 44,541 hours. Is this a refurbished drive whose counter has been reset, or a very old drive whose counter has wrapped past the maximal value?

bacon · November 18, 2024, 1:10pm

73317 hours of operation (~8.3 years). 8450 is the spin up time (in milliseconds).

prez02 · November 18, 2024, 1:16pm

ID #9
That is pretty old for a drive, I guess…

dirtydevver · November 18, 2024, 1:31pm

originally i did have 8 drives in my qnap, and replaced 3 of these with the newer “WDC_WD60EFZX” variants.

When i had further failures i just replaced those 2 with 2x iron wolf ST12000VN0008 as they were virtually the same price but double the capacity.

I built this new nas box around a week ago and started tranferring the data over. i seperated the drives in the case as i suspected in the future the older drives would likely fail with time and id know where to look. Sadly i wasnt expecting problems so soon, is this “Hdd Standby :Always On” normal?

Screenshot 2024-11-18 132716

As i already have 2x12 tb iron wolf drives I ordered another of these last night in preparation of needing to plan for the future and throw in these drives into a new z1 config and to try and migrate the data over to this new pool. Had i not already had 2 of these drives which havent really been used much then i wouldnt have ordered this 3rd drive. I guess for the longer term im wondering if with truenas going down the Nas specific drives is still a good idea or not, as my situation used to be with a qnap device but now ive moved to truenas im now with a custom built solution and wondering if going down enterprise drive route vs NAS drive route is a big issue or not?

dirtydevver · November 18, 2024, 1:37pm

wouldnt suprise me if they were 8-10 years old now, is the error being reported that the drive took longer than normal to power up from a sleep/standby state?

dirtydevver · November 18, 2024, 1:44pm

out of curiosity i went through all the drives in the pool and found that for sdh there are some errors but nothing logged raised, any ideas why this hasnt been flagged yet, is it below some kind of internal failed counter, or simply that it hasnt happened since migration to the truenas box?

im kinda wondering now if 2 of these are showing some signs of errors using these commands how safe is the data right now, if the original SDG drive failed and it tried to rebuild the volume would this be when the other would likely fail and then lose all my data?

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD60EFRX-68L0BN1
Serial Number: WD-WX41D758NS6V
LU WWN Device Id: 5 0014ee 262662ecd
Firmware Version: 82.00A82
User Capacity: 6,001,175,126,016 bytes [6.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5700 rpm
Device is: In smartctl database 7.3/5528
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Nov 18 13:38:52 2024 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 4604) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 700) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x303d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0
3 Spin_Up_Time POS–K 208 196 021 - 8566
4 Start_Stop_Count -O–CK 099 099 000 - 1354
5 Reallocated_Sector_Ct PO–CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 100 253 000 - 0
9 Power_On_Hours -O–CK 001 001 000 - 72371
10 Spin_Retry_Count -O–CK 100 100 000 - 0
11 Calibration_Retry_Count -O–CK 100 100 000 - 0
12 Power_Cycle_Count -O–CK 100 100 000 - 128
192 Power-Off_Retract_Count -O–CK 200 200 000 - 61
193 Load_Cycle_Count -O–CK 197 197 000 - 11647
194 Temperature_Celsius -O—K 122 103 000 - 30
196 Reallocated_Event_Count -O–CK 200 200 000 - 0
197 Current_Pending_Sector -O–CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O–CK 200 200 000 - 0
200 Multi_Zone_Error_Rate —R-- 200 200 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning

General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb6 GPL,SL VS 1 Device vendor specific log
0xb7 GPL,SL VS 54 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 8
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It “wraps” after 49.710 days.

Error 8 [7] occurred at disk power-on lifetime: 3779 hours (157 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER – ST COUNT LBA_48 LH LM LL DV DC
– – – == – == == == – – – – –
40 – 51 00 08 00 00 03 d3 29 98 e3 00 Error: UNC 8 sectors at LBA = 0x03d32998 = 64170392

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
– == – == – == == == – – – – – --------------- --------------------
c8 00 00 00 08 00 00 03 d3 29 98 e3 08 39d+16:12:09.550 READ DMA
25 00 00 00 18 00 01 ae 28 15 38 e0 08 39d+16:12:09.528 READ DMA EXT
25 00 00 04 00 00 00 03 d4 73 90 e0 08 39d+16:12:09.526 READ DMA EXT
25 00 00 03 f8 00 00 03 d4 77 90 e0 08 39d+16:12:09.514 READ DMA EXT
25 00 00 04 00 00 00 03 d3 83 90 e0 08 39d+16:12:09.318 READ DMA EXT

Error 7 [6] occurred at disk power-on lifetime: 3779 hours (157 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER – ST COUNT LBA_48 LH LM LL DV DC
– – – == – == == == – – – – –
40 – 51 04 00 00 00 03 d3 29 98 e0 00 Error: UNC 1024 sectors at LBA = 0x03d32998 = 64170392

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
– == – == – == == == – – – – – --------------- --------------------
25 00 00 04 00 00 00 03 cf b7 38 e0 08 39d+16:12:03.299 READ DMA EXT
25 00 00 04 00 00 00 03 cf b3 38 e0 08 39d+16:12:03.293 READ DMA EXT
25 00 00 04 00 00 00 03 cf af 38 e0 08 39d+16:12:03.288 READ DMA EXT
25 00 00 04 00 00 00 03 cf ab 38 e0 08 39d+16:12:03.283 READ DMA EXT
25 00 00 04 00 00 00 03 cf a7 38 e0 08 39d+16:12:03.277 READ DMA EXT

Error 6 [5] occurred at disk power-on lifetime: 63574 hours (2648 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER – ST COUNT LBA_48 LH LM LL DV DC
– – – == – == == == – – – – –
40 – 51 00 08 00 00 03 d1 85 e0 e3 00 Error: UNC 8 sectors at LBA = 0x03d185e0 = 64062944

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
– == – == – == == == – – – – – --------------- --------------------
c8 00 00 00 08 00 00 03 d1 85 e0 e3 08 20d+15:19:13.898 READ DMA
ea 00 00 00 00 00 00 00 00 00 00 e0 08 20d+15:19:13.898 FLUSH CACHE EXT
ea 00 00 00 00 00 00 00 00 00 00 e0 08 20d+15:19:13.879 FLUSH CACHE EXT
ca 00 00 00 01 00 00 00 10 2d a0 e0 08 20d+15:19:13.878 WRITE DMA
ea 00 00 00 00 00 00 00 00 00 00 e0 08 20d+15:19:13.878 FLUSH CACHE EXT

Error 5 [4] occurred at disk power-on lifetime: 63574 hours (2648 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER – ST COUNT LBA_48 LH LM LL DV DC
– – – == – == == == – – – – –
40 – 51 00 08 00 00 03 d1 85 d8 e3 00 Error: UNC 8 sectors at LBA = 0x03d185d8 = 64062936

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
– == – == – == == == – – – – – --------------- --------------------
c8 00 00 00 08 00 00 03 d1 85 d8 e3 08 20d+15:19:09.660 READ DMA
35 00 00 00 10 00 02 4c ce 89 38 e0 08 20d+15:19:09.660 WRITE DMA EXT
35 00 00 00 08 00 02 4c ce 7e d8 e0 08 20d+15:19:09.660 WRITE DMA EXT
35 00 00 00 08 00 02 4c ce 7e 98 e0 08 20d+15:19:09.660 WRITE DMA EXT
35 00 00 00 08 00 02 4c ce 7e 48 e0 08 20d+15:19:09.660 WRITE DMA EXT

Error 4 [3] occurred at disk power-on lifetime: 63574 hours (2648 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER – ST COUNT LBA_48 LH LM LL DV DC
– – – == – == == == – – – – –
40 – 51 04 00 00 00 03 d1 85 d8 e0 00 Error: UNC 1024 sectors at LBA = 0x03d185d8 = 64062936

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
– == – == – == == == – – – – – --------------- --------------------
25 00 00 04 00 00 00 03 d0 41 08 e0 08 20d+15:19:04.381 READ DMA EXT
25 00 00 04 00 00 00 03 d0 3d 08 e0 08 20d+15:19:04.378 READ DMA EXT
25 00 00 04 00 00 00 03 d0 39 08 e0 08 20d+15:19:04.376 READ DMA EXT
c8 00 00 00 10 00 00 03 ce f0 f8 e3 08 20d+15:19:04.376 READ DMA
25 00 00 04 00 00 01 f3 3b f3 88 e0 08 20d+15:19:04.336 READ DMA EXT

Error 3 [2] occurred at disk power-on lifetime: 57792 hours (2408 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER – ST COUNT LBA_48 LH LM LL DV DC
– – – == – == == == – – – – –
40 – 51 00 08 00 00 03 d2 4a 40 e3 00 Error: UNC 8 sectors at LBA = 0x03d24a40 = 64113216

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
– == – == – == == == – – – – – --------------- --------------------
c8 00 00 00 08 00 00 03 d2 4a 40 e3 08 48d+23:59:35.147 READ DMA
25 00 00 04 00 00 00 03 d3 9b 88 e0 08 48d+23:59:35.015 READ DMA EXT
25 00 00 04 00 00 00 03 d3 97 88 e0 08 48d+23:59:35.014 READ DMA EXT
25 00 00 04 00 00 00 03 d3 93 88 e0 08 48d+23:59:35.013 READ DMA EXT
25 00 00 04 00 00 00 03 d3 8f 88 e0 08 48d+23:59:35.011 READ DMA EXT

Error 2 [1] occurred at disk power-on lifetime: 57792 hours (2408 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER – ST COUNT LBA_48 LH LM LL DV DC
– – – == – == == == – – – – –
40 – 51 04 00 00 00 03 d2 4a 40 e0 00 Error: UNC 1024 sectors at LBA = 0x03d24a40 = 64113216

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
– == – == – == == == – – – – – --------------- --------------------
25 00 00 04 00 00 00 03 c0 9b 88 e0 08 48d+23:59:25.889 READ DMA EXT
25 00 00 04 00 00 00 03 c0 97 88 e0 08 48d+23:59:25.888 READ DMA EXT
25 00 00 04 00 00 00 03 c0 93 88 e0 08 48d+23:59:25.886 READ DMA EXT
ea 00 00 00 00 00 00 00 00 00 00 e0 08 48d+23:59:25.855 FLUSH CACHE EXT
35 00 00 00 01 00 02 b9 91 84 18 e0 08 48d+23:59:25.855 WRITE DMA EXT

Error 1 [0] occurred at disk power-on lifetime: 52306 hours (2179 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER – ST COUNT LBA_48 LH LM LL DV DC
– – – == – == == == – – – – –
10 – 51 00 01 00 02 b9 91 84 18 e0 00 Error: IDNF 1 sectors at LBA = 0x2b9918418 = 11703256088

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
– == – == – == == == – – – – – --------------- --------------------
35 00 00 00 01 00 02 b9 91 84 18 e0 08 2d+14:24:32.428 WRITE DMA EXT
ea 00 00 00 00 00 00 00 00 00 00 e0 08 2d+14:24:30.506 FLUSH CACHE EXT
ec 00 00 00 00 00 00 00 00 00 00 00 08 2d+14:24:29.907 IDENTIFY DEVICE
35 00 00 01 00 00 02 b9 9f b9 30 e0 08 2d+14:24:20.206 WRITE DMA EXT
35 00 00 01 00 00 02 b9 9f b8 30 e0 08 2d+14:24:20.204 WRITE DMA EXT

SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Short offline Completed without error 00% 6570 -

2 Short offline Completed without error 00% 6464 -

3 Short offline Completed without error 00% 6296 -

4 Short offline Completed without error 00% 6128 -

5 Short offline Completed without error 00% 5960 -

6 Short offline Completed without error 00% 5793 -

7 Short offline Completed without error 00% 5625 -

8 Short offline Completed without error 00% 5457 -

9 Short offline Completed without error 00% 5289 -

#10 Short offline Completed without error 00% 5121 -
#11 Short offline Completed without error 00% 4953 -
#12 Short offline Completed without error 00% 4787 -
#13 Short offline Completed without error 00% 4620 -
#14 Short offline Completed without error 00% 4452 -
#15 Short offline Completed without error 00% 4284 -
#16 Short offline Completed without error 00% 4116 -
#17 Short offline Completed without error 00% 3948 -
#18 Extended offline Completed without error 00% 3864 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
Device State: Active (0)
Current Temperature: 30 Celsius
Power Cycle Min/Max Temperature: 26/33 Celsius
Lifetime Min/Max Temperature: 10/49 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (180)

Index Estimated Time Temperature Celsius
181 2024-11-18 05:41 30 ***********
… …(476 skipped). … ***********
180 2024-11-18 13:38 30 ***********

SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 7 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 13 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000d 2 0 Non-CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 609061 Vendor specific

bacon · November 18, 2024, 2:47pm

There isn’t anything in those smartctl outputs that concerns me. It’s pretty typical for a drive of that age.

Note that your TrueNAS system periodically checks the S.M.A.R.T. status of all drives (controlled by the “Enable S.M.A.R.T” Option in Disks > Edit, it’s enabled by default). You’d get an alert if something seems bad.

That said, there is no reliable way to determine when a drive is going to fail (please tell me if there is one ). Make sure you have backups.

Protopia · November 18, 2024, 3:13pm

ooops - yes a typo on my part.

Protopia · November 18, 2024, 3:19pm

The one thing I did notice is that there haven’t been Short or Long smart tests for literally 7.5 years!!!

You should run short SMART tests on each drive weekly, and long SMART tests monthly, especially at this age.

I also recommend that you implement @joeschmuck’s Multi-Report script so you get an email weekly confirming good drive health or an email on the first day that a drive starts having even the most minor of problems.

dirtydevver · November 18, 2024, 3:43pm

thanks for the help, ill do a search now to look up his emailing scripting. ive already got the checks in place. not sure why it hasnt shown a record of previous checks when they were in the qnap box

dirtydevver · November 18, 2024, 3:47pm

so basically clear the error message with the command previously posted here, and just keep an eye on things?

my plan is to get a second z1 setup using some 12tb drives and leave the less important archive type data there on the original z1.

if i get a drive failure ill move the data off and re-create with 1 less drive as i dont want to buy 6tb drives anymore and they probably still got some shelf life in them for the medium term

bacon · November 18, 2024, 4:00pm

The counters in LifeTime(hours) wrap around after 65535 hours (2**16). For example, the last short test of WD-WX41D758NS6V was probably 11 days ago.

etorix · November 18, 2024, 4:43pm

That’s what I would suggest. There’s nothing obviously of concern right now, but run regular long tests (weekly to monthly), not just short tests, keep an eye on things… and have backups.

Of course, moving to a raidz2 pool with larger drives cannot hurt.

Topic		Replies	Views
Nextcloud introduces ZFS Errors Apps and Virtualization	9	132	January 14, 2025
TrueNAS HDD no errors, but fails in Scrutiny TrueNAS General Hardware , ZFS	10	245	February 17, 2026
Degraded Mirror Pool Core 13.3 TrueNAS General CORE	9	84	June 10, 2025
Are my disks failing? Alerts vs S.M.A.R.T TrueNAS General SCALE , Question , Hardware	12	321	March 6, 2025
Is this HDD dead or is it the cable TrueNAS General SCALE	13	523	August 21, 2024

Unhealthy zfs 1 error

1 Extended offline Completed without error 00% 7773 -

2 Conveyance offline Completed without error 00% 7744 -

3 Short offline Completed without error 00% 7744 -

4 Short offline Completed without error 00% 7516 -

5 Short offline Completed without error 00% 7402 -

6 Short offline Completed without error 00% 7234 -

7 Short offline Completed without error 00% 7066 -

8 Short offline Completed without error 00% 6898 -

9 Short offline Completed without error 00% 6731 -

1 Short offline Completed without error 00% 6570 -

2 Short offline Completed without error 00% 6464 -

3 Short offline Completed without error 00% 6296 -

4 Short offline Completed without error 00% 6128 -

5 Short offline Completed without error 00% 5960 -

6 Short offline Completed without error 00% 5793 -

7 Short offline Completed without error 00% 5625 -

8 Short offline Completed without error 00% 5457 -

9 Short offline Completed without error 00% 5289 -

Related topics