It gives me this:
A 700W power supply should be plenty, correct? I wouldn’t think the size would be an issue now unless the power supply itself is going bad…
Thanks!
It gives me this:
A 700W power supply should be plenty, correct? I wouldn’t think the size would be an issue now unless the power supply itself is going bad…
Thanks!
Hmm. Swear that’s worked before.
Elevate to a root shell with sudo su
and then try:
for i in /sys/class/scsi_host/host*/link_power_management_policy; do echo max_performance >$i; done
Unless it’s a really low-quality 700W that can’t actually deliver, 3 drives shouldn’t be anywhere near enough to upset that drive. What we’re doing with the script above is setting the policy on the SATA link itself to never go to a sleep/idle state.
This time it didn’t return anything. I ran it 2x to make sure I did it right the first time…
Thank you!
That should mean it took - you can check with
cat sys/class/scsi_host/host*/link_power_management_policy
And see if it comes back with maximum_performance
for all of them.
OK, figured out what was wrong. missed the “/” at the very beginning of the location… Now I get:
Exactly what you expected…
If this was the issue, should the degrade disappear by itself?
Thank you!
It’ll stop reoccurring. Do a zpool clear
and then a scrub and see if it reoccurs. Some of the issues I saw that were related to this also had older AMD based builds, so I’m wondering if something in the SATA controllers in those systems were exposing an oddity.
Default setting for power control is to follow the BIOS/system defaults, and if those are a bit off, that might be causing the failed writes.
Well, I have done a number of things to see what would happen.
I ran another complete surface scan (took about 8hrs or so) on the disks with the WD software which turned up with zero errors.
I loaded the drives (my data pool drives and my boot pool drive) into another computer I have that is much newer and built for much heavier lifting. It did seem to have less errors but they did not completely go away.
I have abandoned the back plain altogether because it did seem to cause more errors.
I have all drives wired directly to the mother board. (back in the original computer).
I’m running a second scrub in 2 days. Ran one on the newer computer and now running another on the original computer.
I noted that there were several files labeled as possibly damaged during the first scrub. They were all in the backup of my one laptop so I tried to delete the complete backup. I was able to delete most of it but for some reason it would not let me delete the main directory and a couple sub directories. It kept telling me I did not have permission when I tired to delete them from my MacBook and when I tried to ssh in and delete them that way it kept telling me “invalid exchange”. I have no idea what that means and couldn’t find anything from searching… Anyway. I am not seeing any files listed as damaged so far during this latest scrub.
I am not getting any of the “Degraded” or “Faulted” messages on the drives but I am still getting checksum errors. After last boot (about 5-1/2 hours ago), I am up to 324 checksum errors on 2 of the 3 disks and 20 on the 3rd.
I also have 2 more 2T WD Black drives that I tried to setup as mirrored drives to create a second pool but it failed when I tried to set it up?!
Do you think that upgrading my pool to the latest ZFS version might help? I have not researched what the latest update was for, so I don’t know.
Thanks for all the help!!!
@HoneyBadger So, now the pool and a disk show degraded again. UGH…
I got this message. I don’t know if it helps at all. I’m not sure what it’s telling me.
TrueNAS @ truenas
New alerts:
Current alerts:
Thanks!
Well, I don’t think it’s necessarily a hardware issue. I mean it could still be but I changed to my other computer and it is still getting checksum errors and lots of them…
I saw in the Truenas documentation that the checksum setting in each of the datasets should be set to “SHA512”. Do you think that is correct. I went ahead and changed all mine, but it seems to have made it worse instead of better…
This system is:
AMD Ryzen 7 5800X 8-Core Processor
Corsair Vengeance Pro 16GB x 4 (64 GB)
ROG STRIX X570-E Gaming Mother Board
Thanks…
Though you’ve done extensive testing & have validated using manufacturer software, I’m curious to the output of ‘smartctl -a /dev/sd#’ (replace # with relevant drive letter)… Maybe your drives are actually failing?..
Yes, maybe and I will try that. The curious thing is… They have exactly the same amount of errors. All 3 are exactly the same. Just seems a little weird to me. But who knows.
Thanks for your advice!
Here is Disk “1”…
Machine ix-applications
root@truenas[/mnt/LilWizz]# smartctl -a /dev/sdc
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD40EFRX-68WT0N0
Serial Number: WD-WCC4E4EE670X
LU WWN Device Id: 5 0014ee 20b484633
Firmware Version: 82.00A82
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database 7.3/5528
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jul 11 23:44:56 2025 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (50760) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 508) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 195 178 021 Pre-fail Always - 7241
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 150
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 075 075 000 Old_age Always - 18887
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 149
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 107
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 484
194 Temperature_Celsius 0x0022 120 104 000 Old_age Always - 32
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 1
SMART Error Log Version: 1
ATA Error Count: 12373 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 12373 occurred at disk power-on lifetime: 18761 hours (781 days + 17 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 02 00 00 00 a0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 10 02 00 00 00 a0 08 00:31:21.317 SET FEATURES [Enable SATA feature]
ec 00 00 00 00 00 a0 08 00:31:21.316 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 00:31:21.316 SET FEATURES [Set transfer mode]
ef 10 02 00 00 00 a0 08 00:31:21.315 SET FEATURES [Enable SATA feature]
ec 00 00 00 00 00 a0 08 00:31:21.315 IDENTIFY DEVICE
Error 12372 occurred at disk power-on lifetime: 18761 hours (781 days + 17 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 46 00 00 00 a0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 03 46 00 00 00 a0 08 00:31:21.316 SET FEATURES [Set transfer mode]
ef 10 02 00 00 00 a0 08 00:31:21.315 SET FEATURES [Enable SATA feature]
ec 00 00 00 00 00 a0 08 00:31:21.315 IDENTIFY DEVICE
ef 10 02 00 00 00 a0 08 00:31:21.261 SET FEATURES [Enable SATA feature]
Error 12371 occurred at disk power-on lifetime: 18761 hours (781 days + 17 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 02 00 00 00 a0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 10 02 00 00 00 a0 08 00:31:21.315 SET FEATURES [Enable SATA feature]
ec 00 00 00 00 00 a0 08 00:31:21.315 IDENTIFY DEVICE
ef 10 02 00 00 00 a0 08 00:31:21.261 SET FEATURES [Enable SATA feature]
ec 00 00 00 00 00 a0 08 00:31:21.260 IDENTIFY DEVICE
Error 12370 occurred at disk power-on lifetime: 18761 hours (781 days + 17 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 02 00 00 00 a0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 10 02 00 00 00 a0 08 00:31:21.261 SET FEATURES [Enable SATA feature]
ec 00 00 00 00 00 a0 08 00:31:21.260 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 00:31:21.260 SET FEATURES [Set transfer mode]
ef 10 02 00 00 00 a0 08 00:31:21.259 SET FEATURES [Enable SATA feature]
ec 00 00 00 00 00 a0 08 00:31:21.259 IDENTIFY DEVICE
Error 12369 occurred at disk power-on lifetime: 18761 hours (781 days + 17 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 46 00 00 00 a0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 03 46 00 00 00 a0 08 00:31:21.260 SET FEATURES [Set transfer mode]
ef 10 02 00 00 00 a0 08 00:31:21.259 SET FEATURES [Enable SATA feature]
ec 00 00 00 00 00 a0 08 00:31:21.259 IDENTIFY DEVICE
ef 10 02 00 00 00 a0 08 00:31:21.210 SET FEATURES [Enable SATA feature]
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Interrupted (host reset) 10% 18874 -
# 2 Short offline Interrupted (host reset) 10% 18761 -
# 3 Short offline Completed without error 00% 18699 -
# 4 Extended offline Completed without error 00% 18695 -
# 5 Short offline Completed without error 00% 18633 -
# 6 Short offline Completed without error 00% 18490 -
# 7 Short offline Completed without error 00% 18322 -
# 8 Short offline Completed without error 00% 18155 -
# 9 Short offline Completed without error 00% 17987 -
#10 Short offline Completed without error 00% 17819 -
#11 Extended offline Completed without error 00% 17257 -
#12 Extended offline Completed without error 00% 15797 -
#13 Extended offline Completed without error 00% 14382 -
#14 Extended offline Completed without error 00% 12929 -
#15 Extended offline Completed without error 00% 11501 -
#16 Extended offline Completed without error 00% 10072 -
#17 Extended offline Completed without error 00% 8648 -
#18 Extended offline Completed without error 00% 7186 -
#19 Extended offline Completed without error 00% 5748 -
#20 Extended offline Completed without error 00% 4285 -
#21 Extended offline Completed without error 00% 2822 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
The above only provides legacy SMART information - try 'smartctl -x' for more
Here is Disk “2”…
root@truenas[/mnt/LilWizz]# smartctl -a /dev/sdd
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD40EFRX-68WT0N0
Serial Number: WD-WCC4EF16HEXF
LU WWN Device Id: 5 0014ee 2b544f4f1
Firmware Version: 80.00A80
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database 7.3/5528
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jul 11 23:49:22 2025 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (52320) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 523) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 101
3 Spin_Up_Time 0x0027 193 175 021 Pre-fail Always - 7350
4 Start_Stop_Count 0x0032 001 001 000 Old_age Always - 100650
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 001 001 000 Old_age Always - 77038
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 098 098 000 Old_age Always - 2847
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 420
193 Load_Cycle_Count 0x0032 167 167 000 Old_age Always - 101276
194 Temperature_Celsius 0x0022 117 092 000 Old_age Always - 35
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 1
SMART Error Log Version: 1
ATA Error Count: 2008 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 2008 occurred at disk power-on lifetime: 26512 hours (1104 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 46 00 00 00 a0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 03 46 00 00 00 a0 08 00:03:36.037 SET FEATURES [Set transfer mode]
ec 00 00 00 00 00 a0 08 00:03:36.036 IDENTIFY DEVICE
c8 00 60 20 00 00 e0 08 00:03:35.998 READ DMA
ec 00 00 00 00 00 a0 08 00:03:35.988 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 00:03:35.988 SET FEATURES [Set transfer mode]
Error 2007 occurred at disk power-on lifetime: 26512 hours (1104 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 60 20 00 00 e0 Device Fault; Error: ABRT 96 sectors at LBA = 0x00000020 = 32
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 60 20 00 00 e0 08 00:03:35.998 READ DMA
ec 00 00 00 00 00 a0 08 00:03:35.988 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 00:03:35.988 SET FEATURES [Set transfer mode]
ec 00 00 00 00 00 a0 08 00:03:35.988 IDENTIFY DEVICE
c8 00 60 20 00 00 e0 08 00:03:35.949 READ DMA
Error 2006 occurred at disk power-on lifetime: 26512 hours (1104 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 46 00 00 00 a0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 03 46 00 00 00 a0 08 00:03:35.988 SET FEATURES [Set transfer mode]
ec 00 00 00 00 00 a0 08 00:03:35.988 IDENTIFY DEVICE
c8 00 60 20 00 00 e0 08 00:03:35.949 READ DMA
ec 00 00 00 00 00 a0 08 00:03:35.940 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 00:03:35.939 SET FEATURES [Set transfer mode]
Error 2005 occurred at disk power-on lifetime: 26512 hours (1104 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 60 20 00 00 e0 Device Fault; Error: ABRT 96 sectors at LBA = 0x00000020 = 32
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 60 20 00 00 e0 08 00:03:35.949 READ DMA
ec 00 00 00 00 00 a0 08 00:03:35.940 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 00:03:35.939 SET FEATURES [Set transfer mode]
ec 00 00 00 00 00 a0 08 00:03:35.939 IDENTIFY DEVICE
c8 00 60 20 00 00 e0 08 00:03:35.901 READ DMA
Error 2004 occurred at disk power-on lifetime: 26512 hours (1104 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 46 00 00 00 a0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 03 46 00 00 00 a0 08 00:03:35.939 SET FEATURES [Set transfer mode]
ec 00 00 00 00 00 a0 08 00:03:35.939 IDENTIFY DEVICE
c8 00 60 20 00 00 e0 08 00:03:35.901 READ DMA
ec 00 00 00 00 00 a0 08 00:03:35.891 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 00:03:35.891 SET FEATURES [Set transfer mode]
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Interrupted (host reset) 10% 11489 -
# 2 Short offline Completed without error 00% 11376 -
# 3 Short offline Completed without error 00% 11313 -
# 4 Extended offline Aborted by host 90% 11313 -
# 5 Extended offline Completed without error 00% 11310 -
# 6 Short offline Completed without error 00% 11248 -
# 7 Short offline Completed without error 00% 11242 -
# 8 Conveyance offline Completed without error 00% 11218 -
# 9 Short offline Completed without error 00% 11208 -
#10 Extended offline Completed without error 00% 11183 -
#11 Short offline Completed without error 00% 11171 -
#12 Short offline Completed without error 00% 11104 -
#13 Short offline Completed without error 00% 10937 -
#14 Short offline Completed without error 00% 10770 -
#15 Short offline Completed without error 00% 10602 -
#16 Short offline Completed without error 00% 10434 -
#17 Extended offline Completed without error 00% 9872 -
#18 Extended offline Completed without error 00% 9604 -
#19 Extended offline Completed without error 00% 9382 -
#20 Extended offline Completed without error 00% 7931 -
#21 Extended offline Completed without error 00% 6502 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
The above only provides legacy SMART information - try 'smartctl -x' for more
Here is Disk “3”…
root@truenas[/mnt/LilWizz]# smartctl -a /dev/sde
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD40EFRX-68WT0N0
Serial Number: WD-WCC4E7KPT054
LU WWN Device Id: 5 0014ee 2b5f38577
Firmware Version: 82.00A82
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database 7.3/5528
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jul 11 23:50:17 2025 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (51540) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 515) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 195 178 021 Pre-fail Always - 7233
4 Start_Stop_Count 0x0032 092 092 000 Old_age Always - 8385
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 004 004 000 Old_age Always - 70667
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 947
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 739
193 Load_Cycle_Count 0x0032 198 198 000 Old_age Always - 8729
194 Temperature_Celsius 0x0022 119 099 000 Old_age Always - 33
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Interrupted (host reset) 10% 5117 -
# 2 Short offline Completed without error 00% 5004 -
# 3 Short offline Completed without error 00% 4942 -
# 4 Extended offline Completed without error 00% 4938 -
# 5 Short offline Completed without error 00% 4876 -
# 6 Short offline Completed without error 00% 4733 -
# 7 Short offline Completed without error 00% 4565 -
# 8 Short offline Completed without error 00% 4397 -
# 9 Short offline Completed without error 00% 4230 -
#10 Short offline Completed without error 00% 4062 -
#11 Extended offline Completed without error 00% 3500 -
#12 Extended offline Completed without error 00% 2040 -
#13 Extended offline Completed without error 00% 625 -
#14 Extended offline Completed without error 00% 64708 -
#15 Extended offline Completed without error 00% 63279 -
#16 Extended offline Completed without error 00% 61851 -
#17 Extended offline Completed without error 00% 60427 -
#18 Extended offline Completed without error 00% 58965 -
#19 Extended offline Completed without error 00% 57527 -
#20 Extended offline Completed without error 00% 56065 -
#21 Extended offline Completed without error 00% 54601 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
The above only provides legacy SMART information - try 'smartctl -x' for more
I’d actually argue that drive 1 & 2 do/did have a mechanical fault given the multizone failure being a non-zero value. I don’t know if these drives are in warranty, but given that output I’d try to RMA them.
WD in my experience are pretty reasonable & have rma’d drives with similar outputs for me, even though they ‘passed’.
Disk 2 is worst of the bunch with raw read error rate creeping up - it is already half way to hitting their failure threshold.
Disk 3 looks fine imo.
…uhh, I’d recommend starting to backup data.
Another thing to note, I see disk 2 is ~70k hours old, but last test successfuly ran was at 18k hours of life. Not sure if anything went wrong with your scheduled tests? Not sure if I’d recommend running full smart long tests on all of them for more recent test results or focus on backups asap.
I have opened a ticket with WD. If nothing else, I want their opinion on what is going on with these disks.
I have monthly Long smart tests scheduled and weekly short tests. I also ran long tests on all 3 of these disks manually when I started having these issues, so I don’t know why it would show the last long test was sooooo long ago… Doesn’t make sense!
Thanks for your input! I appreciate it.
Well, WD said they believe there isn’t anything wrong with the drives. Their comment was… If you ran our software and it passed, the drives are good.
They suggested that I reformat the drives and start over (which is where I was going next).