Recently I bought three Toshiba N300 12TB hard drives from two different stores. Pretty quickly (within a couple of days) S.M.A.R.T. had alerted about Seek_Error_Rate
for all three disks (it didn’t start for all at the same time though).
I took out two of the disks and have returned one of them for now. I left the third disk in and the rate now looks better. But when running S.M.A.R.T. tests in TrueNAS for the disk it always fails. I may misunderstand the UI, but it’s like it can’t complete the test for some reason. The table for “S.M.A.R.T. Test Results” in the web GUI says Status: FAILED
and Remaining: 0.9
for both short and extended. When looking at smartctl
it says SMART overall-health self-assessment test result: PASSED
, so is it even considered a faulty drive anymore? Can that be a problem if I would try to return the two remaining drives? I guess since it’s a rate, it can fluctuate both up and down over time, but it also feels weird that a failure would “heal”.
Here is the output of sudo smartctl -x /dev/sdb
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.1.74-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Toshiba N300/MN NAS HDD
Device Model: TOSHIBA HDWG21C
Serial Number: [redacted]
LU WWN Device Id: [redacted]
Firmware Version: 0601
User Capacity: 12,000,138,625,024 bytes [12.0 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database 7.3/5528
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: [redacted]
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Disabled
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x86) Offline data collection activity
was aborted by the device with a fatal error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 73) The previous self-test completed having
a test element that failed and the test
element that failed is not known.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: (1182) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate PO-R-- 100 100 050 - 0
2 Throughput_Performance P-S--- 100 100 050 - 0
3 Spin_Up_Time POS--K 100 100 001 - 6959
4 Start_Stop_Count -O--CK 100 100 000 - 9
5 Reallocated_Sector_Ct PO--CK 100 100 050 - 0
7 Seek_Error_Rate PO-R-- 083 001 050 Past 0
8 Seek_Time_Performance P-S--- 100 100 050 - 0
9 Power_On_Hours -O--CK 094 094 000 - 2704
10 Spin_Retry_Count PO--CK 100 100 030 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 9
23 Helium_Condition_Lower PO---K 100 100 075 - 0
24 Helium_Condition_Upper PO---K 100 100 075 - 0
191 G-Sense_Error_Rate -O--CK 100 100 000 - 0
192 Power-Off_Retract_Count -O--CK 100 100 000 - 8
193 Load_Cycle_Count -O--CK 100 100 000 - 12
194 Temperature_Celsius -O---K 100 100 000 - 48 (Min/Max 19/54)
196 Reallocated_Event_Count -O--CK 100 100 000 - 0
197 Current_Pending_Sector -O--CK 100 100 000 - 0
198 Offline_Uncorrectable ----CK 100 100 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
220 Disk_Shift -O---- 100 100 000 - 34996226
222 Loaded_Hours -O--CK 094 094 000 - 2701
223 Load_Retry_Count -O--CK 100 100 000 - 0
224 Load_Friction -O---K 100 100 000 - 0
226 Load-in_Time -OS--K 100 100 000 - 587
240 Head_Flying_Hours P----- 100 100 001 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 51 Comprehensive SMART error log
0x03 GPL R/O 5 Ext. Comprehensive SMART error log
0x04 GPL,SL R/O 8 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x08 GPL R/O 2 Power Conditions log
0x09 SL R/W 1 Selective self-test log
0x0c GPL R/O 513 Pending Defects log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x24 GPL R/O 53248 Current Device Internal Status Data log
0x25 GPL R/O 53248 Saved Device Internal Status Data log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: unknown failure 90% 2703 0
# 2 Short offline Completed: unknown failure 90% 2703 0
# 3 Extended offline Completed: unknown failure 90% 2543 0
# 4 Extended offline Completed: unknown failure 90% 2423 0
# 5 Short offline Completed: unknown failure 90% 2423 0
# 6 Short offline Completed: unknown failure 90% 2293 0
# 7 Short offline Completed: unknown failure 90% 2293 0
# 8 Extended offline Completed: unknown failure 90% 1123 0
# 9 Short offline Completed: unknown failure 90% 1122 0
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 1 (0x0001)
Device State: Active (0)
Current Temperature: 48 Celsius
Power Cycle Min/Max Temperature: 43/53 Celsius
Lifetime Min/Max Temperature: 19/54 Celsius
Specified Max Operating Temperature: 55 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 5/55 Celsius
Min/Max Temperature Limit: -40/70 Celsius
Temperature History Size (Index): 478 (177)
Index Estimated Time Temperature Celsius
178 2024-04-28 06:05 47 ****************************
179 2024-04-28 06:06 47 ****************************
180 2024-04-28 06:07 47 ****************************
[redacted a bunch of lines]
170 2024-04-28 13:55 47 ****************************
171 2024-04-28 13:56 48 *****************************
... ..( 5 skipped). .. *****************************
177 2024-04-28 14:02 48 *****************************
SCT Error Recovery Control:
Read: Disabled
Write: Disabled
Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x01 ===== = = === == General Statistics (rev 3) ==
0x01 0x008 4 9 --- Lifetime Power-On Resets
0x01 0x010 4 2704 --- Power-on Hours
0x01 0x018 6 7744578063 --- Logical Sectors Written
0x01 0x020 6 152227751 --- Number of Write Commands
0x01 0x028 6 50225158 --- Logical Sectors Read
0x01 0x030 6 176631 --- Number of Read Commands
0x01 0x038 6 9734400000 --- Date and Time TimeStamp
0x02 ===== = = === == Free-Fall Statistics (rev 1) ==
0x02 0x010 4 0 --- Overlimit Shock Events
0x03 ===== = = === == Rotating Media Statistics (rev 1) ==
0x03 0x008 4 317 --- Spindle Motor Power-on Hours
0x03 0x010 4 315 --- Head Flying Hours
0x03 0x018 4 12 --- Head Load Events
0x03 0x020 4 0 --- Number of Reallocated Logical Sectors
0x03 0x028 4 2 --- Read Recovery Attempts
0x03 0x030 4 0 --- Number of Mechanical Start Failures
0x03 0x038 4 0 --- Number of Realloc. Candidate Logical Sectors
0x03 0x040 4 8 --- Number of High Priority Unload Events
0x04 ===== = = === == General Errors Statistics (rev 1) ==
0x04 0x008 4 0 --- Number of Reported Uncorrectable Errors
0x04 0x010 4 0 --- Resets Between Cmd Acceptance and Completion
0x05 ===== = = === == Temperature Statistics (rev 1) ==
0x05 0x008 1 48 --- Current Temperature
0x05 0x010 1 47 N-- Average Short Term Temperature
0x05 0x018 1 49 N-- Average Long Term Temperature
0x05 0x020 1 54 --- Highest Temperature
0x05 0x028 1 19 --- Lowest Temperature
0x05 0x030 1 53 N-- Highest Average Short Term Temperature
0x05 0x038 1 33 N-- Lowest Average Short Term Temperature
0x05 0x040 1 51 N-- Highest Average Long Term Temperature
0x05 0x048 1 49 N-- Lowest Average Long Term Temperature
0x05 0x050 4 0 --- Time in Over-Temperature
0x05 0x058 1 55 --- Specified Maximum Operating Temperature
0x05 0x060 4 0 --- Time in Under-Temperature
0x05 0x068 1 5 --- Specified Minimum Operating Temperature
0x06 ===== = = === == Transport Statistics (rev 1) ==
0x06 0x008 4 42 --- Number of Hardware Resets
0x06 0x010 4 16 --- Number of ASR Events
0x06 0x018 4 0 --- Number of Interface CRC Errors
0x07 ===== = = === == Solid State Device Statistics (rev 1) ==
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
Pending Defects log (GP Log 0x0c)
No Defects Logged
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 4 0 Command failed due to ICRC error
0x0002 4 0 R_ERR response for data FIS
0x0003 4 0 R_ERR response for device-to-host data FIS
0x0004 4 0 R_ERR response for host-to-device data FIS
0x0005 4 0 R_ERR response for non-data FIS
0x0006 4 0 R_ERR response for device-to-host non-data FIS
0x0007 4 0 R_ERR response for host-to-device non-data FIS
0x0008 4 0 Device-to-host non-data FIS retries
0x0009 4 1 Transition from drive PhyRdy to drive PhyNRdy
0x000a 4 1 Device-to-host register FISes sent due to a COMRESET
0x000b 4 0 CRC errors within host-to-device FIS
0x000d 4 0 Non-CRC errors within host-to-device FIS
0x000f 4 0 R_ERR response for host-to-device data FIS, CRC
0x0010 4 0 R_ERR response for host-to-device data FIS, non-CRC
0x0012 4 0 R_ERR response for host-to-device non-data FIS, CRC
0x0013 4 0 R_ERR response for host-to-device non-data FIS, non-CRC
I run TrueNAS Scale in a container in Proxmox and the three disks are/were connected to a controller card that is passed through to the TrueNAS container, according to some threads in this excellent forum.
Should I be worried about the Seek Error Rate and return the drives or can I keep using the ones I haven’t returned yet? Do you think the failing runs of S.M.A.R.T. tests are related or is it something else? I don’t have super much knowledge about drives and S.M.A.R.T. on this level, so I would very much appreciate your input and ideas here!