Thank you @joeschmuck for responding.
I rebooted the NAS again and the SSD is back up and pool was resilvered but I have 3 checksum errors
I don’t have any. Too bad as it could have been an easy fix
TrueNAS keeps shuffling stuff around. No idea yet how to stop that from happening.
# smartctl -x /dev/sdh
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.32-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Crucial/Micron Client SSDs
Device Model: CT120BX500SSD1
Serial Number: 1919E180FFCC
LU WWN Device Id: 0 000000 000000000
Firmware Version: M6CR013
User Capacity: 120,034,123,776 bytes [120 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database 7.3/5528
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Fri Jul 26 13:08:38 2024 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Disabled
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x11) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 100 100 050 - 0
5 Reallocate_NAND_Blk_Cnt -O--CK 100 100 010 - 0
9 Power_On_Hours -O--CK 100 100 050 - 43329
12 Power_Cycle_Count -O--CK 100 100 050 - 44
171 Program_Fail_Count -O--CK 100 100 050 - 0
172 Erase_Fail_Count -O--CK 100 100 050 - 0
173 Ave_Block-Erase_Count -O--CK 100 100 050 - 35
174 Unexpect_Power_Loss_Ct -O--CK 100 100 050 - 25
180 Unused_Reserve_NAND_Blk -O--CK 100 100 050 - 100
183 SATA_Interfac_Downshift -O--CK 100 100 050 - 0
184 Error_Correction_Count -O--CK 100 100 050 - 0
187 Reported_Uncorrect -O--CK 100 100 050 - 0
194 Temperature_Celsius -O---K 059 031 050 Past 41 (Min/Max 29/69)
196 Reallocated_Event_Count -O--CK 100 100 050 - 0
197 Current_Pending_ECC_Cnt -O--CK 100 100 050 - 0
198 Offline_Uncorrectable ----CK 100 100 050 - 0
199 UDMA_CRC_Error_Count -O--CK 100 100 050 - 2
202 Percent_Lifetime_Remain ----CK 098 098 001 - 98
206 Write_Error_Rate -OSR-K 100 100 050 - 0
210 Success_RAIN_Recov_Cnt -O--CK 100 100 050 - 0
246 Total_LBAs_Written -O--CK 100 100 050 - 1807575503
247 Host_Program_Page_Count -O--CK 100 100 050 - 56486734
248 FTL_Program_Page_Count -O--CK 100 100 050 - 97475992
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 1 Comprehensive SMART error log
0x03 GPL R/O 1 Ext. Comprehensive SMART error log
0x04 GPL,SL R/O 8 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x24 GPL R/O 88 Current Device Internal Status Data log
0x25 GPL R/O 32 Saved Device Internal Status Data log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
Device Error Count: 2
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 2 [1] occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
04 -- 51 00 00 00 00 00 00 00 00 40 00 Error: ABRT at LBA = 0x00000000 = 0
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
61 00 08 00 38 00 00 09 00 10 30 00 00 00:00:00.000 WRITE FPDMA QUEUED
61 00 08 00 40 00 00 0b 00 10 30 00 00 00:00:00.000 WRITE FPDMA QUEUED
61 00 08 00 48 00 00 47 00 f9 30 00 00 00:00:00.000 WRITE FPDMA QUEUED
61 00 08 00 b8 00 00 3c 00 00 a8 00 00 00:00:00.000 WRITE FPDMA QUEUED
61 00 08 00 b8 00 00 3c 00 00 a8 00 00 00:00:00.000 WRITE FPDMA QUEUED
Error 1 [0] log entry is empty
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 43329 -
# 2 Short offline Interrupted (host reset) 90% 43329 -
# 3 Short offline Aborted by host 00% 43329 -
# 4 Short offline Completed without error 00% 43329 -
# 5 Short offline Completed without error 00% 43319 -
# 6 Short offline Completed without error 00% 43295 -
# 7 Short offline Completed without error 00% 43271 -
# 8 Short offline Completed without error 00% 43248 -
# 9 Short offline Completed without error 00% 43224 -
#10 Short offline Completed without error 00% 43200 -
#11 Extended offline Completed without error 00% 43176 -
#12 Short offline Completed without error 00% 43152 -
#13 Short offline Completed without error 00% 43128 -
#14 Short offline Completed without error 00% 43104 -
#15 Short offline Completed without error 00% 43080 -
#16 Short offline Completed without error 00% 43056 -
#17 Short offline Completed without error 00% 43032 -
#18 Extended offline Completed without error 00% 43009 -
#19 Short offline Completed without error 00% 42985 -
Selective Self-tests/Logging not supported
SCT Commands not supported
Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x01 ===== = = === == General Statistics (rev 1) ==
0x01 0x008 4 44 --- Lifetime Power-On Resets
0x01 0x010 4 43329 --- Power-on Hours
0x01 0x018 6 1807575503 --- Logical Sectors Written
0x01 0x020 6 42220435 --- Number of Write Commands
0x01 0x028 6 1563041600 --- Logical Sectors Read
0x01 0x030 6 51282609 --- Number of Read Commands
0x07 ===== = = === == Solid State Device Statistics (rev 1) ==
0x07 0x008 1 2 --- Percentage Used Endurance Indicator
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
Pending Defects log (GP Log 0x0c) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 4 2 Command failed due to ICRC error
0x0002 4 1 R_ERR response for data FIS
0x0005 4 0 R_ERR response for non-data FIS
0x000a 4 3 Device-to-host register FISes sent due to a COMRESET
I ran the test 3 times. The 1st was aborted by mistake. the 2nd aborted by host and I found this in dmesg:
[ 531.650234] ata6.00: exception Emask 0x0 SAct 0x4080 SErr 0x0 action 0x6 frozen
[ 531.651254] ata6.00: failed command: WRITE FPDMA QUEUED
[ 531.652284] ata6.00: cmd 61/50:38:40:b2:d0/00:00:08:00:00/40 tag 7 ncq dma 40960 out
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 531.654346] ata6.00: status: { DRDY }
[ 531.655331] ata6.00: failed command: WRITE FPDMA QUEUED
[ 531.656376] ata6.00: cmd 61/88:70:18:66:70/00:00:08:00:00/40 tag 14 ncq dma 69632 out
res 40/00:01:04:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 531.658559] ata6.00: status: { DRDY }
[ 531.659629] ata6: hard resetting link
[ 531.974015] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 531.994890] ata6.00: configured for UDMA/133
[ 531.995192] ata6: EH complete