Is my SSD dead? The number of I/O errors exceeded acceptable levels

MSameer · July 26, 2024, 8:11am

I am a bit confused here:

June 9th I received this alert:

 The number of I/O errors associated with a ZFS device exceeded
 acceptable levels. ZFS has marked the device as faulted.
 impact: Fault tolerance of the pool may be compromised.
 eid: 2566
 class: statechange
 state: FAULTED
 host: corellia
 time: 2024-06-10 00:06:51+0300
 vpath: /dev/sdi3
 vguid: 0xA3827F82C74B2AA2
 pool: boot-pool (0x88B45C7FF4BBA0E9)

The SSD seems to be completely unavailable. smartctl cannot open the device.

I pulled it off the NAS, connected it via an external SATA to USB enclosure to my laptop.
Ran smart and badblocks tests (I know badblocks does not make sense for an SSD) and it was all fine.
I moved it again to the NAS and all went fine until today, July the 26th when I received this error:

 The number of I/O errors associated with a ZFS device exceeded
 acceptable levels. ZFS has marked the device as faulted.
 impact: Fault tolerance of the pool may be compromised.
 eid: 3371
 class: statechange
 state: FAULTED
 host: corellia
 time: 2024-07-26 10:23:46+0300
 vpath: /dev/sdj3
 vguid: 0xCF08906F09FA2103
 pool: boot-pool (0x88B45C7FF4BBA0E9)

This is a crucial SSD used for my mirrored boot pool.

 Model Family: Crucial/Micron Client SSDs
 Device Model: CT120BX500SSD1

That SSD often reports high temperatures but I searched and read it could be an issue with the firmware.

So, is it broken? Should I just throw it away and shop for something else or could a bug somewhere be causing this corruption? I am leaning towards throwing it away and not buying corsair SSDs again.

This happened during the update from 24.04.1.1 to 24.04.2 (bad timing but I have a mirrored boot pool).

joeschmuck · July 26, 2024, 9:28am

You need to do a few things…

Run smartctl -t short /dev/sdX where “X” is the drive letter. In the above listing you have sdi and sdj?
Wait 3 minutes then, Post the output of smartctl -x /dev/sdX. We will see what the data shows.

Yes, I have personally experienced a Crucial SSD failure due to firmware but that was a very long time ago. You can check the firmware versions but hold off until we see that smart data.

The issue does not initially sound like the drive is dying. High drive temps could be the problem if those are valid.

If you have a lot of UDMA_CRC_ERRORS then odds are it is a SATA cable issue.

MSameer · July 26, 2024, 10:12am

Thank you @joeschmuck for responding.

I rebooted the NAS again and the SSD is back up and pool was resilvered but I have 3 checksum errors

I don’t have any. Too bad as it could have been an easy fix

TrueNAS keeps shuffling stuff around. No idea yet how to stop that from happening.

# smartctl -x  /dev/sdh 
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.32-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron Client SSDs
Device Model:     CT120BX500SSD1
Serial Number:    1919E180FFCC
LU WWN Device Id: 0 000000 000000000
Firmware Version: M6CR013
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Jul 26 13:08:38 2024 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x02)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(  120) seconds.
Offline data collection
capabilities: 			 (0x11) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					No Selective Self-test supported.
SMART capabilities:            (0x0002)	Does not save SMART data before
					entering power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   100   100   050    -    0
  5 Reallocate_NAND_Blk_Cnt -O--CK   100   100   010    -    0
  9 Power_On_Hours          -O--CK   100   100   050    -    43329
 12 Power_Cycle_Count       -O--CK   100   100   050    -    44
171 Program_Fail_Count      -O--CK   100   100   050    -    0
172 Erase_Fail_Count        -O--CK   100   100   050    -    0
173 Ave_Block-Erase_Count   -O--CK   100   100   050    -    35
174 Unexpect_Power_Loss_Ct  -O--CK   100   100   050    -    25
180 Unused_Reserve_NAND_Blk -O--CK   100   100   050    -    100
183 SATA_Interfac_Downshift -O--CK   100   100   050    -    0
184 Error_Correction_Count  -O--CK   100   100   050    -    0
187 Reported_Uncorrect      -O--CK   100   100   050    -    0
194 Temperature_Celsius     -O---K   059   031   050    Past 41 (Min/Max 29/69)
196 Reallocated_Event_Count -O--CK   100   100   050    -    0
197 Current_Pending_ECC_Cnt -O--CK   100   100   050    -    0
198 Offline_Uncorrectable   ----CK   100   100   050    -    0
199 UDMA_CRC_Error_Count    -O--CK   100   100   050    -    2
202 Percent_Lifetime_Remain ----CK   098   098   001    -    98
206 Write_Error_Rate        -OSR-K   100   100   050    -    0
210 Success_RAIN_Recov_Cnt  -O--CK   100   100   050    -    0
246 Total_LBAs_Written      -O--CK   100   100   050    -    1807575503
247 Host_Program_Page_Count -O--CK   100   100   050    -    56486734
248 FTL_Program_Page_Count  -O--CK   100   100   050    -    97475992
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      1  Comprehensive SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x24       GPL     R/O     88  Current Device Internal Status Data log
0x25       GPL     R/O     32  Saved Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
Device Error Count: 2
	CR     = Command Register
	FEATR  = Features Register
	COUNT  = Count (was: Sector Count) Register
	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
	LH     = LBA High (was: Cylinder High) Register    ]   LBA
	LM     = LBA Mid (was: Cylinder Low) Register      ] Register
	LL     = LBA Low (was: Sector Number) Register     ]
	DV     = Device (was: Device/Head) Register
	DC     = Device Control Register
	ER     = Error register
	ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 [1] occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 51 00 00 00 00 00 00 00 00 40 00  Error: ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 08 00 38 00 00 09 00 10 30 00 00     00:00:00.000  WRITE FPDMA QUEUED
  61 00 08 00 40 00 00 0b 00 10 30 00 00     00:00:00.000  WRITE FPDMA QUEUED
  61 00 08 00 48 00 00 47 00 f9 30 00 00     00:00:00.000  WRITE FPDMA QUEUED
  61 00 08 00 b8 00 00 3c 00 00 a8 00 00     00:00:00.000  WRITE FPDMA QUEUED
  61 00 08 00 b8 00 00 3c 00 00 a8 00 00     00:00:00.000  WRITE FPDMA QUEUED

Error 1 [0] log entry is empty
SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     43329         -
# 2  Short offline       Interrupted (host reset)      90%     43329         -
# 3  Short offline       Aborted by host               00%     43329         -
# 4  Short offline       Completed without error       00%     43329         -
# 5  Short offline       Completed without error       00%     43319         -
# 6  Short offline       Completed without error       00%     43295         -
# 7  Short offline       Completed without error       00%     43271         -
# 8  Short offline       Completed without error       00%     43248         -
# 9  Short offline       Completed without error       00%     43224         -
#10  Short offline       Completed without error       00%     43200         -
#11  Extended offline    Completed without error       00%     43176         -
#12  Short offline       Completed without error       00%     43152         -
#13  Short offline       Completed without error       00%     43128         -
#14  Short offline       Completed without error       00%     43104         -
#15  Short offline       Completed without error       00%     43080         -
#16  Short offline       Completed without error       00%     43056         -
#17  Short offline       Completed without error       00%     43032         -
#18  Extended offline    Completed without error       00%     43009         -
#19  Short offline       Completed without error       00%     42985         -

Selective Self-tests/Logging not supported

SCT Commands not supported

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4              44  ---  Lifetime Power-On Resets
0x01  0x010  4           43329  ---  Power-on Hours
0x01  0x018  6      1807575503  ---  Logical Sectors Written
0x01  0x020  6        42220435  ---  Number of Write Commands
0x01  0x028  6      1563041600  ---  Logical Sectors Read
0x01  0x030  6        51282609  ---  Number of Read Commands
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               2  ---  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  4            2  Command failed due to ICRC error
0x0002  4            1  R_ERR response for data FIS
0x0005  4            0  R_ERR response for non-data FIS
0x000a  4            3  Device-to-host register FISes sent due to a COMRESET

I ran the test 3 times. The 1st was aborted by mistake. the 2nd aborted by host and I found this in dmesg:

[  531.650234] ata6.00: exception Emask 0x0 SAct 0x4080 SErr 0x0 action 0x6 frozen
[  531.651254] ata6.00: failed command: WRITE FPDMA QUEUED
[  531.652284] ata6.00: cmd 61/50:38:40:b2:d0/00:00:08:00:00/40 tag 7 ncq dma 40960 out
                        res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[  531.654346] ata6.00: status: { DRDY }
[  531.655331] ata6.00: failed command: WRITE FPDMA QUEUED
[  531.656376] ata6.00: cmd 61/88:70:18:66:70/00:00:08:00:00/40 tag 14 ncq dma 69632 out
                        res 40/00:01:04:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  531.658559] ata6.00: status: { DRDY }
[  531.659629] ata6: hard resetting link
[  531.974015] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  531.994890] ata6.00: configured for UDMA/133
[  531.995192] ata6: EH complete

brandon_evans · July 26, 2024, 5:21pm

Smart data shows the drive is 4 years old. relocated event count 0. percent lifetime remaining 98%
Total_LBAs_Written = 1807575503 , if im not mistaken thats in gigs. Was this drives used for something other that running truenas. The read and write count is very high. i would go looking for a new drive.

MSameer · July 26, 2024, 5:56pm

Thanks @brandon_evans for responding.

I think it could be more than 4 years old. Been using it since before freenas 11.
Never used it for anything other than booting Freenas then truenas.

Total_LBAs_Written seems to be in something else.
I did:

zpool detach boot-pool sdh3
smartctl -a  /dev/sdh | grep LBA
246 Total_LBAs_Written      0x0032   100   100   050    Old_age   Always       -       1811087199
dd if=/dev/zero of=/dev/sdh bs=1M count=1024
smartctl -a  /dev/sdh | grep LBA
246 Total_LBAs_Written      0x0032   100   100   050    Old_age   Always       -       1813184351

Writing 1GB increased 246 Total_LBAs_Written by 2097152

It should be 1813184351/2097152 = 864 GB

But if you still think that’s too much then I can hunt for a new drive.

EDIT:
I am sure it’s 2097152 per GB because:

resilvered 7.81G in 00:02:01 with 0 errors on Fri Jul 26 20:59:49 2024
smartctl -a  /dev/sdh | grep LBA
246 Total_LBAs_Written      0x0032   100   100   050    Old_age   Always       -       1829860042

(1829860042-1813184351)/2097152 = ~7.9

EDIT2:
1 LBA unit for BX500 is 512 bytes
512 * 2097152 = 1GiB

joeschmuck · July 26, 2024, 6:19pm

You upgraded from 24.04.1.1 so make 24.04.1.1 the Active boot pool. Reboot and see if the problem persists. This can tell you if the hardware is good or not.

Assuming the problem returns…
How is the SSD connected to your machine? If using the HBA, try the onboard SATA port.

If you are already using an onboard SATA port, my suggestion is to make a backup of your TrueNAS configuration file, put it in a safe place. Then wipe both boot-pool drives and reinstall TrueNAS 24.04.2 from the ISO image.

Those are my recommended best steps based on the information provided.

brandon_evans · July 27, 2024, 5:12am

Total_LBAs_Written is in bytes, of corse. ohh how i wish drive manufacturers would stick with a standard. in that case thats not a lot of data written.

MSameer · October 23, 2024, 8:20pm

So the drive barfed again and I ended up replacing it with an S3520. I figured it should be more reliable.

RIP. It’s served me well for 5+ years. Still at 94% health though

HoneyBadger · October 23, 2024, 8:51pm

The Intel DC series drives are very reliable, and the S3520 is after the shift to 3D NAND, so I imagine you’ll be quite pleased with the years of boring service you’ll get from it.

For those following along, the Crucial BX series has an aggressive wear-leveling feature in its firmware that also causes it to throw false-positives for the “Current Pending Sectors” SMART value whenever it does one a wear-leveling pass. The combination of these two makes it an overall sub-par candidate for use in your TrueNAS system, even as a boot device.

If you’ve still got the BX500 drive @MSameer I’d be interested in a full smartctl -x dump of it, if you’re willing to share either publicly or via DM.

MSameer · October 24, 2024, 7:40am

Thank you for the reassurance. I just love boring hardware which you never hear from

The only down side is TN does not come with the intel SSD tools thus I cannot “shrink” the usable space but I don’t think that matters much for a boot drive.

Here goes (It’s connected via an ASM1051E SATA to USB bridge. I can connect it directly if you wise

vader:/tmp# smartctl -x /dev/sda 
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.10.6-amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron Client SSDs
Device Model:     CT120BX500SSD1
Serial Number:    1919E180FFCC
LU WWN Device Id: 0 000000 000000000
Firmware Version: M6CR013
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database 7.3/5610
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Oct 24 10:30:28 2024 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(  120) seconds.
Offline data collection
capabilities: 			 (0x11) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					No Selective Self-test supported.
SMART capabilities:            (0x0002)	Does not save SMART data before
					entering power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   100   100   050    -    0
  5 Reallocate_NAND_Blk_Cnt -O--CK   100   100   010    -    0
  9 Power_On_Hours          -O--CK   100   100   050    -    45463
 12 Power_Cycle_Count       -O--CK   100   100   050    -    58
171 Program_Fail_Count      -O--CK   100   100   050    -    0
172 Erase_Fail_Count        -O--CK   100   100   050    -    0
173 Ave_Block-Erase_Count   -O--CK   100   100   050    -    62
174 Unexpect_Power_Loss_Ct  -O--CK   100   100   050    -    31
180 Unused_Reserve_NAND_Blk -O--CK   100   100   050    -    100
183 SATA_Interfac_Downshift -O--CK   100   100   050    -    0
184 Error_Correction_Count  -O--CK   100   100   050    -    0
187 Reported_Uncorrect      -O--CK   100   100   050    -    0
194 Temperature_Celsius     -O---K   068   031   050    Past 32 (Min/Max 29/69)
196 Reallocated_Event_Count -O--CK   100   100   050    -    0
197 Current_Pending_ECC_Cnt -O--CK   100   100   050    -    0
198 Offline_Uncorrectable   ----CK   100   100   050    -    0
199 UDMA_CRC_Error_Count    -O--CK   100   100   050    -    2
202 Percent_Lifetime_Remain ----CK   096   096   001    -    96
206 Write_Error_Rate        -OSR-K   100   100   050    -    0
210 Success_RAIN_Recov_Cnt  -O--CK   100   100   050    -    0
246 Total_LBAs_Written      -O--CK   100   100   050    -    2612922986
247 Host_Program_Page_Count -O--CK   100   100   050    -    81653843
248 FTL_Program_Page_Count  -O--CK   100   100   050    -    246685256
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      1  Comprehensive SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x24       GPL     R/O     88  Current Device Internal Status Data log
0x25       GPL     R/O     32  Saved Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
Device Error Count: 2
	CR     = Command Register
	FEATR  = Features Register
	COUNT  = Count (was: Sector Count) Register
	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
	LH     = LBA High (was: Cylinder High) Register    ]   LBA
	LM     = LBA Mid (was: Cylinder Low) Register      ] Register
	LL     = LBA Low (was: Sector Number) Register     ]
	DV     = Device (was: Device/Head) Register
	DC     = Device Control Register
	ER     = Error register
	ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 [1] occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 51 00 00 00 00 00 00 00 00 40 00  Error: ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 08 00 38 00 00 09 00 10 30 00 00     00:00:00.000  WRITE FPDMA QUEUED
  61 00 08 00 40 00 00 0b 00 10 30 00 00     00:00:00.000  WRITE FPDMA QUEUED
  61 00 08 00 48 00 00 47 00 f9 30 00 00     00:00:00.000  WRITE FPDMA QUEUED
  61 00 08 00 b8 00 00 3c 00 00 a8 00 00     00:00:00.000  WRITE FPDMA QUEUED
  61 00 08 00 b8 00 00 3c 00 00 a8 00 00     00:00:00.000  WRITE FPDMA QUEUED

Error 1 [0] log entry is empty
SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     45441         -
# 2  Short offline       Completed without error       00%     45417         -
# 3  Short offline       Completed without error       00%     45393         -
# 4  Short offline       Completed without error       00%     45369         -
# 5  Extended offline    Completed without error       00%     45346         -
# 6  Short offline       Completed without error       00%     45322         -
# 7  Short offline       Completed without error       00%     45298         -
# 8  Short offline       Completed without error       00%     45274         -
# 9  Short offline       Completed without error       00%     45250         -
#10  Short offline       Completed without error       00%     45226         -
#11  Short offline       Completed without error       00%     45202         -
#12  Extended offline    Completed without error       00%     45178         -
#13  Short offline       Completed without error       00%     45154         -
#14  Short offline       Completed without error       00%     45130         -
#15  Short offline       Completed without error       00%     45107         -
#16  Short offline       Completed without error       00%     45083         -
#17  Short offline       Completed without error       00%     45059         -
#18  Short offline       Completed without error       00%     45035         -
#19  Extended offline    Completed without error       00%     45011         -

Selective Self-tests/Logging not supported

SCT Commands not supported

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4              58  ---  Lifetime Power-On Resets
0x01  0x010  4           45463  ---  Power-on Hours
0x01  0x018  6      2612922986  ---  Logical Sectors Written
0x01  0x020  6        72819815  ---  Number of Write Commands
0x01  0x028  6      1651517831  ---  Logical Sectors Read
0x01  0x030  6        52176492  ---  Number of Read Commands
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               4  ---  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  4            0  Command failed due to ICRC error
0x0002  4            0  R_ERR response for data FIS
0x0005  4            0  R_ERR response for non-data FIS
0x000a  4            1  Device-to-host register FISes sent due to a COMRESET

Nothing really strikes my attention there but I am not an expert.

HoneyBadger · October 24, 2024, 1:41pm

We actually do have provisioning available through the disk_resize command (eg: disk_resize sdX 16G) but it would be rather difficult (impossible) to do on a live disk and have the data remain intact You’d have to do this from a different boot device and then reinstall to the S3520.

This is the part I’m interested in. The flash translation layer says it’s done ~3x the number of page programming operations as were sent to the disk, probably from the SLC caching behavior, but it’s not logged anything much more intense than that for the wear leveling. Total writes look like they’re in the neighborhood of 1.3T so it’s a pretty lightly used device overall.

MSameer · October 24, 2024, 4:33pm

Brilliant! Thank you.
I am glad I still mirror my boot drive. I can detach the SSD, resize and reattach.

The SSD has been used exclusively for FreeNAS and TrueNAS boot.

type or paste code here

Judging by the total LBAs written it’s 1.245 to be exact

It’s served me well for around 5 years.

Do you need any info from the SSD before it gets added to the e-waste pile?

HoneyBadger · October 24, 2024, 6:54pm

Nope, but it’s still likely a fine candidate for a non-ZFS boot device (albeit with the caveats around write amplification)

MSameer · October 25, 2024, 7:06am

It is too small for anything other than booting TN.

It is also too unreliable to use in non-mirror pools. I hate to get rid of it but I have no real use for it. Maybe I can donate it for free.

Topic		Replies	Views
Currently Unreadable (pending) Sectors all over system TrueNAS General SCALE , Hardware	12	542	April 22, 2025
Intel SSD reported temp and what SMART data displays TrueNAS General SCALE	9	228	August 2, 2025
Every Drive in my pool has degraded! TrueNAS General Hardware	24	458	September 29, 2025
Pool resilverer after a disk died question :) TrueNAS General SCALE , Hardware , ZFS	52	483	February 15, 2026
Truenas Core Uncorrectable I/O failure on storage pool TrueNAS General CORE	23	189	July 10, 2025

Is my SSD dead? The number of I/O errors exceeded acceptable levels

Related topics