Disk seems to have detached from pool spontaneously... what to do?

Yes, it’s a single-drive pool. Used for a VM and some jails.

Here’s the SMART info:

$ smartctl -a /dev/ada1
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron Client SSDs
Device Model:     CT2000MX500SSD1
Serial Number:    2422E8B54A8A
LU WWN Device Id: 5 00a075 1e8b54a8a
Firmware Version: M3CR046
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Oct  4 04:07:44 2024 -03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (    0) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: (   2) minutes.
Extended self-test routine
recommended polling time: (  30) minutes.
Conveyance self-test routine
recommended polling time: (   2) minutes.
SCT capabilities:       (0x0031) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       874
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       3
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   100   100   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       1
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       139
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   073   068   000    Old_age   Always       -       27 (Min/Max 0/32)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_ECC_Cnt 0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Remain 0x0030   100   100   001    Old_age   Offline      -       0
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       458627170
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       9205430
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       4421333

SMART Error Log Version: 1
Invalid Error Log index = 0x10 (T13/1321D rev 1c Section 8.41.6.8.2.2 gives valid range from 1 to 5)

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       870         -
# 2  Short offline       Completed without error       00%       869         -
# 3  Short offline       Completed without error       00%       846         -
# 4  Short offline       Completed without error       00%       822         -
# 5  Short offline       Completed without error       00%       798         -
# 6  Extended offline    Completed without error       00%       774         -
# 7  Short offline       Completed without error       00%       750         -
# 8  Short offline       Completed without error       00%       726         -
# 9  Short offline       Completed without error       00%       702         -
#10  Short offline       Completed without error       00%       678         -
#11  Short offline       Completed without error       00%       653         -
#12  Short offline       Completed without error       00%       629         -
#13  Extended offline    Completed without error       00%       605         -
#14  Short offline       Completed without error       00%       581         -
#15  Short offline       Completed without error       00%       557         -
#16  Short offline       Completed without error       00%       533         -
#17  Short offline       Completed without error       00%       508         -
#18  Short offline       Completed without error       00%       484         -
#19  Short offline       Completed without error       00%       460         -
#20  Extended offline    Completed without error       00%       436         -
#21  Short offline       Completed without error       00%       412         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I’m running the long SMART test right now.

The story here is that this pool was single-disk for a long time, and, realizing the error of my ways, I bought another 2TB SSD to make it into a mirror. But it turned out that the new SSD was like 400 MB smaller than the old one, and so TrueNAS wouldn’t let me use it to make a mirror. So I repeated the process with a different 2TB SSD and… same story!

ada3 is the third SSD I’ve bought in an attempt to turn this pool into a mirrored one. It’s actually big enough to do this, but TrueNAS failed twice when trying to add this disk to the pool. The first time the error was [EFAULT] Failed to wipe disk ada1: [Errno 1] Operation not permitted: '/dev/ada3', and the second it was [EFAULT] Unable to GPT format the disk "ada3": gpart: geom 'ada1': File exists.

That debacle is documented here in case you’re interested. I still haven’t been able to solve it, and have just left the disk in in the meantime. (If you do read it, ada3 appears as ada1 there; the disk was on a different SATA port at the time).

1 Like