Bringing back a SUSPENDED pool that doesn't show up with `zpool status`

I lost my main 6-disk RAIDZ2 pool this night after two subsequent device got removed from the pool.
It got from DEGRADED to SUSPENDED within 5mns as far as alerting emails can tell.

The last email I got was pool state is SUSPENDED
The following disks are not healthy (one is FAULTED, one is REMOVED)

Of course all my data is on this pool, and it seems TN is also storing Reporting data on it.
I couldn’t access Disks reporting and in the end had to hard reboot the server because of an unresponsive VM.

I suspect the 6 band new WD Gold disk were running too hot (last check a few days ago showed one around 55C).

Anyway, now after reboot the pool infos show in the interface but the disks are listed among available disks for new pools.

CLI zpool status doesn’t show the pool in question.

short SMARTCTL on all 6 disks were successful.

I’m not sure what to do next to rebuild the pool.

First, try to import the pool, from GUI or with zpool import

Ohh, that doesn’t look good.

got a cold sweat just reading sentence … '^^

But at least zpool import shows my pool and says it can be imported

cannot import ‘mypool’: I/O Error
Destroy and re-create the pool from backup source.

It does look bad bad bad.
I can’t possibly have a backup of my main pool.

First: Keep calm, and signal the ever-helpful Batman @HoneyBadger

Second: A full description of your system is in order, detailed hardware and OS version.

Then we’ll see about solving this “I/O Error” (any more details from the error message?) and attempting force import.
Do you have a backup?

When you provide your system hardware makeup and which version of TrueNAS you are running, did you cause the REMOVED or was all this done on it’s own?

Please post the output of zpool status -vand post the output for the two drives that you feel are faulted smartctl -x /dev/??? where ??? = the drive id (sda/sdb/ada0/ada1 for example) and please use the </> when posting so it is significantly easier for us to read.

Whatever you do, wait for someone to tell you what to do. You should be able to copy any important files off the server now. If they are all important, copy the most important files. You don’t want to put a lot of stress on the pool however if you have a few TB of data you “must have”, grab it now. You already realize that your pool could fail completely, I can hear it in your voice. You are correct. Let’s minimize the damage if possible. Besides @HoneyBadger we have @Arwen who is known for being very knowledgeable in this area.

But provide those things that have been asked of you by the team here so the best possible help can be obtained.

1 Like

This is my main pool in terms of capacity, so I can’t possibly have a full backup of it.

For the context, most subsets of data have backup somewhere.
Though it’d be quite a bit of work to piece it back together. And that excepts a huge and dear movie collection, too big to have a backup :worried:
My main VM, a docker host, provides lots of services, and rely on that pool for storage. That’s the main thing really. Backup of its data is available, but I’d have to rebuild the host.

I’m using TN Scale 24.10.2
My hardware is made of:

- a boot pool on an SSD (Crucial BX500 120Go)
- a RAIDZ2 6x 12TB WD122KRYZ (WD Gold Enterprise Class SATA HDD)
(and, taken from my old /community signature:)
  - Supermicro A2SDi-H-TF (Atom C3758 8 cores)
  - 64GB ECC RAM
  - SilverStone D380 chassis (not well ventilated)

zpool status -v doesn’t know about my pool

zpool status -v subramanya
cannot open 'subramanya': no such pool 

zfs list doesn’t list subramanya

When you provide your system hardware makeup and which version of TrueNAS you are running, did you cause the REMOVED or was all this done on it’s own?

The REMOVE was done on its own. It’s the first email alert, and it says:

The following devices are not healthy:
- Disk  WDC_WDXXXX
  B002R13D is REMOVED

Whatever you do, wait for someone to tell you what to do. You should be able to copy any important files off the server now. If they are all important, copy the most important files. You don’t want to put a lot of stress on the pool however if you have a few TB of data you “must have”, grab it now.

I don’t have access to anything unfortunately!

Then we’ll see about solving this “I/O Error” (any more details from the error message?)

Nope, just exactly this:

# zpool import subramanya
cannot import ‘subramanya’: I/O error
            Destroy and re-create the pool from
            a backup source.
smartctl -x /dev/sda
...
smartctl -x /dev/sdf

all passed a short test successfully that was run after the pool got suspended.

I’m gonna post a dedicated smartctl -x result in subsequent post as it’s very lengthy.

I should have seen that of course, duh. This is why I’m not the expert.

Yes, lengthy is good. More specific data is good.

This is /dev/sde, the one that got REMOVED on its own, numbered B002R13D

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===                    Device Model:     WDC WD122KRYZ-01CDAB0
Serial Number:    B002R13D
LU WWN Device Id: 5 0014ee 26bdd981b                    Firmware Version: 01.01H01
User Capacity:    12,000,138,625,024 bytes [12.0 TB]    Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm                              Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5706     ATA Version is:   ACS-4 published, ANSI INCITS 529-2018
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)Local Time is:    Wed Jun  4 15:08:39 2025 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled                               AAM feature is:   Unavailable
APM feature is:   Disabled                              Rd look-ahead is: Enabled
Write cache is:   Enabled                               DSN feature is:   Disabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

Warning! SMART Attribute Thresholds Structure error: invalid SMART checksum.
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (24944) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 901) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
  2 Throughput_Performance  --S--K   100   100   000    -    0
  3 Spin_Up_Time            POS--K   253   162   021    -    6875
  4 Start_Stop_Count        -O--CK   100   100   000    -    42
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  8 Seek_Time_Performance   --S--K   100   100   000    -    0
  9 Power_On_Hours          -O--CK   096   096   000    -    3450
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    42
192 Power-Off_Retract_Count -O--CK   200   200   000    -    37
193 Load_Cycle_Count        -O--CK   200   200   000    -    4
194 Temperature_Celsius     -O---K   105   097   000    -    47
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   100   000    -    0
241 Total_LBAs_Written      -O--CK   200   200   000    -    110894121272
242 Total_LBAs_Read         -O--CK   200   200   000    -    160129189293
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x04       GPL     R/O    256  Device Statistics log
0x04       SL      R/O    255  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x0a       GPL     R/W    256  Device Statistics Notification
0x0c       GPL     R/O   2048  Pending Defects log
0x0f       GPL     R/O      2  Sense Data for Successful NCQ Cmds log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x12       GPL     R/O      1  SATA NCQ Non-Data log
0x13       GPL     R/O      1  SATA NCQ Send and Receive log
0x15       GPL     R/W      1  Rebuild Assist log
0x24       GPL     R/O    322  Current Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x53       GPL     R/O      1  Sense Data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa1  GPL,SL  VS      16  Device vendor specific log
0xa3-0xa5  GPL,SL  VS      16  Device vendor specific log
0xa7       GPL,SL  VS      16  Device vendor specific log
0xa8-0xb1  GPL,SL  VS       1  Device vendor specific log
0xb2       GPL     VS   65535  Device vendor specific log
0xb3-0xb6  GPL,SL  VS       1  Device vendor specific log
0xb9           SL  VS       1  Device vendor specific log
0xba       GPL,SL  VS      84  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xd2       GPL,SL  VS       1  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      3444         -
# 2  Extended offline    Completed without error       00%      3395         -
# 3  Short offline       Completed without error       00%      3220         -
# 4  Short offline       Completed without error       00%      3052         -
# 5  Short offline       Completed without error       00%      2884         -
# 6  Short offline       Completed without error       00%      2716         -
# 7  Extended offline    Completed without error       00%      2650         -
# 8  Short offline       Completed without error       00%      2548         -
# 9  Short offline       Completed without error       00%      2380         -
#10  Short offline       Completed without error       00%      2212         -
#11  Short offline       Completed without error       00%      2044         -
#12  Extended offline    Completed without error       00%      1930         -
#13  Short offline       Completed without error       00%      1876         -
#14  Short offline       Completed without error       00%      1709         -
#15  Short offline       Completed without error       00%      1541         -
#16  Short offline       Completed without error       00%      1373         -
#17  Short offline       Completed without error       00%      1208         -
#18  Extended offline    Completed without error       00%      1196         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
Device State:                        Active (0)
Current Temperature:                    47 Celsius
Power Cycle Min/Max Temperature:     47/48 Celsius
Lifetime    Min/Max Temperature:     15/55 Celsius
Under/Over Temperature Limit Count:   0/0
Minimum supported ERC Time Limit:    65 (6.5 seconds)
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      5/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (445)

Index    Estimated Time   Temperature Celsius
 446    2025-06-04 07:11    47  ****************************
 ...    ..(148 skipped).    ..  ****************************
 117    2025-06-04 09:40    47  ****************************
 118    2025-06-04 09:41    48  *****************************
 ...    ..( 40 skipped).    ..  *****************************
 159    2025-06-04 10:22    48  *****************************
 160    2025-06-04 10:23     ?  -
 161    2025-06-04 10:24    48  *****************************
 ...    ..(126 skipped).    ..  *****************************
 288    2025-06-04 12:31    48  *****************************
 289    2025-06-04 12:32    47  ****************************
 ...    ..(155 skipped).    ..  ****************************
 445    2025-06-04 15:08    47  ****************************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 3) ==
0x01  0x008  4              42  -D-  Lifetime Power-On Resets
0x01  0x010  4            3450  -D-  Power-on Hours
0x01  0x018  6    110894121272  -D-  Logical Sectors Written
0x01  0x020  6       501998921  -D-  Number of Write Commands
0x01  0x028  6    160129189293  -D-  Logical Sectors Read
0x01  0x030  6       704894669  -D-  Number of Read Commands
0x01  0x038  6      3830065408  -D-  Date and Time TimeStamp
0x02  =====  =               =  ===  == Free-Fall Statistics (rev 1) ==
0x02  0x010  4               0  -D-  Overlimit Shock Events
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4            3433  -D-  Spindle Motor Power-on Hours
0x03  0x010  4            3431  -D-  Head Flying Hours
0x03  0x018  4              42  -D-  Head Load Events
0x03  0x020  4               0  -D-  Number of Reallocated Logical Sectors
0x03  0x028  4              32  -D-  Read Recovery Attempts
0x03  0x030  4               0  -D-  Number of Mechanical Start Failures
0x03  0x038  4               0  -D-  Number of Realloc. Candidate Logical Sectors
0x03  0x040  4              37  -D-  Number of High Priority Unload Events
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  -D-  Number of Reported Uncorrectable Errors
0x04  0x010  4               0  -D-  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              47  ---  Current Temperature
0x05  0x010  1              47  -D-  Average Short Term Temperature
0x05  0x018  1              46  -D-  Average Long Term Temperature
0x05  0x020  1              55  -D-  Highest Temperature
0x05  0x028  1              23  -D-  Lowest Temperature
0x05  0x030  1              52  -D-  Highest Average Short Term Temperature
0x05  0x038  1              35  -D-  Lowest Average Short Term Temperature
0x05  0x040  1              46  -D-  Highest Average Long Term Temperature
0x05  0x048  1              38  -D-  Lowest Average Long Term Temperature
0x05  0x050  4               0  -D-  Time in Over-Temperature
0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  -D-  Time in Under-Temperature
0x05  0x068  1               5  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4              68  -D-  Number of Hardware Resets
0x06  0x010  4              15  -D-  Number of ASR Events
0x06  0x018  4               0  -D-  Number of Interface CRC Errors
0xff  =====  =               =  ===  == Vendor Specific Statistics (rev 1) ==
0xff  0x008  7               0  -D-  Vendor Specific
0xff  0x010  7               0  -D-  Vendor Specific
0xff  0x018  7               0  -D-  Vendor Specific
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c)
No Defects Logged

This is /dev/sdf, the second diskbthatbhad issue and was FAULTED, numbered B002SNAD

root@freenas[~]# smartctl -x /dev/sdf
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===                    Device Model:     WDC WD122KRYZ-01CDAB0
Serial Number:    B002SNAD
LU WWN Device Id: 5 0014ee 2c133ac8a                    Firmware Version: 01.01H01
User Capacity:    12,000,138,625,024 bytes [12.0 TB]    Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm                              Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5706     ATA Version is:   ACS-4 published, ANSI INCITS 529-2018
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)Local Time is:    Wed Jun  4 15:11:40 2025 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled                               AAM feature is:   Unavailable
APM feature is:   Disabled                              Rd look-ahead is: Enabled
Write cache is:   Enabled                               DSN feature is:   Disabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

Warning! SMART Attribute Thresholds Structure error: invalid SMART checksum.
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (25724) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 909) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   100   051    -    0
  2 Throughput_Performance  --S--K   100   100   000    -    0
  3 Spin_Up_Time            POS--K   253   148   021    -    8308
  4 Start_Stop_Count        -O--CK   100   100   000    -    22
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  8 Seek_Time_Performance   --S--K   100   100   000    -    0
  9 Power_On_Hours          -O--CK   096   096   000    -    3136
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    22
192 Power-Off_Retract_Count -O--CK   200   200   000    -    17
193 Load_Cycle_Count        -O--CK   200   200   000    -    4
194 Temperature_Celsius     -O---K   108   098   000    -    44
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   100   000    -    0
241 Total_LBAs_Written      -O--CK   200   200   000    -    133504419264
242 Total_LBAs_Read         -O--CK   200   200   000    -    159730486329
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x04       GPL     R/O    256  Device Statistics log
0x04       SL      R/O    255  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x0a       GPL     R/W    256  Device Statistics Notification
0x0c       GPL     R/O   2048  Pending Defects log
0x0f       GPL     R/O      2  Sense Data for Successful NCQ Cmds log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x12       GPL     R/O      1  SATA NCQ Non-Data log
0x13       GPL     R/O      1  SATA NCQ Send and Receive log
0x15       GPL     R/W      1  Rebuild Assist log
0x24       GPL     R/O    322  Current Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x53       GPL     R/O      1  Sense Data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa1  GPL,SL  VS      16  Device vendor specific log
0xa3-0xa5  GPL,SL  VS      16  Device vendor specific log
0xa7       GPL,SL  VS      16  Device vendor specific log
0xa8-0xb1  GPL,SL  VS       1  Device vendor specific log
0xb2       GPL     VS   65535  Device vendor specific log
0xb3-0xb6  GPL,SL  VS       1  Device vendor specific log
0xb9           SL  VS       1  Device vendor specific log
0xba       GPL,SL  VS      84  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xd2       GPL,SL  VS       1  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      3080         -
# 2  Short offline       Completed without error       00%      2905         -
# 3  Short offline       Completed without error       00%      2737         -
# 4  Short offline       Completed without error       00%      2569         -
# 5  Short offline       Completed without error       00%      2401         -
# 6  Extended offline    Completed without error       00%      2335         -
# 7  Short offline       Completed without error       00%      2233         -
# 8  Short offline       Completed without error       00%      2065         -
# 9  Short offline       Completed without error       00%      1897         -
#10  Short offline       Completed without error       00%      1729         -
#11  Extended offline    Completed without error       00%      1615         -
#12  Short offline       Completed without error       00%      1561         -
#13  Short offline       Completed without error       00%      1394         -
#14  Short offline       Completed without error       00%      1226         -
#15  Short offline       Completed without error       00%      1058         -
#16  Short offline       Completed without error       00%       893         -
#17  Extended offline    Completed without error       00%       881         -
#18  Short offline       Completed without error       00%       725         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
Device State:                        Active (0)
Current Temperature:                    44 Celsius
Power Cycle Min/Max Temperature:     44/45 Celsius
Lifetime    Min/Max Temperature:     14/54 Celsius
Under/Over Temperature Limit Count:   0/0
Minimum supported ERC Time Limit:    65 (6.5 seconds)
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      5/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (220)

Index    Estimated Time   Temperature Celsius
 221    2025-06-04 07:14    44  *************************
 ...    ..( 98 skipped).    ..  *************************
 320    2025-06-04 08:53    44  *************************
 321    2025-06-04 08:54    45  **************************
 ...    ..( 37 skipped).    ..  **************************
 359    2025-06-04 09:32    45  **************************
 360    2025-06-04 09:33     ?  -
 361    2025-06-04 09:34    45  **************************
 362    2025-06-04 09:35    44  *************************
 363    2025-06-04 09:36    45  **************************
 ...    ..( 84 skipped).    ..  **************************
 448    2025-06-04 11:01    45  **************************
 449    2025-06-04 11:02    44  *************************
 ...    ..(248 skipped).    ..  *************************
 220    2025-06-04 15:11    44  *************************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 3) ==
0x01  0x008  4              22  -D-  Lifetime Power-On Resets
0x01  0x010  4            3136  -D-  Power-on Hours
0x01  0x018  6    133504419264  -D-  Logical Sectors Written
0x01  0x020  6       697405087  -D-  Number of Write Commands
0x01  0x028  6    159730486329  -D-  Logical Sectors Read
0x01  0x030  6       738664978  -D-  Number of Read Commands
0x01  0x038  6      2699665408  -D-  Date and Time TimeStamp
0x02  =====  =               =  ===  == Free-Fall Statistics (rev 1) ==
0x02  0x010  4               0  -D-  Overlimit Shock Events
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4            3120  -D-  Spindle Motor Power-on Hours
0x03  0x010  4            3119  -D-  Head Flying Hours
0x03  0x018  4              22  -D-  Head Load Events
0x03  0x020  4               0  -D-  Number of Reallocated Logical Sectors
0x03  0x028  4              48  -D-  Read Recovery Attempts
0x03  0x030  4               0  -D-  Number of Mechanical Start Failures
0x03  0x038  4               0  -D-  Number of Realloc. Candidate Logical Sectors
0x03  0x040  4              17  -D-  Number of High Priority Unload Events
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  -D-  Number of Reported Uncorrectable Errors
0x04  0x010  4               0  -D-  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              44  ---  Current Temperature
0x05  0x010  1              44  -D-  Average Short Term Temperature
0x05  0x018  1              42  -D-  Average Long Term Temperature
0x05  0x020  1              54  -D-  Highest Temperature
0x05  0x028  1              22  -D-  Lowest Temperature
0x05  0x030  1              53  -D-  Highest Average Short Term Temperature
0x05  0x038  1              36  -D-  Lowest Average Short Term Temperature
0x05  0x040  1              42  -D-  Highest Average Long Term Temperature
0x05  0x048  1              39  -D-  Lowest Average Long Term Temperature
0x05  0x050  4               0  -D-  Time in Over-Temperature
0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  -D-  Time in Under-Temperature
0x05  0x068  1               5  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4              48  -D-  Number of Hardware Resets
0x06  0x010  4              14  -D-  Number of ASR Events
0x06  0x018  4               0  -D-  Number of Interface CRC Errors
0xff  =====  =               =  ===  == Vendor Specific Statistics (rev 1) ==
0xff  0x008  7               0  -D-  Vendor Specific
0xff  0x010  7               0  -D-  Vendor Specific
0xff  0x018  7               0  -D-  Vendor Specific
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c)
No Defects Logged

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            2  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4        26300  Vendor specific

Good hardware, except the case—now we know why these drives are too hot. And you run regular SMART tests. :+1:
That said, the drives are too warm for comfort, but still officially within operating temperatures.
B002R13D looks fine, so maybe there was a reboot and drives were reshuffled. Let’s run through all of them.

I don’t think my previous WD30EFRX Red had such detailed info about temperature in SMARCTL.

Looking at the detailed report now, it doesn’t sound so obvious that temperature could be the original issue.

That said, the drives are too warm for comfort, but still officially within operating temperatures.

Just as you say @etorix

Good hardware, except the case

I’ve got a Dell R730XD waiting for a transfer of the instance, with more HDD slots and better cooling, but still hasn’t done the transition. Power consumption is gonna be the drawback, Supermicro A2SDi-H-TF is so efficient for that matter.

Now that you mention it, there was a reboot yesterday. An attempt at migrating to 25.10, reverted when I realized VM migration wasn’t supported.
That being said, revert went well and everything was fine for the day and the evening.

Posting other drives smart tests.

There is no 25.10. Do you mean 24.10 or 25.04?

I hope you mean 25.04 and not 25.10, 25.10 is nowhere near ready for general use.

Have you turned the system off and let everything cool down?

Heat can effect more than HDD, the SOC, CPU, and other motherboard chips may be effected by excessive heat. Something caused that drive to lose connectivity, at least for a time.

Looks fine to me.

I suppose you mean “25.04”.
You might be another case of the mysterious “ZFS label eating bug at update time”.
I suggest that you file a bug report, which will upload technical information to iX, post the JIRA ticket here… and pray that @HoneyBadger will have a look at it and find a lead.

/dev/sda

root@freenas[~]# smartctl -x /dev/sda
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD122KRYZ-01CDAB0                 Serial Number:    B002R24D
LU WWN Device Id: 5 0014ee 2c1339354                    Firmware Version: 01.01H01
User Capacity:    12,000,138,625,024 bytes [12.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physicalRotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5706
ATA Version is:   ACS-4 published, ANSI INCITS 529-2018 SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jun  4 15:31:52 2025 CEST         SMART support is: Available - device has SMART capability.                                                      SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled                               DSN feature is:   Disabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

Warning! SMART Attribute Thresholds Structure error: invalid SMART checksum.
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (25724) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 909) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
  2 Throughput_Performance  --S--K   100   100   000    -    0
  3 Spin_Up_Time            POS--K   253   164   021    -    5108
  4 Start_Stop_Count        -O--CK   100   100   000    -    25
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  8 Seek_Time_Performance   --S--K   100   100   000    -    0
  9 Power_On_Hours          -O--CK   096   096   000    -    3058
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    25
192 Power-Off_Retract_Count -O--CK   200   200   000    -    20
193 Load_Cycle_Count        -O--CK   200   200   000    -    4
194 Temperature_Celsius     -O---K   099   091   000    -    53
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   100   000    -    0
241 Total_LBAs_Written      -O--CK   200   200   000    -    110054283864
242 Total_LBAs_Read         -O--CK   200   200   000    -    122091966964
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x04       GPL     R/O    256  Device Statistics log
0x04       SL      R/O    255  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x0a       GPL     R/W    256  Device Statistics Notification
0x0c       GPL     R/O   2048  Pending Defects log
0x0f       GPL     R/O      2  Sense Data for Successful NCQ Cmds log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x12       GPL     R/O      1  SATA NCQ Non-Data log
0x13       GPL     R/O      1  SATA NCQ Send and Receive log
0x15       GPL     R/W      1  Rebuild Assist log
0x24       GPL     R/O    322  Current Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x53       GPL     R/O      1  Sense Data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa1  GPL,SL  VS      16  Device vendor specific log
0xa3-0xa5  GPL,SL  VS      16  Device vendor specific log
0xa7       GPL,SL  VS      16  Device vendor specific log
0xa8-0xb1  GPL,SL  VS       1  Device vendor specific log
0xb2       GPL     VS   65535  Device vendor specific log
0xb3-0xb6  GPL,SL  VS       1  Device vendor specific log
0xb9           SL  VS       1  Device vendor specific log
0xba       GPL,SL  VS      84  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xd2       GPL,SL  VS       1  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      3051         -
# 2  Extended offline    Completed without error       00%      3002         -
# 3  Short offline       Completed without error       00%      2827         -
# 4  Short offline       Completed without error       00%      2659         -
# 5  Short offline       Completed without error       00%      2491         -
# 6  Short offline       Completed without error       00%      2323         -
# 7  Extended offline    Completed without error       00%      2257         -
# 8  Short offline       Completed without error       00%      2155         -
# 9  Short offline       Completed without error       00%      1987         -
#10  Short offline       Completed without error       00%      1819         -
#11  Short offline       Completed without error       00%      1651         -
#12  Extended offline    Completed without error       00%      1537         -
#13  Short offline       Completed without error       00%      1483         -
#14  Short offline       Completed without error       00%      1316         -
#15  Short offline       Completed without error       00%      1148         -
#16  Short offline       Completed without error       00%       980         -
#17  Short offline       Completed without error       00%       815         -
#18  Extended offline    Completed without error       00%       803         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
Device State:                        Active (0)
Current Temperature:                    53 Celsius
Power Cycle Min/Max Temperature:     53/54 Celsius
Lifetime    Min/Max Temperature:     15/61 Celsius
Under/Over Temperature Limit Count:   0/0
Minimum supported ERC Time Limit:    65 (6.5 seconds)
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      5/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (331)

Index    Estimated Time   Temperature Celsius
 332    2025-06-04 07:34    53  **********************************
 ...    ..(116 skipped).    ..  **********************************
 449    2025-06-04 09:31    53  **********************************
 450    2025-06-04 09:32     ?  -
 451    2025-06-04 09:33    53  **********************************
 ...    ..( 12 skipped).    ..  **********************************
 464    2025-06-04 09:46    53  **********************************
 465    2025-06-04 09:47    54  ***********************************
 ...    ..(121 skipped).    ..  ***********************************
 109    2025-06-04 11:49    54  ***********************************
 110    2025-06-04 11:50    53  **********************************
 ...    ..(220 skipped).    ..  **********************************
 331    2025-06-04 15:31    53  **********************************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 3) ==
0x01  0x008  4              25  -D-  Lifetime Power-On Resets
0x01  0x010  4            3058  -D-  Power-on Hours
0x01  0x018  6    110054283864  -D-  Logical Sectors Written
0x01  0x020  6       640104341  -D-  Number of Write Commands
0x01  0x028  6    122091966964  -D-  Logical Sectors Read
0x01  0x030  6       581394089  -D-  Number of Read Commands
0x01  0x038  6      2418865408  -D-  Date and Time TimeStamp
0x02  =====  =               =  ===  == Free-Fall Statistics (rev 1) ==
0x02  0x010  4               0  -D-  Overlimit Shock Events
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4            3042  -D-  Spindle Motor Power-on Hours
0x03  0x010  4            3041  -D-  Head Flying Hours
0x03  0x018  4              25  -D-  Head Load Events
0x03  0x020  4               0  -D-  Number of Reallocated Logical Sectors
0x03  0x028  4              64  -D-  Read Recovery Attempts
0x03  0x030  4               0  -D-  Number of Mechanical Start Failures
0x03  0x038  4               0  -D-  Number of Realloc. Candidate Logical Sectors
0x03  0x040  4              20  -D-  Number of High Priority Unload Events
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  -D-  Number of Reported Uncorrectable Errors
0x04  0x010  4               0  -D-  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              53  ---  Current Temperature
0x05  0x010  1              53  -D-  Average Short Term Temperature
0x05  0x018  1              52  -D-  Average Long Term Temperature
0x05  0x020  1              61  -D-  Highest Temperature
0x05  0x028  1              22  -D-  Lowest Temperature
0x05  0x030  1              58  -D-  Highest Average Short Term Temperature
0x05  0x038  1              34  -D-  Lowest Average Short Term Temperature
0x05  0x040  1              52  -D-  Highest Average Long Term Temperature
0x05  0x048  1              44  -D-  Lowest Average Long Term Temperature
0x05  0x050  4             330  -D-  Time in Over-Temperature
0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  -D-  Time in Under-Temperature
0x05  0x068  1               5  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4              48  -D-  Number of Hardware Resets
0x06  0x010  4              13  -D-  Number of ASR Events
0x06  0x018  4               0  -D-  Number of Interface CRC Errors
0xff  =====  =               =  ===  == Vendor Specific Statistics (rev 1) ==
0xff  0x008  7               0  -D-  Vendor Specific
0xff  0x010  7               0  -D-  Vendor Specific
0xff  0x018  7               0  -D-  Vendor Specific
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c)
No Defects Logged

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            2  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4        27513  Vendor specific

Once things cool down you can use /sbin/zdb -l /dev/sd<n> from the shell (System → Shell in the GUI) to examine the ZFS disk labels. Remember to look at partitions as well as the entire disk. Current TrueNAS uses partition 1 for ZFS on data drives (boot drives are different). So you would /sbin/zdb -l /dev/sda1 to look for ZFS labels on drive sda partition 1.

ls -l /dev/sd* will get you a list of disk devices.

Sorry, yeah I meant 25.04

I did not.

Something caused that drive to lose connectivity, at least for a time.

The physical machine hasn’t been touched (that I know of), it was running in the basement.
Can you share how you identify that? Do you mean the B002SNAD drive?

Once things cool down you can use /sbin/zdb -l /dev/sd<n> from the shell (System → Shell in the GUI) to examine the ZFS disk labels. Remember to look at partitions as well as the entire disk. Current TrueNAS uses partition 1 for ZFS on data drives (boot drives are different). So you would /sbin/zdb -l /dev/sda1 to look for ZFS labels on drive sda partition 1

What do you mean? I can do that right now.
Do you think it’s useful to post it here?

You might be another case of the mysterious “ZFS label eating bug at update time”.
I suggest that you file a bug report,

Ok, do you suggest I do that now with the info I have?

Yes. Since that drive was marked REMOVED, something must have caused it to stop communicating. You mentioned heat as the case is not well ventilated. I pointed out that heat can effect more than just the drives. Turn it off and let it cool down for at least an hour. Then turn it back on and see if you still have the same issues.

I suggested looking at the ZFS labels since if the labels cannot be read (or are corrupt) then you cannot import the zpool. But the first thing is remove one of the possible problems, the heat.

All right, thanks, I will turn it off first.

zdb -l /dev/sde1
zdb -l /dev/sdf1

both return information on LABEL 0