Disks still taged with old pool after "full with zero" wipe

Hi everyone,

I’m currently facing an anoying problem :
After a replacement of a single SSD on a pool of 10 SSD in 2 vdev in raidZ1 I started having massive checksum errors in all of the SSD from the Vdev where the SSD was swapped (let’s call it VDEV 2). I think an intermitent power connexion was the root cause of that issue but I cannot be sure.
Anyway, I somehow mannaged to back up all of the data before all of the disks in that vdev were marked as unmonted. I tried to mount them again, didn’t worked, so i tried rebuilding the whole pool and got the same error message :

"middlewared.service_exception.CallError: [EFAULT] Could not create a partition of 2046260412416 bytes on disk sdd because the disk is too small. If you are replacing a disk in a pool, please ensure that the new disk is not smaller than the disk being replaced.

Could not create partition 1 from 4196353 to 4000798720
Error encountered; not saving changes."

In the end i also tried to do a software update to the latest fangtooth 25.04 because the CPU, memory ant interface overview in the dashboard was also brocken, but no changes.

I created a new pool with the 5 SSD from the Vdev1 that was ok and wiped the 5 other, that’s when i remarked that 3 of those 5 SSD where still taged as beeing part of the old pool. Those 3 SSD also generate failed SMART test, but i cannot find an unsuccesfull SMART test when viewing the results on the GUI

My question : can i “fix” those 3 SSD or are they good for the trash ?

My setup :

Proxmox running a VM of truenas with a full PCI-passtrough of an LSI 9300 HBA for the pool disks (13 in total)

I appreciate the frustrations you are facing but trying everything you can think of to wipe and create a pool is probably not the best approach.

Something went wrong with the original pool. And before we can create a new pool we need to know what caused that and fix the root issue.

Firstly a few pointers for the next time this happens:

  1. You were absolutely right to save a copy of your data whilst you could. That is always the highest priority action.

  2. Ask for help before you try any recovery actions - it would have been useful to have obtained diagnostic information about the old pool to aid with identifying the root cause.

  3. Taking random actions in the hope that it will fix a problem (like upgrading to a new version of o/s when there is zero indication that this will help) isn’t a good idea - you really want to keep changes to a minimum to enable diagnostic information to be collected and to avoid changes from causing any confusion about the root cause.

  4. Wiping an SSD by writing zeros is a VERY VERY VERY bad idea - because all SSDs have a limited write capacity (in the specs Total Bytes Written) and writing zeros (rather than doing a full disk TRIM - which has the same logical effect) substantially eats into that amount.

Obviously, your original pool is gone (along with some of the error stats that would help diagnose the root cause), so we are not going to be able to attempt to recover that - so the objective needs to be to focus on making sure that the remaining disks are sound and helping you create a new pool from them.

Now, fixing these kinds of problems generally requires use the CLI Shell to run commands, and we really need to get some detailed diagnostics. Can you please run the following commands and post the output from each command here in a separate </> box:

  • lsblk -bo NAME,LABEL,MAJ:MIN,TRAN,ROTA,ZONED,VENDOR,MODEL,SERIAL,PARTUUID,START,SIZE,PARTTYPENAME
  • /sbin/zpool status -vLtsc lsblk,serial,smartx,smart
  • sudo zpool import
  • lspci
  • sudo sas2flash -list
  • sudo sas3flash -list
  • sudo storcli show all

Also, for each of the pool SSDs identified in the lsblk, please run the following command substituting the device name and posting the output in a </> box:

  • sudo smartctl -x /dev/sdX

I appreciate that this is quite a lot of effort, but I assure you that this is necessary. Thanks.

2 Likes

I fully agree with you on your point 1 trough 4. I got a bit frustrated with the issue ^^.

For your diagostic steps :

  1. lsblk -bo NAME,LABEL,MAJ:MIN,TRAN,ROTA,ZONED,VENDOR,MODEL,SERIAL,PARTUUID,START,SIZE,PARTTYPENAME
> root@truenas[/home/admin]# lsblk -bo NAME,LABEL,MAJ:MIN,TRAN,ROTA,ZONED,VENDOR,MODEL,SERIAL,PARTUUID,START,SIZE,PARTTYPENAME
> NAME   LABEL         MAJ:MIN TRAN ROTA ZONED VENDOR   MODEL               SERIAL               PARTUUID                               START          SIZE PARTTYPENAME
> sda                    8:0           1 none  QEMU     QEMU HARDDISK       drive-scsi0                                                         34359738368 
> ├─sda1                 8:1           1 none                                                    fe076a1f-e506-409c-a727-d7a3eb98e9ea    4096       1048576 BIOS boot
> ├─sda2 EFI             8:2           1 none                                                    1a000f7a-9a87-4e06-8d7f-b6892f7084bb    6144     536870912 EFI System
> └─sda3 boot-pool       8:3           1 none                                                    04070c22-02a9-45a0-9f4c-3b5f83343222 1054720   33819704832 Solaris /usr & Apple ZFS
> sdb                    8:16  sas     0 none  ATA      Patriot P220 2048GB P220IICB2502282905                                                2048408248320 
> └─sdb1                 8:17          0 none                                                    9e0158f6-c78f-43b8-8a55-b466d70a5d10    2048 2046260412416 Solaris /usr & Apple ZFS
> sdc                    8:32  sas     0 none  ATA      PNY 2TB SATA SSD    PNA3624302834AT04890                                              2048408248320 
> ├─sdc1                 8:33          0 none                                                    968f8c3d-a7a1-47f1-a661-f01f28b23c61    2048    2147484160 Linux swap
> └─sdc2 fast_pool       8:34          0 none                                                    00809d21-eb8f-439b-9730-5d0a204e32d1 4198400 2046258315776 Solaris /usr & Apple ZFS
> sdd                    8:48  sas     0 none  ATA      PNY 2TB SATA SSD    PNA4524103605AT00339                                              2048408248320 
> ├─sdd1                 8:49          0 none                                                    aa2fe2a6-1c1b-45dc-963c-666378fb7134    2048    2147484160 Linux swap
> └─sdd2 fast_pool       8:50          0 none                                                    e3dbb6fb-e49d-44ba-bb01-ae7e0a448b4b 4198400 2046258315776 Solaris /usr & Apple ZFS
> sde                    8:64  sas     0 none  ATA      PNY 2TB SATA SSD    PNA4524103605AT00290                                              2048408248320 
> ├─sde1 truenas:swap4   8:65          0 none                                                    2492f8e0-6635-4805-a73c-93dae94a0040    2048    2147484160 Linux swap
> └─sde2 fast_pool       8:66          0 none                                                    66ab80df-b616-41d9-ab36-d031099701e6 4198400 2046258315776 Solaris /usr & Apple ZFS
> sdf                    8:80  sas     0 none  ATA      PNY 2TB SATA SSD    PNA4524103605AT00523                                              2048408248320 
> └─sdf1                 8:81          0 none                                                    c0181761-c7af-4b35-8a80-943f728c6773    2048 2046260412416 Solaris /usr & Apple ZFS
> sdg                    8:96  sas     0 none  ATA      PNY CS900 2TB SSD   PNY2243221025010001B                                              2000398934016 
> └─sdg1 SSD_1           8:97          0 none                                                    a39efbce-4896-4ef0-85e5-7b6d6439e41d    2048 1998251360256 Solaris /usr & Apple ZFS
> sdh                    8:112 sas     0 none  ATA      PNY CS900 2TB SSD   PNY2243221025010001A                                              2000398934016 
> └─sdh1 SSD_1           8:113         0 none                                                    855364c5-e89b-4831-884f-819ea0f1e5cd    2048 1998251360256 Solaris /usr & Apple ZFS
> sdi                    8:128 sas     0 none  ATA      PNY CS900 2TB SSD   PNY2243221025010001E                                              2000398934016 
> └─sdi1 SSD_1           8:129         0 none                                                    60799bc0-51d9-4bf9-a2e7-f79279d2604e    2048 1998251360256 Solaris /usr & Apple ZFS
> sdj                    8:144 sas     0 none  ATA      PNY CS900 2TB SSD   PNY2243221025010001C                                              2000398934016 
> └─sdj1 SSD_1           8:145         0 none                                                    16aab8b7-8d5d-4099-bfc0-aec2506fff7c    2048 1998251360256 Solaris /usr & Apple ZFS
> sdk                    8:160 sas     0 none  ATA      PNY CS900 2TB SSD   PNY2243221025010001D                                              2000398934016 
> └─sdk1 SSD_1           8:161         0 none                                                    657b832d-8309-486b-b559-ac2c4427b01f    2048 1998251360256 Solaris /usr & Apple ZFS
> sdl                    8:176 sas     1 none  ATA      OOS8000G            00001J3V                                                          8001563222016 
> ├─sdl1 truenas:swap3   8:177         1 none                                                    e38e1048-7d18-4aee-9fdd-6504260c1ed8    2048    2147484160 Linux swap
> └─sdl2 hdd             8:178         1 none                                                    4122567f-6d98-424c-acb5-5745839142b6 4198400 7999413289472 Solaris /usr & Apple ZFS
> sdm                    8:192 sas     1 none  ATA      OOS8000G            00061MZB                                                          8001563222016 
> ├─sdm1 truenas:swap3   8:193         1 none                                                    3c272a1f-c177-4b69-be4b-cedb12103966    2048    2147484160 Linux swap
> └─sdm2 hdd             8:194         1 none                                                    b6d37e63-f813-4fff-a5b1-0b9dd2c42885 4198400 7999413289472 Solaris /usr & Apple ZFS
> sdn                    8:208 sas     1 none  ATA      OOS8000G            0000Z00W                                                          8001563222016 
> ├─sdn1 truenas:swap3   8:209         1 none                                                    bc4bfe16-0b24-4bc4-bebc-0da39422c595    2048    2147484160 Linux swap
> └─sdn2 hdd             8:210         1 none                                                    2a2753d6-9909-4b0a-a510-bb2c66abd71f 4198400 7999413289472 Solaris /usr & Apple ZFS

As you can guess, the 3 drive with an error are SDC, SDD and SDE

  1. /sbin/zpool status -vLtsc lsblk,serial,smartx,smart
    this one seems to hang up, and when i do a ctrl+C to cancel it asks for admin password, and i have “zsh: event not found: y”
    I tried to run it as root to bypass the issue but “Can’t run -c with root privileges unless ZPOOL_SCRIPTS_AS_ROOT is set.”

  2. lspci

root@truenas[/home/admin]# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:03.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon
00:05.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
00:10.0 Serial Attached SCSI controller: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
00:11.0 Serial Attached SCSI controller: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
00:12.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03)
00:1e.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
00:1f.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
01:01.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI
  1. sudo sas2flash -list
root@truenas[/home/admin]# sudo sas2flash -list
LSI Corporation SAS2 Flash Utility
Version 20.00.00.00 (2014.09.18) 
Copyright (c) 2008-2014 LSI Corporation. All rights reserved 

        No LSI SAS adapters found! Limited Command Set Available!
        ERROR: Command Not allowed without an adapter!
        ERROR: Couldn't Create Command -list
        Exiting Program.
  1. sudo sas3flash -list
root@truenas[/home/admin]# sudo sas3flash -list
Avago Technologies SAS3 Flash Utility
Version 16.00.00.00 (2017.05.02) 
Copyright 2008-2017 Avago Technologies. All rights reserved.

        Adapter Selected is a Avago SAS: SAS3008(C0)

        Controller Number              : 0
        Controller                     : SAS3008(C0)
        PCI Address                    : 00:00:10:00
        SAS Address                    : 500062b-2-013e-d440
        NVDATA Version (Default)       : 07.01.00.03
        NVDATA Version (Persistent)    : 07.01.00.03
        Firmware Product ID            : 0x2221 (IT)
        Firmware Version               : 07.00.01.00
        NVDATA Vendor                  : LSI
        NVDATA Product ID              : SAS9300-16i
        BIOS Version                   : 08.15.00.00
        UEFI BSD Version               : 06.00.00.00
        FCODE Version                  : N/A
        Board Name                     : SAS9300-16i
        Board Assembly                 : 03-25600-01B
        Board Tracer Number            : SP53931343

        Finished Processing Commands Successfully.
        Exiting SAS3Flash.
  1. sudo storcli show all
root@truenas[/home/admin]# sudo storcli show all
CLI Version = 007.2807.0000.0000 Dec 22, 2023
Operating system = Linux 6.12.15-production+truenas
Status Code = 0
Status = Success
Description = None

Number of Controllers = 0
Host Name = truenas
Operating System  = Linux 6.12.15-production+truenas
StoreLib IT Version = 07.2900.0200.0100
  1. sudo smartctl -x /dev/sdx
    First the 3 with apparent errors (sdc, sdd and sde)
admin@truenas[~]$ sudo smartctl -x /dev/sdc
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     PNY 2TB SATA SSD
Serial Number:    PNA3624302834AT04890
LU WWN Device Id: 5 f8db4c 362402834
Firmware Version: X0108A0
User Capacity:    2,048,408,248,320 bytes [2.04 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Jul  3 10:56:36 2025 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
No failed Attributes found.

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.
SCT capabilities:              (0x0001) SCT Status supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     -O--CK   100   100   050    -    0
  5 Reallocated_Sector_Ct   -O--CK   100   100   050    -    0
  9 Power_On_Hours          -O--CK   100   100   050    -    4173
 12 Power_Cycle_Count       -O--CK   100   100   050    -    4
160 Unknown_Attribute       -O--CK   100   100   050    -    0
161 Unknown_Attribute       PO--CK   100   100   050    -    100
163 Unknown_Attribute       -O--CK   100   100   050    -    93
164 Unknown_Attribute       -O--CK   100   100   050    -    7279
165 Unknown_Attribute       -O--CK   100   100   050    -    17
166 Unknown_Attribute       -O--CK   100   100   050    -    1
167 Unknown_Attribute       -O--CK   100   100   050    -    4
168 Unknown_Attribute       -O--CK   100   100   050    -    5050
169 Unknown_Attribute       -O--CK   100   100   050    -    100
175 Program_Fail_Count_Chip -O--CK   100   100   050    -    0
176 Erase_Fail_Count_Chip   -O--CK   100   100   050    -    0
177 Wear_Leveling_Count     -O--CK   100   100   050    -    0
178 Used_Rsvd_Blk_Cnt_Chip  -O--CK   100   100   050    -    0
181 Program_Fail_Cnt_Total  -O--CK   100   100   050    -    0
182 Erase_Fail_Count_Total  -O--CK   100   100   050    -    0
192 Power-Off_Retract_Count -O--CK   100   100   050    -    4
194 Temperature_Celsius     -O---K   100   100   050    -    32
195 Hardware_ECC_Recovered  -O--CK   100   100   050    -    0
196 Reallocated_Event_Count -O--CK   100   100   050    -    0
197 Current_Pending_Sector  -O--CK   100   100   050    -    0
198 Offline_Uncorrectable   -O--CK   100   100   050    -    0
199 UDMA_CRC_Error_Count    -O--CK   100   100   050    -    0
232 Available_Reservd_Space -O--CK   100   100   050    -    100
241 Total_LBAs_Written      ----CK   100   100   050    -    213861
242 Total_LBAs_Read         ----CK   100   100   050    -    147597
245 Unknown_Attribute       -O--CK   100   100   050    -    67746
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      1  Comprehensive SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x24       GPL     R/O     88  Current Device Internal Status Data log
0x25       GPL     R/O     32  Saved Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      4152         -
# 2  Short offline       Completed without error       00%      4128         -
# 3  Short offline       Completed without error       00%      4104         -
# 4  Short offline       Completed without error       00%      4080         -
# 5  Short offline       Completed without error       00%      4056         -
# 6  Short offline       Completed without error       00%      4032         -
# 7  Short offline       Completed without error       00%      4007         -
# 8  Short offline       Completed without error       00%      3983         -
# 9  Short offline       Completed without error       00%      3959         -
#10  Short offline       Completed without error       00%      3935         -
#11  Short offline       Completed without error       00%      3911         -
#12  Short offline       Completed without error       00%      3887         -
#13  Short offline       Completed without error       00%      3863         -
#14  Short offline       Completed without error       00%      3839         -
#15  Short offline       Completed without error       00%      3815         -
#16  Short offline       Completed without error       00%      3791         -
#17  Short offline       Completed without error       00%      3768         -
#18  Short offline       Completed without error       00%      3744         -
#19  Short offline       Completed without error       00%      3720         -

Selective Self-tests/Logging not supported

SCT Status Version:                  3
SCT Version (vendor specific):       0 (0x0000)
Device State:                        Active (0)
Current Temperature:                    32 Celsius
Power Cycle Min/Max Temperature:     32/32 Celsius
Lifetime    Min/Max Temperature:     21/68 Celsius
Specified Max Operating Temperature:   100 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Data Table command not supported

SCT Error Recovery Control command not supported

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4               4  ---  Lifetime Power-On Resets
0x01  0x010  4            4173  ---  Power-on Hours
0x01  0x018  6      1130754864  ---  Logical Sectors Written
0x01  0x020  6       204224541  ---  Number of Write Commands
0x01  0x028  6      1083030075  ---  Logical Sectors Read
0x01  0x030  6       618534309  ---  Number of Read Commands
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               0  ---  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  4            0  Command failed due to ICRC error
0x0002  4            0  R_ERR response for data FIS
0x0005  4            0  R_ERR response for non-data FIS
0x000a  4            1  Device-to-host register FISes sent due to a COMRESET
> admin@truenas[~]$ sudo smartctl -x /dev/sdd
> smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
> Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Device Model:     PNY 2TB SATA SSD
> Serial Number:    PNA4524103605AT00339
> LU WWN Device Id: 5 f8db4c 452403605
> Firmware Version: W0724A0
> User Capacity:    2,048,408,248,320 bytes [2.04 TB]
> Sector Size:      512 bytes logical/physical
> Rotation Rate:    Solid State Device
> Form Factor:      2.5 inches
> TRIM Command:     Available
> Device is:        Not in smartctl database 7.3/5528
> ATA Version is:   ACS-2 T13/2015-D revision 3
> SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Thu Jul  3 11:01:10 2025 CEST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> AAM feature is:   Unavailable
> APM level is:     254 (maximum performance)
> Rd look-ahead is: Enabled
> Write cache is:   Enabled
> DSN feature is:   Unavailable
> ATA Security is:  Disabled, NOT FROZEN [SEC1]
> Wt Cache Reorder: Unavailable
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: FAILED!
> Drive failure expected in less than 24 hours. SAVE ALL DATA.
> No failed Attributes found.
> 
> General SMART Values:
> Offline data collection status:  (0x02) Offline data collection activity
>                                         was completed without error.
>                                         Auto Offline Data Collection: Disabled.
> Self-test execution status:      (   0) The previous self-test routine completed
>                                         without error or no self-test has ever 
>                                         been run.
> Total time to complete Offline 
> data collection:                (  120) seconds.
> Offline data collection
> capabilities:                    (0x11) SMART execute Offline immediate.
>                                         No Auto Offline data collection support.
>                                         Suspend Offline collection upon new
>                                         command.
>                                         No Offline surface scan supported.
>                                         Self-test supported.
>                                         No Conveyance Self-test supported.
>                                         No Selective Self-test supported.
> SMART capabilities:            (0x0002) Does not save SMART data before
>                                         entering power-saving mode.
>                                         Supports SMART auto save timer.
> Error logging capability:        (0x01) Error logging supported.
>                                         General Purpose Logging supported.
> Short self-test routine 
> recommended polling time:        (   2) minutes.
> Extended self-test routine
> recommended polling time:        (  10) minutes.
> SCT capabilities:              (0x0001) SCT Status supported.
> 
> SMART Attributes Data Structure revision number: 1
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
>   1 Raw_Read_Error_Rate     -O--CK   100   100   050    -    0
>   5 Reallocated_Sector_Ct   -O--CK   100   100   050    -    0
>   9 Power_On_Hours          -O--CK   100   100   050    -    4180
>  12 Power_Cycle_Count       -O--CK   100   100   050    -    4
> 160 Unknown_Attribute       -O--CK   100   100   050    -    0
> 161 Unknown_Attribute       PO--CK   100   100   050    -    100
> 163 Unknown_Attribute       -O--CK   100   100   050    -    21
> 164 Unknown_Attribute       -O--CK   100   100   050    -    4868
> 165 Unknown_Attribute       -O--CK   100   100   050    -    14
> 166 Unknown_Attribute       -O--CK   100   100   050    -    1
> 167 Unknown_Attribute       -O--CK   100   100   050    -    5
> 168 Unknown_Attribute       -O--CK   100   100   050    -    3808
> 169 Unknown_Attribute       -O--CK   100   100   050    -    100
> 175 Program_Fail_Count_Chip -O--CK   100   100   050    -    0
> 176 Erase_Fail_Count_Chip   -O--CK   100   100   050    -    0
> 177 Wear_Leveling_Count     -O--CK   100   100   050    -    0
> 178 Used_Rsvd_Blk_Cnt_Chip  -O--CK   100   100   050    -    0
> 181 Program_Fail_Cnt_Total  -O--CK   100   100   050    -    0
> 182 Erase_Fail_Count_Total  -O--CK   100   100   050    -    0
> 192 Power-Off_Retract_Count -O--CK   100   100   050    -    4
> 194 Temperature_Celsius     -O---K   100   100   050    -    32
> 195 Hardware_ECC_Recovered  -O--CK   100   100   050    -    0
> 196 Reallocated_Event_Count -O--CK   100   100   050    -    0
> 197 Current_Pending_Sector  -O--CK   100   100   050    -    0
> 198 Offline_Uncorrectable   -O--CK   100   100   050    -    0
> 199 UDMA_CRC_Error_Count    -O--CK   100   100   050    -    0
> 232 Available_Reservd_Space -O--CK   100   100   050    -    100
> 241 Total_LBAs_Written      ----CK   100   100   050    -    205065
> 242 Total_LBAs_Read         ----CK   100   100   050    -    138633
> 245 Unknown_Attribute       -O--CK   100   100   050    -    72697
>                             ||||||_ K auto-keep
>                             |||||__ C event count
>                             ||||___ R error rate
>                             |||____ S speed/performance
>                             ||_____ O updated online
>                             |______ P prefailure warning
> 
> General Purpose Log Directory Version 1
> SMART           Log Directory Version 1 [multi-sector log support]
> Address    Access  R/W   Size  Description
> 0x00       GPL,SL  R/O      1  Log Directory
> 0x01           SL  R/O      1  Summary SMART error log
> 0x02           SL  R/O      1  Comprehensive SMART error log
> 0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
> 0x04       GPL,SL  R/O      8  Device Statistics log
> 0x06           SL  R/O      1  SMART self-test log
> 0x07       GPL     R/O      1  Extended self-test log
> 0x10       GPL     R/O      1  NCQ Command Error log
> 0x11       GPL     R/O      1  SATA Phy Event Counters log
> 0x24       GPL     R/O     88  Current Device Internal Status Data log
> 0x25       GPL     R/O     32  Saved Device Internal Status Data log
> 0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
> 0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
> 
> SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
> No Errors Logged
> 
> SMART Extended Self-test Log Version: 1 (1 sectors)
> Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
> # 1  Short offline       Completed without error       00%      4136         -
> # 2  Short offline       Completed without error       00%      4112         -
> # 3  Short offline       Completed without error       00%      4087         -
> # 4  Short offline       Completed without error       00%      4063         -
> # 5  Short offline       Completed without error       00%      4039         -
> # 6  Short offline       Completed without error       00%      4015         -
> # 7  Short offline       Completed without error       00%      3991         -
> # 8  Short offline       Completed without error       00%      3967         -
> # 9  Short offline       Completed without error       00%      3943         -
> #10  Short offline       Completed without error       00%      3919         -
> #11  Short offline       Completed without error       00%      3895         -
> #12  Short offline       Completed without error       00%      3870         -
> #13  Short offline       Completed without error       00%      3846         -
> #14  Short offline       Completed without error       00%      3822         -
> #15  Short offline       Completed without error       00%      3798         -
> #16  Short offline       Completed without error       00%      3775         -
> #17  Short offline       Completed without error       00%      3750         -
> #18  Short offline       Completed without error       00%      3726         -
> #19  Short offline       Completed without error       00%      3702         -
> 
> Selective Self-tests/Logging not supported
> 
> SCT Status Version:                  3
> SCT Version (vendor specific):       0 (0x0000)
> Device State:                        Active (0)
> Current Temperature:                    32 Celsius
> Power Cycle Min/Max Temperature:     32/32 Celsius
> Lifetime    Min/Max Temperature:     21/73 Celsius
> Specified Max Operating Temperature:   100 Celsius
> Under/Over Temperature Limit Count:   0/0
> 
> SCT Data Table command not supported
> 
> SCT Error Recovery Control command not supported
> 
> Device Statistics (GP Log 0x04)
> Page  Offset Size        Value Flags Description
> 0x01  =====  =               =  ===  == General Statistics (rev 1) ==
> 0x01  0x008  4               4  ---  Lifetime Power-On Resets
> 0x01  0x010  4            4180  ---  Power-on Hours
> 0x01  0x018  6       554289920  ---  Logical Sectors Written
> 0x01  0x020  6       201495289  ---  Number of Write Commands
> 0x01  0x028  6       495556240  ---  Logical Sectors Read
> 0x01  0x030  6       556274762  ---  Number of Read Commands
> 0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
> 0x07  0x008  1               0  ---  Percentage Used Endurance Indicator
>                                 |||_ C monitored condition met
>                                 ||__ D supports DSN
>                                 |___ N normalized value
> 
> Pending Defects log (GP Log 0x0c) not supported
> 
> SATA Phy Event Counters (GP Log 0x11)
> ID      Size     Value  Description
> 0x0001  4            0  Command failed due to ICRC error
> 0x0002  4            0  R_ERR response for data FIS
> 0x0005  4            0  R_ERR response for non-data FIS
> 0x000a  4            0  Device-to-host register FISes sent due to a COMRESET
admin@truenas[~]$ sudo smartctl -x /dev/sde
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     PNY 2TB SATA SSD
Serial Number:    PNA4524103605AT00290
LU WWN Device Id: 5 f8db4c 452403605
Firmware Version: W0724A0
User Capacity:    2,048,408,248,320 bytes [2.04 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Jul  3 11:02:19 2025 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
No failed Attributes found.

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.
SCT capabilities:              (0x0001) SCT Status supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     -O--CK   100   100   050    -    0
  5 Reallocated_Sector_Ct   -O--CK   100   100   050    -    0
  9 Power_On_Hours          -O--CK   100   100   050    -    4194
 12 Power_Cycle_Count       -O--CK   100   100   050    -    4
160 Unknown_Attribute       -O--CK   100   100   050    -    0
161 Unknown_Attribute       PO--CK   100   100   050    -    100
163 Unknown_Attribute       -O--CK   100   100   050    -    16
164 Unknown_Attribute       -O--CK   100   100   050    -    4856
165 Unknown_Attribute       -O--CK   100   100   050    -    14
166 Unknown_Attribute       -O--CK   100   100   050    -    1
167 Unknown_Attribute       -O--CK   100   100   050    -    5
168 Unknown_Attribute       -O--CK   100   100   050    -    3808
169 Unknown_Attribute       -O--CK   100   100   050    -    100
175 Program_Fail_Count_Chip -O--CK   100   100   050    -    0
176 Erase_Fail_Count_Chip   -O--CK   100   100   050    -    0
177 Wear_Leveling_Count     -O--CK   100   100   050    -    0
178 Used_Rsvd_Blk_Cnt_Chip  -O--CK   100   100   050    -    0
181 Program_Fail_Cnt_Total  -O--CK   100   100   050    -    0
182 Erase_Fail_Count_Total  -O--CK   100   100   050    -    0
192 Power-Off_Retract_Count -O--CK   100   100   050    -    4
194 Temperature_Celsius     -O---K   100   100   050    -    31
195 Hardware_ECC_Recovered  -O--CK   100   100   050    -    0
196 Reallocated_Event_Count -O--CK   100   100   050    -    0
197 Current_Pending_Sector  -O--CK   100   100   050    -    0
198 Offline_Uncorrectable   -O--CK   100   100   050    -    0
199 UDMA_CRC_Error_Count    -O--CK   100   100   050    -    0
232 Available_Reservd_Space -O--CK   100   100   050    -    100
241 Total_LBAs_Written      ----CK   100   100   050    -    178765
242 Total_LBAs_Read         ----CK   100   100   050    -    112371
245 Unknown_Attribute       -O--CK   100   100   050    -    72755
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      1  Comprehensive SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x24       GPL     R/O     88  Current Device Internal Status Data log
0x25       GPL     R/O     32  Saved Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      4150         -
# 2  Short offline       Completed without error       00%      4126         -
# 3  Short offline       Completed without error       00%      4101         -
# 4  Short offline       Completed without error       00%      4077         -
# 5  Short offline       Completed without error       00%      4053         -
# 6  Short offline       Completed without error       00%      4029         -
# 7  Short offline       Completed without error       00%      4005         -
# 8  Short offline       Completed without error       00%      3980         -
# 9  Short offline       Completed without error       00%      3956         -
#10  Short offline       Completed without error       00%      3932         -
#11  Short offline       Completed without error       00%      3908         -
#12  Short offline       Completed without error       00%      3884         -
#13  Short offline       Completed without error       00%      3859         -
#14  Short offline       Completed without error       00%      3835         -
#15  Short offline       Completed without error       00%      3811         -
#16  Short offline       Completed without error       00%      3787         -
#17  Short offline       Completed without error       00%      3763         -
#18  Short offline       Completed without error       00%      3739         -
#19  Short offline       Completed without error       00%      3715         -

Selective Self-tests/Logging not supported

SCT Status Version:                  3
SCT Version (vendor specific):       0 (0x0000)
Device State:                        Active (0)
Current Temperature:                    31 Celsius
Power Cycle Min/Max Temperature:     31/31 Celsius
Lifetime    Min/Max Temperature:     21/63 Celsius
Specified Max Operating Temperature:   100 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Data Table command not supported

SCT Error Recovery Control command not supported

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4               4  ---  Lifetime Power-On Resets
0x01  0x010  4            4194  ---  Power-on Hours
0x01  0x018  6      3125669620  ---  Logical Sectors Written
0x01  0x020  6       199821420  ---  Number of Write Commands
0x01  0x028  6      3069432834  ---  Logical Sectors Read
0x01  0x030  6       341276084  ---  Number of Read Commands
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               0  ---  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  4            0  Command failed due to ICRC error
0x0002  4            0  R_ERR response for data FIS
0x0005  4            0  R_ERR response for non-data FIS
0x000a  4            0  Device-to-host register FISes sent due to a COMRESET

then the other 2 from that original VDEV (sdb and sdf)

admin@truenas[~]$ sudo smartctl -x /dev/sdb
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     Patriot P220 2048GB
Serial Number:    P220IICB2502282905
LU WWN Device Id: 0 000000 000000000
Firmware Version: HP3618C8
User Capacity:    2,048,408,248,320 bytes [2.04 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Jul  3 11:05:44 2025 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Disabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (  208) seconds.
Offline data collection
capabilities:                    (0x5d) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Abort Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (  30) minutes.
Extended self-test routine
recommended polling time:        (  60) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     -O--CK   100   100   050    -    0
  5 Reallocated_Sector_Ct   -O--CK   100   100   050    -    0
  9 Power_On_Hours          -O--CK   100   100   050    -    131
 12 Power_Cycle_Count       -O--CK   100   100   050    -    6
160 Unknown_Attribute       -O--CK   100   100   050    -    0
161 Unknown_Attribute       -O--CK   100   100   050    -    10084
163 Unknown_Attribute       -O--CK   100   100   050    -    500
164 Unknown_Attribute       -O--CK   100   100   050    -    0
165 Unknown_Attribute       -O--CK   100   100   050    -    0
166 Unknown_Attribute       -O--CK   100   100   050    -    0
167 Unknown_Attribute       -O--CK   100   100   050    -    0
168 Unknown_Attribute       -O--CK   100   100   050    -    0
169 Unknown_Attribute       -O--CK   100   100   050    -    100
175 Program_Fail_Count_Chip -O--CK   100   100   050    -    16777216
176 Erase_Fail_Count_Chip   -O--CK   100   100   050    -    0
177 Wear_Leveling_Count     -O--CK   100   100   050    -    56029
178 Used_Rsvd_Blk_Cnt_Chip  -O--CK   100   100   050    -    0
181 Program_Fail_Cnt_Total  -O--CK   100   100   050    -    0
182 Erase_Fail_Count_Total  -O--CK   100   100   050    -    0
192 Power-Off_Retract_Count -O--CK   100   100   050    -    6
194 Temperature_Celsius     -O--CK   100   100   050    -    40
195 Hardware_ECC_Recovered  -O--CK   100   100   050    -    0
196 Reallocated_Event_Count -O--CK   100   100   050    -    0
197 Current_Pending_Sector  -O--CK   100   100   050    -    0
198 Offline_Uncorrectable   -O--CK   100   100   050    -    0
199 UDMA_CRC_Error_Count    -O--CK   100   100   050    -    0
232 Available_Reservd_Space -O--CK   100   100   050    -    100
241 Total_LBAs_Written      -O--CK   100   100   050    -    24305
242 Total_LBAs_Read         -O--CK   100   100   050    -    40903
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01       GPL,SL  R/O      1  Summary SMART error log
0x02       GPL,SL  R/O      1  Comprehensive SMART error log
0x03       GPL,SL  R/O      1  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06       GPL,SL  R/O      1  SMART self-test log
0x07       GPL,SL  R/O      1  Extended self-test log
0x09       GPL,SL  R/W      1  Selective self-test log
0x10       GPL,SL  R/O      1  NCQ Command Error log
0x11       GPL,SL  R/O      1  SATA Phy Event Counters log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0       GPL,SL  VS      16  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 0 (1 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Offline             Completed without error       00%       120         -
# 2  Offline             Self-test routine in progress 10%       120         -
# 3  Offline             Self-test routine in progress 10%       120         -
# 4  Offline             Self-test routine in progress 10%       120         -
# 5  Offline             Self-test routine in progress 10%       120         -
# 6  Offline             Self-test routine in progress 10%       120         -
# 7  Offline             Self-test routine in progress 10%       120         -
# 8  Offline             Self-test routine in progress 10%       120         -
# 9  Offline             Self-test routine in progress 10%       120         -
#10  Offline             Self-test routine in progress 10%       120         -
#11  Offline             Self-test routine in progress 10%       120         -
#12  Offline             Self-test routine in progress 10%       120         -
#13  Offline             Self-test routine in progress 10%       120         -
#14  Offline             Self-test routine in progress 10%       120         -
#15  Offline             Self-test routine in progress 10%       120         -
#16  Offline             Self-test routine in progress 10%       120         -
#17  Offline             Self-test routine in progress 10%       120         -
#18  Offline             Self-test routine in progress 10%       120         -
#19  Offline             Self-test routine in progress 10%       120         -

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Commands not supported

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4               6  ---  Lifetime Power-On Resets
0x01  0x010  4             131  ---  Power-on Hours
0x01  0x018  6      1592852480  ---  Logical Sectors Written
0x01  0x020  6        15465563  ---  Number of Write Commands
0x01  0x028  6      2680619008  ---  Logical Sectors Read
0x01  0x030  6        25555697  ---  Number of Read Commands
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4               6  ---  Lifetime Power-On Resets
0x01  0x010  4             131  ---  Power-on Hours
0x01  0x018  6      1592852480  ---  Logical Sectors Written
0x01  0x020  6        15465563  ---  Number of Write Commands
0x01  0x028  6      2680619008  ---  Logical Sectors Read
0x01  0x030  6        25555697  ---  Number of Read Commands
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4               6  ---  Lifetime Power-On Resets
0x01  0x010  4             131  ---  Power-on Hours
0x01  0x018  6      1592852480  ---  Logical Sectors Written
0x01  0x020  6        15465563  ---  Number of Write Commands
0x01  0x028  6      2680619008  ---  Logical Sectors Read
0x01  0x030  6        25555697  ---  Number of Read Commands
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4               6  ---  Lifetime Power-On Resets
0x01  0x010  4             131  ---  Power-on Hours
0x01  0x018  6      1592852480  ---  Logical Sectors Written
0x01  0x020  6        15465563  ---  Number of Write Commands
0x01  0x028  6      2680619008  ---  Logical Sectors Read
0x01  0x030  6        25555697  ---  Number of Read Commands
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4               6  ---  Lifetime Power-On Resets
0x01  0x010  4             131  ---  Power-on Hours
0x01  0x018  6      1592852480  ---  Logical Sectors Written
0x01  0x020  6        15465563  ---  Number of Write Commands
0x01  0x028  6      2680619008  ---  Logical Sectors Read
0x01  0x030  6        25555697  ---  Number of Read Commands
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4               6  ---  Lifetime Power-On Resets
0x01  0x010  4             131  ---  Power-on Hours
0x01  0x018  6      1592852480  ---  Logical Sectors Written
0x01  0x020  6        15465563  ---  Number of Write Commands
0x01  0x028  6      2680619008  ---  Logical Sectors Read
0x01  0x030  6        25555697  ---  Number of Read Commands
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4               6  ---  Lifetime Power-On Resets
0x01  0x010  4             131  ---  Power-on Hours
0x01  0x018  6      1592852480  ---  Logical Sectors Written
0x01  0x020  6        15465563  ---  Number of Write Commands
0x01  0x028  6      2680619008  ---  Logical Sectors Read
0x01  0x030  6        25555697  ---  Number of Read Commands
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4               6  ---  Lifetime Power-On Resets
0x01  0x010  4             131  ---  Power-on Hours
0x01  0x018  6      1592852480  ---  Logical Sectors Written
0x01  0x020  6        15465563  ---  Number of Write Commands
0x01  0x028  6      2680619008  ---  Logical Sectors Read
0x01  0x030  6        25555697  ---  Number of Read Commands
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0009  2           20  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2           17  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS
admin@truenas[~]$ sudo smartctl -x /dev/sdf
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     PNY 2TB SATA SSD
Serial Number:    PNA4524103605AT00523
LU WWN Device Id: 5 f8db4c 452403605
Firmware Version: W0724A0
User Capacity:    2,048,408,248,320 bytes [2.04 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Jul  3 11:06:45 2025 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.
SCT capabilities:              (0x0001) SCT Status supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     -O--CK   100   100   050    -    0
  5 Reallocated_Sector_Ct   -O--CK   100   100   050    -    0
  9 Power_On_Hours          -O--CK   100   100   050    -    4200
 12 Power_Cycle_Count       -O--CK   100   100   050    -    8
160 Unknown_Attribute       -O--CK   100   100   050    -    0
161 Unknown_Attribute       PO--CK   100   100   050    -    100
163 Unknown_Attribute       -O--CK   100   100   050    -    20
164 Unknown_Attribute       -O--CK   100   100   050    -    4851
165 Unknown_Attribute       -O--CK   100   100   050    -    14
166 Unknown_Attribute       -O--CK   100   100   050    -    1
167 Unknown_Attribute       -O--CK   100   100   050    -    5
168 Unknown_Attribute       -O--CK   100   100   050    -    3808
169 Unknown_Attribute       -O--CK   100   100   050    -    100
175 Program_Fail_Count_Chip -O--CK   100   100   050    -    0
176 Erase_Fail_Count_Chip   -O--CK   100   100   050    -    0
177 Wear_Leveling_Count     -O--CK   100   100   050    -    0
178 Used_Rsvd_Blk_Cnt_Chip  -O--CK   100   100   050    -    0
181 Program_Fail_Cnt_Total  -O--CK   100   100   050    -    0
182 Erase_Fail_Count_Total  -O--CK   100   100   050    -    0
192 Power-Off_Retract_Count -O--CK   100   100   050    -    8
194 Temperature_Celsius     -O---K   100   100   050    -    31
195 Hardware_ECC_Recovered  -O--CK   100   100   050    -    0
196 Reallocated_Event_Count -O--CK   100   100   050    -    0
197 Current_Pending_Sector  -O--CK   100   100   050    -    0
198 Offline_Uncorrectable   -O--CK   100   100   050    -    0
199 UDMA_CRC_Error_Count    -O--CK   100   100   050    -    0
232 Available_Reservd_Space -O--CK   100   100   050    -    100
241 Total_LBAs_Written      ----CK   100   100   050    -    83424
242 Total_LBAs_Read         ----CK   100   100   050    -    117850
245 Unknown_Attribute       -O--CK   100   100   050    -    72720
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      1  Comprehensive SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x24       GPL     R/O     88  Current Device Internal Status Data log
0x25       GPL     R/O     32  Saved Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      4190         -
# 2  Extended offline    Interrupted (host reset)      80%      4180         -
# 3  Extended offline    Completed without error       00%      4167         -
# 4  Short offline       Completed without error       00%      4167         -
# 5  Extended offline    Completed without error       00%      4162         -
# 6  Short offline       Completed without error       00%      4160         -
# 7  Short offline       Completed without error       00%      4144         -
# 8  Short offline       Completed without error       00%      4120         -
# 9  Short offline       Completed without error       00%      4096         -
#10  Short offline       Completed without error       00%      4072         -
#11  Short offline       Completed without error       00%      4048         -
#12  Short offline       Completed without error       00%      4024         -
#13  Short offline       Completed without error       00%      4000         -
#14  Short offline       Completed without error       00%      3976         -
#15  Short offline       Completed without error       00%      3952         -
#16  Short offline       Completed without error       00%      3928         -
#17  Short offline       Completed without error       00%      3904         -
#18  Short offline       Completed without error       00%      3880         -
#19  Short offline       Completed without error       00%      3857         -

Selective Self-tests/Logging not supported

SCT Status Version:                  3
SCT Version (vendor specific):       0 (0x0000)
Device State:                        Active (0)
Current Temperature:                    31 Celsius
Power Cycle Min/Max Temperature:     31/31 Celsius
Lifetime    Min/Max Temperature:     21/67 Celsius
Specified Max Operating Temperature:   100 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Data Table command not supported

SCT Error Recovery Control command not supported

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4               8  ---  Lifetime Power-On Resets
0x01  0x010  4            4200  ---  Power-on Hours
0x01  0x018  6      1172321977  ---  Logical Sectors Written
0x01  0x020  6       194595931  ---  Number of Write Commands
0x01  0x028  6      3428467071  ---  Logical Sectors Read
0x01  0x030  6        83649724  ---  Number of Read Commands
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               0  ---  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  4            0  Command failed due to ICRC error
0x0002  4            0  R_ERR response for data FIS
0x0005  4            0  R_ERR response for non-data FIS
0x000a  4            0  Device-to-host register FISes sent due to a COMRESET

And now the other 5 SSD that “works” (sdh, sdi, sdj, sdk and sdg)

admin@truenas[~]$ sudo smartctl -x /dev/sdh
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     PNY CS900 2TB SSD
Serial Number:    PNY2243221025010001A
LU WWN Device Id: 5 f8db4c 22430001a
Firmware Version: CS900702
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Jul  3 11:09:02 2025 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Total time to complete Offline 
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x00)         Offline data collection not supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x00) Error logging NOT supported.
                                        General Purpose Logging supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   050    -    0
  9 Power_On_Hours          -O--C-   100   100   000    -    14796
 12 Power_Cycle_Count       -O--C-   100   100   000    -    4038
168 Unknown_Attribute       -O--C-   100   100   000    -    0
170 Unknown_Attribute       PO----   100   100   010    -    398
173 Unknown_Attribute       -O--C-   100   100   000    -    1900591
192 Power-Off_Retract_Count -O--C-   100   100   000    -    106
194 Temperature_Celsius     PO---K   041   017   000    -    59 (Min/Max 12/83)
218 Unknown_Attribute       PO-R--   100   100   050    -    0
231 Unknown_SSD_Attribute   PO--C-   100   100   000    -    99
241 Total_LBAs_Written      -O--C-   100   100   000    -    15349
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x04       GPL,SL  R/O      8  Device Statistics log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log

SMART Extended Comprehensive Error Log (GP Log 0x03) not supported

SMART Error Log not supported

SMART Extended Self-test Log (GP Log 0x07) not supported

SMART Self-test Log not supported

Selective Self-tests/Logging not supported

SCT Commands not supported

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 3) ==
0x01  0x008  4            4038  ---  Lifetime Power-On Resets
0x01  0x010  4           14796  ---  Power-on Hours
0x01  0x018  6     32191165694  ---  Logical Sectors Written
0x01  0x028  6     82424244222  ---  Logical Sectors Read
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              75  ---  Current Temperature
0x05  0x020  1             100  ---  Highest Temperature
0x05  0x028  1              29  ---  Lowest Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x018  4               0  ---  Number of Interface CRC Errors
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               1  ---  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2           15  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2           16  Device-to-host register FISes sent due to a COMRESET
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0010  2            0  R_ERR response for host-to-device data FIS, non-CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x0013  2            0  R_ERR response for host-to-device non-data FIS, non-CRC
admin@truenas[~]$ sudo smartctl -x /dev/sdi
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     PNY CS900 2TB SSD
Serial Number:    PNY2243221025010001E
LU WWN Device Id: 5 f8db4c 22430001e
Firmware Version: CS900702
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Jul  3 11:09:56 2025 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Total time to complete Offline 
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x00)         Offline data collection not supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x00) Error logging NOT supported.
                                        General Purpose Logging supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   050    -    0
  9 Power_On_Hours          -O--C-   100   100   000    -    14658
 12 Power_Cycle_Count       -O--C-   100   100   000    -    4038
168 Unknown_Attribute       -O--C-   100   100   000    -    0
170 Unknown_Attribute       PO----   100   100   010    -    492
173 Unknown_Attribute       -O--C-   100   100   000    -    1966130
192 Power-Off_Retract_Count -O--C-   100   100   000    -    106
194 Temperature_Celsius     PO---K   038   017   000    -    62 (Min/Max 12/83)
218 Unknown_Attribute       PO-R--   100   100   050    -    0
231 Unknown_SSD_Attribute   PO--C-   100   100   000    -    98
241 Total_LBAs_Written      -O--C-   100   100   000    -    15148
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x04       GPL,SL  R/O      8  Device Statistics log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log

SMART Extended Comprehensive Error Log (GP Log 0x03) not supported

SMART Error Log not supported

SMART Extended Self-test Log (GP Log 0x07) not supported

SMART Self-test Log not supported

Selective Self-tests/Logging not supported

SCT Commands not supported

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 3) ==
0x01  0x008  4            4038  ---  Lifetime Power-On Resets
0x01  0x010  4           14658  ---  Power-on Hours
0x01  0x018  6     31768223482  ---  Logical Sectors Written
0x01  0x028  6     80596377745  ---  Logical Sectors Read
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              78  ---  Current Temperature
0x05  0x020  1             100  ---  Highest Temperature
0x05  0x028  1              29  ---  Lowest Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x018  4               0  ---  Number of Interface CRC Errors
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               2  ---  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2           15  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2           16  Device-to-host register FISes sent due to a COMRESET
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0010  2            0  R_ERR response for host-to-device data FIS, non-CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x0013  2            0  R_ERR response for host-to-device non-data FIS, non-CRC
admin@truenas[~]$ sudo smartctl -x /dev/sdj
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     PNY CS900 2TB SSD
Serial Number:    PNY2243221025010001C
LU WWN Device Id: 5 f8db4c 22430001c
Firmware Version: CS900702
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Jul  3 11:10:50 2025 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Total time to complete Offline 
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x00)         Offline data collection not supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x00) Error logging NOT supported.
                                        General Purpose Logging supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   050    -    0
  9 Power_On_Hours          -O--C-   100   100   000    -    14683
 12 Power_Cycle_Count       -O--C-   100   100   000    -    4038
168 Unknown_Attribute       -O--C-   100   100   000    -    0
170 Unknown_Attribute       PO----   100   100   010    -    552
173 Unknown_Attribute       -O--C-   100   100   000    -    1966128
192 Power-Off_Retract_Count -O--C-   100   100   000    -    106
194 Temperature_Celsius     PO---K   049   033   000    -    51 (Min/Max 8/67)
218 Unknown_Attribute       PO-R--   100   100   050    -    0
231 Unknown_SSD_Attribute   PO--C-   100   100   000    -    98
241 Total_LBAs_Written      -O--C-   100   100   000    -    15139
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x04       GPL,SL  R/O      8  Device Statistics log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log

SMART Extended Comprehensive Error Log (GP Log 0x03) not supported

SMART Error Log not supported

SMART Extended Self-test Log (GP Log 0x07) not supported

SMART Self-test Log not supported

Selective Self-tests/Logging not supported

SCT Commands not supported

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 3) ==
0x01  0x008  4            4038  ---  Lifetime Power-On Resets
0x01  0x010  4           14683  ---  Power-on Hours
0x01  0x018  6     31750418291  ---  Logical Sectors Written
0x01  0x028  6     78613267602  ---  Logical Sectors Read
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              68  ---  Current Temperature
0x05  0x020  1              84  ---  Highest Temperature
0x05  0x028  1              25  ---  Lowest Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x018  4               0  ---  Number of Interface CRC Errors
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               2  ---  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2           20  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2           16  Device-to-host register FISes sent due to a COMRESET
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0010  2            0  R_ERR response for host-to-device data FIS, non-CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x0013  2            0  R_ERR response for host-to-device non-data FIS, non-CRC
admin@truenas[~]$ sudo smartctl -x /dev/sdk
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     PNY CS900 2TB SSD
Serial Number:    PNY2243221025010001D
LU WWN Device Id: 5 f8db4c 22430001d
Firmware Version: CS900702
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Jul  3 11:11:56 2025 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Total time to complete Offline 
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x00)         Offline data collection not supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x00) Error logging NOT supported.
                                        General Purpose Logging supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   050    -    0
  9 Power_On_Hours          -O--C-   100   100   000    -    14782
 12 Power_Cycle_Count       -O--C-   100   100   000    -    4038
168 Unknown_Attribute       -O--C-   100   100   000    -    0
170 Unknown_Attribute       PO----   100   100   010    -    539
173 Unknown_Attribute       -O--C-   100   100   000    -    2097207
192 Power-Off_Retract_Count -O--C-   100   100   000    -    106
194 Temperature_Celsius     PO---K   039   020   000    -    61 (Min/Max 13/80)
218 Unknown_Attribute       PO-R--   100   100   050    -    0
231 Unknown_SSD_Attribute   PO--C-   100   100   000    -    98
241 Total_LBAs_Written      -O--C-   100   100   000    -    15142
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x04       GPL,SL  R/O      8  Device Statistics log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log

SMART Extended Comprehensive Error Log (GP Log 0x03) not supported

SMART Error Log not supported

SMART Extended Self-test Log (GP Log 0x07) not supported

SMART Self-test Log not supported

Selective Self-tests/Logging not supported

SCT Commands not supported

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 3) ==
0x01  0x008  4            4038  ---  Lifetime Power-On Resets
0x01  0x010  4           14782  ---  Power-on Hours
0x01  0x018  6     31756005738  ---  Logical Sectors Written
0x01  0x028  6     78753357473  ---  Logical Sectors Read
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              78  ---  Current Temperature
0x05  0x020  1              97  ---  Highest Temperature
0x05  0x028  1              30  ---  Lowest Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x018  4               0  ---  Number of Interface CRC Errors
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               2  ---  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            7  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            8  Device-to-host register FISes sent due to a COMRESET
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0010  2            0  R_ERR response for host-to-device data FIS, non-CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x0013  2            0  R_ERR response for host-to-device non-data FIS, non-CRC
admin@truenas[~]$ sudo smartctl -x /dev/sdg
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     PNY CS900 2TB SSD
Serial Number:    PNY2243221025010001B
LU WWN Device Id: 5 f8db4c 22430001b
Firmware Version: CS900702
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Jul  3 11:12:04 2025 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Total time to complete Offline 
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x00)         Offline data collection not supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x00) Error logging NOT supported.
                                        General Purpose Logging supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   050    -    0
  9 Power_On_Hours          -O--C-   100   100   000    -    14854
 12 Power_Cycle_Count       -O--C-   100   100   000    -    4038
168 Unknown_Attribute       -O--C-   100   100   000    -    0
170 Unknown_Attribute       PO----   100   100   010    -    513
173 Unknown_Attribute       -O--C-   100   100   000    -    1966130
192 Power-Off_Retract_Count -O--C-   100   100   000    -    111
194 Temperature_Celsius     PO---K   051   028   000    -    49 (Min/Max 12/72)
218 Unknown_Attribute       PO-R--   100   100   050    -    0
231 Unknown_SSD_Attribute   PO--C-   100   100   000    -    98
241 Total_LBAs_Written      -O--C-   100   100   000    -    15145
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x04       GPL,SL  R/O      8  Device Statistics log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log

SMART Extended Comprehensive Error Log (GP Log 0x03) not supported

SMART Error Log not supported

SMART Extended Self-test Log (GP Log 0x07) not supported

SMART Self-test Log not supported

Selective Self-tests/Logging not supported

SCT Commands not supported

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 3) ==
0x01  0x008  4            4038  ---  Lifetime Power-On Resets
0x01  0x010  4           14854  ---  Power-on Hours
0x01  0x018  6     31761548330  ---  Logical Sectors Written
0x01  0x028  6     66795671810  ---  Logical Sectors Read
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              66  ---  Current Temperature
0x05  0x020  1              89  ---  Highest Temperature
0x05  0x028  1              29  ---  Lowest Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x018  4               0  ---  Number of Interface CRC Errors
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               2  ---  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2           15  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2           16  Device-to-host register FISes sent due to a COMRESET
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0010  2            0  R_ERR response for host-to-device data FIS, non-CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x0013  2            0  R_ERR response for host-to-device non-data FIS, non-CRC

zpool status

I prefixed commands that needed to run with root with sudo, and this was was designed not to run as root. But if you are going to run it as root then you probably need to run:

  • ZPOOL_SCRIPTS_AS_ROOT=1 sudo zpool status -vLtsc lsblk,serial,smartx,smart

sas3flash

Good news - your HBA is in IT mode.

Bad news - latest LSI9300-1i firmware is (according to Broadcom) at least 15.00.01.00 and you are on 07.00.01.00. Before you recreate the pool I would recommend that you update it using sas3flash.

smartctl

Good news - the following drives are fine without any sign of issues that I can spot: /dev/sdc, /dev/sdd, /dev/sde, /dev/sdf

Bad news - the following drives show smart errors:

  • /dev/sdb seems to have had a problem with smart short tests not completing though the most recent one seems to have completed fine - it is unclear whether you just started loads at around the same time or whether there is a problem. There are also some things listed in the Pending Defects Log at the end which are a little worrying. And I didn’t see a Temperature report either
  • /dev/sdg also has some things listed in the Pending Defects Log.
  • /dev/sdh also has some things listed in the Pending Defects Log.
  • /dev/sdi actually reached the maximum temp of 100C - you literally could have boiled water on this SSD - this is a major cooling issue. And it has Pending Defects Log items too.
  • /dev/sdj has Pending Defects Log items.

Pending Defects are NOT actual defects - and I am really not expert enough to know how much the Pending Defect Log indicates an actual error.

Other things:

  1. sdc, sdd, sde, sdf, sdg, sdh, sdj, sdk have also got rather hot at some point - but less than the stated maximum (83C vs 100C) - but IMO that is too hot and too close to maximum for comfort. However, if you don’t know when and why they got so hot, then you should look at improving the airflow across them to avoid high temps.

  2. You don’t really need to run SMART short tests every 24 hours (though there is no harm in it either) but you do need to run SMART long tests every so often.

Cooling

  1. Intermittent power issues might cause disks to get checksum errors - but that is more often indicative of a cable, controller, PSU or memory issue than a disk issue - and it could be either poor electrical connections, a hardware fault or a temperature generated fault.

  2. Given the temperatures of the disks, indicating a cooling issue, I wonder what temps the LSI card got to - in themselves these need a very good airflow because they can get hot, and so I wonder if the controller overheating was a root cause.

    IMO you need to improve the air cooling in your case substantially before you try to do anything more with it.

  3. On the assumption that the 2nd most likely cause is a connection issue, once you have improved the cooling, you should IMO also:

    • Remove and reseat your memory
    • Remove and reseat your HBA
    • Remove and reseat all power and SATA/SAS cables to all drives
    • Run a memory test

With the benefit of hindsight, it is quite likely that had you done all of the above, you could have had your original pool working just fine.

Pool creation error

You have two different types of PNY SSD and whilst they are both advertised as 2TB, one type is 2.5% or 48GB smaller than the other.

I have no idea exactly what you were trying to do when you got the error about failing to create a partition, but:

  • You need to clean the drives before you try to create a pool on them; and

  • If you create a vDev with the larger ones and then try to mirror them or expand a RAIDZ vDev with the smaller ones, then it won’t work.

TL;DR

  • SSDs look fine except for the temperatures - you need to look at the cooling of controller and SSDs urgently because SSDs should NOT get hot enough to boil water.

  • Most likely issue is a connection issue on memory, controllers or disks - or this disk temp issue.

  • Reseat everything and run a memory test

  • Reboot and clean all drives and then create your pool

If you want advice on pool layout, just ask. (It will depend on your use case(s) - or in non-technical jargon it will depend on what exactly you are using your NAS for.)

If you need details instructions for any of this because the TrueNAS documentation doesn’t give you enough information, just ask.

Awsome, many thanks for the in depth response !!!

Concerning the temperature of the ssd, i only realised yesterday that that was a problem and have added a 120mm fan pointed at them to have an active colling and everything seems to be at or around 30 C° now. I’ll make a small 3D printed jig to space them better soon.

Concerning the HBA, I realised early-on that the temperature was a problem and the card already has a fan zip-tied to have active cooling on it (as well as new thermal interface material)

Concerning the HBA firmware version… I’ll keep it in mind for the next time i’ll rebuild this server, but as of right now everything works as i want. So if it ain’t broken dont fix ?

concerning the SSD Size, yeah, I didn’t expect that when bying the second set of 5… thanks PNY I guess

What should I do ? The GUI wipe didn’t work (both fast and full with zero) and I event pluged them into my windows PC to wipe them clean with diskpart before doing the commands.

My use case is mostly for my Nextcloud, so accessing files (photos and PDF mostly) maybe a SCSi drive for my not so often played games (if I can figure out the tunning part for windows for good latency / transfert speed)

If you have a better idea on how to use the 10 SSD I’m oppen to it, I thought the 2 RaidZ1 Vdevs was good enough for a home use on a 10 Gbps LAN.
Would you recommand a 3 Vdev of 3 drives and a hot spare ? a 1 vdev of 10 drives in RaidZ2 ?

One way to cleanse SSD is to use the software provided by each Manufacturer. Go to their websites and browse for whatever they use. Samsung is Samsung Magician. Most have a tool to securely erase their SSDs.

Your system is broken right now without any data on it so if a firmware update causes any problems the impact is low.

But if you don’t upgrade now and the bugs fixed in later firmware end up trashing your pool, you will be mightily annoyed.

Both sets of drives meet the 2TB (2 * 10^12) definition. But one set has more space than needed to meet this definition and the others don’t. (All this means is that manufacturers put the extra space into the over-provision for use in generating empty cells.)

As I said, don’t write zeros - delete partitions and do a full disk trim.

But this should have worked - perhaps the issue was about mismatching sizes and NOT about drives not being clean.

These use cases are literally at opposite ends of the scale - Nextcloud should be on RAIDZ for space efficiency, whereas iSCSI needs to avoid read and write amplification and needs to be on mirrors and also needs synchronous writes (so if your disks had been HDDs rather than SSDs then you would have needed the iSCSI to be either on SSD or have an SSD SLOG).

So you probably need two pools - a RAIDZ1/2 pool (pick your redundancy level to match the essentiality of your data and your attitude to risk) and a mirror pool for your iSCSI data.

Phase 07 ? :open_mouth: I had never seen something that old, you should be on P16.00.12.00.This is the most likely cause of CRC issues.
Also make sure the 9300-16i is well powered (it needs an additional connector) and well cooled.

1 Like

I’ve been using it for more than a year without any issues. I think the issue is more from thermal / bad physical connexion on power. Anyhow, All of my data is still on the disk (mostly on the HDD pool) and I don’t want to risq it unless absolutly necessary

I tried making a pool with one of the 3 drive with the issue and i still got the same error message :

> Traceback (most recent call last):
>   File "/usr/lib/python3/dist-packages/middlewared/plugins/disk_/format.py", line 39, in format
>     subprocess.run(["sgdisk", "-n", f"1:0:+{int(size / 1024)}k", "-t", "1:BF01", f"/dev/{disk}"],
>   File "/usr/lib/python3.11/subprocess.py", line 571, in run
>     raise CalledProcessError(retcode, process.args,
> subprocess.CalledProcessError: Command '['sgdisk', '-n', '1:0:+1998301184k', '-t', '1:BF01', '/dev/sdc']' returned non-zero exit status 4.
> 
> During handling of the above exception, another exception occurred:
> 
> Traceback (most recent call last):
>   File "/usr/lib/python3/dist-packages/middlewared/job.py", line 515, in run
>     await self.future
>   File "/usr/lib/python3/dist-packages/middlewared/job.py", line 560, in __run_body
>     rv = await self.method(*args)
>          ^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/usr/lib/python3/dist-packages/middlewared/service/crud_service.py", line 287, in nf
>     rv = await func(*args, **kwargs)
>          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 48, in nf
>     res = await f(*args, **kwargs)
>           ^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 174, in nf
>     return await func(*args, **kwargs)
>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/usr/lib/python3/dist-packages/middlewared/plugins/pool_/pool.py", line 582, in do_create
>     await self.middleware.call('pool.format_disks', job, disks, 0, 30)
>   File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1000, in call
>     return await self._call(
>            ^^^^^^^^^^^^^^^^^
>   File "/usr/lib/python3/dist-packages/middlewared/main.py", line 715, in _call
>     return await methodobj(*prepared_call.args)
>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/usr/lib/python3/dist-packages/middlewared/plugins/pool_/format_disks.py", line 29, in format_disks
>     await asyncio_map(unlock_and_format_disk, disks.items(), limit=16)
>   File "/usr/lib/python3/dist-packages/middlewared/utils/asyncio_.py", line 19, in asyncio_map
>     return await asyncio.gather(*futures)
>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/usr/lib/python3/dist-packages/middlewared/utils/asyncio_.py", line 16, in func
>     return await real_func(arg)
>            ^^^^^^^^^^^^^^^^^^^^
>   File "/usr/lib/python3/dist-packages/middlewared/plugins/pool_/format_disks.py", line 24, in unlock_and_format_disk
>     await self.middleware.call('disk.format', disk, config.get('size'))
>   File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1000, in call
>     return await self._call(
>            ^^^^^^^^^^^^^^^^^
>   File "/usr/lib/python3/dist-packages/middlewared/main.py", line 726, in _call
>     return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/usr/lib/python3/dist-packages/middlewared/main.py", line 619, in run_in_executor
>     return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
>     result = self.fn(*self.args, **self.kwargs)
>              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/usr/lib/python3/dist-packages/middlewared/plugins/disk_/format.py", line 54, in format
>     raise CallError(error)
> middlewared.service_exception.CallError: [EFAULT] Could not create a partition of 2046260412416 bytes on disk sdc because the disk is too small. If you are replacing a disk in a pool, please ensure that the new disk is not smaller than the disk being replaced.
> 
> Could not create partition 1 from 4196353 to 4000798720
> Error encountered; not saving changes.

I tried to remove to clean the drive with “dd if=/dev/zero of=/dev/sdx bs=512 count=1” but it doesn’t seem to do anything. Could it be an error on the drive parameter where it thinks that it is bigger / smaller than it really is ?

I’ve only skimmed the thread, but I saw something about 100C drive temps. That might be a reporting error, because most SSDs will have long since gone into thermal shutdown at that kind of temperature. :melting_face:

That will only write zeroes to the first 512 bytes of the drive (Block Size 512, count 1) which won’t get anywhere near the ZFS labels.

Try sudo zpool labelclear /dev/sdX against the target drive(s).

It didn’t change anything, I still get the same error message

Show me the contents of sudo wipefs -n /dev/sdc2 please - it’s strange that it won’t clear your labels.

here you go

admin@truenas[~]$ sudo wipefs -n /dev/sdc2
[sudo] password for admin: 
DEVICE OFFSET        TYPE       UUID                LABEL
sdc2   0x4000        zfs_member 4420361186125816859 fast_pool
sdc2   0x44000       zfs_member 4420361186125816859 fast_pool
sdc2   0x1dc6e784000 zfs_member 4420361186125816859 fast_pool
sdc2   0x1dc6e7c4000 zfs_member 4420361186125816859 fast_pool

Do sudo zpool labelclear /dev/sdc2 against this and then re-run the wipefs query. It should blank it out.

doesn’t seem to have made any changes

admin@truenas[~]$ sudo zpool labelclear /dev/sdc2
[sudo] password for admin: 
use '-f' to override the following error:
/dev/sdc2 is a member of potentially active pool "fast_pool"
admin@truenas[~]$ sudo zpool labelclear /dev/sdc2 -f
admin@truenas[~]$ sudo wipefs -n /dev/sdc2       
DEVICE OFFSET        TYPE       UUID                LABEL
sdc2   0x4000        zfs_member 4420361186125816859 fast_pool
sdc2   0x44000       zfs_member 4420361186125816859 fast_pool
sdc2   0x1dc6e784000 zfs_member 4420361186125816859 fast_pool
sdc2   0x1dc6e7c4000 zfs_member 4420361186125816859 fast_pool
admin@truenas[~]$ 

You need to write to both the beginning 2MB of data AND the last 2MB of data on the drive because a GPT table is stored at the end. And you need to remove ZFS labels from the drive too. Please stop trying commands that you think will work because you don’t know enough details of what is really needed.

You need to listen to wiser voices about how to get your system working again. Two people have now advised you to upgrade your firmware because old firmware might be both the cause of the current problem and a cause of a future problem, yet you are determined to ignore this advice. If you need to backup the data on your HDD pool then do that first - but upgrade your firmware.

1 Like

It may be wise here to try removing the VM/passthrough from the equation here, even if temporarily, just to sort out if there’s any odd issues going on with respect to disk control/soft-resets/etc.

If it were a hardware / software issue with the HBA I would assume that it would impact everything downstream and not just 1 specific vdev. That’s why I was postponning it.
Nevertheless, I upgraded the firmware, fixed the boot issue, and nothing changed

then please tell me wich one I should use, otherwise I will do my best but i’m in no way a pro with linux CLI.

I pocked around a bit and it doesn’t look like it changes anything. In the end it looks like the 3 SSD are stuck in some kind of read-only state : even after deleting the 2 partitions (tried with diskpart on windows and with fdisk in truenas) eveerything looks like its working but if i refresh, the partitions are back.

In the end it looks like i have won 3 small paper weight…

Anyway, many thanks for everyone help, truly!

I’m not quite ready to declare them dead yet.

This is the point where you should boot up a separate machine off a live-CD with no other devices connected, and start probing drives with hdparm commands like ATA_SECURE_ERASE that target the device level directly, and require parameters like --please-destroy-my-data and --yes-i-know-what-i-am-doing