Cannot import Pool I/O Error after UDMA CRC Errors and replaced HBA

I’m a long time user of TrueNAS/FreeNAS from 2012. My TrueNAS home server was built in 2020 and I rebuilt my original 2012 server as a dedicated local backup. I have enjoyed the journey and knowledge gained from successfully running my NAS by following the forum and learning from this community over the last 13 years.

Unfortunately, I started obtaining UDMA CRC errors on my primary RaidZ2 ‘hd01’ pool which I struggled to troubleshoot. Ultimately I identified a failed HBA and replaced it. I was a bit too late in reading the “Drive Troubleshooting Flowchart” from @ joeschmuck which is an excellent guide. So I’m aware that I should have sought earlier guidance from this forum.

I tried replacing one of the disks reporting the UDMA CRC errors which was the wrong action to take and re-silvering became stuck. I had to shutdown the server losing the pool from the GUI on reboot.

My primary ‘hd01’ pool would no longer import into the GUI showing I/O Errors as detailed below.

Error: concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs_/pool_actions.py", line 231, in import_pool
    zfs.import_pool(found, pool_name, properties, missing_log=missing_log, any_host=any_host)
  File "libzfs.pyx", line 1374, in libzfs.ZFS.import_pool
  File "libzfs.pyx", line 1402, in libzfs.ZFS.__import_pool
libzfs.ZFSException: cannot import 'hd01' as 'hd01': I/O error

"""

While I have a good data backup my preference would be to restore the primary ‘hd01’ pool, if that’s possible?

Failing this, I would like to determine if my set of 10TB WD Red drives are fully re-usable (if I’m forced into rebuilding this pool). All my WD Red drives are outside of their 3 year warranty, and I would prefer to stagger any new purchase of additional drives.

I have run weekly MultiReports with regular short and long term tests since the server was built. All SMART tests report as ‘passed’. I received critical MultiReports for the UDMA CRC errors. No further new errors have occurred since the HBA was replaced and all available drives can be seen by the new HBA.

These issues occurred in early May. Due to work priorities I decided to shutdown my home server and leave any repair and restore until I could request assistance by posting this message.

I hoping that someone here can now guide me on the best way to restore my primary home server.

My hardware is:
Supermicro CSE-829U X10DRU (12 Bay 2U)
1x Xeon E5-2643 v3 @3.4Ghz
128GB ECC DDR4
LSI SAS9300-8i 12Gbps SAS PCI-E 3.0 HBA (IT mode FW:16.00.10.00)
Pool sd01: 2x960GB SSD in Mirror (Samsung - PM883)
Pool hd01: 6x10TB HDD in RaidZ2 (WD Red - WDC WD100EFAX-68LHPN0)
Pool boot : 2x64GB SATADOM-SL 3IE3 V2 in Mirror (Innodisk - DHSSL-64)

Here is what I received with a zpool import command in the shell for the ‘hd01’ pool:

# zpool import

  pool: hd01
    id: 3338235958544899494
 state: ONLINE
status: One or more devices were being resilvered.
action: The pool can be imported using its name or numeric identifier.
config:

	hd01                                        ONLINE
	  raidz2-0                                  ONLINE
	    sdf2                                    ONLINE
	    sdh2                                    ONLINE
	    replacing-2                             ONLINE
	      sdd2                                  ONLINE
	      b381495c-244a-4bba-a33c-a83e2bcc4fba  ONLINE
	    sda2                                    ONLINE
	    sdb2                                    ONLINE
	    sdi2                                    ONLINE

The identity of all the drives in the server by GPTID or Drive Ident is as follows:

# lsblk -o +PARTUUID,NAME,LABEL,SERIAL
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS PARTUUID                             NAME LABEL       SERIAL
sda      8:0    0   9.1T  0 disk                                                  sda              JEJJXMEX
├─sda1   8:1    0     2G  0 part             495687fd-2444-11ee-ad54-002590faf4a4 sda1 emnas:swap1 
└─sda2   8:2    0   9.1T  0 part             496602b9-2444-11ee-ad54-002590faf4a4 sda2 hd01        
sdb      8:16   0   9.1T  0 disk                                                  sdb              2TJZET2D
├─sdb1   8:17   0     2G  0 part             0e4b99e0-2840-11eb-9ccb-002590faf4a4 sdb1 emnas:swap0 
└─sdb2   8:18   0   9.1T  0 part             0e822397-2840-11eb-9ccb-002590faf4a4 sdb2 hd01        
sdc      8:32   0   3.6T  0 disk                                                  sdc              WD-WCC131817132
├─sdc1   8:33   0     2G  0 part             4d8d9ffa-232d-11ed-8552-441ea13edfdc sdc1             
└─sdc2   8:34   0   3.6T  0 part             4da00c6f-232d-11ed-8552-441ea13edfdc sdc2 zd04        
sdd      8:48   0   9.1T  0 disk                                                  sdd              2YK2ZP4D
├─sdd1   8:49   0     2G  0 part             0daa3a2e-2840-11eb-9ccb-002590faf4a4 sdd1 emnas:swap1 
└─sdd2   8:50   0   9.1T  0 part             0ded02af-2840-11eb-9ccb-002590faf4a4 sdd2 hd01        
sde      8:64   0   9.1T  0 disk                                                  sde              JEGSBJDM
└─sde1   8:65   0   9.1T  0 part             b381495c-244a-4bba-a33c-a83e2bcc4fba sde1 hd01        
sdf      8:80   0   9.1T  0 disk                                                  sdf              2YK3H7HD
├─sdf1   8:81   0     2G  0 part             0da13c60-2840-11eb-9ccb-002590faf4a4 sdf1             
└─sdf2   8:82   0   9.1T  0 part             0de232dd-2840-11eb-9ccb-002590faf4a4 sdf2 hd01        
sdg      8:96   0   9.1T  0 disk                                                  sdg              JEGSM56M
├─sdg1   8:97   0     2G  0 part             13f664cd-71b3-11ec-8f6a-002590faf4a4 sdg1             
└─sdg2   8:98   0   9.1T  0 part             141cfead-71b3-11ec-8f6a-002590faf4a4 sdg2 zd01        
sdh      8:112  0   9.1T  0 disk                                                  sdh              JEK9190N
├─sdh1   8:113  0     2G  0 part             0dac76b3-2840-11eb-9ccb-002590faf4a4 sdh1 emnas:swap1 
└─sdh2   8:114  0   9.1T  0 part             0debcf35-2840-11eb-9ccb-002590faf4a4 sdh2 hd01        
sdi      8:128  0   9.1T  0 disk                                                  sdi              2YK3RZLD
├─sdi1   8:129  0     2G  0 part             0e5440b6-2840-11eb-9ccb-002590faf4a4 sdi1             
└─sdi2   8:130  0   9.1T  0 part             0e7cb1d5-2840-11eb-9ccb-002590faf4a4 sdi2 hd01        
sdj      8:144  0 894.3G  0 disk                                                  sdj              S45NNA0N804048
├─sdj1   8:145  0     2G  0 part             71bda442-2840-11eb-9ccb-002590faf4a4 sdj1 emnas:swap0 
└─sdj2   8:146  0 892.3G  0 part             71c2abe9-2840-11eb-9ccb-002590faf4a4 sdj2 sd01        
sdk      8:160  0 894.3G  0 disk                                                  sdk              S45NNA0N804030
├─sdk1   8:161  0     2G  0 part             71b8aa9a-2840-11eb-9ccb-002590faf4a4 sdk1 emnas:swap0 
└─sdk2   8:162  0 892.3G  0 part             71bfab11-2840-11eb-9ccb-002590faf4a4 sdk2 sd01        
sdl      8:176  0  59.6G  0 disk                                                  sdl              BCA11610140231682
├─sdl1   8:177  0     1M  0 part             26b330c0-6d48-4fd9-b0ef-5141d208626e sdl1             
├─sdl2   8:178  0   512M  0 part             e7805f5e-2ee2-48d6-a37d-19e915cf9cc4 sdl2 EFI         
└─sdl3   8:179  0  59.1G  0 part             091de296-1cff-4018-abdf-3633f0c3de79 sdl3 boot-pool   
sdm      8:192  0  59.6G  0 disk                                                  sdm              BCA11610140231275
├─sdm1   8:193  0     1M  0 part             4a8c4733-1081-4a04-8117-57d99ca86883 sdm1             
├─sdm2   8:194  0   512M  0 part             c7f74448-a2a5-4b28-8a43-1f41f15da007 sdm2 EFI         
└─sdm3   8:195  0  59.1G  0 part             db5f6636-9705-408e-8554-fc5ce9dd63fd sdm3 boot-pool   
zd0    230:0    0   100G  0 disk                                                  zd0              
zd16   230:16   0    32G  0 disk                                                  zd16             
zd32   230:32   0    20G  0 disk                                                  zd32             
zd48   230:48   0    20G  0 disk                                                  zd48             
zd64   230:64   0    10G  0 disk                                                  zd64             
zd80   230:80   0    10G  0 disk                                                  zd80             

I will provide a second post detailing the SMART reports from all 7 drives from the ‘hd01’ pool.

Here is the listing of SMART drive data from the drives in the critical ‘hd01’ pool, starting with ‘sda’:

# smartctl -a /dev/sda
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red Plus
Device Model:     WDC WD100EFAX-68LHPN0
Serial Number:    JEJJXMEX
LU WWN Device Id: 5 000cca 267e3e068
Firmware Version: 83.H0A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5770
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jul 23 21:34:27 2025 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(   93) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (1155) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   130   130   054    Old_age   Offline      -       108
  3 Spin_Up_Time            0x0007   150   150   024    Pre-fail  Always       -       438 (Average 440)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       54
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   094   094   000    Old_age   Always       -       42605
 10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       52
 22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   067   067   000    Old_age   Always       -       39667
193 Load_Cycle_Count        0x0012   067   067   000    Old_age   Always       -       39667
194 Temperature_Celsius     0x0002   004   004   000    Old_age   Always       -       25 (Min/Max 7/51)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     42604         -
# 2  Short offline       Completed without error       00%     42576         -
# 3  Short offline       Completed without error       00%     42408         -
# 4  Short offline       Completed without error       00%     42240         -
# 5  Short offline       Completed without error       00%     42237         -
# 6  Short offline       Completed without error       00%     42101         -
# 7  Short offline       Completed without error       00%     42080         -
# 8  Extended offline    Completed without error       00%     42018         -
# 9  Short offline       Completed without error       00%     41924         -
#10  Short offline       Completed without error       00%     41755         -
#11  Short offline       Completed without error       00%     41587         -
#12  Short offline       Completed without error       00%     41419         -
#13  Short offline       Completed without error       00%     41251         -
#14  Extended offline    Completed without error       00%     41178         -
#15  Short offline       Completed without error       00%     40916         -
#16  Short offline       Completed without error       00%     40748         -
#17  Short offline       Completed without error       00%     40580         -
#18  Extended offline    Completed without error       00%     40507         -
#19  Short offline       Completed without error       00%     40412         -
#20  Short offline       Completed without error       00%     40245         -
#21  Short offline       Completed without error       00%     40076         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

SMART drive data for ‘sdb’:


# smartctl -a /dev/sdb
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red Plus
Device Model:     WDC WD100EFAX-68LHPN0
Serial Number:    2TJZET2D
LU WWN Device Id: 5 000cca 26aea05db
Firmware Version: 83.H0A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5770
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jul 23 21:37:29 2025 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(   93) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (1123) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   130   130   054    Old_age   Offline      -       108
  3 Spin_Up_Time            0x0007   148   148   024    Pre-fail  Always       -       442 (Average 445)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       77
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   095   095   000    Old_age   Always       -       39139
 10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       77
 22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   003   003   000    Old_age   Always       -       117467
193 Load_Cycle_Count        0x0012   003   003   000    Old_age   Always       -       117467
194 Temperature_Celsius     0x0002   004   004   000    Old_age   Always       -       25 (Min/Max 7/44)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     39139         -
# 2  Short offline       Completed without error       00%     39110         -
# 3  Short offline       Completed without error       00%     38942         -
# 4  Short offline       Completed without error       00%     38774         -
# 5  Short offline       Completed without error       00%     38771         -
# 6  Short offline       Completed without error       00%     38636         -
# 7  Short offline       Completed without error       00%     38615         -
# 8  Extended offline    Completed without error       00%     38552         -
# 9  Short offline       Completed without error       00%     38459         -
#10  Short offline       Completed without error       00%     38289         -
#11  Short offline       Completed without error       00%     38121         -
#12  Short offline       Completed without error       00%     37953         -
#13  Short offline       Completed without error       00%     37785         -
#14  Extended offline    Completed without error       00%     37712         -
#15  Short offline       Completed without error       00%     37450         -
#16  Short offline       Completed without error       00%     37282         -
#17  Short offline       Completed without error       00%     37114         -
#18  Extended offline    Completed without error       00%     37041         -
#19  Short offline       Completed without error       00%     36946         -
#20  Short offline       Completed without error       00%     36780         -
#21  Short offline       Completed without error       00%     36610         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more


SMART drive data for ‘sdf’:

# smartctl -a /dev/sdf
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red Plus
Device Model:     WDC WD100EFAX-68LHPN0
Serial Number:    2YK3H7HD
LU WWN Device Id: 5 000cca 273ebdcd7
Firmware Version: 83.H0A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5770
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jul 23 22:29:31 2025 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(   93) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (1152) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   129   129   054    Old_age   Offline      -       112
  3 Spin_Up_Time            0x0007   149   149   024    Pre-fail  Always       -       439 (Average 441)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       79
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   095   095   000    Old_age   Always       -       39167
 10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       79
 22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   005   005   000    Old_age   Always       -       114617
193 Load_Cycle_Count        0x0012   005   005   000    Old_age   Always       -       114617
194 Temperature_Celsius     0x0002   250   250   000    Old_age   Always       -       26 (Min/Max 8/44)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       395

SMART Error Log Version: 1
ATA Error Count: 395 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 395 occurred at disk power-on lifetime: 38797 hours (1616 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 78 18 88 00 00 40 00      00:17:08.697  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      00:17:08.697  READ LOG EXT
  60 f8 20 08 01 00 40 00      00:17:08.697  READ FPDMA QUEUED
  60 30 10 48 00 00 40 00      00:17:08.697  READ FPDMA QUEUED
  60 10 08 28 00 00 40 00      00:17:08.697  READ FPDMA QUEUED

Error 394 occurred at disk power-on lifetime: 38797 hours (1616 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 e0 00 20 00 00 40 00      00:16:57.680  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      00:16:57.680  READ LOG EXT
  60 e0 10 20 00 00 40 00      00:16:57.664  READ FPDMA QUEUED
  60 e0 08 20 00 00 40 00      00:16:57.664  READ FPDMA QUEUED
  ef 10 02 00 00 00 00 00      00:16:57.426  SET FEATURES [Enable SATA feature]

Error 393 occurred at disk power-on lifetime: 38797 hours (1616 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 e0 00 20 00 00 40 00      00:16:55.688  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      00:16:55.688  READ LOG EXT
  60 08 18 00 20 00 40 00      00:16:55.681  READ FPDMA QUEUED
  60 e0 10 20 00 00 40 00      00:16:55.681  READ FPDMA QUEUED
  60 e0 08 20 00 00 40 00      00:16:55.681  READ FPDMA QUEUED

Error 392 occurred at disk power-on lifetime: 38797 hours (1616 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 e0 00 20 00 00 40 00      00:16:55.680  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      00:16:55.680  READ LOG EXT
  60 08 10 00 20 00 40 00      00:16:55.664  READ FPDMA QUEUED
  60 e0 08 20 00 00 40 00      00:16:55.664  READ FPDMA QUEUED
  ef 10 02 00 00 00 00 00      00:16:55.426  SET FEATURES [Enable SATA feature]

Error 391 occurred at disk power-on lifetime: 38797 hours (1616 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 e0 00 20 00 00 40 00      00:16:17.197  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      00:16:17.197  READ LOG EXT
  60 e0 10 20 00 00 40 00      00:16:17.189  READ FPDMA QUEUED
  60 e0 08 20 00 00 40 00      00:16:17.189  READ FPDMA QUEUED
  ec 00 00 00 00 00 00 00      00:16:17.189  IDENTIFY DEVICE

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     39165         -
# 2  Short offline       Completed without error       00%     39137         -
# 3  Short offline       Completed without error       00%     38969         -
# 4  Short offline       Completed without error       00%     38817         -
# 5  Short offline       Completed without error       00%     38816         -
# 6  Extended offline    Completed without error       00%     38579         -
# 7  Short offline       Completed without error       00%     38485         -
# 8  Short offline       Completed without error       00%     38316         -
# 9  Short offline       Completed without error       00%     38148         -
#10  Short offline       Completed without error       00%     37980         -
#11  Short offline       Completed without error       00%     37812         -
#12  Extended offline    Completed without error       00%     37739         -
#13  Short offline       Completed without error       00%     37477         -
#14  Short offline       Completed without error       00%     37309         -
#15  Short offline       Completed without error       00%     37141         -
#16  Extended offline    Completed without error       00%     37068         -
#17  Short offline       Completed without error       00%     36973         -
#18  Short offline       Completed without error       00%     36806         -
#19  Short offline       Completed without error       00%     36637         -
#20  Short offline       Completed without error       00%     36469         -
#21  Extended offline    Completed without error       00%     36396         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

SMART drive data for ‘sdh’:

# smartctl -a /dev/sdh
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red Plus
Device Model:     WDC WD100EFAX-68LHPN0
Serial Number:    JEK9190N
LU WWN Device Id: 5 000cca 267ee62b2
Firmware Version: 83.H0A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5770
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jul 23 22:30:53 2025 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(   93) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (1353) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   130   130   054    Old_age   Offline      -       110
  3 Spin_Up_Time            0x0007   147   147   024    Pre-fail  Always       -       450 (Average 447)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       79
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   095   095   000    Old_age   Always       -       39257
 10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       79
 22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   003   003   000    Old_age   Always       -       116774
193 Load_Cycle_Count        0x0012   003   003   000    Old_age   Always       -       116774
194 Temperature_Celsius     0x0002   240   240   000    Old_age   Always       -       27 (Min/Max 9/45)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       704

SMART Error Log Version: 1
ATA Error Count: 704 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 704 occurred at disk power-on lifetime: 38585 hours (1607 days + 17 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 40 00 68 20 83 40 00  33d+05:59:24.955  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00  33d+05:59:24.955  READ LOG EXT
  60 c0 18 a8 20 83 40 00  33d+05:59:24.954  READ FPDMA QUEUED
  60 40 18 28 19 83 40 00  33d+05:59:24.948  READ FPDMA QUEUED
  60 40 00 e8 18 83 40 00  33d+05:59:24.948  READ FPDMA QUEUED

Error 703 occurred at disk power-on lifetime: 38585 hours (1607 days + 17 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 e8 00 b0 57 c3 40 00  33d+05:56:08.383  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00  33d+05:56:08.383  READ LOG EXT
  60 d0 18 98 62 c3 40 00  33d+05:56:08.379  READ FPDMA QUEUED
  60 b0 10 e8 5d c3 40 00  33d+05:56:08.379  READ FPDMA QUEUED
  60 50 08 98 5c c3 40 00  33d+05:56:08.379  READ FPDMA QUEUED

Error 702 occurred at disk power-on lifetime: 38585 hours (1607 days + 17 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 c0 00 f0 cf 30 40 00  33d+05:53:48.550  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00  33d+05:53:48.550  READ LOG EXT
  60 80 08 b0 d0 30 40 00  33d+05:53:48.550  READ FPDMA QUEUED
  60 40 00 b0 cd 30 40 00  33d+05:53:48.549  READ FPDMA QUEUED
  60 40 08 70 cd 30 40 00  33d+05:53:48.549  READ FPDMA QUEUED

Error 701 occurred at disk power-on lifetime: 38585 hours (1607 days + 17 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 10 10 8e 9d 40 00  33d+05:53:10.578  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00  33d+05:53:10.578  READ LOG EXT
  60 00 08 10 86 9d 40 00  33d+05:53:10.559  READ FPDMA QUEUED
  60 00 00 10 7e 9d 40 00  33d+05:53:10.559  READ FPDMA QUEUED
  60 00 10 10 76 9d 40 00  33d+05:53:10.526  READ FPDMA QUEUED

Error 700 occurred at disk power-on lifetime: 38585 hours (1607 days + 17 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 c8 36 cd 40 00  33d+05:52:40.997  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00  33d+05:52:40.997  READ LOG EXT
  60 00 08 c8 3e cd 40 00  33d+05:52:40.967  READ FPDMA QUEUED
  60 00 10 c8 2e cd 40 00  33d+05:52:40.950  READ FPDMA QUEUED
  60 00 08 c8 26 cd 40 00  33d+05:52:40.941  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     39256         -
# 2  Short offline       Completed without error       00%     39227         -
# 3  Short offline       Completed without error       00%     39059         -
# 4  Short offline       Completed without error       00%     38891         -
# 5  Short offline       Completed without error       00%     38888         -
# 6  Short offline       Completed without error       00%     38752         -
# 7  Short offline       Completed without error       00%     38732         -
# 8  Extended offline    Completed without error       00%     38673         -
# 9  Short offline       Completed without error       00%     38575         -
#10  Extended offline    Completed without error       00%     38564         -
#11  Short offline       Completed without error       00%     38406         -
#12  Short offline       Completed without error       00%     38238         -
#13  Short offline       Completed without error       00%     38070         -
#14  Short offline       Completed without error       00%     37902         -
#15  Extended offline    Completed without error       00%     37833         -
#16  Short offline       Completed without error       00%     37567         -
#17  Short offline       Completed without error       00%     37399         -
#18  Short offline       Completed without error       00%     37231         -
#19  Extended offline    Completed without error       00%     37162         -
#20  Short offline       Completed without error       00%     37063         -
#21  Short offline       Completed without error       00%     36896         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

SMART drive data for ‘sdi’:

# smartctl -a /dev/sdi
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red Plus
Device Model:     WDC WD100EFAX-68LHPN0
Serial Number:    2YK3RZLD
LU WWN Device Id: 5 000cca 273ebf9ea
Firmware Version: 83.H0A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5770
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jul 23 22:34:51 2025 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(   93) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (1140) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   129   129   054    Old_age   Offline      -       112
  3 Spin_Up_Time            0x0007   145   145   024    Pre-fail  Always       -       449 (Average 456)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       78
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   095   095   000    Old_age   Always       -       39257
 10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       78
 22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   002   002   000    Old_age   Always       -       118383
193 Load_Cycle_Count        0x0012   002   002   000    Old_age   Always       -       118383
194 Temperature_Celsius     0x0002   004   004   000    Old_age   Always       -       25 (Min/Max 8/45)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       117392

SMART Error Log Version: 1
ATA Error Count: 65535 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 65535 occurred at disk power-on lifetime: 38737 hours (1614 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 e0 18 a0 f7 05 40 00      06:01:07.577  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      06:01:07.577  READ LOG EXT
  60 d8 08 80 07 06 40 00      06:01:07.567  READ FPDMA QUEUED
  60 00 58 80 ff 05 40 00      06:01:07.563  READ FPDMA QUEUED
  60 18 10 88 f6 05 40 00      06:01:07.563  READ FPDMA QUEUED

Error 65534 occurred at disk power-on lifetime: 38737 hours (1614 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 20 00 f0 c7 05 40 00      06:01:07.525  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      06:01:07.525  READ LOG EXT
  60 e8 30 b8 e7 05 40 00      06:01:07.509  READ FPDMA QUEUED
  60 f0 10 c8 df 05 40 00      06:01:07.508  READ FPDMA QUEUED
  60 28 50 a0 de 05 40 00      06:01:07.496  READ FPDMA QUEUED

Error 65533 occurred at disk power-on lifetime: 38737 hours (1614 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 c0 18 58 98 05 40 00      06:01:07.470  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      06:01:07.470  READ LOG EXT
  60 e8 58 e8 cf 05 40 00      06:01:07.460  READ FPDMA QUEUED
  60 30 50 c0 c7 05 40 00      06:01:07.460  READ FPDMA QUEUED
  60 c8 48 f8 bf 05 40 00      06:01:07.460  READ FPDMA QUEUED

Error 65532 occurred at disk power-on lifetime: 38737 hours (1614 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 e8 00 e8 cf 05 40 00      06:01:07.444  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      06:01:07.444  READ LOG EXT
  2f 00 01 10 00 00 00 00      06:01:07.436  READ LOG EXT
  2f 00 01 10 00 00 00 00      06:01:07.436  READ LOG EXT
  60 d8 28 10 ce 05 40 00      06:01:07.425  READ FPDMA QUEUED

Error 65531 occurred at disk power-on lifetime: 38737 hours (1614 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 c0 48 58 98 05 40 00      06:01:07.436  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      06:01:07.436  READ LOG EXT
  60 d8 28 10 ce 05 40 00      06:01:07.425  READ FPDMA QUEUED
  60 20 20 f0 c7 05 40 00      06:01:07.425  READ FPDMA QUEUED
  60 30 18 c0 c7 05 40 00      06:01:07.425  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     39256         -
# 2  Short offline       Completed without error       00%     39227         -
# 3  Short offline       Completed without error       00%     39059         -
# 4  Short offline       Completed without error       00%     38891         -
# 5  Short offline       Completed without error       00%     38888         -
# 6  Short offline       Completed without error       00%     38752         -
# 7  Short offline       Completed without error       00%     38732         -
# 8  Extended offline    Completed without error       00%     38669         -
# 9  Short offline       Completed without error       00%     38575         -
#10  Short offline       Completed without error       00%     38406         -
#11  Short offline       Completed without error       00%     38238         -
#12  Short offline       Completed without error       00%     38070         -
#13  Short offline       Completed without error       00%     37902         -
#14  Extended offline    Completed without error       00%     37829         -
#15  Short offline       Completed without error       00%     37567         -
#16  Short offline       Completed without error       00%     37399         -
#17  Short offline       Completed without error       00%     37231         -
#18  Extended offline    Completed without error       00%     37158         -
#19  Short offline       Completed without error       00%     37063         -
#20  Short offline       Completed without error       00%     36896         -
#21  Short offline       Completed without error       00%     36727         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

SMART drive data for ‘sdd’:

# smartctl -a /dev/sdd
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red Plus
Device Model:     WDC WD100EFAX-68LHPN0
Serial Number:    2YK2ZP4D
LU WWN Device Id: 5 000cca 273eba26e
Firmware Version: 83.H0A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5770
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jul 23 22:45:52 2025 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(   93) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (1154) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   130   130   054    Old_age   Offline      -       108
  3 Spin_Up_Time            0x0007   145   145   024    Pre-fail  Always       -       452 (Average 453)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       78
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   095   095   000    Old_age   Always       -       39167
 10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       78
 22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   004   004   000    Old_age   Always       -       115988
193 Load_Cycle_Count        0x0012   004   004   000    Old_age   Always       -       115988
194 Temperature_Celsius     0x0002   004   004   000    Old_age   Always       -       25 (Min/Max 9/45)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       6014

SMART Error Log Version: 1
ATA Error Count: 6014 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 6014 occurred at disk power-on lifetime: 38660 hours (1610 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 e0 00 a0 fe 3f 40 00      19:26:50.645  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      19:26:50.644  READ LOG EXT
  60 e0 08 a0 fe 3f 40 00      19:26:50.636  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      19:26:50.620  READ LOG EXT
  2f 00 01 10 00 00 00 00      19:26:50.619  READ LOG EXT

Error 6013 occurred at disk power-on lifetime: 38660 hours (1610 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 e0 00 a0 fe 3f 40 00      19:26:50.619  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      19:26:50.619  READ LOG EXT
  60 e0 08 a0 fe 3f 40 00      19:26:50.616  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      19:26:50.603  READ LOG EXT
  2f 00 01 10 00 00 00 00      19:26:50.603  READ LOG EXT

Error 6012 occurred at disk power-on lifetime: 38660 hours (1610 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 e0 00 a0 fe 3f 40 00      19:26:50.603  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      19:26:50.603  READ LOG EXT
  60 e0 10 a0 fc 3f 40 00      19:26:50.560  READ FPDMA QUEUED
  60 e0 08 a0 fe 3f 40 00      19:26:50.560  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      19:26:50.537  READ LOG EXT

Error 6011 occurred at disk power-on lifetime: 38660 hours (1610 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 e0 10 a0 fc 3f 40 00      19:26:50.537  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      19:26:50.537  READ LOG EXT
  60 e0 08 a0 fe 3f 40 00      19:26:50.388  READ FPDMA QUEUED
  60 e0 00 a0 fe 3f 40 00      19:26:50.388  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      19:26:50.370  READ LOG EXT

Error 6010 occurred at disk power-on lifetime: 38660 hours (1610 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 41 00 00 00 00 00  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 e0 18 a0 fc 3f 40 00      19:26:50.370  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00      19:26:50.370  READ LOG EXT
  60 e0 10 a0 fe 3f 40 00      19:26:50.368  READ FPDMA QUEUED
  60 e0 08 a0 fe 3f 40 00      19:26:50.368  READ FPDMA QUEUED
  60 e0 00 20 fe 3f 40 00      19:26:50.368  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     39166         -
# 2  Short offline       Completed without error       00%     39137         -
# 3  Short offline       Completed without error       00%     38969         -
# 4  Short offline       Completed without error       00%     38801         -
# 5  Short offline       Completed without error       00%     38798         -
# 6  Short offline       Completed without error       00%     38663         -
# 7  Extended offline    Completed without error       00%     38580         -
# 8  Short offline       Completed without error       00%     38485         -
# 9  Short offline       Completed without error       00%     38316         -
#10  Short offline       Completed without error       00%     38148         -
#11  Short offline       Completed without error       00%     37980         -
#12  Short offline       Completed without error       00%     37812         -
#13  Extended offline    Completed without error       00%     37739         -
#14  Short offline       Completed without error       00%     37477         -
#15  Short offline       Completed without error       00%     37309         -
#16  Short offline       Completed without error       00%     37141         -
#17  Extended offline    Completed without error       00%     37068         -
#18  Short offline       Completed without error       00%     36973         -
#19  Short offline       Completed without error       00%     36806         -
#20  Short offline       Completed without error       00%     36637         -
#21  Short offline       Completed without error       00%     36469         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

SMART drive data for ‘sde’ (the drive being re-silvered):

# smartctl -a /dev/sde
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red Plus
Device Model:     WDC WD100EFAX-68LHPN0
Serial Number:    JEGSBJDM
LU WWN Device Id: 5 000cca 267ca9fd0
Firmware Version: 83.H0A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5770
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jul 23 22:49:12 2025 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(   93) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (1154) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   130   130   054    Old_age   Offline      -       108
  3 Spin_Up_Time            0x0007   216   216   024    Pre-fail  Always       -       172 (Average 436)
  4 Start_Stop_Count        0x0012   099   099   000    Old_age   Always       -       5086
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   095   095   000    Old_age   Always       -       41395
 10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       70
 22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   064   064   000    Old_age   Always       -       44289
193 Load_Cycle_Count        0x0012   064   064   000    Old_age   Always       -       44289
194 Temperature_Celsius     0x0002   250   250   000    Old_age   Always       -       26 (Min/Max 6/50)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     41393         -
# 2  Short offline       Completed without error       00%     41365         -
# 3  Short offline       Completed without error       00%     41197         -
# 4  Short offline       Completed without error       00%     41029         -
# 5  Short offline       Completed without error       00%     41026         -
# 6  Short offline       Completed without error       00%     40890         -
# 7  Short offline       Completed without error       00%     40869         -
# 8  Extended offline    Completed without error       00%     40807         -
# 9  Short offline       Completed without error       00%     40712         -
#10  Short offline       Completed without error       00%     37399         -
#11  Extended offline    Completed without error       00%     37325         -
#12  Short offline       Completed without error       00%     37230         -
#13  Short offline       Completed without error       00%     37229         -
#14  Short offline       Completed without error       00%     37061         -
#15  Short offline       Completed without error       00%     36893         -
#16  Short offline       Completed without error       00%     36725         -
#17  Extended offline    Completed without error       00%     36652         -
#18  Short offline       Completed without error       00%     36558         -
#19  Short offline       Completed without error       00%     36389         -
#20  Short offline       Completed without error       00%     36221         -
#21  Short offline       Completed without error       00%     36053         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

Oh my,

These 4 disks are all showing ATA Errors, some 100s but upwards of 1000s.

This looks like an easy problem to troubleshoot.

UDMA_CRC_Errors are typically caused by a faulty data cable, backplane, or HBA (including motherboard).

I’m not familiar with the server you have, others here would likely familiar, but you have several drives which have no issues at all and a few drives with the UDMA CRC Errors.

Questions:

  1. Did you recently physically do anything to the computer, relocate it for example, or open the case and put your hands in it?
  2. Is there any commonality of hardware between the drives which have the errors and being seperated by the drive not having errors? Such as the bad group share the same data cable going into a backplane?
  3. Is the HBA being cooled properly?
  4. If you do not see any single item that could possibly be the single point of failure, do you have another HBA you could use?
    The drive Serial Number: 2YK3RZLD definitely has some bad data between it and the HBA.

I highly doubt you suddenly developed multiple drives with the same UDMA CRC Errors being generated.

Good luck

EDIT: I wanted to tell you that I highly doubt the drives are failing. It is something else.

3 Likes

My advice would be to try the following in sequence and stop if a command gives you NO OUTPUT indicating success:

  1. sudo zpool import -R /mnt hd01
  2. sudo zpool import -R /mnt -f hd01
  3. sudo zpool import -R /mnt -F hd01
  4. sudo zpool import -R /mnt -fF hd01

None of these should cause any further corruption or make things worse, nor cause data loss. However, if you want to be cautious and have the patience to wait for feedback, try only the first two and post the results.

1 Like

100% agree. Usually, it can be as simple as re-seating all the SATA connectors, perhaps after applying a bit of DeOxIt to all the pins. I replace cables if that doesn’t work.

Also: do not discount the possibility of a bad power cable connection. ZFS has helpfully told me to delete my pool and replace from backups when all I had was a loose power connector.

Reallocated sector count is the most important indicator to look at and for this attribute all drives look good.

Thanks @joeschmuck for your queries here. I can respond providing additional information on how I got myself into this situation.

Is there any commonality of hardware between the drives which have the errors and being seperated by the drive not having errors? Such as the bad group share the same data cable going into a backplane?

I can confirm all the HDD drives share the same common connection and are plugged into the one backplane (BPN-SAS3-826EL1) and this is connected by a single MiniSAS HD to MiniSAS HD cable (CBL-SAST-0531) to the HBA (SAS9300-8i).

I highly doubt you suddenly developed multiple drives with the same UDMA CRC Errors being generated.

Correct, I obtained my first UDMA errors back in January on 2 drives, serial number: 2YK2ZP4D (271 errors) and 2YK3RZLD (94 errors) after a server reboot. The errors remained constant and unchanged (so I discounted them) until 5th May.

There was a power cut in my area and I gracefully shutdown the server (which is protected by a UPS). On the restoration of power I rebooted the server and an additional drive serial number JEK9190N reported 704 UDMA errors on the MultiReport so I knew I had a more serious problem to deal with.

Did you recently physically do anything to the computer, relocate it for example, or open the case and put your hands in it?

Yes, after the 5th May incident I tried relocating drives to different drive slots in the backplane and changed the MiniSAS cable. I saw error logs referencing drive serial 2YK2ZPAD and made a bad decision to replace this drive with serial JEGSBJDM forcing a resilver which stuck fast at 12% so I had no choice but to manually shutdown the server. On the reboot of the server it threw up further SAS errors in the server log and a cascade of UDMA CRC errors on the indicated hard drives (as listed in my original post) and the hd01 pool was no longer available in the TrueNAS GUI.

For reference I received the following sas errors on the old HBA 31st May.

May 31 01:09:36 emnas.local kernel: sd 0:0:9:0: Power-on or device reset occurred
May 31 01:09:36 emnas.local kernel: sd 0:0:9:0: [sdl] Unaligned partial completion (resid=114656, sector_sz=512)
May 31 01:09:36 emnas.local kernel: sd 0:0:9:0: [sdl] tag#1008 CDB: Read(16) 88 00 00 00 00 00 00 00 00 a0 00 00 00 e0 00 00
May 31 01:09:36 emnas.local kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
May 31 01:09:36 emnas.local kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
May 31 01:09:36 emnas.local kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
May 31 01:09:36 emnas.local kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
May 31 01:10:08 emnas.local kernel: sd 0:0:9:0: attempting task abort!scmd(0x00000000e44a5305), outstanding for 37452 ms & timeout 30000 ms
May 31 01:10:08 emnas.local kernel: sd 0:0:9:0: [sdl] tag#1019 CDB: Read(16) 88 00 00 00 00 00 00 40 00 a0 00 00 00 e0 00 00
May 31 01:10:08 emnas.local kernel: scsi target0:0:9: handle(0x0012), sas_address(0x5003048001918901), phy(1)
May 31 01:10:08 emnas.local kernel: scsi target0:0:9: enclosure logical id(0x500304800191893f), slot(1) 
May 31 01:10:08 emnas.local kernel: scsi target0:0:9: enclosure level(0x0000), connector name(     )
May 31 01:10:08 emnas.local kernel: sd 0:0:9:0: task abort: SUCCESS scmd(0x00000000e44a5305)
May 31 01:10:08 emnas.local kernel: sd 0:0:9:0: attempting task abort!scmd(0x00000000601b61df), outstanding for 37740 ms & timeout 30000 ms
May 31 01:10:08 emnas.local kernel: sd 0:0:9:0: [sdl] tag#1018 CDB: Read(16) 88 00 00 00 00 00 00 00 00 a0 00 00 00 e0 00 00
May 31 01:10:08 emnas.local kernel: scsi target0:0:9: handle(0x0012), sas_address(0x5003048001918901), phy(1)
May 31 01:10:08 emnas.local kernel: scsi target0:0:9: enclosure logical id(0x500304800191893f), slot(1) 
May 31 01:10:08 emnas.local kernel: scsi target0:0:9: enclosure level(0x0000), connector name(     )
May 31 01:10:08 emnas.local kernel: sd 0:0:9:0: No reference found at driver, assuming scmd(0x00000000601b61df) might have completed
May 31 01:10:08 emnas.local kernel: sd 0:0:9:0: task abort: SUCCESS scmd(0x00000000601b61df)
May 31 01:10:09 emnas.local kernel: sd 0:0:9:0: Power-on or device reset occurred
May 31 01:10:09 emnas.local kernel: sd 0:0:9:0: [sdl] Unaligned partial completion (resid=114013, sector_sz=512)
May 31 01:10:09 emnas.local kernel: sd 0:0:9:0: [sdl] tag#981 CDB: Read(16) 88 00 00 00 00 00 00 40 00 a0 00 00 00 e0 00 00
May 31 01:10:09 emnas.local kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)

The errors above confirmed to me that the HBA was the root cause of the UDMA CRC errors. At this point I shutdown the server and ordered a new HBA.

Is the HBA being cooled properly?

I purchased an additional air baffle (part number MCP-310-21905-0B) which I installed in June along with the new HBA (SAS9300-8i) to help focus and direct air flow towards the back of the server where the HBA is installed.

I’m aware the location of the HBA is not ideal in the CSE-829U case as it sits close to the power supply and my server is installed in an attic.

I use stock standard SuperMicro fans (so the server can get noisy). I only have one CPU installed in this dual CPU server. This means I’m currently limited to using this one PCI slot for the HBA. Purchasing a second CPU would open up the possibility of relocating the HBA to an alternative PCI slot. So, yes, this remains an area of weakness and a single point of failure.

I can monitor the ambient room temperatures in the attic through my UPS. Is there any way of monitoring the running temperatures of a SAS9300-8i card?

If you do not see any single item that could possibly be the single point of failure, do you have another HBA you could use?

Yes, I purchased and installed the new HBA into the server in June. This resolved the sas errors in the server log and I believe this now resolves the UDMA CRC errors.

I’m ready to attempt an import of the hd01 pool and I will be strictly following all guidance received from this forum.

EDIT: I wanted to tell you that I highly doubt the drives are failing. It is something else.

It is reassuring to know that the drives (as they currently stand) look OK, with a question mark on drive serial 2YK3RZLD containing bad data. I suspect drive serial JEGSBJDM is of no benefit to the hd01 pool. Any data written would have been through the old faulty HBA card and this drive replacement failed and still remains in the process of being resilvered.

@ericm It looks like you have this all covered, good deal.

Not that I am aware of.

Have you used Multi-Report? (see my link below if not). It will track those pesky UDMA CRC Errors and let you know if more are generated. When creating the initial configuration file, the last question is if you want it to scan the drives for drive error data such as UDMA CRC Errors. Answer yes. When you run the script it generates a report and you will see two values in the UDMA CRC Errors column for drives with these errors. You will see a “0” and in parenthesis the drive reported value. When the “0” value increases, you know you have additional errors. It beasts trying to keep track of large numbers.

As you probably know, UDMA_CRC_Errors never clear. I have no clue why they are permanent, but they are. I’m sure there is a reason.

@Protopia thank you very much for your advice. I have run the first 2 commands.

Here is the output I received:

# sudo zpool import -R /mnt hd01
cannot import 'hd01': I/O error
	Destroy and re-create the pool from
	a backup source.
# sudo zpool import -R /mnt -f hd01
cannot import 'hd01': I/O error
	Destroy and re-create the pool from
	a backup source.

The output doesn’t look like much to go on!

I prefer to take the cautious approach so will wait for your further advice and instruction.

Have you used Multi-Report?

Oh yes, I use Multi-Report on both of my servers to e-mail out the weekly drive status for the HDDs and SSDs I have running. Very much appreciate your dedication to updating and developing this important reporting tool along with all the supporting documentation. In addition to this, your drive troubleshooting flowcharts are excellent.

Ok - let’s try a trial run which won’t do anything but should tell us whether it will succeed on a later non-trial attempt:

  • sudo zpool import -R /mnt -fFn hd01

This can sometimes take some time as it looks to see if rolling back a few seconds of writes can make the pool consistent again. As I said, this is a trial, and it won’t actually import it or lose a few writes, but it will indicate whether this approach will work if we try it for real.

So let’s see what the output from this says.

Before checking on the command above, do you have excellent cooling across your HBA. LSI / Broadcom docs say about 200 LPM (linear feet per minute) of air flow. You won’t have that stock without a server case or adding fans to consumer cases.

I use a standard server case. I have changed the fan speed from Optimal to HeavyIO speed which can be heard! I’m not sure if that gets the air flow to 200 LPM but it’s the best setting I can go for.

It’s 3am here and about the coolest time. Thanks @SmallBarky for your advice.

OK, great.

After changing the server fan speed (with a view to looking after the HBA), I created a tmux session to run this trial import command in the knowledge that it could take a long time to complete.

Looks like it ran very quickly (within a minute) without any messages being received back!

# sudo zpool import -R /mnt -fFn hd01
# 

I take it that no output means good news and that I could have success running this command outside the trial session?

I will await your further advice before running any other commands.

That is 1000% correct. However, seeing error logs reported from something as low level as the ATA bus is usually an indication that something is not quite right. I have seen SATA drives corrupt data in exactly this way.

Error 65535 occurred at disk power-on lifetime: 38737 hours (1614 days + 1 hours)

this line in particular is an indicator. Some errors are fine and as long as they are not increasing quickly, in general its usually “fine” for a while. But drives that have these latent errors can sometimes start iterating these quickly.

Personal opinion, When I see that many decimal places, its worth checking into the disks as a next step. This is especially true because the ATA Error count is wildly differant between some drives with a couple hundred and others with thousands and tens of thousands.

AI Generated Chart:

Drive Model Family Serial Number ATA Error Count
sdf Western Digital Red Plus 2YK3H7HD 395
sdh Western Digital Red Plus JEK9190N 704
sdi Western Digital Red Plus 2YK3RZLD 65535
sdd Western Digital Red Plus 2YK2ZP4D 6014

If they were all about the same error count, I would suspect the HBA/Layer 1/etc but since they are not I suspect at least one of these drives is unfortunately bad. Since OP already replaced the HBA that kinda proves its probably not the issue, or at least it isn’t any longer.

Also Maximum 16-bit unsigned integer value is the maximum size of this field, and 65535 is that maximum.

I say all this to say, the pool might just import automagically on the next reboot if we power down and kick out sdi but I want to check something first. We don’t have the luxury of SAS drives here, which typically fail in a more predicable way. Sometimes (especially) SATA drives fail poorly.

Sidebar: @joeschmuck do you grep this field? The whole tracking over time aspect of this is why I asked. IIRC it’d be hard because theres differant permutations of the line.

I dont think this path is the answer here just yet. Lets back up a couple levels quick and check the disks.

OP its your call. We can dig in deeper and try to figure out what happened or we can try to get you up and running.

Either pull the sdi disk first and try and see if it just works,

Or we can check the disk speeds to confirm sdi is a problem, but maybe more.

There is some risk involved with trying to read data off a suspect disk like this. I want to check to see if the other disks are also okay, and we’re reading from the disks anyway if we’re trying to import the pool.

DANGER: This one line script needs to be run as root, you are taking the responsibility for typing this into your system as your own risk.

It will read the first 10G of each of the disks nondisruptively into RAM. It will print out progress with stats to the screen and to a file. If you can copy the output back to me, that may give us some clues.

for d in /dev/sd? /dev/sd?? /dev/nvme*n1; do [ -b "$d" ] && echo -e "\n=== Reading 10G from $d ===" && dd if="$d" of=/dev/null bs=1M count=10240 status=progress; done 2>&1 | tee /root/read_10g_from_disks.log

Example output

root@m50[~]# for d in /dev/sd? /dev/sd?? /dev/nvme*n1; do [ -b "$d" ] && echo -e "\n=== Reading 10G from $d ===" && dd if="$d" of=/dev/null bs=1M count=10240 status=progress; done 2>&1 | tee /root/read_10g_from_disks.log

=== Reading 10G from /dev/sda ===
10634657792 bytes (11 GB, 9.9 GiB) copied, 21 s, 506 MB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 21.2289 s, 506 MB/s

=== Reading 10G from /dev/sdb ===
10568597504 bytes (11 GB, 9.8 GiB) copied, 41 s, 258 MB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 41.6714 s, 258 MB/s

=== Reading 10G from /dev/sdc ===
9901703168 bytes (9.9 GB, 9.2 GiB) copied, 38 s, 260 MB/s

You’d want to run this when the pool is not imported. As in this situation.
Otherwise the first couple ones will always be faster (ARC) especially on not very busy systems and systems with alot of RAM.

Yes - this suggests that if you run sudo zpool import -R /mnt -fF hd01 (without the -n) the pool will import with the loss of a few transactions worth of data. BUT despite this indication it might not import.

If you want to do a bit more diagnostics and run sudo zdb -l /dev/sdXn for the ZFS partition on each disk (i.e. /dev/sda1 if that is a ZFS partition), we can see what the last TXG number was and thus get a feel for how much data might be lost.

If you want to map the drives and partitions in the pool then run:

  • lsblk -bo NAME,LABEL,MAJ:MIN,TRAN,ROTA,ZONED,VENDOR,MODEL,SERIAL,PARTUUID,START,SIZE,PARTTYPENAME

which will give you the partuuids and hopefully the LABEL will show “hd01”.

The UDMA_CRC_Errors_Count ? Of course. I record 23 columns of data per drive and the default maximum file length is 720 days. It is in the statisticalsmartdata.csv (file name by default). If there is a need to track anything else, please let me know and I could add it.

EDIT: I do not record the individual Error 65535 occurred at messages as this would make the CSV file explode! I would have to record most of smartctl -x /dev/sdz output and for some drives, it can be long. But I can add it, or try to add it, but I need more to go on besides matching “Error 65535”. I am working on the next version of the script but as of now I don’t see it going out the door for a while unless there was a problem with the script on GitHub.

It’s not “Error 65535”, in the log comment I grabbed it was saying the 65535th error occured when the drive had x number of spindle hours.

A little further up in the SMART output (on the bottum under UDMA_CRC_Error_Count) if ATA errors exist, you will see a whole new section that does not usually show up in smartctl -a

SMART Error Log Version: 1
ATA Error Count: 6014 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
SMART Error Log Version: 1
ATA Error Count: 65535 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]

SMART Error Log Version: 1
ATA Error Count: 704 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
SMART Error Log Version: 1
ATA Error Count: 395 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]