ENX IO error after random Freeze

Hello,

I have a TrueNAS Core NAS with a Supermicro X11-SLM-F Motherboard and 6x Seagate Exos X18 18 TB HDDs in a RAIDz2.

A couple days ago, my NAS just randomly froze. I couldn’t access it via SSH, the UI nor IPMI KVM Console, so yesterday I shut it down via IPMI Power Control. After that, it seemed to run fine.

Today, I couldn’t access it again, so I had to shut it down once more. After booting I saw that it was resilvering one of the drives (ada5). I was logged in in the UI, SSH and had a KVM console running. After about 10 minutes or so, I tried to check the zpool status and the command froze. I tried the UI, but it didn’t load any concent. I tried the same command in the KVM console and it froze as well - I had to power it down via IPMI one more time.

Now, when I boot the NAS up, I get the following error:

vdev .c:166:vdev_dbgmsq(): raidz-0 vdev (guid 16332391713668644581): zio_wdev_io_ assess(zio=0xfffff801ald984dBAx) setting cant_write=TRUE due to write failure with ENX IO

Using ChatGPT, I tried to troubleshoot this and went into Single User Mode. I could run commands, but the problem is that the output wasn’t readable because I can’t scroll up and the window is too small, so most of the output disappears. I rebooted one more time, but the above mentioned ENX IO error keeps popping up.

I checked all the cables. I don’t really now what to do now. Do you guys have any ideas?

Here a few screenshots:

  1. Using “zpool import” and “zpool status” in Single User Mode:

  1. The error message at the end of the boot process:

  2. Finally, a link to a video of the boot process since I cannot scroll up in KVM:

https://www.dropbox.com/scl/fi/51eoilruskbrydg3otvqf/Bootup.avi?rlkey=7e3rho151qqlri3kesc6b98rn&st=015uezuy&dl=0

I managed to enable SSH in Single User Mode. This is the output of zpool status:

root@:~ # zpool status
  pool: Data
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Jun 29 09:30:54 2025
        0B scanned at 0B/s, 0B issued at 0B/s, 46.3T total
        0B resilvered, 0.00% done, no estimated completion time
config:

        NAME                                            STATE     READ WRITE CKSUM
        Data                                            ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/f16a1008-9ac4-11ee-91ae-0cc47a406253  ONLINE       0     0     0
            gptid/24bf353a-a425-11ee-8096-0cc47a406253  ONLINE       0     0     0
            ada5p2                                      ONLINE     418     0     6
            ada2p2                                      ONLINE     254     0     0
            gptid/c46504b2-a376-11ee-93c7-0cc47a406253  ONLINE       0     0     0
            gptid/0b1ff3e2-9ec0-11ee-b90a-0cc47a406253  ONLINE       0     0     0

errors: No known data errors

I assume this error came because the server froze while resilvering and I had to power it off forecefully. How can I use this information now to get the pool working without messing it up furthermore?

SMART Output of ada2
root@:~ # smartctl -a /dev/ada2
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST18000NM000J-2TV103
Serial Number:    WR5089XQ
LU WWN Device Id: 5 000c50 0ed391d7f
Firmware Version: SN02
User Capacity:    18,000,207,937,536 bytes [18.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jul  2 11:06:46 2025 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  567) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (1578) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   064   044    Pre-fail  Always       -       2374144
  3 Spin_Up_Time            0x0003   094   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       41
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   086   060   045    Pre-fail  Always       -       385669600
  9 Power_On_Hours          0x0032   085   085   000    Old_age   Always       -       13330
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       41
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   062   052   000    Old_age   Always       -       38 (Min/Max 37/38)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       32
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       746
194 Temperature_Celsius     0x0022   038   048   000    Old_age   Always       -       38 (0 21 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       13246 (176 165 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       42258101644
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       1992636884918

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SMART Output of ada5
root@:~ # smartctl -a /dev/ada5
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST18000NM000J-2TV103
Serial Number:    xxx
LU WWN Device Id: 5 000c50 0ece30a22
Firmware Version: SN02
User Capacity:    18,000,207,937,536 bytes [18.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Jul  2 10:57:23 2025 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  559) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (1512) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   083   064   044    Pre-fail  Always       -       187542825
  3 Spin_Up_Time            0x0003   095   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       163
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   085   060   045    Pre-fail  Always       -       351887469
  9 Power_On_Hours          0x0032   085   085   000    Old_age   Always       -       13163
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       163
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   063   046   000    Old_age   Always       -       37 (Min/Max 30/37)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       156
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       712
194 Temperature_Celsius     0x0022   037   054   000    Old_age   Always       -       37 (0 21 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       13078 (177 187 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       41802673948
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       906994377032

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I have removed ada5 by unplugging it and managed to boot properly into TrueNAS. It started resilvering and has been stuck on 18,11 % for 5 - 8 hours.

zpool status
root@truenas[~]# zpool status
  pool: Data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Jul  2 13:34:34 2025
        10.4T scanned at 321M/s, 8.38T issued at 258M/s, 46.3T total
        4.57M resilvered, 18.11% done, 1 days 18:42:33 to go
config:

        NAME                                            STATE     READ WRITE CKSUM
        Data                                            DEGRADED     0     0     0
          raidz2-0                                      DEGRADED 6.97K    18     0
            gptid/f16a1008-9ac4-11ee-91ae-0cc47a406253  ONLINE       0     0 1.87K
            gptid/24bf353a-a425-11ee-8096-0cc47a406253  ONLINE       0     0 2.03K  (resilvering)
            4593813305476830719                         UNAVAIL      0     0     0  was /dev/ada5p2
            ada2p2                                      ONLINE       0     0 2.03K  (resilvering)
            gptid/c46504b2-a376-11ee-93c7-0cc47a406253  ONLINE   15.5K    47 1.94K
            gptid/0b1ff3e2-9ec0-11ee-b90a-0cc47a406253  REMOVED      0     0     0

errors: 7203 data errors, use '-v' for a list

  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:08:01 with 0 errors on Wed Jun 25 03:53:01 2025
config:

        NAME                                          STATE     READ WRITE CKSUM
        boot-pool                                     ONLINE       0     0     0
          gptid/a39dbd2e-63d6-11eb-b3a3-0cc47a406253  ONLINE       0     0     0

errors: No known data errors

Apparently there have been 7203 data errors while resilvering. It jumped from 10 to 7203 errors on 18,11 % resilvering. I’m adding all the “zpool status” commands I ran in between:

Summary
root@truenas[~]# zpool status
  pool: Data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Jul  2 13:34:34 2025
        2.63T scanned at 135G/s, 1.30T issued at 66.7G/s, 46.3T total
        0B resilvered, 2.82% done, no estimated completion time
config:

        NAME                                            STATE     READ WRITE CKS                                                                                                                                                                    UM
        Data                                            DEGRADED     0     0                                                                                                                                                                         0
          raidz2-0                                      DEGRADED     0     0                                                                                                                                                                         0
            gptid/f16a1008-9ac4-11ee-91ae-0cc47a406253  ONLINE       0     0 1.1                                                                                                                                                                    9K
            gptid/24bf353a-a425-11ee-8096-0cc47a406253  ONLINE       0     0 1.2                                                                                                                                                                    5K
            4593813305476830719                         UNAVAIL      0     0                                                                                                                                                                         0  was /dev/ada5p2
            ada2p2                                      ONLINE       0     0 1.2                                                                                                                                                                    5K
            gptid/c46504b2-a376-11ee-93c7-0cc47a406253  ONLINE       0     0 1.1                                                                                                                                                                    8K
            gptid/0b1ff3e2-9ec0-11ee-b90a-0cc47a406253  REMOVED      0     0                                                                                                                                                                         0

errors: 4 data errors, use '-v' for a list

  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:08:01 with 0 errors on Wed Jun 25 03:53:01 2025
config:

        NAME                                          STATE     READ WRITE CKSUM
        boot-pool                                     ONLINE       0     0     0
          gptid/a39dbd2e-63d6-11eb-b3a3-0cc47a406253  ONLINE       0     0     0

errors: No known data errors
root@truenas[~]# zpool status
  pool: Data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Jul  2 13:34:34 2025
        5.45T scanned at 27.2G/s, 2.97T issued at 14.8G/s, 46.3T total
        0B resilvered, 6.42% done, no estimated completion time
config:

        NAME                                            STATE     READ WRITE CKSUM
        Data                                            DEGRADED     0     0     0
          raidz2-0                                      DEGRADED     0     0     0
            gptid/f16a1008-9ac4-11ee-91ae-0cc47a406253  ONLINE       0     0 1.63K
            gptid/24bf353a-a425-11ee-8096-0cc47a406253  ONLINE       0     0 1.70K
            4593813305476830719                         UNAVAIL      0     0     0  was /dev/ada5p2
            ada2p2                                      ONLINE       0     0 1.70K
            gptid/c46504b2-a376-11ee-93c7-0cc47a406253  ONLINE       0     0 1.61K
            gptid/0b1ff3e2-9ec0-11ee-b90a-0cc47a406253  REMOVED      0     0     0

errors: 9 data errors, use '-v' for a list

  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:08:01 with 0 errors on Wed Jun 25 03:53:01 2025
config:

        NAME                                          STATE     READ WRITE CKSUM
        boot-pool                                     ONLINE       0     0     0
          gptid/a39dbd2e-63d6-11eb-b3a3-0cc47a406253  ONLINE       0     0     0

errors: No known data errors
root@truenas[~]# zpool status
  pool: Data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Jul  2 13:34:34 2025
        5.45T scanned at 14.0G/s, 3.23T issued at 8.30G/s, 46.3T total
        2.91M resilvered, 6.99% done, 01:28:27 to go
config:

        NAME                                            STATE     READ WRITE CKSUM
        Data                                            DEGRADED     0     0     0
          raidz2-0                                      DEGRADED     0     0     0
            gptid/f16a1008-9ac4-11ee-91ae-0cc47a406253  ONLINE       0     0 1.65K
            gptid/24bf353a-a425-11ee-8096-0cc47a406253  ONLINE       0     0 1.73K  (resilvering)
            4593813305476830719                         UNAVAIL      0     0     0  was /dev/ada5p2
            ada2p2                                      ONLINE       0     0 1.73K  (resilvering)
            gptid/c46504b2-a376-11ee-93c7-0cc47a406253  ONLINE       0     0 1.64K
            gptid/0b1ff3e2-9ec0-11ee-b90a-0cc47a406253  REMOVED      0     0     0

errors: 10 data errors, use '-v' for a list

  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:08:01 with 0 errors on Wed Jun 25 03:53:01 2025
config:

        NAME                                          STATE     READ WRITE CKSUM
        boot-pool                                     ONLINE       0     0     0
          gptid/a39dbd2e-63d6-11eb-b3a3-0cc47a406253  ONLINE       0     0     0

errors: No known data errors
root@truenas[~]# zpool status
  pool: Data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Jul  2 13:34:34 2025
        6.32T scanned at 11.2G/s, 3.87T issued at 6.84G/s, 46.3T total
        2.91M resilvered, 8.36% done, 01:45:44 to go
config:

        NAME                                            STATE     READ WRITE CKSUM
        Data                                            DEGRADED     0     0     0
          raidz2-0                                      DEGRADED     0     0     0
            gptid/f16a1008-9ac4-11ee-91ae-0cc47a406253  ONLINE       0     0 1.67K
            gptid/24bf353a-a425-11ee-8096-0cc47a406253  ONLINE       0     0 1.76K  (resilvering)
            4593813305476830719                         UNAVAIL      0     0     0  was /dev/ada5p2
            ada2p2                                      ONLINE       0     0 1.76K  (resilvering)
            gptid/c46504b2-a376-11ee-93c7-0cc47a406253  ONLINE       0     0 1.67K
            gptid/0b1ff3e2-9ec0-11ee-b90a-0cc47a406253  REMOVED      0     0     0

errors: 10 data errors, use '-v' for a list

  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:08:01 with 0 errors on Wed Jun 25 03:53:01 2025
config:

        NAME                                          STATE     READ WRITE CKSUM
        boot-pool                                     ONLINE       0     0     0
          gptid/a39dbd2e-63d6-11eb-b3a3-0cc47a406253  ONLINE       0     0     0

errors: No known data errors
root@truenas[~]# zpool status
  pool: Data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Jul  2 13:34:34 2025
        10.4T scanned at 3.62G/s, 8.38T issued at 2.91G/s, 46.3T total
        4.57M resilvered, 18.11% done, 03:41:54 to go
config:

        NAME                                            STATE     READ WRITE CKSUM
        Data                                            DEGRADED     0     0     0
          raidz2-0                                      DEGRADED 6.97K    18     0
            gptid/f16a1008-9ac4-11ee-91ae-0cc47a406253  ONLINE       0     0 1.87K
            gptid/24bf353a-a425-11ee-8096-0cc47a406253  ONLINE       0     0 2.03K  (resilvering)
            4593813305476830719                         UNAVAIL      0     0     0  was /dev/ada5p2
            ada2p2                                      ONLINE       0     0 2.03K  (resilvering)
            gptid/c46504b2-a376-11ee-93c7-0cc47a406253  ONLINE   14.1K    47 1.94K
            gptid/0b1ff3e2-9ec0-11ee-b90a-0cc47a406253  REMOVED      0     0     0

errors: 7203 data errors, use '-v' for a list

  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:08:01 with 0 errors on Wed Jun 25 03:53:01 2025
config:

        NAME                                          STATE     READ WRITE CKSUM
        boot-pool                                     ONLINE       0     0     0
          gptid/a39dbd2e-63d6-11eb-b3a3-0cc47a406253  ONLINE       0     0     0

errors: No known data errors

Running “zpool status -v” gives me this error:

errors: List of errors unavailable: pool I/O is currently suspended

I get the feeling that all data is lost? :frowning: Can anybody help?

Edit: I think this is exactly what happened right at the beginning. I’m still SSHed into my NAS, but I cannot open a second SSH connection. The GUI also isn’t loading. I’m leaving everything on for now so I can still access my NAS.

I think you should boot a Live Linux distro and run memory tests and smart tests. You are very close to losing the entire pool and I can’t tell what disks were good and bad with them being swapped and resilvered by this process. I think you are hurting your recovery chances before verifying hardware health

Do you have a backup of the data pool or is this your only source? Do you have a current backup of your Configuration file?

Thanks for your reply!

Is it okay to shut the NAS down while it’s resilvering?

I have backups of the config and the important files.

I will do the memory and SMART test tomorrow and report back.

I will also change out all the SATA Cables. I have enough unused ones.

I want to make sure the hardware and OS will run and not crash first. If it stays up through stress tests for CPU and RAM, It should be okay to worry about the status of the drives and your pool health. If your hardware and OS crashes or is unstable, the next option would be new hardware, load TrueNAS and try to get the pool back to health.

What’s the proper way to handle the stuck resilvering though? Just shutting it down?

Let me try for an experienced, second opinion.
@HoneyBadger what do you think the best course of action is on this?
Summary, pool problems after freeze or crash. Systems looks unstable while its resilvering. I want to test hardware first.

I expect the resilvering will be fine if we get your system to shut down gracefully, aka normal shutdown. If it has locked up or crashed, I would just move on to a Live distro and the hardware tests, CPU, RAM and SMART Long.

2 Likes

I’m afraid the server won’t shut down properly. I sent a power off command in SSH and got a proper response, but the server isn’t shutting down. I’ll leave for work and look at it tonight.

This is what it has been stuck on after shutdown since this morning. I got this from the KVM console, which took a few minutes to load. Shall I just pull the plug?

I think pulling the plug is the right choice. I think time to try moving on to CPU, RAM and SMART Long tests using a Live USB or CD

I did just that. I’m starting with the SMART tests. each will take about 23h, so I have a long way to go. I’m starting with ada5 (which is now sbc in Ubuntu). I’ll report back after each test.

Edit: Would it be fine to run all SMART tests at once? The disks aren’t really doing anything.

Yes.
Smart tests are done by the drives themselves. Running the test on all drives in parallell is absolutely fine.

1 Like

Alright, so I have three tests that completed without any errors and three have been Interuppted with the message “Host reset”. Five of the drives got a new SATA cable and for one of them I had to use one of the pre-existing cables.

No errors:

WR509FP8
=== START OF INFORMATION SECTION ===
Device Model:     ST18000NM000J-2TV103
Serial Number:    WR509FP8
LU WWN Device Id: 5 000c50 0ed3b50e7
Firmware Version: SN02
User Capacity:    18.000.207.937.536 bytes [18,0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Jul  4 16:34:28 2025 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  559) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (1553) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   081   064   044    Pre-fail  Always       -       118994654
  3 Spin_Up_Time            0x0003   094   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       28
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   086   060   045    Pre-fail  Always       -       415016972
  9 Power_On_Hours          0x0032   085   085   000    Old_age   Always       -       13428
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       28
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   052   000    Old_age   Always       -       34 (Min/Max 34/42)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       16
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       914
194 Temperature_Celsius     0x0022   034   048   000    Old_age   Always       -       34 (0 21 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       13315 (139 75 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       52277590836
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       1892437255932

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     13427         -
# 2  Extended offline    Aborted by host               90%         2         -
# 3  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more
WR5089XQ (ada2)
=== START OF INFORMATION SECTION ===
Device Model:     ST18000NM000J-2TV103
Serial Number:    WR5089XQ
LU WWN Device Id: 5 000c50 0ed391d7f
Firmware Version: SN02
User Capacity:    18.000.207.937.536 bytes [18,0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Jul  4 16:36:09 2025 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  567) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (1578) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   080   064   044    Pre-fail  Always       -       106544119
  3 Spin_Up_Time            0x0003   093   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       44
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   086   060   045    Pre-fail  Always       -       395640566
  9 Power_On_Hours          0x0032   085   085   000    Old_age   Always       -       13383
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       44
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   052   000    Old_age   Always       -       34 (Min/Max 33/42)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       32
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       994
194 Temperature_Celsius     0x0022   034   048   000    Old_age   Always       -       34 (0 21 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       13272 (74 129 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       42258692396
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       1993717482280

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     13381         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

ubuntu@ubuntu:~$ sudo smartctl -a /dev/sdc
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.11.0-17-generic] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
WR509760
=== START OF INFORMATION SECTION ===
Device Model:     ST18000NM000J-2TV103
Serial Number:    WR509760
LU WWN Device Id: 5 000c50 0ed3b51f4
Firmware Version: SN02
User Capacity:    18.000.207.937.536 bytes [18,0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Jul  4 16:47:41 2025 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  559) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (1544) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   078   064   044    Pre-fail  Always       -       60904647
  3 Spin_Up_Time            0x0003   092   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       16
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   086   060   045    Pre-fail  Always       -       391631627
  9 Power_On_Hours          0x0032   085   085   000    Old_age   Always       -       13282
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       16
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   053   000    Old_age   Always       -       34 (Min/Max 33/42)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       7
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       875
194 Temperature_Celsius     0x0022   034   047   000    Old_age   Always       -       34 (0 21 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       13171 (83 225 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       41829968860
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       2943061722698

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     13280         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

Interrupted (Host Reset):

WR506AAA (ada5), new cable
=== START OF INFORMATION SECTION ===
Device Model:     ST18000NM000J-2TV103
Serial Number:    WR506AAA
LU WWN Device Id: 5 000c50 0ece30a22
Firmware Version: SN02
User Capacity:    18.000.207.937.536 bytes [18,0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Jul  4 16:37:12 2025 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (  41) The self-test routine was interrupted
                                        by the host with a hard or soft reset.
Total time to complete Offline
data collection:                (  559) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (1512) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   083   064   044    Pre-fail  Always       -       187679547
  3 Spin_Up_Time            0x0003   095   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       167
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   085   060   045    Pre-fail  Always       -       351957015
  9 Power_On_Hours          0x0032   085   085   000    Old_age   Always       -       13188
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       168
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   063   046   000    Old_age   Always       -       37 (Min/Max 36/42)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       158
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       772
194 Temperature_Celsius     0x0022   037   054   000    Old_age   Always       -       37 (0 21 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       13079 (8 176 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       41802673948
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       906994513754

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Interrupted (host reset)      00%     13164         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more
WR5097S0, old cable
=== START OF INFORMATION SECTION ===
Device Model:     ST18000NM000J-2TV103
Serial Number:    WR5097S0
LU WWN Device Id: 5 000c50 0ed3b246b
Firmware Version: SN02
User Capacity:    18.000.207.937.536 bytes [18,0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Jul  4 16:37:33 2025 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (  41) The self-test routine was interrupted
                                        by the host with a hard or soft reset.
Total time to complete Offline
data collection:                (  559) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (1523) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   081   064   044    Pre-fail  Always       -       126360095
  3 Spin_Up_Time            0x0003   093   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       21
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   086   060   045    Pre-fail  Always       -       382002617
  9 Power_On_Hours          0x0032   085   085   000    Old_age   Always       -       13262
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       21
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   063   048   000    Old_age   Always       -       37 (Min/Max 36/38)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       11
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       894
194 Temperature_Celsius     0x0022   037   052   000    Old_age   Always       -       37 (0 21 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       13129 (141 188 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       41823444756
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       1128145328197

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Interrupted (host reset)      00%     13237         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more
WR5088JC, new cable
=== START OF INFORMATION SECTION ===
Device Model:     ST18000NM000J-2TV103
Serial Number:    WR5088JC
LU WWN Device Id: 5 000c50 0ed398f18
Firmware Version: SN02
User Capacity:    18.000.207.937.536 bytes [18,0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Jul  4 16:37:53 2025 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (  41) The self-test routine was interrupted
                                        by the host with a hard or soft reset.
Total time to complete Offline
data collection:                (  559) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (1560) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   080   064   044    Pre-fail  Always       -       89997751
  3 Spin_Up_Time            0x0003   093   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       26
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   087   060   045    Pre-fail  Always       -       466730378
  9 Power_On_Hours          0x0032   085   085   000    Old_age   Always       -       13537
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       26
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   062   043   000    Old_age   Always       -       38 (Min/Max 37/39)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       13
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1137
194 Temperature_Celsius     0x0022   038   057   000    Old_age   Always       -       38 (0 20 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       13402 (196 34 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       107245523516
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       2146654675354

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Interrupted (host reset)      00%     13513         -
# 2  Extended offline    Interrupted (host reset)      00%       109         -
# 3  Short offline       Completed without error       00%       106         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

I’m gonna try the long SMART tests again to make sure they run. Where do I go from here?

Have you run the CPU and RAM tests or just SMART? Are you running SMART Long on all the drives again with no changes to hardware?

I just checked the cables again, so no changes to the hardware. Since I’m running Ubuntu Live, I didn’t wanna reboot to run the memtest since I need to setup SSH and stuff again.

Two of the SMART tests failed right away more or less, one is stil running (90% to go).

Shall I use stress and s-tui for the CPU stress test?

Edit: I used s-tui for 45 minutes to stress test. The Max temperature went up to 90°. When I used s-tui -c, the CPU temperature climbed up to 100° and the MB started beeping. Not sure why, the -c option should only save the results as a CSV file.

I’m running the memtest now.

1 Like

Alright, so something really messed up is going on. The CPU stress test went fine for 45 minutes at a max temperature of 90°. I guess that’s fine for an old G3220 with the stock fan.

When I changed the command to write the test results (s-tui -c), the temperature rose to 100°C and the MB started beeping. After some further investigation I noticed that the CPU cooler was loose, even though I haven’t touched the NAS.

I now ordered 6 brand new SATA cables and some thermal paste. I will do a proper hardware clean up to make sure everything is hooked up fine. I will then do the CPU, memtest and long SMART tests again.

I’ll leave everything off for now. I might need to upgrade my hardware at some point I guess:

  • Supermicro X10-SLM-F
  • Intel Pentium G3220
  • 32GB of Crucial ECC RAM (4x 8 GB)
  • Enermax Triathlon Eco 350 Watt PSU (ETL350ATW-M)
  • 6x Seagate Exos X18 18 TB
  • Kingston 120 GB SSD SA400S37 (via SATA to USB adapter)

Only the HDDs and the SSD are new, everything else is 10+ years old. The SATA to USB adapter has also been running fine for a few years. I changed it up just in case since I have a few spares.

Feel free to let me know your thoughts! I will report back when I have the new cables.

1 Like

I think you are good in going through the process.

1 Like

Status update: The CPU Fan was completely covered in dust, which explains why the temperatures were so high. I have completed a 45 Minute CPU stress test where the temperatures didn’t pass 55°. I’m running a memtest86+ now. The first pass was successfull without any errors. I’m leaving it on over night to run for a few more passes. Then, I’ll do a long SMART test on all disks with the new SATA cables.

CPU stress test with s-tui

RAM stress test with memtest86+

Any thoughts so far?

Edit:

RAM overview