New RAIDZ2 array failures move between disks. Please Help

mille535 · November 14, 2024, 9:52pm

The Problem

I just replaced all of my drives in my TrueNAS server going from 6x4TB drives to 8x14TB drives in RAIDZ2. The array created just fine, I started moving my backup data back to the newly created array and then went to bed. I wake up to multiple alerts about the array being degraded. Whats more concerning is two drives are mentioned between all the emails but only one show as faulted. Now I know that you can get a bad drive now and then but I guess I am looking for a little assistance as I am not a ZFS expert before I trash the array and test the two drives in question.

From my email alerts here are a some of the errors I am receiving:

Device: /dev/sde [SAT], not capable of SMART self-check.
Device: /dev/sde [SAT], Read SMART Self-Test Log Failed.
Device: /dev/sde [SAT], Read SMART Error Log Failed.
Device: /dev/sde [SAT], failed to read SMART Attribute Data.
Device: /dev/sdg [SAT], failed to read SMART Attribute Data.
Device: /dev/sdg [SAT], Read SMART Self-Test Log Failed.

And then there is:

Pool spinners state is ONLINE: One or more devices are faulted in response to > persistent errors. Sufficient replicas exist for the pool to continue functioning in > a degraded state.
The following devices are not healthy:

Disk WDC_WUH721414ALE6L4 9MGBVN0T is FAULTED

Disk WDC_WUH721414ALE6L4 9MGBVN0T maps to /dev/sdg

What have I tried

It’s interesting that two drives are called out in all the messages above but only /dev/sdg is marked as failed. Since most of the errors appear to be smart related I run smartctl -x against both of the drives and to my surprise they come back with information and from what I see everything looks fine. I’ll place the output of those command below.

I decide that maybe it’s just a fluke and attempt clear the error by running zpool clear spinners and then run a couple zpool status in rapic sucession. I noticed that for a moment the array started resilvering but then the /dev/sde drive temporarily goes faulted. Now I wasn’t taking notes but I think /dev/sde went back to online and then it came back to /dev/sdg in a failed state. I really wish I had taken better notes during this step and I suppose I could try it again as nothing currently saved on the array is important at this point.

I’ve also re-seated the drives and checked all the cable connections but I am not exactly sure where to go from here.

zpool status output

truenas_admin@nas[~]$ sudo zpool status
  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:13 with 0 errors on Sat Nov  9 03:45:15 2024
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdi3    ONLINE       0     0     0
            sdj3    ONLINE       0     0     0

errors: No known data errors

  pool: spinners
 state: ONLINE
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
  scan: resilvered 15.4M in 00:00:01 with 0 errors on Thu Nov 14 14:33:17 2024
config:

        NAME                                      STATE     READ WRITE CKSUM
        spinners                                  ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            34e72a4d-7818-4b0f-b451-fe08f3a0b889  ONLINE       0     0     0
            45c26a5f-7929-462a-ab29-a5db62aa2ac1  ONLINE       0     0     0
            5974936a-65f8-4c4e-862b-565a05caf1b7  ONLINE       0     0     0
            6c125305-3b0d-4f4c-b9d7-fece449bb321  ONLINE       0     0     0
            ae7d6fbf-d8d7-4cf0-b57c-de6548a21547  FAULTED      7   3266     0  too many errors
            7793656b-2429-4c67-b2a3-6bdca892f809  ONLINE       0     0     0
            b46f5b6f-6f82-415d-a101-f88dddd4109f  ONLINE       0     0     0
            ba79d8ec-8bec-446b-8909-3e9230e75725  ONLINE       0     0     0

sde SMART info

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Ultrastar DC HC530
Device Model:     WDC  WUH721414ALE6L4
Serial Number:    9LGETRAG
LU WWN Device Id: 5 000cca 28fc6459f
Firmware Version: LDGNW400
User Capacity:    14,000,519,643,136 bytes [14.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Nov 14 15:37:51 2024 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  101) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (1457) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   001    -    0
  2 Throughput_Performance  P-S---   100   100   054    -    0
  3 Spin_Up_Time            POS---   096   096   001    -    163
  4 Start_Stop_Count        -O--C-   100   100   000    -    2
  5 Reallocated_Sector_Ct   PO--CK   100   100   001    -    0
  7 Seek_Error_Rate         PO-R--   100   100   001    -    0
  8 Seek_Time_Performance   P-S---   100   100   020    -    0
  9 Power_On_Hours          -O--C-   100   100   000    -    16
 10 Spin_Retry_Count        PO--C-   100   100   001    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    2
 22 Helium_Level            PO---K   100   100   025    -    100
192 Power-Off_Retract_Count -O--CK   100   100   000    -    3
193 Load_Cycle_Count        -O--C-   100   100   000    -    3
194 Temperature_Celsius     -O----   052   052   000    -    41 (Min/Max 19/44)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   100   100   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      1  Comprehensive SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL     R/O    256  Device Statistics log
0x04       SL      R/O    255  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x0c       GPL     R/O   5501  Pending Defects log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x12       GPL     R/O      1  SATA NCQ Non-Data log
0x13       GPL     R/O      1  SATA NCQ Send and Receive log
0x15       GPL     R/W      1  Rebuild Assist log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x24       GPL     R/O    256  Current Device Internal Status Data log
0x25       GPL     R/O    256  Saved Device Internal Status Data log
0x2f       GPL     R/O      1  Set Sector Configuration
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       256 (0x0100)
Device State:                        Active (0)
Current Temperature:                    41 Celsius
Power Cycle Min/Max Temperature:     36/41 Celsius
Lifetime    Min/Max Temperature:     19/44 Celsius
Under/Over Temperature Limit Count:   0/0
SMART Status:                        0xc24f (PASSED)
Minimum supported ERC Time Limit:    65 (6.5 seconds)

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -40/70 Celsius
Temperature History Size (Index):    128 (50)

Index    Estimated Time   Temperature Celsius
  51    2024-11-14 13:30    43  ************************
 ...    ..( 57 skipped).    ..  ************************
 109    2024-11-14 14:28    43  ************************
 110    2024-11-14 14:29    36  *****************
 ...    ..(  3 skipped).    ..  *****************
 114    2024-11-14 14:33    36  *****************
 115    2024-11-14 14:34    37  ******************
 ...    ..(  8 skipped).    ..  ******************
 124    2024-11-14 14:43    37  ******************
 125    2024-11-14 14:44    38  *******************
 126    2024-11-14 14:45    37  ******************
 127    2024-11-14 14:46    38  *******************
 ...    ..(  9 skipped).    ..  *******************
   9    2024-11-14 14:56    38  *******************
  10    2024-11-14 14:57    39  ********************
 ...    ..( 13 skipped).    ..  ********************
  24    2024-11-14 15:11    39  ********************
  25    2024-11-14 15:12    40  *********************
  26    2024-11-14 15:13    40  *********************
  27    2024-11-14 15:14    39  ********************
  28    2024-11-14 15:15    40  *********************
 ...    ..( 17 skipped).    ..  *********************
  46    2024-11-14 15:33    40  *********************
  47    2024-11-14 15:34    41  **********************
  48    2024-11-14 15:35    41  **********************
  49    2024-11-14 15:36    41  **********************
  50    2024-11-14 15:37    43  ************************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4               2  ---  Lifetime Power-On Resets
0x01  0x010  4              16  ---  Power-on Hours
0x01  0x018  6       293133704  ---  Logical Sectors Written
0x01  0x020  6          679535  ---  Number of Write Commands
0x01  0x028  6        77139572  ---  Logical Sectors Read
0x01  0x030  6         1597899  ---  Number of Read Commands
0x01  0x038  6        57665050  ---  Date and Time TimeStamp
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4              15  ---  Spindle Motor Power-on Hours
0x03  0x010  4              15  ---  Head Flying Hours
0x03  0x018  4               3  ---  Head Load Events
0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
0x03  0x028  4               0  ---  Read Recovery Attempts
0x03  0x030  4               1  ---  Number of Mechanical Start Failures
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               0  ---  Resets Between Cmd Acceptance and Completion
0x04  0x018  4               0  ---  Physical Element Status Changed
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              41  ---  Current Temperature
0x05  0x010  1              42  N--  Average Short Term Temperature
0x05  0x018  1               -  N--  Average Long Term Temperature
0x05  0x020  1              44  ---  Highest Temperature
0x05  0x028  1              19  ---  Lowest Temperature
0x05  0x030  1              43  N--  Highest Average Short Term Temperature
0x05  0x038  1              23  N--  Lowest Average Short Term Temperature
0x05  0x040  1               -  N--  Highest Average Long Term Temperature
0x05  0x048  1               -  N--  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4               4  ---  Number of Hardware Resets
0x06  0x010  4               2  ---  Number of ASR Events
0x06  0x018  4               0  ---  Number of Interface CRC Errors
0xff  =====  =               =  ===  == Vendor Specific Statistics (rev 1) ==
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c)
No Defects Logged

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            1  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            2  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS

sdg SMART info

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Ultrastar DC HC530
Device Model:     WDC  WUH721414ALE6L4
Serial Number:    9MGBVN0T
LU WWN Device Id: 5 000cca 290c5641b
Firmware Version: LDGNW400
User Capacity:    14,000,519,643,136 bytes [14.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Nov 14 15:39:04 2024 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  101) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (1400) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   001    -    0
  2 Throughput_Performance  P-S---   100   100   054    -    0
  3 Spin_Up_Time            POS---   096   096   001    -    159
  4 Start_Stop_Count        -O--C-   100   100   000    -    8
  5 Reallocated_Sector_Ct   PO--CK   100   100   001    -    0
  7 Seek_Error_Rate         PO-R--   100   100   001    -    0
  8 Seek_Time_Performance   P-S---   100   100   020    -    0
  9 Power_On_Hours          -O--C-   100   100   000    -    16
 10 Spin_Retry_Count        PO--C-   100   100   001    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    8
 22 Helium_Level            PO---K   100   100   025    -    100
192 Power-Off_Retract_Count -O--CK   100   100   000    -    35
193 Load_Cycle_Count        -O--C-   100   100   000    -    35
194 Temperature_Celsius     -O----   042   042   000    -    50 (Min/Max 20/56)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   100   100   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      1  Comprehensive SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL     R/O    256  Device Statistics log
0x04       SL      R/O    255  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x0c       GPL     R/O   5501  Pending Defects log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x12       GPL     R/O      1  SATA NCQ Non-Data log
0x13       GPL     R/O      1  SATA NCQ Send and Receive log
0x15       GPL     R/W      1  Rebuild Assist log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x24       GPL     R/O    256  Current Device Internal Status Data log
0x25       GPL     R/O    256  Saved Device Internal Status Data log
0x2f       GPL     R/O      1  Set Sector Configuration
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         9         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       256 (0x0100)
Device State:                        Active (0)
Current Temperature:                    50 Celsius
Power Cycle Min/Max Temperature:     40/50 Celsius
Lifetime    Min/Max Temperature:     20/56 Celsius
Under/Over Temperature Limit Count:   0/0
SMART Status:                        0xc24f (PASSED)
Minimum supported ERC Time Limit:    65 (6.5 seconds)

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -40/70 Celsius
Temperature History Size (Index):    128 (51)

Index    Estimated Time   Temperature Celsius
  52    2024-11-14 13:32    53  **********************************
 ...    ..(  7 skipped).    ..  **********************************
  60    2024-11-14 13:40    53  **********************************
  61    2024-11-14 13:41    52  *********************************
 ...    ..(  9 skipped).    ..  *********************************
  71    2024-11-14 13:51    52  *********************************
  72    2024-11-14 13:52    51  ********************************
 ...    ..(  6 skipped).    ..  ********************************
  79    2024-11-14 13:59    51  ********************************
  80    2024-11-14 14:00    52  *********************************
 ...    ..(  7 skipped).    ..  *********************************
  88    2024-11-14 14:08    52  *********************************
  89    2024-11-14 14:09    53  **********************************
  90    2024-11-14 14:10    52  *********************************
 ...    ..(  5 skipped).    ..  *********************************
  96    2024-11-14 14:16    52  *********************************
  97    2024-11-14 14:17    51  ********************************
 ...    ..( 10 skipped).    ..  ********************************
 108    2024-11-14 14:28    51  ********************************
 109    2024-11-14 14:29    40  *********************
 ...    ..(  2 skipped).    ..  *********************
 112    2024-11-14 14:32    40  *********************
 113    2024-11-14 14:33    41  **********************
 ...    ..(  2 skipped).    ..  **********************
 116    2024-11-14 14:36    41  **********************
 117    2024-11-14 14:37    42  ***********************
 ...    ..(  2 skipped).    ..  ***********************
 120    2024-11-14 14:40    42  ***********************
 121    2024-11-14 14:41    43  ************************
 ...    ..( 13 skipped).    ..  ************************
   7    2024-11-14 14:55    43  ************************
   8    2024-11-14 14:56    44  *************************
 ...    ..(  6 skipped).    ..  *************************
  15    2024-11-14 15:03    44  *************************
  16    2024-11-14 15:04    45  **************************
 ...    ..(  2 skipped).    ..  **************************
  19    2024-11-14 15:07    45  **************************
  20    2024-11-14 15:08    46  ***************************
 ...    ..(  3 skipped).    ..  ***************************
  24    2024-11-14 15:12    46  ***************************
  25    2024-11-14 15:13    47  ****************************
 ...    ..(  4 skipped).    ..  ****************************
  30    2024-11-14 15:18    47  ****************************
  31    2024-11-14 15:19    48  *****************************
 ...    ..(  2 skipped).    ..  *****************************
  34    2024-11-14 15:22    48  *****************************
  35    2024-11-14 15:23    47  ****************************
  36    2024-11-14 15:24    47  ****************************
  37    2024-11-14 15:25    48  *****************************
 ...    ..(  3 skipped).    ..  *****************************
  41    2024-11-14 15:29    48  *****************************
  42    2024-11-14 15:30    49  ******************************
 ...    ..(  6 skipped).    ..  ******************************
  49    2024-11-14 15:37    49  ******************************
  50    2024-11-14 15:38    50  *******************************
  51    2024-11-14 15:39    52  *********************************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4               8  ---  Lifetime Power-On Resets
0x01  0x010  4              16  ---  Power-on Hours
0x01  0x018  6       283443368  ---  Logical Sectors Written
0x01  0x020  6          470497  ---  Number of Write Commands
0x01  0x028  6          110628  ---  Logical Sectors Read
0x01  0x030  6            1960  ---  Number of Read Commands
0x01  0x038  6        57731900  ---  Date and Time TimeStamp
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4               8  ---  Spindle Motor Power-on Hours
0x03  0x010  4               8  ---  Head Flying Hours
0x03  0x018  4              35  ---  Head Load Events
0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
0x03  0x028  4               0  ---  Read Recovery Attempts
0x03  0x030  4               1  ---  Number of Mechanical Start Failures
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               0  ---  Resets Between Cmd Acceptance and Completion
0x04  0x018  4               0  ---  Physical Element Status Changed
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              50  ---  Current Temperature
0x05  0x010  1              51  N--  Average Short Term Temperature
0x05  0x018  1               -  N--  Average Long Term Temperature
0x05  0x020  1              56  ---  Highest Temperature
0x05  0x028  1              20  ---  Lowest Temperature
0x05  0x030  1              52  N--  Highest Average Short Term Temperature
0x05  0x038  1              23  N--  Lowest Average Short Term Temperature
0x05  0x040  1               -  N--  Highest Average Long Term Temperature
0x05  0x048  1               -  N--  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4             150  ---  Number of Hardware Resets
0x06  0x010  4               1  ---  Number of ASR Events
0x06  0x018  4               0  ---  Number of Interface CRC Errors
0xff  =====  =               =  ===  == Vendor Specific Statistics (rev 1) ==
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c)
No Defects Logged

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2           10  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2           11  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS

Recent ZFS Events

truenas_admin@nas[~]$ sudo zpool events -v
TIME                           CLASS
Nov 14 2024 14:31:38.341002753 sysevent.fs.zfs.history_event
        version = 0x0
        class = "sysevent.fs.zfs.history_event"
        pool = "boot-pool"
        pool_guid = 0x2cbeedcd8520118b
        pool_state = 0x0
        pool_context = 0x0
        history_hostname = "(none)"
        history_internal_str = "pool version 5000; software version zfs-2.1.99-2769-g3b73b0eb6; uts (none) 6.6.44-production+truenas #1 SMP PREEMPT_DYNAMIC Fri Nov  8 18:37:36 UTC 2024 x86_64"
        history_internal_name = "open"
        history_txg = 0xa323f
        history_time = 0x67365e2a
        time = 0x67365e2a 0x14534a01
        eid = 0x1

Nov 14 2024 14:31:38.357002753 sysevent.fs.zfs.config_sync
        version = 0x0
        class = "sysevent.fs.zfs.config_sync"
        pool = "boot-pool"
        pool_guid = 0x2cbeedcd8520118b
        pool_state = 0x0
        pool_context = 0x0
        time = 0x67365e2a 0x15476e01
        eid = 0x2

Nov 14 2024 14:31:38.357002753 sysevent.fs.zfs.pool_import
        version = 0x0
        class = "sysevent.fs.zfs.pool_import"
        pool = "boot-pool"
        pool_guid = 0x2cbeedcd8520118b
        pool_state = 0x0
        pool_context = 0x0
        time = 0x67365e2a 0x15476e01
        eid = 0x3

Nov 14 2024 14:31:38.357002753 sysevent.fs.zfs.history_event
        version = 0x0
        class = "sysevent.fs.zfs.history_event"
        pool = "boot-pool"
        pool_guid = 0x2cbeedcd8520118b
        pool_state = 0x0
        pool_context = 0x0
        history_hostname = "(none)"
        history_internal_str = "pool version 5000; software version zfs-2.1.99-2769-g3b73b0eb6; uts (none) 6.6.44-production+truenas #1 SMP PREEMPT_DYNAMIC Fri Nov  8 18:37:36 UTC 2024 x86_64"
        history_internal_name = "import"
        history_txg = 0xa3241
        history_time = 0x67365e2a
        time = 0x67365e2a 0x15476e01
        eid = 0x4

Nov 14 2024 14:31:38.377002754 sysevent.fs.zfs.config_sync
        version = 0x0
        class = "sysevent.fs.zfs.config_sync"
        pool = "boot-pool"
        pool_guid = 0x2cbeedcd8520118b
        pool_state = 0x0
        pool_context = 0x0
        time = 0x67365e2a 0x16789b02
        eid = 0x5

Nov 14 2024 14:32:04.077003110 ereport.fs.zfs.checksum
        class = "ereport.fs.zfs.checksum"
        ena = 0xe68d4933401c01
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xa112750aa9ef7a6f
                vdev = 0xbbb37a2b8f7b6b0d
        (end detector)
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x2
        pool_failmode = "wait"
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        vdev_ashift = 0x9
        vdev_complete_ts = 0xe68cf2005
        vdev_delta_ts = 0x3dfa9
        vdev_read_errors = 0x0
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x1
        vdev_delays = 0x0
        parent_guid = 0xe8c42350361a95d9
        parent_type = "raidz"
        vdev_spare_paths =
        vdev_spare_guids =
        zio_err = 0x0
        zio_flags = 0x100080 [CANFAIL DONT_PROPAGATE]
        zio_stage = 0x400000 [VDEV_IO_DONE]
        zio_pipeline = 0x3e00000 [VDEV_IO_START VDEV_IO_DONE VDEV_IO_ASSESS CHECKSUM_VERIFY DONE]
        zio_delay = 0x0
        zio_timestamp = 0x0
        zio_delta = 0x0
        zio_priority = 0x0 [SYNC_READ]
        zio_offset = 0x12b407e000
        zio_size = 0x1000
        zio_objset = 0x0
        zio_object = 0x0
        zio_level = 0x1
        zio_blkid = 0x0
        bad_ranges = 0x0 0x910
        bad_ranges_min_gap = 0x8
        bad_range_sets = 0x0
        bad_range_clears = 0x2112
        time = 0x67365e44 0x496f966
        eid = 0x6

Nov 14 2024 14:32:04.097003110 ereport.fs.zfs.checksum
        class = "ereport.fs.zfs.checksum"
        ena = 0xe68d4933401c01
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xa112750aa9ef7a6f
                vdev = 0xbbb37a2b8f7b6b0d
        (end detector)
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x2
        pool_failmode = "wait"
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        vdev_ashift = 0x9
        vdev_complete_ts = 0xe6a0c22e4
        vdev_delta_ts = 0x34aa4
        vdev_read_errors = 0x0
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x1
        vdev_delays = 0x0
        parent_guid = 0xe8c42350361a95d9
        parent_type = "raidz"
        vdev_spare_paths =
        vdev_spare_guids =
        zio_err = 0x0
        zio_flags = 0x100080 [CANFAIL DONT_PROPAGATE]
        zio_stage = 0x400000 [VDEV_IO_DONE]
        zio_pipeline = 0x3e00000 [VDEV_IO_START VDEV_IO_DONE VDEV_IO_ASSESS CHECKSUM_VERIFY DONE]
        zio_delay = 0x0
        zio_timestamp = 0x0
        zio_delta = 0x0
        zio_priority = 0x0 [SYNC_READ]
        zio_offset = 0x12b4082000
        zio_size = 0x1000
        zio_objset = 0x0
        zio_object = 0x0
        zio_level = 0x0
        zio_blkid = 0x2
        bad_ranges = 0x0 0xa88
        bad_ranges_min_gap = 0x8
        bad_range_sets = 0x0
        bad_range_clears = 0x19c6
        time = 0x67365e44 0x5c82666
        eid = 0x7

Nov 14 2024 14:32:05.425003128 sysevent.fs.zfs.history_event
        version = 0x0
        class = "sysevent.fs.zfs.history_event"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        history_hostname = "nas"
        history_internal_str = "pool version 5000; software version zfs-2.1.99-2769-g3b73b0eb6; uts nas 6.6.44-production+truenas #1 SMP PREEMPT_DYNAMIC Fri Nov  8 18:37:36 UTC 2024 x86_64"
        history_internal_name = "open"
        history_txg = 0x28c0
        history_time = 0x67365e45
        time = 0x67365e45 0x19550878
        eid = 0x8

Nov 14 2024 14:32:05.661003132 sysevent.fs.zfs.config_sync
        version = 0x0
        class = "sysevent.fs.zfs.config_sync"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        time = 0x67365e45 0x27661b7c
        eid = 0x9

Nov 14 2024 14:32:05.661003132 sysevent.fs.zfs.pool_import
        version = 0x0
        class = "sysevent.fs.zfs.pool_import"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        time = 0x67365e45 0x27661b7c
        eid = 0xa

Nov 14 2024 14:32:05.661003132 sysevent.fs.zfs.history_event
        version = 0x0
        class = "sysevent.fs.zfs.history_event"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        history_hostname = "nas"
        history_internal_str = "pool version 5000; software version zfs-2.1.99-2769-g3b73b0eb6; uts nas 6.6.44-production+truenas #1 SMP PREEMPT_DYNAMIC Fri Nov  8 18:37:36 UTC 2024 x86_64"
        history_internal_name = "import"
        history_txg = 0x28c2
        history_time = 0x67365e45
        time = 0x67365e45 0x27661b7c
        eid = 0xb

Nov 14 2024 14:32:05.793003134 sysevent.fs.zfs.config_sync
        version = 0x0
        class = "sysevent.fs.zfs.config_sync"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        time = 0x67365e45 0x2f44447e
        eid = 0xc

Nov 14 2024 14:32:06.237003140 sysevent.fs.zfs.history_event
        version = 0x0
        class = "sysevent.fs.zfs.history_event"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        history_hostname = "nas"
        history_dsname = "spinners"
        history_internal_str = "aclinherit=0"
        history_internal_name = "set"
        history_dsid = 0x36
        history_txg = 0x28c4
        history_time = 0x67365e46
        time = 0x67365e46 0xe206184
        eid = 0xd

Nov 14 2024 14:32:06.237003140 sysevent.fs.zfs.resilver_start
        version = 0x0
        class = "sysevent.fs.zfs.resilver_start"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        resilver_type = "healing"
        time = 0x67365e46 0xe206184
        eid = 0xe

Nov 14 2024 14:32:06.237003140 sysevent.fs.zfs.history_event
        version = 0x0
        class = "sysevent.fs.zfs.history_event"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        history_hostname = "nas"
        history_internal_str = "func=2 mintxg=6819 maxtxg=10431"
        history_internal_name = "scan setup"
        history_txg = 0x28c4
        history_time = 0x67365e46
        time = 0x67365e46 0xe206184
        eid = 0xf

Nov 14 2024 14:32:07.177003153 sysevent.fs.zfs.history_event
        version = 0x0
        class = "sysevent.fs.zfs.history_event"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        history_hostname = "nas"
        history_internal_str = "errors=0"
        history_internal_name = "scan done"
        history_txg = 0x28c6
        history_time = 0x67365e47
        time = 0x67365e47 0xa8cda91
        eid = 0x10

Nov 14 2024 14:32:07.177003153 sysevent.fs.zfs.resilver_finish
        version = 0x0
        class = "sysevent.fs.zfs.resilver_finish"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        resilver_type = "healing"
        time = 0x67365e47 0xa8cda91
        eid = 0x11

Nov 14 2024 14:32:51.867718631 ereport.fs.zfs.io
        class = "ereport.fs.zfs.io"
        ena = 0x19896a988203801
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xa112750aa9ef7a6f
                vdev = 0xbbb37a2b8f7b6b0d
        (end detector)
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        pool_failmode = "continue"
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        vdev_ashift = 0x9
        vdev_complete_ts = 0x19896a7ac4
        vdev_delta_ts = 0x1eb4a
        vdev_read_errors = 0x0
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x1
        vdev_delays = 0x0
        parent_guid = 0xe8c42350361a95d9
        parent_type = "raidz"
        vdev_spare_paths =
        vdev_spare_guids =
        zio_err = 0x5
        zio_flags = 0xb00c1 [DONT_AGGREGATE PHYSICAL CANFAIL PROBE TRYHARD DONT_QUEUE]
        zio_stage = 0x2000000 [DONE]
        zio_pipeline = 0x2100000 [READY DONE]
        zio_delay = 0x1e297
        zio_timestamp = 0x1989688f7a
        zio_delta = 0x1eb4a
        zio_priority = 0x0 [SYNC_READ]
        zio_offset = 0x42000
        zio_size = 0x2000
        time = 0x67365e73 0x33b855e7
        eid = 0x12

Nov 14 2024 14:32:51.871718582 ereport.fs.zfs.io
        class = "ereport.fs.zfs.io"
        ena = 0x19896cc93603801
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xa112750aa9ef7a6f
                vdev = 0xbbb37a2b8f7b6b0d
        (end detector)
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        pool_failmode = "continue"
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        vdev_ashift = 0x9
        vdev_complete_ts = 0x19896cbaf3
        vdev_delta_ts = 0x241ed
        vdev_read_errors = 0x1
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x1
        vdev_delays = 0x0
        parent_guid = 0xe8c42350361a95d9
        parent_type = "raidz"
        vdev_spare_paths =
        vdev_spare_guids =
        zio_err = 0x5
        zio_flags = 0xb00c1 [DONT_AGGREGATE PHYSICAL CANFAIL PROBE TRYHARD DONT_QUEUE]
        zio_stage = 0x2000000 [DONE]
        zio_pipeline = 0x2100000 [READY DONE]
        zio_delay = 0x23c6c
        zio_timestamp = 0x19896a7906
        zio_delta = 0x241ed
        zio_priority = 0x0 [SYNC_READ]
        zio_offset = 0xcbbbfd82000
        zio_size = 0x2000
        time = 0x67365e73 0x33f55eb6
        eid = 0x13

Nov 14 2024 14:32:51.871718582 ereport.fs.zfs.io
        class = "ereport.fs.zfs.io"
        ena = 0x19896e515303801
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xa112750aa9ef7a6f
                vdev = 0xbbb37a2b8f7b6b0d
        (end detector)
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        pool_failmode = "continue"
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        vdev_ashift = 0x9
        vdev_complete_ts = 0x19896e4a87
        vdev_delta_ts = 0x17ef7
        vdev_read_errors = 0x2
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x1
        vdev_delays = 0x0
        parent_guid = 0xe8c42350361a95d9
        parent_type = "raidz"
        vdev_spare_paths =
        vdev_spare_guids =
        zio_err = 0x5
        zio_flags = 0xb00c1 [DONT_AGGREGATE PHYSICAL CANFAIL PROBE TRYHARD DONT_QUEUE]
        zio_stage = 0x2000000 [DONE]
        zio_pipeline = 0x2100000 [READY DONE]
        zio_delay = 0x17ac8
        zio_timestamp = 0x19896ccb90
        zio_delta = 0x17ef7
        zio_priority = 0x0 [SYNC_READ]
        zio_offset = 0xcbbbfdc2000
        zio_size = 0x2000
        time = 0x67365e73 0x33f55eb6
        eid = 0x14

Nov 14 2024 14:32:51.871718582 ereport.fs.zfs.probe_failure
        class = "ereport.fs.zfs.probe_failure"
        ena = 0x19896efc5103801
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xa112750aa9ef7a6f
                vdev = 0xbbb37a2b8f7b6b0d
        (end detector)
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        pool_failmode = "continue"
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        vdev_ashift = 0x9
        vdev_complete_ts = 0x19896e4a87
        vdev_delta_ts = 0x17ef7
        vdev_read_errors = 0x3
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x1
        vdev_delays = 0x0
        parent_guid = 0xe8c42350361a95d9
        parent_type = "raidz"
        vdev_spare_paths =
        vdev_spare_guids =
        prev_state = 0x0
        time = 0x67365e73 0x33f55eb6
        eid = 0x15

Nov 14 2024 14:32:53.716756097 resource.fs.zfs.statechange
        version = 0x0
        class = "resource.fs.zfs.statechange"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_state = "REMOVED" (0x3)
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        vdev_laststate = "ONLINE" (0x7)
        time = 0x67365e75 0x2ab8d481
        eid = 0x16

Nov 14 2024 14:32:53.716756097 resource.fs.zfs.removed
        version = 0x0
        class = "resource.fs.zfs.removed"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_state = "REMOVED" (0x3)
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        time = 0x67365e75 0x2ab8d481
        eid = 0x17

Nov 14 2024 14:32:53.716756097 resource.fs.zfs.statechange
        version = 0x0
        class = "resource.fs.zfs.statechange"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_state = "FAULTED" (0x5)
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        vdev_laststate = "REMOVED" (0x3)
        time = 0x67365e75 0x2ab8d481
        eid = 0x18

Nov 14 2024 14:32:54.060274085 sysevent.fs.zfs.config_sync
        version = 0x0
        class = "sysevent.fs.zfs.config_sync"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        time = 0x67365e76 0x397b5a5
        eid = 0x19

Nov 14 2024 14:32:54.085436563 ereport.fs.zfs.probe_failure
        class = "ereport.fs.zfs.probe_failure"
        ena = 0x1a05f906a602c01
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xa112750aa9ef7a6f
                vdev = 0xbbb37a2b8f7b6b0d
        (end detector)
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        pool_failmode = "continue"
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        vdev_ashift = 0x9
        vdev_complete_ts = 0x1a05f88596
        vdev_delta_ts = 0xad8e
        vdev_read_errors = 0x0
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x0
        vdev_delays = 0x0
        parent_guid = 0xe8c42350361a95d9
        parent_type = "raidz"
        vdev_spare_paths =
        vdev_spare_guids =
        prev_state = 0x0
        time = 0x67365e76 0x517a893
        eid = 0x1a

Nov 14 2024 14:32:54.278348898 resource.fs.zfs.statechange
        version = 0x0
        class = "resource.fs.zfs.statechange"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_state = "REMOVED" (0x3)
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        vdev_laststate = "FAULTED" (0x5)
        time = 0x67365e76 0x10974462
        eid = 0x1b

Nov 14 2024 14:32:54.278348898 resource.fs.zfs.removed
        version = 0x0
        class = "resource.fs.zfs.removed"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_state = "REMOVED" (0x3)
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        time = 0x67365e76 0x10974462
        eid = 0x1c

Nov 14 2024 14:32:54.643204836 sysevent.fs.zfs.config_sync
        version = 0x0
        class = "sysevent.fs.zfs.config_sync"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        time = 0x67365e76 0x265686e4
        eid = 0x1d

Nov 14 2024 14:33:11.082156951 resource.fs.zfs.statechange
        version = 0x0
        class = "resource.fs.zfs.statechange"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_state = "ONLINE" (0x7)
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        vdev_laststate = "REMOVED" (0x3)
        time = 0x67365e87 0x4e59d97
        eid = 0x1e

Nov 14 2024 14:33:11.102156702 sysevent.fs.zfs.vdev_online
        version = 0x0
        class = "sysevent.fs.zfs.vdev_online"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_state = "ONLINE" (0x7)
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        time = 0x67365e87 0x616c99e
        eid = 0x1f

Nov 14 2024 14:33:11.710149114 sysevent.fs.zfs.config_sync
        version = 0x0
        class = "sysevent.fs.zfs.config_sync"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        time = 0x67365e87 0x2a5403fa
        eid = 0x20

Nov 14 2024 14:33:11.710149114 sysevent.fs.zfs.config_sync
        version = 0x0
        class = "sysevent.fs.zfs.config_sync"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        time = 0x67365e87 0x2a5403fa
        eid = 0x21

Nov 14 2024 14:33:16.882084572 sysevent.fs.zfs.resilver_start
        version = 0x0
        class = "sysevent.fs.zfs.resilver_start"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        resilver_type = "healing"
        time = 0x67365e8c 0x34938adc
        eid = 0x22

Nov 14 2024 14:33:16.882084572 sysevent.fs.zfs.history_event
        version = 0x0
        class = "sysevent.fs.zfs.history_event"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        history_hostname = "nas"
        history_internal_str = "func=2 mintxg=10446 maxtxg=10458"
        history_internal_name = "scan setup"
        history_txg = 0x28de
        history_time = 0x67365e8c
        time = 0x67365e8c 0x34938adc
        eid = 0x23

Nov 14 2024 14:33:17.638075138 sysevent.fs.zfs.history_event
        version = 0x0
        class = "sysevent.fs.zfs.history_event"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        history_hostname = "nas"
        history_internal_str = "errors=0"
        history_internal_name = "scan done"
        history_txg = 0x28e0
        history_time = 0x67365e8d
        time = 0x67365e8d 0x26084102
        eid = 0x24

Nov 14 2024 14:33:17.638075138 sysevent.fs.zfs.resilver_finish
        version = 0x0
        class = "sysevent.fs.zfs.resilver_finish"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        resilver_type = "healing"
        time = 0x67365e8d 0x26084102
        eid = 0x25

Nov 14 2024 14:33:25.537976552 ereport.fs.zfs.io
        class = "ereport.fs.zfs.io"
        ena = 0x21565d0edf02c01
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xa112750aa9ef7a6f
                vdev = 0xbbb37a2b8f7b6b0d
        (end detector)
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        pool_failmode = "continue"
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        vdev_ashift = 0x9
        vdev_complete_ts = 0x21565cd339
        vdev_delta_ts = 0xef2d2b0
        vdev_read_errors = 0x0
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x0
        vdev_delays = 0x0
        parent_guid = 0xe8c42350361a95d9
        parent_type = "raidz"
        vdev_spare_paths =
        vdev_spare_guids =
        zio_err = 0x5
        zio_flags = 0xb00c1 [DONT_AGGREGATE PHYSICAL CANFAIL PROBE TRYHARD DONT_QUEUE]
        zio_stage = 0x2000000 [DONE]
        zio_pipeline = 0x2100000 [READY DONE]
        zio_delay = 0xef2c874
        zio_timestamp = 0x21476a0089
        zio_delta = 0xef2d2b0
        zio_priority = 0x0 [SYNC_READ]
        zio_offset = 0x42000
        zio_size = 0x2000
        time = 0x67365e95 0x2010dee8
        eid = 0x26

Nov 14 2024 14:33:25.553976353 ereport.fs.zfs.io
        class = "ereport.fs.zfs.io"
        ena = 0x21574afbe002c01
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xa112750aa9ef7a6f
                vdev = 0xbbb37a2b8f7b6b0d
        (end detector)
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        pool_failmode = "continue"
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        vdev_ashift = 0x9
        vdev_complete_ts = 0x21574ad64d
        vdev_delta_ts = 0xfe0340c
        vdev_read_errors = 0x1
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x0
        vdev_delays = 0x0
        parent_guid = 0xe8c42350361a95d9
        parent_type = "raidz"
        vdev_spare_paths =
        vdev_spare_guids =
        zio_err = 0x5
        zio_flags = 0xb00c1 [DONT_AGGREGATE PHYSICAL CANFAIL PROBE TRYHARD DONT_QUEUE]
        zio_stage = 0x2000000 [DONE]
        zio_pipeline = 0x2100000 [READY DONE]
        zio_delay = 0xfe02fc4
        zio_timestamp = 0x21476aa241
        zio_delta = 0xfe0340c
        zio_priority = 0x0 [SYNC_READ]
        zio_offset = 0xcbbbfd82000
        zio_size = 0x2000
        time = 0x67365e95 0x21050221
        eid = 0x27

Nov 14 2024 14:33:25.553976353 ereport.fs.zfs.io
        class = "ereport.fs.zfs.io"
        ena = 0x21574bcf1b02c01
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xa112750aa9ef7a6f
                vdev = 0xbbb37a2b8f7b6b0d
        (end detector)
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        pool_failmode = "continue"
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        vdev_ashift = 0x9
        vdev_complete_ts = 0x21574bc625
        vdev_delta_ts = 0xfe0f48c
        vdev_read_errors = 0x2
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x0
        vdev_delays = 0x0
        parent_guid = 0xe8c42350361a95d9
        parent_type = "raidz"
        vdev_spare_paths =
        vdev_spare_guids =
        zio_err = 0x5
        zio_flags = 0xb00c1 [DONT_AGGREGATE PHYSICAL CANFAIL PROBE TRYHARD DONT_QUEUE]
        zio_stage = 0x2000000 [DONE]
        zio_pipeline = 0x2100000 [READY DONE]
        zio_delay = 0xfe0f203
        zio_timestamp = 0x21476ad199
        zio_delta = 0xfe0f48c
        zio_priority = 0x0 [SYNC_READ]
        zio_offset = 0xcbbbfdc2000
        zio_size = 0x2000
        time = 0x67365e95 0x21050221
        eid = 0x28

Nov 14 2024 14:33:25.553976353 ereport.fs.zfs.probe_failure
        class = "ereport.fs.zfs.probe_failure"
        ena = 0x21574d0a1d02c01
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xa112750aa9ef7a6f
                vdev = 0xbbb37a2b8f7b6b0d
        (end detector)
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        pool_failmode = "continue"
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        vdev_ashift = 0x9
        vdev_complete_ts = 0x21574bc625
        vdev_delta_ts = 0xfe0f48c
        vdev_read_errors = 0x3
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x0
        vdev_delays = 0x0
        parent_guid = 0xe8c42350361a95d9
        parent_type = "raidz"
        vdev_spare_paths =
        vdev_spare_guids =
        prev_state = 0x0
        time = 0x67365e95 0x21050221
        eid = 0x29

Nov 14 2024 14:33:25.925971711 resource.fs.zfs.statechange
        version = 0x0
        class = "resource.fs.zfs.statechange"
        pool = "spinners"
        pool_guid = 0xa112750aa9ef7a6f
        pool_state = 0x0
        pool_context = 0x0
        vdev_guid = 0xbbb37a2b8f7b6b0d
        vdev_state = "FAULTED" (0x5)
        vdev_path = "/dev/disk/by-partuuid/ae7d6fbf-d8d7-4cf0-b57c-de6548a21547"
        vdev_laststate = "ONLINE" (0x7)
        time = 0x67365e95 0x373134ff
        eid = 0x2a

Closing remarks / Hardware Specs
I am at a loss here with where to go next. I believe the server hardware itself (minus the drives) is fine as everything worked with the old 4TB drives installed. I am hoping somebody with a little more ZFS knowledge than me might have some pointers for me before I destroy the array and test the two drives in question outside of the system.

Hardware Specs

ASRock Rack D1541D4U-2T8R Mobo

Intel Xeon D1541 CPU

128GB DDR4 ECC RDIMM

Nvidia Quadro P400

Silverstone Case Storage CS382

EVGA N1 400W PSU

Again any help would be greatly appreciated.

Thanks,
Kirk

Fleshmauler · November 14, 2024, 9:59pm

Looking at the motherboard in question it has 6 sata data ports & 1 SAS port under a LSI3008 for up to x8 connectivity. I’m guessing that you had your 6x4tb drives connected directly to the motherboard via sata data ports.

I’m guessing that you’re using the SAS connection to hookup your 8x14tb drives? If so did you change it from IR to IT mode in the BIOS? Edit: Forum posts for similar boards that have the LSI3008 chip on board imply that it may have to be flashed into IT mode:

I’m suspecting this might be relevant because the errors on the HDD (to my non-professional eye) imply a connection error.

mille535 · November 14, 2024, 10:39pm

The board has two MiniSAS HD ports and I have 2 fanout cables that allow for 8 drives. 2 of the 6 native sata ports are used for the boot drives and nothing else. The controller is in IT mode as the previous 4TB drives were also connected to the 3008 using the same fan out cables.

Stux · November 14, 2024, 10:49pm

The drives are brand new.

The drive may have failed, and may need replacing.

It’s possible that a drive can basically be bad on arrival, and I assume that no burn in testing was done… since the drives aren’t even that old.

In the meantime, you should run a smart extended tests on all your drives.

Also, as drive ids can change at reboot or if drives are swapped or have a power hiccup, the warnings could be referring to the same drive.

And finally, your drive hit 56C and is at 50C. Many would regard that as too warm, especially for helium drives (which run cooler) and you should do something about your drive cooling.

mille535 · November 14, 2024, 10:53pm

Thanks for looking out but the drives are new and installed 24 hours ago hence no smart tests. I do have monthly shorts and quarterly fulls scheduled. I’ll run a full on the two drives in question in the meantime. Also good to know about the drive IDs. I’ve when I receive the error notifications I’ve basically been going off the serial of the drive to make sure there is no confusion there.

Stux · November 14, 2024, 10:58pm

Yeah, I noticed and was adjusting my post, sorry.

Stux · November 14, 2024, 11:00pm

A short takes 2 minutes. I would personally be doing smart testing more often.

Perhaps weekly and monthly.

mille535 · November 14, 2024, 11:01pm

No worries! It’s good advice for anybody else who may find this in the future.

Thanks for the tips.

Fleshmauler · November 15, 2024, 12:58am

Damn - I really thought I was onto something, but you’re already 5 steps ahead. Yeah, I guess testing on the two disks in question is what we’re left with.

mille535 · November 15, 2024, 1:11am

Haha well I appreciate the help! I’ve decided to run a smart long tonight and if there’s nothing obviously wrong then I’ll swap back in my 4TBs and run full tests on at least the two in question.

Thanks again for the suggestions!

etorix · November 15, 2024, 8:34am

A 400 W PSU with 8 drives? That’s a bit on the short side.

…like getting rid of the Silverstone case. (Says the Silverstone-hater.)

mille535 · November 15, 2024, 2:41pm

While it may be on the short side it is adequate for the system. Lets assume a drive uses 30W at max that’s 240W in drives if running at full tilt, the CPU is 45W, Quadro card is 30W, and lets say 20W for the mobo. That’s 335W or 83.75% running at 100% utilization. My UPS reports that it pulls around 80-90W during normal operation (this was with the 6x4TB drives). So while I probably should add “get a bigger PSU” to the upgrade list, I believe it is acceptable and within spec. Now if someone were to say “hey maybe just try a larger CPU and see if the array breaks” sure I’m open to trying anything within reason.

I am not trying to be rude as I am genuinely looking for help and I don’t think a person asking for help should have an attitude about suggestions if they are in good faith. I really don’t feel the case contributes to my problem at all. I really like the case and it fits the bill for me perfectly but the drive temps are definitely concerning. I hadn’t really been tracking drive temps historically so I don’t know what the 6x4TBs looked like temp-wise but I was running stress tests last night while watching temps and never saw anything over 45C. All that to say is I like the Silverstone case but I’m not married to it. If you (or anyone) has a case suggestion that has support for 8 removable drives, mATX mobo, and at least half height PCIe I’s love to hear it.

mille535 · November 15, 2024, 2:54pm

For anybody following along at home. Last night I started some testing. First I ran a SMART long on the 9MGBVN0T drive and while smartctl said it was running (I tracked it for a few hours until it said it was no longer running a test) the results were never logged for the long test, only a previous short test. I’ve never seen that happen before so though it was curious.

Next I destroyed the zpool and started running badblocks on the 9MGBVN0T drive. Currently it’s at 62% complete with 0/0/0 errors which is also curious to me.

While that is running I created a new temporary RAIDZ1 pool with the remaining 7 drives. I wrote about 10TB of random data using openssl and /dev/urandom as a seed to a file. After the 10TB file was created I initiated a scrub to add a little more stress and the array is still in online and healthy state.

So one part part of me still thinks its just the 9MGBVN0T that is the problem but on the other hand badblocks has yet to complain about the drive. I am still a little perplexed that the SMART long didn’t log in smartcrl, though so I am planning on running a long on the other drives to see if they have that same behavior or if maybe that is a smoking gun.

Any other suggestions would be appreciated. A part of me wants to just RMA the 9MGBVN0T drive and be done with it but I worry the vendor might want some sort of proof that it is bad and so far badblocks hasn’t given me that proof.

mille535 · November 15, 2024, 3:12pm

mille535:

etorix:

…like getting rid of the Silverstone case. (Says the Silverstone-hater.)

I am not trying to be rude as I am genuinely looking for help and I don’t think a person asking for help should have an attitude about suggestions if they are in good faith. I really don’t feel the case contributes to my problem at all. I really like the case and it fits the bill for me perfectly but the drive temps are definitely concerning. I hadn’t really been tracking drive temps historically so I don’t know what the 6x4TBs looked like temp-wise but I was running stress tests last night while watching temps and never saw anything over 45C. All that to say is I like the Silverstone case but I’m not married to it. If you (or anyone) has a case suggestion that has support for 8 removable drives, mATX mobo, and at least half height PCIe I’s love to hear it.

Quick follow-up on this. I have the drive bay fans connected to the motherboard and checked in IPMI and it looks like one fan is running at 2700 rpm and the other is 400 so that is certainly an issue. I’ll either need to plug the fans back into the backplane or see if there is a spot in BIOS to raise the speed of that fan. Still open to suggestions for other cases, though.

Fleshmauler · November 15, 2024, 7:25pm

The concern is whether the 5v rail can supply enough wattage. Usually PSUs display total wattage based on the 12v rail max (unless they are really bad & need to add up 12v, 5v, and 3.3v to hit advertised numbers), but the 5v & 3.3v have their own max capacity hidden away somewhere (hopefully on the side of the PSU or the box it came on, otherwise have fun finding manufacturer specs).

It could be just fine, but without actually checking how much load there is on 5v & what the PSU can actually produce, impossible to say for certain. There are great gaming PSUs that can supply 1000w - but are anemic on 3.3v & 5v because they don’t expect high load on those rails (not really a lot of demand for gamers running 12 hdds). There are 600w PSUs that aren’t great for crazy overclocked systems that want all that sweet & stable 12v juice, but ironically, are fantastic for HDDs due to beefy 5v and 3.3v rails.

Depends on what the max amperage draw per voltage is for your HDDs (as per manufacturer) & the max wattage per rail is for your PSU.

tldr; should be fine, but remember that 12v, 5v, and 3.5v all have different max wattage out per PSU & that HDDs don’t just run on 12v (mostly). 400w advertised on PSU != 400w of 12v, 400w of 5v, & 400w of 3.3v

Stux · November 15, 2024, 7:29pm

No fans ?
No memory?

mille535 · November 15, 2024, 7:50pm

If you truly believe the power supply to be the issue, can you suggest a size appropriate for my system? 600W? Is there a brand / model that TrueNAS’ers like for a build this size?

Fleshmauler · November 15, 2024, 7:58pm

Per EVGA your PSU has a combined 110w max on the 5v & 3.3v rails. Per the white sheet the max operating wattage of the SATA models of your HDDs is 6 watts - if we’re going 100% worst case scenario & it is all on 5v that’ll be a total of 48 watts from HDDs alone.

Leaving 62 watts for everything else that requires 5v and 3.3v. Fans, PCIE, etc.

But that is worst case & realistically 12v is taking more than a 0% participation in this calculation.

In short, your PSU is likely fine.

Edit: Your motherboard has an IPMI that I’m guessing can give you the voltage output of individual rails. If you see voltage going too far below their expected values (IE 12v rail reporting 11v or 3.3v showing 2.7v) while your system is at load, then we can blame the PSU. These things would cause instability. Or you’d trip overcurrent protection on the PSU & system would randomly shutoff. Or if it is a REALLY bad PSU things start to release the magic smoke & something breaks/burns.

Edit #2: there is also something to consider when system boots as HDD spin up may actually want more wattage than regular operational use. Something that just meets your operational usage, may still have issues during spin up cycles.

Edit #3: cleaned up typos & added some extras.

mille535 · November 15, 2024, 8:41pm

Thanks for the in-depth information, Fleshmauler. The system has certainly evolved over the years going from 2 to 4 to 6 to 8 drives and I’ve always gone by the rough napkin math and came to the conclusion that it’s probably fine, but not ideal. I am a sysadmin by day and I certainly wouldn’t run any of my production equipment that close to the line so ultimately a new PSU is on the list of future upgrades even if it’s not contributing to the issue at hand.

Unfortunately, the IPMI in this system is super basic and doesn’t show the load of the individual rails, just the current voltages.

Again, thanks for the deep dive, truly useful information.

etorix · November 15, 2024, 8:52pm

Fair enough. My own experience with Silverstone cases has been that they have WAY too many screws, are fiddly to work with and invariably have some ill-designed details that makes them less useful and less pleasant to work with that they should—if only the designer had bothered to actually build a computer in his design. And, specifically with respect to NAS cases, that the DS-381 is a drive-cooking abomination.
I think that the CS-381 uses the same cage as the DS-381. Your CS-382 may have a different cage, which can only be good. But still 45°C is not great.

Genuine 2U rackmount chassis from Supermicro, Dell, HP, Lenovo, Mitac, etc. should fit the bill—but you won’t like to HEAR these, literally.

In consumer-styled cases, which may work with consumer-style quiet rather than server-style style jet engines, …

Fractal Design Node 804 has decent cooling, is not so quiet (open top: HDD noise goes straight out), but does not have removable drives.
Nanoxia Deep Silence 8 Pro provides a better access to drives (tool-less trays inside the case but no backplane), tough still not removable, decent cooling (not tested in the most stressful situation) and quietness.

I don’t have a fit for your requirements. Inter-Tech NAS8 looked interesting (and perfectly fits your requirements), but seems to have quite a few quirks and design flaws.