How to troubleshoot disk faulted in pool

Hello, running TrueNAS Scale 22.12.2 on a dedicated server. Intel N5105 based system with 32gb of ram, 2x256gb SSD’s as the boot pool, and 3x16gb Seagate Exos X18 drives as my main storage pool in RAIDZ1. Been running for about 18 months flawlessly. Checked today and discovered one disk had faulted, and am unsure how to proceed with troubleshooting. I am backing up everything to an external drive just in case at the moment, and ran a scrub, but found zero errors as that disk appears to be offline. Could someone point me in the right direction to clear the error and get it back to 100%? I have a new matching drive on its way up to me now, but that’ll take a week or so.

Output of “sudo vpool status -v”:

admin@truenas[~]$ sudo zpool status -v
[sudo] password for admin:
pool: File Dump
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use ‘zpool clear’ to mark the device
repaired.
scan: scrub repaired 0B in 09:59:29 with 0 errors on Tue Feb 18 04:01:35 2025
config:

NAME                                      STATE     READ WRITE CKSUM
File Dump                                 DEGRADED     0     0     0
  raidz1-0                                DEGRADED     0     0     0
    cc8ed6ca-3a43-4228-89b1-20dc761a0235  ONLINE       0     0     0
    a7850746-5736-403e-a29d-778a1db2ebcf  ONLINE       0     0     0
    ef694f04-9d55-49a2-9ad2-f7d4c419f832  FAULTED     63   300     0  too many errors

errors: No known data errors

pool: boot-pool
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using ‘zpool upgrade’. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 00:00:08 with 0 errors on Tue Feb 18 03:45:09 2025
config:

NAME           STATE     READ WRITE CKSUM
boot-pool      ONLINE       0     0     0
  mirror-0     ONLINE       0     0     0
    nvme0n1p3  ONLINE       0     0     0
    nvme1n1p3  ONLINE       0     0     0

errors: No known data errors

Welcome to the TrueNAS forums!

That’s pretty old. Documentation does not seem to go that far back. But, here is a link to the 23.10 disk replacement;

The tricky part will be identifying the failed disk.

You could look at the partition uid that you show in your cli output to match it to a device, then run smartctl to match that to a serial number.

/dev/disk/by-uuid has symlinks mapping the partition uid to the device, that seems like an easy way to find the device, and then from there the serial.

Ofc for all I know the UI view of the pool shows the serial right there, which would be easier :sweat_smile:

You are running without redundancy, so you definitely don’t want to pull the wrong drive.

Archived docs section

Use this command, it should work fine:
lsblk -o +PARTUUID,NAME,LABEL | grep -E "[a-z0-9]*-" | awk -F" " '{print $7" -> " $8}'

EDIT: A better way and it provides you the drive serial number as well, which is what you should be using to remove the failed drive.
lsblk -o +PARTUUID,NAME,LABEL,SERIAL

1 Like

Forgive me, what will that command do?

Show you the partition uuid and serial of the drive, so you can map one to the other.

Then when you are at “replace physical drive” in the instructions, you can power down the server, find the drive with that serial, triple check that’s really the one that failed, and replace it with the new drive.

After which you power it on again and continue with the drive replacement steps in the docs.

Also, thoughts and prayers for a 3 x 16TB raidz1. Exos are good drives though, fingers crossed you get through the resilver without a second failure.

Tried rebooting system (been 6 months or more). It did re-sliver with the following result:

admin@truenas[~]$ sudo zpool status -v
pool: File Dump
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using ‘zpool clear’ or replace the device with ‘zpool replace’.
see: Message ID: ZFS-8000-9P — OpenZFS documentation
scan: resilvered 6.64G in 00:03:36 with 0 errors on Tue Feb 18 19:00:35 2025
config:

NAME                                      STATE     READ WRITE CKSUM
File Dump                                 ONLINE       0     0     0
  raidz1-0                                ONLINE       0     0     0
    cc8ed6ca-3a43-4228-89b1-20dc761a0235  ONLINE       0     0     0
    a7850746-5736-403e-a29d-778a1db2ebcf  ONLINE       0     0     0
    ef694f04-9d55-49a2-9ad2-f7d4c419f832  ONLINE       0     0     1

I would assume next task is to re-scrub, correct?

Have you run SMART-Long test on that drive?

Not yet, passed SMART-short a moment ago.

Please run the following commands and post the output for each command here in a separate </> box:

  • lsblk -bo NAME,MODEL,ROTA,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
  • sudo smartctl -x /dev/sdX where X is the disk letter for the disk with the failing partuuid on it.

This will tell us which disk is which in the pool and what type of disk it is (so we can check whether it is SMR or not) and give us the SMART data for that disk.

Here are the results

</
admin@truenas[~]$ lsblk -bo NAME,MODEL,ROTA,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
NAME MODEL ROTA PTTYPE TYPE    START           SIZE PARTTYPENAME PARTUUID
sda  ST160    1 gpt    disk          16000900661248              
├─sda1
│             1 gpt    part      128     2147418624 Linux swap   04751ea2-b116-41f3-89af-00de2367dc23
└─sda2
              1 gpt    part  4194432 15998753095168 Solaris /usr & Apple ZFS
                                                                 a7850746-5736-403e-a29d-778a1db2ebcf
sdb  ST160    1 gpt    disk          16000900661248              
├─sdb1
│             1 gpt    part      128     2147418624 Linux swap   57936b98-5755-4b72-b49c-759ff757778b
└─sdb2
              1 gpt    part  4194432 15998753095168 Solaris /usr & Apple ZFS
                                                                 ef694f04-9d55-49a2-9ad2-f7d4c419f832
sdc  ST160    1 gpt    disk          16000900661248              
├─sdc1
│             1 gpt    part      128     2147418624 Linux swap   77a86ba3-821a-4266-8bf7-0b7fc9cee928
└─sdc2
              1 gpt    part  4194432 15998753095168 Solaris /usr & Apple ZFS
                                                                 cc8ed6ca-3a43-4228-89b1-20dc761a0235
nvme1n1
     PNY C    0 gpt    disk            250059350016              
├─nvme1n1p1
│             0 gpt    part     4096        1048576 BIOS boot    529a5496-fd10-4a91-93ce-3d64be232846
├─nvme1n1p2
│             0 gpt    part     6144      536870912 EFI System   2adc40dd-fc6e-4321-8565-5cfe89b3e974
├─nvme1n1p3
│             0 gpt    part 34609152   232339447296 Solaris /usr & Apple ZFS
│                                                                2c08764c-9b56-4535-9648-dd5336eb11d3
└─nvme1n1p4
              0 gpt    part  1054720    17179869184 Linux swap   4fcd654b-aad5-4b01-ae73-f547605849d3
nvme0n1
     PNY C    0 gpt    disk            250059350016              
├─nvme0n1p1
│             0 gpt    part     4096        1048576 BIOS boot    65d7ecdf-e708-456c-a2a8-31dfd2b02cae
├─nvme0n1p2
│             0 gpt    part     6144      536870912 EFI System   dae7c0a9-c46c-4410-9e12-01ad3aba4bf4
├─nvme0n1p3
│             0 gpt    part 34609152   232339447296 Solaris /usr & Apple ZFS
│                                                                b33e0983-8eb0-4291-8ec9-c3ca85dcb73f
└─nvme0n1p4
              0 gpt    part  1054720    17179869184 Linux swap   6fccc6a1-d5c2-4e26-8395-d158793e9488
admin@truenas[~]$ 
>```

And here:

admin@truenas[~]$ sudo smartctl -x /dev/sdb
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST16000NM000J-2TW103
Serial Number:    ZR60NVXT
LU WWN Device Id: 5 000c50 0e4ba1806
Firmware Version: SN02
User Capacity:    16,000,900,661,248 bytes [16.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Feb 19 07:25:50 2025 AKST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Disabled
ATA Security is:  Disabled, frozen [SEC2]
Write SCT (Get) Feature Control Command failed: Connection timed out
Wt Cache Reorder: Unknown (SCT Feature Control command failed)

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 248)	Self-test routine in progress...
					80% of test remaining.
Total time to complete Offline 
data collection: 		(  567) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 (1347) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x70bd)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   074   064   044    -    23471408
  3 Spin_Up_Time            PO----   091   091   000    -    0
  4 Start_Stop_Count        -O--CK   100   100   020    -    16
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  7 Seek_Error_Rate         POSR--   087   060   045    -    512085938
  9 Power_On_Hours          -O--CK   083   083   000    -    15507
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   100   100   020    -    16
 18 Unknown_Attribute       PO-R--   100   100   050    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
188 Command_Timeout         -O--CK   100   100   000    -    0
190 Airflow_Temperature_Cel -O---K   060   049   000    -    40 (Min/Max 29/44)
192 Power-Off_Retract_Count -O--CK   100   100   000    -    6
193 Load_Cycle_Count        -O--CK   100   100   000    -    617
194 Temperature_Celsius     -O---K   040   051   000    -    40 (0 19 0 0 0)
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
200 Multi_Zone_Error_Rate   PO---K   100   100   001    -    0
240 Head_Flying_Hours       ------   100   253   000    -    14401 (16 212 0)
241 Total_LBAs_Written      ------   100   253   000    -    46487242843
242 Total_LBAs_Read         ------   100   253   000    -    24033141338761
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
0x04       GPL     R/O    256  Device Statistics log
0x04       SL      R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x0a       GPL     R/W      8  Device Statistics Notification
0x0c       GPL     R/O   2048  Pending Defects log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x13       GPL     R/O      1  SATA NCQ Send and Receive log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x24       GPL     R/O    768  Current Device Internal Status Data log
0x2f       GPL     R/O      1  Set Sector Configuration
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1       GPL,SL  VS     160  Device vendor specific log
0xa2       GPL     VS   16320  Device vendor specific log
0xa4       GPL,SL  VS     160  Device vendor specific log
0xa6       GPL     VS     192  Device vendor specific log
0xa8-0xa9  GPL,SL  VS     136  Device vendor specific log
0xab       GPL     VS       1  Device vendor specific log
0xad       GPL     VS      16  Device vendor specific log
0xb1       GPL,SL  VS     160  Device vendor specific log
0xb6       GPL     VS    1920  Device vendor specific log
0xbe-0xbf  GPL     VS   65535  Device vendor specific log
0xc1       GPL,SL  VS       8  Device vendor specific log
0xc3       GPL,SL  VS      24  Device vendor specific log
0xc6       GPL     VS    5184  Device vendor specific log
0xc7       GPL,SL  VS       8  Device vendor specific log
0xc9       GPL,SL  VS       8  Device vendor specific log
0xca       GPL,SL  VS      16  Device vendor specific log
0xcd       GPL,SL  VS       1  Device vendor specific log
0xce       GPL     VS       1  Device vendor specific log
0xcf       GPL     VS     512  Device vendor specific log
0xd1       GPL     VS     656  Device vendor specific log
0xd2       GPL     VS   10256  Device vendor specific log
0xd4       GPL     VS    2048  Device vendor specific log
0xda       GPL,SL  VS       1  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Self-test routine in progress 80%     15507         -
# 2  Short offline       Completed without error       00%     15495         -
# 3  Extended offline    Interrupted (host reset)      90%     15495         -
# 4  Short offline       Completed without error       00%     15494         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       522 (0x020a)
Device State:                        Active (0)
Current Temperature:                    40 Celsius
Power Cycle Min/Max Temperature:     29/44 Celsius
Lifetime    Min/Max Temperature:     19/51 Celsius
Under/Over Temperature Limit Count:   0/11
SMART Status:                        0xc24f (PASSED)
Vendor specific:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         4 minutes
Temperature Logging Interval:        59 minutes
Min/Max recommended Temperature:     10/40 Celsius
Min/Max Temperature Limit:            5/60 Celsius
Temperature History Size (Index):    128 (25)

Index    Estimated Time   Temperature Celsius
  26    2025-02-14 02:07    43  ************************
  27    2025-02-14 03:06    42  ***********************
 ...    ..(  4 skipped).    ..  ***********************
  32    2025-02-14 08:01    42  ***********************
  33    2025-02-14 09:00    43  ************************
 ...    ..(  7 skipped).    ..  ************************
  41    2025-02-14 16:52    43  ************************
  42    2025-02-14 17:51    44  *************************
  43    2025-02-14 18:50    43  ************************
  44    2025-02-14 19:49    44  *************************
  45    2025-02-14 20:48    44  *************************
  46    2025-02-14 21:47    43  ************************
  47    2025-02-14 22:46    43  ************************
  48    2025-02-14 23:45    43  ************************
  49    2025-02-15 00:44    44  *************************
  50    2025-02-15 01:43    43  ************************
 ...    ..(  5 skipped).    ..  ************************
  56    2025-02-15 07:37    43  ************************
  57    2025-02-15 08:36    42  ***********************
  58    2025-02-15 09:35    43  ************************
  59    2025-02-15 10:34    44  *************************
  60    2025-02-15 11:33    43  ************************
 ...    ..(  8 skipped).    ..  ************************
  69    2025-02-15 20:24    43  ************************
  70    2025-02-15 21:23    44  *************************
  71    2025-02-15 22:22    43  ************************
 ...    ..(  6 skipped).    ..  ************************
  78    2025-02-16 05:15    43  ************************
  79    2025-02-16 06:14    42  ***********************
  80    2025-02-16 07:13    42  ***********************
  81    2025-02-16 08:12    42  ***********************
  82    2025-02-16 09:11    43  ************************
 ...    ..(  2 skipped).    ..  ************************
  85    2025-02-16 12:08    43  ************************
  86    2025-02-16 13:07    42  ***********************
  87    2025-02-16 14:06    42  ***********************
  88    2025-02-16 15:05    42  ***********************
  89    2025-02-16 16:04    44  *************************
  90    2025-02-16 17:03    45  **************************
  91    2025-02-16 18:02    44  *************************
  92    2025-02-16 19:01    45  **************************
  93    2025-02-16 20:00    44  *************************
  94    2025-02-16 20:59    44  *************************
  95    2025-02-16 21:58    44  *************************
  96    2025-02-16 22:57    48  *****************************
  97    2025-02-16 23:56    49  ******************************
  98    2025-02-17 00:55    50  *******************************
 ...    ..(  2 skipped).    ..  *******************************
 101    2025-02-17 03:52    50  *******************************
 102    2025-02-17 04:51    49  ******************************
 103    2025-02-17 05:50    45  **************************
 104    2025-02-17 06:49    44  *************************
 105    2025-02-17 07:48    44  *************************
 106    2025-02-17 08:47    43  ************************
 ...    ..( 14 skipped).    ..  ************************
 121    2025-02-17 23:32    43  ************************
 122    2025-02-18 00:31    44  *************************
 123    2025-02-18 01:30    44  *************************
 124    2025-02-18 02:29    43  ************************
 125    2025-02-18 03:28    43  ************************
 126    2025-02-18 04:27    44  *************************
 127    2025-02-18 05:26    44  *************************
   0    2025-02-18 06:25    44  *************************
   1    2025-02-18 07:24    43  ************************
   2    2025-02-18 08:23    43  ************************
   3    2025-02-18 09:22    44  *************************
   4    2025-02-18 10:21    43  ************************
 ...    ..(  3 skipped).    ..  ************************
   8    2025-02-18 14:17    43  ************************
   9    2025-02-18 15:16    44  *************************
  10    2025-02-18 16:15     ?  -
  11    2025-02-18 17:14    40  *********************
  12    2025-02-18 18:13     ?  -
  13    2025-02-18 19:12    29  **********
  14    2025-02-18 20:11    39  ********************
  15    2025-02-18 21:10    43  ************************
  16    2025-02-18 22:09    44  *************************
  17    2025-02-18 23:08    44  *************************
  18    2025-02-19 00:07    43  ************************
 ...    ..(  2 skipped).    ..  ************************
  21    2025-02-19 03:04    43  ************************
  22    2025-02-19 04:03    42  ***********************
  23    2025-02-19 05:02    41  **********************
  24    2025-02-19 06:01    41  **********************
  25    2025-02-19 07:00    41  **********************

SCT Error Recovery Control:
           Read:    100 (10.0 seconds)
          Write:    100 (10.0 seconds)

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4              16  ---  Lifetime Power-On Resets
0x01  0x010  4           15507  ---  Power-on Hours
0x01  0x018  6     46487027075  ---  Logical Sectors Written
0x01  0x020  6       968073821  ---  Number of Write Commands
0x01  0x028  6  24033141122990  ---  Logical Sectors Read
0x01  0x030  6      2163712320  ---  Number of Read Commands
0x01  0x038  6               -  ---  Date and Time TimeStamp
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4           14472  ---  Spindle Motor Power-on Hours
0x03  0x010  4           14472  ---  Head Flying Hours
0x03  0x018  4             617  ---  Head Load Events
0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
0x03  0x028  4               0  ---  Read Recovery Attempts
0x03  0x030  4               0  ---  Number of Mechanical Start Failures
0x03  0x038  4               0  ---  Number of Realloc. Candidate Logical Sectors
0x03  0x040  4               6  ---  Number of High Priority Unload Events
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               0  ---  Resets Between Cmd Acceptance and Completion
0x04  0x018  4               0  -D-  Physical Element Status Changed
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              41  ---  Current Temperature
0x05  0x010  1              42  ---  Average Short Term Temperature
0x05  0x018  1              42  ---  Average Long Term Temperature
0x05  0x020  1              50  ---  Highest Temperature
0x05  0x028  1              26  ---  Lowest Temperature
0x05  0x030  1              45  ---  Highest Average Short Term Temperature
0x05  0x038  1              32  ---  Lowest Average Short Term Temperature
0x05  0x040  1              42  ---  Highest Average Long Term Temperature
0x05  0x048  1              34  ---  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               5  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4              53  ---  Number of Hardware Resets
0x06  0x010  4              14  ---  Number of ASR Events
0x06  0x018  4               0  ---  Number of Interface CRC Errors
0xff  =====  =               =  ===  == Vendor Specific Statistics (rev 1) ==
0xff  0x008  7               0  ---  Vendor Specific
0xff  0x010  7               0  ---  Vendor Specific
0xff  0x018  7               0  ---  Vendor Specific
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c)
No Defects Logged

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2           18  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS

Seagate FARM log (GP Log 0xa6) supported [try: -l farm]```

Hrrmmph! Not following this instruction has made it MUCH more difficult to interpret the results.

Drive looks OK except for these:

Except later it says Max temp is 60C so I can’t see how a lifetime max temp of 51C would account for 11 times over 60C.

Can you please run sudo smartctl -l farm /dev/sdb so we can compare SMART and FARM stats?

P.S. And yes - once the Long Test finishes I would run a scrub next.

1 Like

Fixed the formatting, driving me crazy too.

2 Likes

I would have done it myself if I had editing rights. :grin:

With the formatting fixed, I now spotted something else.

I note that SMART tests have only just been run.

You should set up regular short and long smart tests on all drives, as well as regular scrubs, and implement @joeschmuck’s (yes the same Joe Schmuck that fixed the formatting) Multi-Report script to tell you when things start to go wrong.

Here is the output you requested, hopefully the formatting works fine this time (I tried initially, but couldn’t figure out how to get it in the box):

I have now set up regular tests on the drives now - the SMART tests did not want to run on my earlier distro of Truenas SCALE. I have also upgraded the install to 24.10.2 distribution. I had been reluctant to upgrade to avoid borking an otherwise working stable system.

admin@truenas[~]$ sudo smartctl -l farm /dev/sdb 
[sudo] password for admin: 
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

Seagate Field Access Reliability Metrics log (FARM) (GP Log 0xa6)
	FARM Log Page 0: Log Header
		FARM Log Version: 4.19
		Pages Supported: 6
		Log Size: 98304
		Page Size: 16384
		Heads Supported: 24
		Number of Copies: 0
		Reason for Frame Capture: 0
	FARM Log Page 1: Drive Information
		Serial Number: ZR60NVXT
		World Wide Name: 0x5000c500e4ba1806
		Device Interface: SATA
		Device Capacity in Sectors: 31251759104
		Physical Sector Size: 4096
		Logical Sector Size: 512
		Device Buffer Size: 268435456
		Number of Heads: 17
		Device Form Factor: 3.5 inches
		Rotation Rate: 7200 rpm
		Firmware Rev: SN02    
		ATA Security State (ID Word 128): 0x01629
		ATA Features Supported (ID Word 78): 0x016cc
		ATA Features Enabled (ID Word 79): 0x0000000000000044
		Power on Hours: 15520
		Spindle Power on Hours: 14485
		Head Flight Hours: 14484
		Head Load Events: 617
		Power Cycle Count: 17
		Hardware Reset Count: 53
		Spin-up Time: 10 ms
		Time to ready of the last power cycle: 20518 ms
		Time drive is held in staggered spin: 21741 ms
		Model Number: ST16000NM000J-2TW103                    
		Drive Recording Type: CMR
		Max Number of Available Sectors for Reassignment: 16384
		Assembly Date (YYWW): 2261
		Depopulation Head Mask: 0
	FARM Log Page 2: Workload Statistics
		Total Number of Read Commands: 2167278580
		Total Number of Write Commands: 968363984
		Total Number of Random Read Commands: 21666326
		Total Number of Random Write Commands: 959333852
		Total Number Of Other Commands: 27789091
		Logical Sectors Written: 46496452219
		Logical Sectors Read: 2873404318438
		Number of dither events during current power cycle: 161
		Number of times dither was held off during random workloads: 37640
		Number of times dither was held off during sequential workloads: 233819
		Number of Read commands from 0-3.125% of LBA space for last 3 SMART Summary Frames: 8450999
		Number of Read commands from 3.125-25% of LBA space for last 3 SMART Summary Frames: 14811083
		Number of Read commands from 25-75% of LBA space for last 3 SMART Summary Frames: 15952368
		Number of Read commands from 75-100% of LBA space for last 3 SMART Summary Frames: 9736244
		Number of Write commands from 0-3.125% of LBA space for last 3 SMART Summary Frames: 388297
		Number of Write commands from 3.125-25% of LBA space for last 3 SMART Summary Frames: 0
		Number of Write commands from 25-75% of LBA space for last 3 SMART Summary Frames: 0
		Number of Write commands from 75-100% of LBA space for last 3 SMART Summary Frames: 19775087
	FARM Log Page 3: Error Statistics
		Unrecoverable Read Errors: 0
		Unrecoverable Write Errors: 0
		Number of Reallocated Sectors: 0
		Number of Read Recovery Attempts: 0
		Number of Mechanical Start Failures: 0
		Number of Reallocated Candidate Sectors: 0
		Number of ASR Events: 14
		Number of Interface CRC Errors: 0
		Spin Retry Count: 0
		Spin Retry Count Normalized: 100
		Spin Retry Count Worst: 100
		Number of IOEDC Errors (Raw): 0
		CTO Count Total: 0
		CTO Count Over 5s: 0
		CTO Count Over 7.5s: 0
		Total Flash LED (Assert) Events: 0
		Index of the last Flash LED: 0
		Flash LED Event 0:
			Event Information: 0x0000000000000000
			Timestamp of Event 0 (hours): 0
			Power Cycle Event 0: 0
		Flash LED Event 1:
			Event Information: 0x0000000000000000
			Timestamp of Event 1 (hours): 0
			Power Cycle Event 1: 0
		Flash LED Event 2:
			Event Information: 0x0000000000000000
			Timestamp of Event 2 (hours): 0
			Power Cycle Event 2: 0
		Flash LED Event 3:
			Event Information: 0x0000000000000000
			Timestamp of Event 3 (hours): 0
			Power Cycle Event 3: 0
		Flash LED Event 4:
			Event Information: 0x0000000000000000
			Timestamp of Event 4 (hours): 0
			Power Cycle Event 4: 0
		Flash LED Event 5:
			Event Information: 0x0000000000000000
			Timestamp of Event 5 (hours): 0
			Power Cycle Event 5: 0
		Flash LED Event 6:
			Event Information: 0x0000000000000000
			Timestamp of Event 6 (hours): 0
			Power Cycle Event 6: 0
		Flash LED Event 7:
			Event Information: 0x0000000000000000
			Timestamp of Event 7 (hours): 0
			Power Cycle Event 7: 0
		Uncorrectable errors: 0
		Cumulative Lifetime Unrecoverable Read errors due to ERC: 0
		Cum Lifetime Unrecoverable by head 0:
			Cumulative Lifetime Unrecoverable Read Repeating: 0
			Cumulative Lifetime Unrecoverable Read Unique: 0
		Cum Lifetime Unrecoverable by head 1:
			Cumulative Lifetime Unrecoverable Read Repeating: 0
			Cumulative Lifetime Unrecoverable Read Unique: 0
		Cum Lifetime Unrecoverable by head 2:
			Cumulative Lifetime Unrecoverable Read Repeating: 0
			Cumulative Lifetime Unrecoverable Read Unique: 0
		Cum Lifetime Unrecoverable by head 3:
			Cumulative Lifetime Unrecoverable Read Repeating: 0
			Cumulative Lifetime Unrecoverable Read Unique: 0
		Cum Lifetime Unrecoverable by head 4:
			Cumulative Lifetime Unrecoverable Read Repeating: 0
			Cumulative Lifetime Unrecoverable Read Unique: 0
		Cum Lifetime Unrecoverable by head 5:
			Cumulative Lifetime Unrecoverable Read Repeating: 0
			Cumulative Lifetime Unrecoverable Read Unique: 0
		Cum Lifetime Unrecoverable by head 6:
			Cumulative Lifetime Unrecoverable Read Repeating: 0
			Cumulative Lifetime Unrecoverable Read Unique: 0
		Cum Lifetime Unrecoverable by head 7:
			Cumulative Lifetime Unrecoverable Read Repeating: 0
			Cumulative Lifetime Unrecoverable Read Unique: 0
		Cum Lifetime Unrecoverable by head 8:
			Cumulative Lifetime Unrecoverable Read Repeating: 0
			Cumulative Lifetime Unrecoverable Read Unique: 0
		Cum Lifetime Unrecoverable by head 9:
			Cumulative Lifetime Unrecoverable Read Repeating: 0
			Cumulative Lifetime Unrecoverable Read Unique: 0
		Cum Lifetime Unrecoverable by head 10:
			Cumulative Lifetime Unrecoverable Read Repeating: 0
			Cumulative Lifetime Unrecoverable Read Unique: 0
		Cum Lifetime Unrecoverable by head 11:
			Cumulative Lifetime Unrecoverable Read Repeating: 0
			Cumulative Lifetime Unrecoverable Read Unique: 0
		Cum Lifetime Unrecoverable by head 12:
			Cumulative Lifetime Unrecoverable Read Repeating: 0
			Cumulative Lifetime Unrecoverable Read Unique: 0
		Cum Lifetime Unrecoverable by head 13:
			Cumulative Lifetime Unrecoverable Read Repeating: 0
			Cumulative Lifetime Unrecoverable Read Unique: 0
		Cum Lifetime Unrecoverable by head 14:
			Cumulative Lifetime Unrecoverable Read Repeating: 0
			Cumulative Lifetime Unrecoverable Read Unique: 0
		Cum Lifetime Unrecoverable by head 15:
			Cumulative Lifetime Unrecoverable Read Repeating: 0
			Cumulative Lifetime Unrecoverable Read Unique: 0
		Cum Lifetime Unrecoverable by head 16:
			Cumulative Lifetime Unrecoverable Read Repeating: 0
			Cumulative Lifetime Unrecoverable Read Unique: 0
	FARM Log Page 4: Environment Statistics
		Current Temperature (Celsius): 42
		Highest Temperature: 50
		Lowest Temperature: 26
		Average Short Term Temperature: 42
		Average Long Term Temperature: 42
		Highest Average Short Term Temperature: 45
		Lowest Average Short Term Temperature: 32
		Highest Average Long Term Temperature: 42
		Lowest Average Long Term Temperature: 34
		Time In Over Temperature (minutes): 0
		Time In Under Temperature (minutes): 0
		Specified Max Operating Temperature: 60
		Specified Min Operating Temperature: 5
		Current Relative Humidity: 0
		Current Motor Power: 3097
		Current 12 volts: 12.189
		Minimum 12 volts: 12.131
		Maximum 12 volts: 12.206
		Current 5 volts: 5.049
		Minimum 5 volts: 5.036
		Maximum 5 volts: 5.080
		12V Power Average: 0.000
		12V Power Minimum: 0.000
		12V Power Maximum: 0.000
		5V Power Average: 0.000
		5V Power Minimum: 0.000
		5V Power Maximum: 0.000
	FARM Log Page 5: Reliability Statistics
		Error Rate (SMART Attribute 1 Raw): 0x000000000c6b9530
		Error Rate (SMART Attribute 1 Normalized): 83
		Error Rate (SMART Attribute 1 Worst): 64
		Seek Error Rate (SMART Attr 7 Raw): 0x000000001eca16d8
		Seek Error Rate (SMART Attr 7 Normalized): 87
		Seek Error Rate (SMART Attr 7 Worst): 60
		High Priority Unload Events: 6
		Helium Pressure Threshold Tripped: 0
		LBAs Corrected By Parity Sector: 0
		DVGA Skip Write Detect by Head 0: 0
		DVGA Skip Write Detect by Head 1: 0
		DVGA Skip Write Detect by Head 2: 0
		DVGA Skip Write Detect by Head 3: 0
		DVGA Skip Write Detect by Head 4: 0
		DVGA Skip Write Detect by Head 5: 0
		DVGA Skip Write Detect by Head 6: 0
		DVGA Skip Write Detect by Head 7: 0
		DVGA Skip Write Detect by Head 8: 0
		DVGA Skip Write Detect by Head 9: 0
		DVGA Skip Write Detect by Head 10: 0
		DVGA Skip Write Detect by Head 11: 0
		DVGA Skip Write Detect by Head 12: 0
		DVGA Skip Write Detect by Head 13: 0
		DVGA Skip Write Detect by Head 14: 0
		DVGA Skip Write Detect by Head 15: 0
		DVGA Skip Write Detect by Head 16: 0
		RVGA Skip Write Detect by Head 0: 0
		RVGA Skip Write Detect by Head 1: 0
		RVGA Skip Write Detect by Head 2: 0
		RVGA Skip Write Detect by Head 3: 0
		RVGA Skip Write Detect by Head 4: 0
		RVGA Skip Write Detect by Head 5: 0
		RVGA Skip Write Detect by Head 6: 0
		RVGA Skip Write Detect by Head 7: 0
		RVGA Skip Write Detect by Head 8: 0
		RVGA Skip Write Detect by Head 9: 0
		RVGA Skip Write Detect by Head 10: 0
		RVGA Skip Write Detect by Head 11: 0
		RVGA Skip Write Detect by Head 12: 0
		RVGA Skip Write Detect by Head 13: 0
		RVGA Skip Write Detect by Head 14: 0
		RVGA Skip Write Detect by Head 15: 0
		RVGA Skip Write Detect by Head 16: 0
		FVGA Skip Write Detect by Head 0: 0
		FVGA Skip Write Detect by Head 1: 0
		FVGA Skip Write Detect by Head 2: 0
		FVGA Skip Write Detect by Head 3: 0
		FVGA Skip Write Detect by Head 4: 0
		FVGA Skip Write Detect by Head 5: 0
		FVGA Skip Write Detect by Head 6: 0
		FVGA Skip Write Detect by Head 7: 0
		FVGA Skip Write Detect by Head 8: 0
		FVGA Skip Write Detect by Head 9: 0
		FVGA Skip Write Detect by Head 10: 0
		FVGA Skip Write Detect by Head 11: 0
		FVGA Skip Write Detect by Head 12: 0
		FVGA Skip Write Detect by Head 13: 0
		FVGA Skip Write Detect by Head 14: 0
		FVGA Skip Write Detect by Head 15: 0
		FVGA Skip Write Detect by Head 16: 0
		Skip Write Detect Threshold Exceeded by Head 0: 0
		Skip Write Detect Threshold Exceeded by Head 1: 0
		Skip Write Detect Threshold Exceeded by Head 2: 0
		Skip Write Detect Threshold Exceeded by Head 3: 0
		Skip Write Detect Threshold Exceeded by Head 4: 0
		Skip Write Detect Threshold Exceeded by Head 5: 0
		Skip Write Detect Threshold Exceeded by Head 6: 0
		Skip Write Detect Threshold Exceeded by Head 7: 0
		Skip Write Detect Threshold Exceeded by Head 8: 0
		Skip Write Detect Threshold Exceeded by Head 9: 0
		Skip Write Detect Threshold Exceeded by Head 10: 0
		Skip Write Detect Threshold Exceeded by Head 11: 0
		Skip Write Detect Threshold Exceeded by Head 12: 0
		Skip Write Detect Threshold Exceeded by Head 13: 0
		Skip Write Detect Threshold Exceeded by Head 14: 0
		Skip Write Detect Threshold Exceeded by Head 15: 0
		Skip Write Detect Threshold Exceeded by Head 16: 0
		Write Power On (hrs) by Head 0: 15591
		Write Power On (hrs) by Head 1: 6915
		Write Power On (hrs) by Head 2: 7430
		Write Power On (hrs) by Head 3: 7000
		Write Power On (hrs) by Head 4: 7691
		Write Power On (hrs) by Head 5: 7064
		Write Power On (hrs) by Head 6: 7586
		Write Power On (hrs) by Head 7: 7239
		Write Power On (hrs) by Head 8: 7772
		Write Power On (hrs) by Head 9: 7591
		Write Power On (hrs) by Head 10: 9166
		Write Power On (hrs) by Head 11: 7081
		Write Power On (hrs) by Head 12: 5224
		Write Power On (hrs) by Head 13: 5228
		Write Power On (hrs) by Head 14: 40890
		Write Power On (hrs) by Head 15: 5047
		Write Power On (hrs) by Head 16: 5192
		MR Head Resistance from Head 0: 0
		MR Head Resistance from Head 1: 0
		MR Head Resistance from Head 2: 0
		MR Head Resistance from Head 3: 0
		MR Head Resistance from Head 4: 0
		MR Head Resistance from Head 5: 0
		MR Head Resistance from Head 6: 0
		MR Head Resistance from Head 7: 0
		MR Head Resistance from Head 8: 0
		MR Head Resistance from Head 9: 0
		MR Head Resistance from Head 10: 0
		MR Head Resistance from Head 11: 0
		MR Head Resistance from Head 12: 0
		MR Head Resistance from Head 13: 0
		MR Head Resistance from Head 14: 0
		MR Head Resistance from Head 15: 0
		MR Head Resistance from Head 16: 0
		Second MR Head Resistance by Head 0: 0
		Second MR Head Resistance by Head 1: 0
		Second MR Head Resistance by Head 2: 0
		Second MR Head Resistance by Head 3: 0
		Second MR Head Resistance by Head 4: 0
		Second MR Head Resistance by Head 5: 0
		Second MR Head Resistance by Head 6: 0
		Second MR Head Resistance by Head 7: 0
		Second MR Head Resistance by Head 8: 0
		Second MR Head Resistance by Head 9: 0
		Second MR Head Resistance by Head 10: 0
		Second MR Head Resistance by Head 11: 0
		Second MR Head Resistance by Head 12: 0
		Second MR Head Resistance by Head 13: 0
		Second MR Head Resistance by Head 14: 0
		Second MR Head Resistance by Head 15: 0
		Second MR Head Resistance by Head 16: 0
		Number of Reallocated Sectors by Head 0: 0
		Number of Reallocated Sectors by Head 1: 0
		Number of Reallocated Sectors by Head 2: 0
		Number of Reallocated Sectors by Head 3: 0
		Number of Reallocated Sectors by Head 4: 0
		Number of Reallocated Sectors by Head 5: 0
		Number of Reallocated Sectors by Head 6: 0
		Number of Reallocated Sectors by Head 7: 0
		Number of Reallocated Sectors by Head 8: 0
		Number of Reallocated Sectors by Head 9: 0
		Number of Reallocated Sectors by Head 10: 0
		Number of Reallocated Sectors by Head 11: 0
		Number of Reallocated Sectors by Head 12: 0
		Number of Reallocated Sectors by Head 13: 0
		Number of Reallocated Sectors by Head 14: 0
		Number of Reallocated Sectors by Head 15: 0
		Number of Reallocated Sectors by Head 16: 0
		Number of Reallocation Candidate Sectors by Head 0: 0
		Number of Reallocation Candidate Sectors by Head 1: 0
		Number of Reallocation Candidate Sectors by Head 2: 0
		Number of Reallocation Candidate Sectors by Head 3: 0
		Number of Reallocation Candidate Sectors by Head 4: 0
		Number of Reallocation Candidate Sectors by Head 5: 0
		Number of Reallocation Candidate Sectors by Head 6: 0
		Number of Reallocation Candidate Sectors by Head 7: 0
		Number of Reallocation Candidate Sectors by Head 8: 0
		Number of Reallocation Candidate Sectors by Head 9: 0
		Number of Reallocation Candidate Sectors by Head 10: 0
		Number of Reallocation Candidate Sectors by Head 11: 0
		Number of Reallocation Candidate Sectors by Head 12: 0
		Number of Reallocation Candidate Sectors by Head 13: 0
		Number of Reallocation Candidate Sectors by Head 14: 0
		Number of Reallocation Candidate Sectors by Head 15: 0
		Number of Reallocation Candidate Sectors by Head 16: 0

I can’t see any issues with this drive that would cause errors.

Anyone else got any ideas?