Hello, running TrueNAS Scale 22.12.2 on a dedicated server. Intel N5105 based system with 32gb of ram, 2x256gb SSD’s as the boot pool, and 3x16gb Seagate Exos X18 drives as my main storage pool in RAIDZ1. Been running for about 18 months flawlessly. Checked today and discovered one disk had faulted, and am unsure how to proceed with troubleshooting. I am backing up everything to an external drive just in case at the moment, and ran a scrub, but found zero errors as that disk appears to be offline. Could someone point me in the right direction to clear the error and get it back to 100%? I have a new matching drive on its way up to me now, but that’ll take a week or so.
Output of “sudo vpool status -v”:
admin@truenas[~]$ sudo zpool status -v
[sudo] password for admin:
pool: File Dump
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use ‘zpool clear’ to mark the device
repaired.
scan: scrub repaired 0B in 09:59:29 with 0 errors on Tue Feb 18 04:01:35 2025
config:
NAME STATE READ WRITE CKSUM
File Dump DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
cc8ed6ca-3a43-4228-89b1-20dc761a0235 ONLINE 0 0 0
a7850746-5736-403e-a29d-778a1db2ebcf ONLINE 0 0 0
ef694f04-9d55-49a2-9ad2-f7d4c419f832 FAULTED 63 300 0 too many errors
errors: No known data errors
pool: boot-pool
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using ‘zpool upgrade’. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 00:00:08 with 0 errors on Tue Feb 18 03:45:09 2025
config:
The tricky part will be identifying the failed disk.
You could look at the partition uid that you show in your cli output to match it to a device, then run smartctl to match that to a serial number.
/dev/disk/by-uuid has symlinks mapping the partition uid to the device, that seems like an easy way to find the device, and then from there the serial.
Ofc for all I know the UI view of the pool shows the serial right there, which would be easier
You are running without redundancy, so you definitely don’t want to pull the wrong drive.
Use this command, it should work fine: lsblk -o +PARTUUID,NAME,LABEL | grep -E "[a-z0-9]*-" | awk -F" " '{print $7" -> " $8}'
EDIT: A better way and it provides you the drive serial number as well, which is what you should be using to remove the failed drive. lsblk -o +PARTUUID,NAME,LABEL,SERIAL
Show you the partition uuid and serial of the drive, so you can map one to the other.
Then when you are at “replace physical drive” in the instructions, you can power down the server, find the drive with that serial, triple check that’s really the one that failed, and replace it with the new drive.
After which you power it on again and continue with the drive replacement steps in the docs.
Tried rebooting system (been 6 months or more). It did re-sliver with the following result:
admin@truenas[~]$ sudo zpool status -v
pool: File Dump
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using ‘zpool clear’ or replace the device with ‘zpool replace’.
see: Message ID: ZFS-8000-9P — OpenZFS documentation
scan: resilvered 6.64G in 00:03:36 with 0 errors on Tue Feb 18 19:00:35 2025
config:
sudo smartctl -x /dev/sdX where X is the disk letter for the disk with the failing partuuid on it.
This will tell us which disk is which in the pool and what type of disk it is (so we can check whether it is SMR or not) and give us the SMART data for that disk.
</
admin@truenas[~]$ lsblk -bo NAME,MODEL,ROTA,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
NAME MODEL ROTA PTTYPE TYPE START SIZE PARTTYPENAME PARTUUID
sda ST160 1 gpt disk 16000900661248
├─sda1
│ 1 gpt part 128 2147418624 Linux swap 04751ea2-b116-41f3-89af-00de2367dc23
└─sda2
1 gpt part 4194432 15998753095168 Solaris /usr & Apple ZFS
a7850746-5736-403e-a29d-778a1db2ebcf
sdb ST160 1 gpt disk 16000900661248
├─sdb1
│ 1 gpt part 128 2147418624 Linux swap 57936b98-5755-4b72-b49c-759ff757778b
└─sdb2
1 gpt part 4194432 15998753095168 Solaris /usr & Apple ZFS
ef694f04-9d55-49a2-9ad2-f7d4c419f832
sdc ST160 1 gpt disk 16000900661248
├─sdc1
│ 1 gpt part 128 2147418624 Linux swap 77a86ba3-821a-4266-8bf7-0b7fc9cee928
└─sdc2
1 gpt part 4194432 15998753095168 Solaris /usr & Apple ZFS
cc8ed6ca-3a43-4228-89b1-20dc761a0235
nvme1n1
PNY C 0 gpt disk 250059350016
├─nvme1n1p1
│ 0 gpt part 4096 1048576 BIOS boot 529a5496-fd10-4a91-93ce-3d64be232846
├─nvme1n1p2
│ 0 gpt part 6144 536870912 EFI System 2adc40dd-fc6e-4321-8565-5cfe89b3e974
├─nvme1n1p3
│ 0 gpt part 34609152 232339447296 Solaris /usr & Apple ZFS
│ 2c08764c-9b56-4535-9648-dd5336eb11d3
└─nvme1n1p4
0 gpt part 1054720 17179869184 Linux swap 4fcd654b-aad5-4b01-ae73-f547605849d3
nvme0n1
PNY C 0 gpt disk 250059350016
├─nvme0n1p1
│ 0 gpt part 4096 1048576 BIOS boot 65d7ecdf-e708-456c-a2a8-31dfd2b02cae
├─nvme0n1p2
│ 0 gpt part 6144 536870912 EFI System dae7c0a9-c46c-4410-9e12-01ad3aba4bf4
├─nvme0n1p3
│ 0 gpt part 34609152 232339447296 Solaris /usr & Apple ZFS
│ b33e0983-8eb0-4291-8ec9-c3ca85dcb73f
└─nvme0n1p4
0 gpt part 1054720 17179869184 Linux swap 6fccc6a1-d5c2-4e26-8395-d158793e9488
admin@truenas[~]$
>```
With the formatting fixed, I now spotted something else.
I note that SMART tests have only just been run.
You should set up regular short and long smart tests on all drives, as well as regular scrubs, and implement @joeschmuck’s (yes the same Joe Schmuck that fixed the formatting) Multi-Report script to tell you when things start to go wrong.
Here is the output you requested, hopefully the formatting works fine this time (I tried initially, but couldn’t figure out how to get it in the box):
I have now set up regular tests on the drives now - the SMART tests did not want to run on my earlier distro of Truenas SCALE. I have also upgraded the install to 24.10.2 distribution. I had been reluctant to upgrade to avoid borking an otherwise working stable system.
admin@truenas[~]$ sudo smartctl -l farm /dev/sdb
[sudo] password for admin:
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
Seagate Field Access Reliability Metrics log (FARM) (GP Log 0xa6)
FARM Log Page 0: Log Header
FARM Log Version: 4.19
Pages Supported: 6
Log Size: 98304
Page Size: 16384
Heads Supported: 24
Number of Copies: 0
Reason for Frame Capture: 0
FARM Log Page 1: Drive Information
Serial Number: ZR60NVXT
World Wide Name: 0x5000c500e4ba1806
Device Interface: SATA
Device Capacity in Sectors: 31251759104
Physical Sector Size: 4096
Logical Sector Size: 512
Device Buffer Size: 268435456
Number of Heads: 17
Device Form Factor: 3.5 inches
Rotation Rate: 7200 rpm
Firmware Rev: SN02
ATA Security State (ID Word 128): 0x01629
ATA Features Supported (ID Word 78): 0x016cc
ATA Features Enabled (ID Word 79): 0x0000000000000044
Power on Hours: 15520
Spindle Power on Hours: 14485
Head Flight Hours: 14484
Head Load Events: 617
Power Cycle Count: 17
Hardware Reset Count: 53
Spin-up Time: 10 ms
Time to ready of the last power cycle: 20518 ms
Time drive is held in staggered spin: 21741 ms
Model Number: ST16000NM000J-2TW103
Drive Recording Type: CMR
Max Number of Available Sectors for Reassignment: 16384
Assembly Date (YYWW): 2261
Depopulation Head Mask: 0
FARM Log Page 2: Workload Statistics
Total Number of Read Commands: 2167278580
Total Number of Write Commands: 968363984
Total Number of Random Read Commands: 21666326
Total Number of Random Write Commands: 959333852
Total Number Of Other Commands: 27789091
Logical Sectors Written: 46496452219
Logical Sectors Read: 2873404318438
Number of dither events during current power cycle: 161
Number of times dither was held off during random workloads: 37640
Number of times dither was held off during sequential workloads: 233819
Number of Read commands from 0-3.125% of LBA space for last 3 SMART Summary Frames: 8450999
Number of Read commands from 3.125-25% of LBA space for last 3 SMART Summary Frames: 14811083
Number of Read commands from 25-75% of LBA space for last 3 SMART Summary Frames: 15952368
Number of Read commands from 75-100% of LBA space for last 3 SMART Summary Frames: 9736244
Number of Write commands from 0-3.125% of LBA space for last 3 SMART Summary Frames: 388297
Number of Write commands from 3.125-25% of LBA space for last 3 SMART Summary Frames: 0
Number of Write commands from 25-75% of LBA space for last 3 SMART Summary Frames: 0
Number of Write commands from 75-100% of LBA space for last 3 SMART Summary Frames: 19775087
FARM Log Page 3: Error Statistics
Unrecoverable Read Errors: 0
Unrecoverable Write Errors: 0
Number of Reallocated Sectors: 0
Number of Read Recovery Attempts: 0
Number of Mechanical Start Failures: 0
Number of Reallocated Candidate Sectors: 0
Number of ASR Events: 14
Number of Interface CRC Errors: 0
Spin Retry Count: 0
Spin Retry Count Normalized: 100
Spin Retry Count Worst: 100
Number of IOEDC Errors (Raw): 0
CTO Count Total: 0
CTO Count Over 5s: 0
CTO Count Over 7.5s: 0
Total Flash LED (Assert) Events: 0
Index of the last Flash LED: 0
Flash LED Event 0:
Event Information: 0x0000000000000000
Timestamp of Event 0 (hours): 0
Power Cycle Event 0: 0
Flash LED Event 1:
Event Information: 0x0000000000000000
Timestamp of Event 1 (hours): 0
Power Cycle Event 1: 0
Flash LED Event 2:
Event Information: 0x0000000000000000
Timestamp of Event 2 (hours): 0
Power Cycle Event 2: 0
Flash LED Event 3:
Event Information: 0x0000000000000000
Timestamp of Event 3 (hours): 0
Power Cycle Event 3: 0
Flash LED Event 4:
Event Information: 0x0000000000000000
Timestamp of Event 4 (hours): 0
Power Cycle Event 4: 0
Flash LED Event 5:
Event Information: 0x0000000000000000
Timestamp of Event 5 (hours): 0
Power Cycle Event 5: 0
Flash LED Event 6:
Event Information: 0x0000000000000000
Timestamp of Event 6 (hours): 0
Power Cycle Event 6: 0
Flash LED Event 7:
Event Information: 0x0000000000000000
Timestamp of Event 7 (hours): 0
Power Cycle Event 7: 0
Uncorrectable errors: 0
Cumulative Lifetime Unrecoverable Read errors due to ERC: 0
Cum Lifetime Unrecoverable by head 0:
Cumulative Lifetime Unrecoverable Read Repeating: 0
Cumulative Lifetime Unrecoverable Read Unique: 0
Cum Lifetime Unrecoverable by head 1:
Cumulative Lifetime Unrecoverable Read Repeating: 0
Cumulative Lifetime Unrecoverable Read Unique: 0
Cum Lifetime Unrecoverable by head 2:
Cumulative Lifetime Unrecoverable Read Repeating: 0
Cumulative Lifetime Unrecoverable Read Unique: 0
Cum Lifetime Unrecoverable by head 3:
Cumulative Lifetime Unrecoverable Read Repeating: 0
Cumulative Lifetime Unrecoverable Read Unique: 0
Cum Lifetime Unrecoverable by head 4:
Cumulative Lifetime Unrecoverable Read Repeating: 0
Cumulative Lifetime Unrecoverable Read Unique: 0
Cum Lifetime Unrecoverable by head 5:
Cumulative Lifetime Unrecoverable Read Repeating: 0
Cumulative Lifetime Unrecoverable Read Unique: 0
Cum Lifetime Unrecoverable by head 6:
Cumulative Lifetime Unrecoverable Read Repeating: 0
Cumulative Lifetime Unrecoverable Read Unique: 0
Cum Lifetime Unrecoverable by head 7:
Cumulative Lifetime Unrecoverable Read Repeating: 0
Cumulative Lifetime Unrecoverable Read Unique: 0
Cum Lifetime Unrecoverable by head 8:
Cumulative Lifetime Unrecoverable Read Repeating: 0
Cumulative Lifetime Unrecoverable Read Unique: 0
Cum Lifetime Unrecoverable by head 9:
Cumulative Lifetime Unrecoverable Read Repeating: 0
Cumulative Lifetime Unrecoverable Read Unique: 0
Cum Lifetime Unrecoverable by head 10:
Cumulative Lifetime Unrecoverable Read Repeating: 0
Cumulative Lifetime Unrecoverable Read Unique: 0
Cum Lifetime Unrecoverable by head 11:
Cumulative Lifetime Unrecoverable Read Repeating: 0
Cumulative Lifetime Unrecoverable Read Unique: 0
Cum Lifetime Unrecoverable by head 12:
Cumulative Lifetime Unrecoverable Read Repeating: 0
Cumulative Lifetime Unrecoverable Read Unique: 0
Cum Lifetime Unrecoverable by head 13:
Cumulative Lifetime Unrecoverable Read Repeating: 0
Cumulative Lifetime Unrecoverable Read Unique: 0
Cum Lifetime Unrecoverable by head 14:
Cumulative Lifetime Unrecoverable Read Repeating: 0
Cumulative Lifetime Unrecoverable Read Unique: 0
Cum Lifetime Unrecoverable by head 15:
Cumulative Lifetime Unrecoverable Read Repeating: 0
Cumulative Lifetime Unrecoverable Read Unique: 0
Cum Lifetime Unrecoverable by head 16:
Cumulative Lifetime Unrecoverable Read Repeating: 0
Cumulative Lifetime Unrecoverable Read Unique: 0
FARM Log Page 4: Environment Statistics
Current Temperature (Celsius): 42
Highest Temperature: 50
Lowest Temperature: 26
Average Short Term Temperature: 42
Average Long Term Temperature: 42
Highest Average Short Term Temperature: 45
Lowest Average Short Term Temperature: 32
Highest Average Long Term Temperature: 42
Lowest Average Long Term Temperature: 34
Time In Over Temperature (minutes): 0
Time In Under Temperature (minutes): 0
Specified Max Operating Temperature: 60
Specified Min Operating Temperature: 5
Current Relative Humidity: 0
Current Motor Power: 3097
Current 12 volts: 12.189
Minimum 12 volts: 12.131
Maximum 12 volts: 12.206
Current 5 volts: 5.049
Minimum 5 volts: 5.036
Maximum 5 volts: 5.080
12V Power Average: 0.000
12V Power Minimum: 0.000
12V Power Maximum: 0.000
5V Power Average: 0.000
5V Power Minimum: 0.000
5V Power Maximum: 0.000
FARM Log Page 5: Reliability Statistics
Error Rate (SMART Attribute 1 Raw): 0x000000000c6b9530
Error Rate (SMART Attribute 1 Normalized): 83
Error Rate (SMART Attribute 1 Worst): 64
Seek Error Rate (SMART Attr 7 Raw): 0x000000001eca16d8
Seek Error Rate (SMART Attr 7 Normalized): 87
Seek Error Rate (SMART Attr 7 Worst): 60
High Priority Unload Events: 6
Helium Pressure Threshold Tripped: 0
LBAs Corrected By Parity Sector: 0
DVGA Skip Write Detect by Head 0: 0
DVGA Skip Write Detect by Head 1: 0
DVGA Skip Write Detect by Head 2: 0
DVGA Skip Write Detect by Head 3: 0
DVGA Skip Write Detect by Head 4: 0
DVGA Skip Write Detect by Head 5: 0
DVGA Skip Write Detect by Head 6: 0
DVGA Skip Write Detect by Head 7: 0
DVGA Skip Write Detect by Head 8: 0
DVGA Skip Write Detect by Head 9: 0
DVGA Skip Write Detect by Head 10: 0
DVGA Skip Write Detect by Head 11: 0
DVGA Skip Write Detect by Head 12: 0
DVGA Skip Write Detect by Head 13: 0
DVGA Skip Write Detect by Head 14: 0
DVGA Skip Write Detect by Head 15: 0
DVGA Skip Write Detect by Head 16: 0
RVGA Skip Write Detect by Head 0: 0
RVGA Skip Write Detect by Head 1: 0
RVGA Skip Write Detect by Head 2: 0
RVGA Skip Write Detect by Head 3: 0
RVGA Skip Write Detect by Head 4: 0
RVGA Skip Write Detect by Head 5: 0
RVGA Skip Write Detect by Head 6: 0
RVGA Skip Write Detect by Head 7: 0
RVGA Skip Write Detect by Head 8: 0
RVGA Skip Write Detect by Head 9: 0
RVGA Skip Write Detect by Head 10: 0
RVGA Skip Write Detect by Head 11: 0
RVGA Skip Write Detect by Head 12: 0
RVGA Skip Write Detect by Head 13: 0
RVGA Skip Write Detect by Head 14: 0
RVGA Skip Write Detect by Head 15: 0
RVGA Skip Write Detect by Head 16: 0
FVGA Skip Write Detect by Head 0: 0
FVGA Skip Write Detect by Head 1: 0
FVGA Skip Write Detect by Head 2: 0
FVGA Skip Write Detect by Head 3: 0
FVGA Skip Write Detect by Head 4: 0
FVGA Skip Write Detect by Head 5: 0
FVGA Skip Write Detect by Head 6: 0
FVGA Skip Write Detect by Head 7: 0
FVGA Skip Write Detect by Head 8: 0
FVGA Skip Write Detect by Head 9: 0
FVGA Skip Write Detect by Head 10: 0
FVGA Skip Write Detect by Head 11: 0
FVGA Skip Write Detect by Head 12: 0
FVGA Skip Write Detect by Head 13: 0
FVGA Skip Write Detect by Head 14: 0
FVGA Skip Write Detect by Head 15: 0
FVGA Skip Write Detect by Head 16: 0
Skip Write Detect Threshold Exceeded by Head 0: 0
Skip Write Detect Threshold Exceeded by Head 1: 0
Skip Write Detect Threshold Exceeded by Head 2: 0
Skip Write Detect Threshold Exceeded by Head 3: 0
Skip Write Detect Threshold Exceeded by Head 4: 0
Skip Write Detect Threshold Exceeded by Head 5: 0
Skip Write Detect Threshold Exceeded by Head 6: 0
Skip Write Detect Threshold Exceeded by Head 7: 0
Skip Write Detect Threshold Exceeded by Head 8: 0
Skip Write Detect Threshold Exceeded by Head 9: 0
Skip Write Detect Threshold Exceeded by Head 10: 0
Skip Write Detect Threshold Exceeded by Head 11: 0
Skip Write Detect Threshold Exceeded by Head 12: 0
Skip Write Detect Threshold Exceeded by Head 13: 0
Skip Write Detect Threshold Exceeded by Head 14: 0
Skip Write Detect Threshold Exceeded by Head 15: 0
Skip Write Detect Threshold Exceeded by Head 16: 0
Write Power On (hrs) by Head 0: 15591
Write Power On (hrs) by Head 1: 6915
Write Power On (hrs) by Head 2: 7430
Write Power On (hrs) by Head 3: 7000
Write Power On (hrs) by Head 4: 7691
Write Power On (hrs) by Head 5: 7064
Write Power On (hrs) by Head 6: 7586
Write Power On (hrs) by Head 7: 7239
Write Power On (hrs) by Head 8: 7772
Write Power On (hrs) by Head 9: 7591
Write Power On (hrs) by Head 10: 9166
Write Power On (hrs) by Head 11: 7081
Write Power On (hrs) by Head 12: 5224
Write Power On (hrs) by Head 13: 5228
Write Power On (hrs) by Head 14: 40890
Write Power On (hrs) by Head 15: 5047
Write Power On (hrs) by Head 16: 5192
MR Head Resistance from Head 0: 0
MR Head Resistance from Head 1: 0
MR Head Resistance from Head 2: 0
MR Head Resistance from Head 3: 0
MR Head Resistance from Head 4: 0
MR Head Resistance from Head 5: 0
MR Head Resistance from Head 6: 0
MR Head Resistance from Head 7: 0
MR Head Resistance from Head 8: 0
MR Head Resistance from Head 9: 0
MR Head Resistance from Head 10: 0
MR Head Resistance from Head 11: 0
MR Head Resistance from Head 12: 0
MR Head Resistance from Head 13: 0
MR Head Resistance from Head 14: 0
MR Head Resistance from Head 15: 0
MR Head Resistance from Head 16: 0
Second MR Head Resistance by Head 0: 0
Second MR Head Resistance by Head 1: 0
Second MR Head Resistance by Head 2: 0
Second MR Head Resistance by Head 3: 0
Second MR Head Resistance by Head 4: 0
Second MR Head Resistance by Head 5: 0
Second MR Head Resistance by Head 6: 0
Second MR Head Resistance by Head 7: 0
Second MR Head Resistance by Head 8: 0
Second MR Head Resistance by Head 9: 0
Second MR Head Resistance by Head 10: 0
Second MR Head Resistance by Head 11: 0
Second MR Head Resistance by Head 12: 0
Second MR Head Resistance by Head 13: 0
Second MR Head Resistance by Head 14: 0
Second MR Head Resistance by Head 15: 0
Second MR Head Resistance by Head 16: 0
Number of Reallocated Sectors by Head 0: 0
Number of Reallocated Sectors by Head 1: 0
Number of Reallocated Sectors by Head 2: 0
Number of Reallocated Sectors by Head 3: 0
Number of Reallocated Sectors by Head 4: 0
Number of Reallocated Sectors by Head 5: 0
Number of Reallocated Sectors by Head 6: 0
Number of Reallocated Sectors by Head 7: 0
Number of Reallocated Sectors by Head 8: 0
Number of Reallocated Sectors by Head 9: 0
Number of Reallocated Sectors by Head 10: 0
Number of Reallocated Sectors by Head 11: 0
Number of Reallocated Sectors by Head 12: 0
Number of Reallocated Sectors by Head 13: 0
Number of Reallocated Sectors by Head 14: 0
Number of Reallocated Sectors by Head 15: 0
Number of Reallocated Sectors by Head 16: 0
Number of Reallocation Candidate Sectors by Head 0: 0
Number of Reallocation Candidate Sectors by Head 1: 0
Number of Reallocation Candidate Sectors by Head 2: 0
Number of Reallocation Candidate Sectors by Head 3: 0
Number of Reallocation Candidate Sectors by Head 4: 0
Number of Reallocation Candidate Sectors by Head 5: 0
Number of Reallocation Candidate Sectors by Head 6: 0
Number of Reallocation Candidate Sectors by Head 7: 0
Number of Reallocation Candidate Sectors by Head 8: 0
Number of Reallocation Candidate Sectors by Head 9: 0
Number of Reallocation Candidate Sectors by Head 10: 0
Number of Reallocation Candidate Sectors by Head 11: 0
Number of Reallocation Candidate Sectors by Head 12: 0
Number of Reallocation Candidate Sectors by Head 13: 0
Number of Reallocation Candidate Sectors by Head 14: 0
Number of Reallocation Candidate Sectors by Head 15: 0
Number of Reallocation Candidate Sectors by Head 16: 0