Hello there,
after reboot my boot pool got errors. SMART tests ended with PASSED.
TrueNAS Log:
Jun 26 02:52:22 njetflix syslog-ng[3248]: Error suspend timeout has elapsed, attempting to write again; fd='79'
Jun 26 02:52:22 njetflix syslog-ng[3248]: Suspending write operation because of an I/O error; fd='79', time_reopen='60'
Jun 26 02:52:48 njetflix syslog-ng[3248]: Error suspend timeout has elapsed, attempting to write again; fd='82'
Jun 26 02:52:48 njetflix syslog-ng[3248]: Suspending write operation because of an I/O error; fd='82', time_reopen='60'
Jun 26 02:53:22 njetflix syslog-ng[3248]: Error suspend timeout has elapsed, attempting to write again; fd='79'
Jun 26 02:53:22 njetflix syslog-ng[3248]: Suspending write operation because of an I/O error; fd='79', time_reopen='60'
Jun 26 02:53:48 njetflix syslog-ng[3248]: Error suspend timeout has elapsed, attempting to write again; fd='82'
Jun 26 02:53:49 njetflix syslog-ng[3248]: Suspending write operation because of an I/O error; fd='82', time_reopen='60'
Jun 26 02:54:22 njetflix syslog-ng[3248]: Error suspend timeout has elapsed, attempting to write again; fd='79'
Jun 26 02:54:22 njetflix syslog-ng[3248]: Suspending write operation because of an I/O error; fd='79', time_reopen='60'
Jun 26 02:54:49 njetflix syslog-ng[3248]: Error suspend timeout has elapsed, attempting to write again; fd='82'
Jun 26 02:54:49 njetflix syslog-ng[3248]: Suspending write operation because of an I/O error; fd='82', time_reopen='60'
Jun 26 02:55:22 njetflix syslog-ng[3248]: Error suspend timeout has elapsed, attempting to write again; fd='79'
Jun 26 02:55:22 njetflix syslog-ng[3248]: Suspending write operation because of an I/O error; fd='79', time_reopen='60'
Jun 26 02:55:49 njetflix syslog-ng[3248]: Error suspend timeout has elapsed, attempting to write again; fd='82'
Jun 26 02:55:49 njetflix syslog-ng[3248]: Suspending write operation because of an I/O error; fd='82', time_reopen='60'
Jun 26 02:56:22 njetflix syslog-ng[3248]: Error suspend timeout has elapsed, attempting to write again; fd='79'
Jun 26 02:56:22 njetflix syslog-ng[3248]: Suspending write operation because of an I/O error; fd='79', time_reopen='60'
Jun 26 02:56:49 njetflix syslog-ng[3248]: Error suspend timeout has elapsed, attempting to write again; fd='82'
Jun 26 02:56:49 njetflix syslog-ng[3248]: Suspending write operation because of an I/O error; fd='82', time_reopen='60'
Jun 26 02:57:22 njetflix syslog-ng[3248]: Error suspend timeout has elapsed, attempting to write again; fd='79'
Jun 26 02:57:23 njetflix syslog-ng[3248]: Suspending write operation because of an I/O error; fd='79', time_reopen='60'
Jun 26 02:57:49 njetflix syslog-ng[3248]: Error suspend timeout has elapsed, attempting to write again; fd='82'
Jun 26 02:57:49 njetflix syslog-ng[3248]: Suspending write operation because of an I/O error; fd='82', time_reopen='60'
Jun 26 02:58:23 njetflix syslog-ng[3248]: Error suspend timeout has elapsed, attempting to write again; fd='79'
Jun 26 02:58:23 njetflix syslog-ng[3248]: Suspending write operation because of an I/O error; fd='79', time_reopen='60'
Jun 26 02:58:49 njetflix syslog-ng[3248]: Error suspend timeout has elapsed, attempting to write again; fd='82'
Jun 26 02:58:49 njetflix syslog-ng[3248]: Suspending write operation because of an I/O error; fd='82', time_reopen='60'
pool: boot-pool
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:00:08 with 55 errors on Thu Jun 26 01:38:30 2025
config:
NAME STATE READ WRITE CKSUM
boot-pool DEGRADED 0 0 0
sdd3 DEGRADED 0 0 334 too many errors
errors: Permanent errors have been detected in the following files:
/audit/syslog-ng-00002.rqf
/audit/SYSTEM.db-wal
/audit/MIDDLEWARE.db-wal
/root/.zsh-histfile
/var/log/samba4/log.samba-dcerpcd
/var/log/auth.log
/var/log/sysstat/sa24
/var/log/audit/audit.log
/var/log/syslog
/var/log/kern.log
/var/log/debug
/var/log/samba4/log.wb-TRUENAS
/var/log/audit/audit.log.1
/var/log/sysstat/sa25
/var/log/daemon.log
/var/log/error
/var/log/sysstat/sa23
/var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/ba5929460e6b0e402582875c5daa6bc9365206416ecc0b762b9162460faa5f4f
/var/lib/containerd/io.containerd.metadata.v1.bolt/meta.db
/var/lib/dhcp/dhclient.leases.enp7s0
/var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/9069cad4ed2dcec942d9b889ffc4583a46c38752ccd900c5f5c71b6eddbbb07b
truenas_admin@njetflix[~]$ sudo smartctl -a -x /dev/sdd
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: INTEL SSDSCKHW120A4
Serial Number: CVDA51320168120Q
LU WWN Device Id: 5 5cd2e4 04bfc1691
Firmware Version: DC31
User Capacity: 120,034,123,776 bytes [120 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
TRIM Command: Available, deterministic
Device is: Not in smartctl database 7.3/5706
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Jun 26 03:06:22 2025 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Disabled
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x05) Offline data collection activity
was aborted by an interrupting command from host.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 33) The self-test routine was interrupted
by the host with a hard or soft reset.
Total time to complete Offline
data collection: ( 2930) seconds.
Offline data collection
capabilities: (0x7f) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Abort Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 48) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x0025) SCT Status supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
5 Reallocated_Sector_Ct -O--CK 100 100 000 - 0
9 Power_On_Hours -O--CK 100 100 000 - 1371 (67 6 0)
12 Power_Cycle_Count -O--CK 099 099 000 - 1566
170 Unknown_Attribute PO--CK 100 100 010 - 0
171 Unknown_Attribute -O--CK 100 100 000 - 0
172 Unknown_Attribute -O--CK 100 100 000 - 0
174 Unknown_Attribute -O--CK 100 100 000 - 23
183 Runtime_Bad_Block -O--CK 100 100 000 - 9
184 End-to-End_Error PO--CK 100 100 090 - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
190 Airflow_Temperature_Cel -O--CK 034 055 000 - 34 (Min/Max -21/55)
192 Power-Off_Retract_Count -O--CK 100 100 000 - 23
199 UDMA_CRC_Error_Count -O--CK 100 100 000 - 0
225 Unknown_SSD_Attribute -O--CK 100 100 000 - 178891
226 Unknown_SSD_Attribute -O--CK 100 100 000 - 65535
227 Unknown_SSD_Attribute -O--CK 100 100 000 - 50
228 Power-off_Retract_Count -O--CK 100 100 000 - 65535
232 Available_Reservd_Space PO--CK 100 100 010 - 0
233 Media_Wearout_Indicator -O--CK 100 100 000 - 0
241 Total_LBAs_Written -O--CK 100 100 000 - 178891
242 Total_LBAs_Read -O--CK 100 100 000 - 184075
249 Unknown_Attribute -O--CK 100 100 000 - 11559
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x04 GPL,SL R/O 1 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL,SL R/O 1 SATA Phy Event Counters log
0x30 GPL,SL R/O 16 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xb7 GPL,SL VS 16 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log (GP Log 0x03) not supported
SMART Error Log not supported
SMART Extended Self-test Log Version: 0 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Offline Interrupted (host reset) 10% 1371 -
# 2 Extended offline Completed without error 00% 1370 -
# 3 Extended offline Completed without error 00% 1370 -
# 4 Offline Interrupted (host reset) 10% 1369 -
# 5 Extended offline Completed without error 00% 1369 -
# 6 Short offline Completed without error 00% 1364 -
# 7 Offline Interrupted (host reset) 10% 1363 -
# 8 Offline Interrupted (host reset) 10% 1363 -
# 9 Offline Interrupted (host reset) 10% 1344 -
#10 Short offline Completed without error 00% 1344 -
#11 Offline Interrupted (host reset) 10% 1337 -
#12 Offline Interrupted (host reset) 10% 1320 -
#13 Conveyance offline Completed without error 00% 1320 -
#14 Offline Interrupted (host reset) 10% 1320 -
#15 Offline Interrupted (host reset) 10% 1288 -
#16 Offline Interrupted (host reset) 10% 1284 -
#17 Offline Interrupted (host reset) 10% 1230 -
#18 Offline Interrupted (host reset) 10% 1230 -
#19 Offline Interrupted (host reset) 10% 1185 -
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Offline Interrupted (host reset) 10% 1371 -
# 2 Extended offline Completed without error 00% 1370 -
# 3 Extended offline Completed without error 00% 1370 -
# 4 Offline Interrupted (host reset) 10% 1369 -
# 5 Extended offline Completed without error 00% 1369 -
# 6 Short offline Completed without error 00% 1364 -
# 7 Offline Interrupted (host reset) 10% 1363 -
# 8 Offline Interrupted (host reset) 10% 1363 -
# 9 Offline Interrupted (host reset) 10% 1344 -
#10 Short offline Completed without error 00% 1344 -
#11 Offline Interrupted (host reset) 10% 1337 -
#12 Offline Interrupted (host reset) 10% 1320 -
#13 Conveyance offline Completed without error 00% 1320 -
#14 Offline Interrupted (host reset) 10% 1320 -
#15 Offline Interrupted (host reset) 10% 1288 -
#16 Offline Interrupted (host reset) 10% 1284 -
#17 Offline Interrupted (host reset) 10% 1230 -
#18 Offline Interrupted (host reset) 10% 1230 -
#19 Offline Interrupted (host reset) 10% 1185 -
#20 Offline Interrupted (host reset) 10% 1165 -
#21 Offline Interrupted (host reset) 10% 1115 -
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 0 (0x0000)
Device State: Active (0)
Current Temperature: 34 Celsius
Power Cycle Min/Max Temperature: -21/55 Celsius
Lifetime Min/Max Temperature: -21/66 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Temperature History Version: 0 (Unknown, should be 2)
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 10 minutes
Min/Max recommended Temperature: 0/ 0 Celsius
Min/Max Temperature Limit: 0/ 0 Celsius
Temperature History Size (Index): 0 (410)
Temperature History is empty
SCT Error Recovery Control command not supported
Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x01 ===== = = === == General Statistics (rev 2) ==
0x01 0x008 4 1580 --- Lifetime Power-On Resets
0x01 0x010 4 1378 --- Power-on Hours
0x01 0x018 6 13075412317 --- Logical Sectors Written
0x01 0x028 6 13592132470 --- Logical Sectors Read
0x04 ===== = = === == General Errors Statistics (rev 1) ==
0x04 0x008 4 0 --- Number of Reported Uncorrectable Errors
0x04 0x010 4 3798 --- Resets Between Cmd Acceptance and Completion
0x05 ===== = = === == Temperature Statistics (rev 1) ==
0x05 0x008 1 34 --- Current Temperature
0x05 0x010 1 36 --- Average Short Term Temperature
0x05 0x018 1 - --- Average Long Term Temperature
0x05 0x020 1 49 --- Highest Temperature
0x05 0x028 1 21 --- Lowest Temperature
0x05 0x030 1 36 --- Highest Average Short Term Temperature
0x05 0x038 1 31 --- Lowest Average Short Term Temperature
0x05 0x040 1 - --- Highest Average Long Term Temperature
0x05 0x048 1 - --- Lowest Average Long Term Temperature
0x05 0x050 4 0 --- Time in Over-Temperature
0x05 0x058 1 70 --- Specified Maximum Operating Temperature
0x05 0x060 4 0 --- Time in Under-Temperature
0x05 0x068 1 0 --- Specified Minimum Operating Temperature
0x06 ===== = = === == Transport Statistics (rev 1) ==
0x06 0x008 4 3798 --- Number of Hardware Resets
0x06 0x010 4 4532 --- Number of ASR Events
0x06 0x018 4 0 --- Number of Interface CRC Errors
0x07 ===== = = === == Solid State Device Statistics (rev 1) ==
0x07 0x008 1 4 --- Percentage Used Endurance Indicator
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
Pending Defects log (GP Log 0x0c) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 0 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 2 Device-to-host register FISes sent due to a COMRESET
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0010 2 0 R_ERR response for host-to-device data FIS, non-CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x0013 2 0 R_ERR response for host-to-device non-data FIS, non-CRC
0x0002 2 0 R_ERR response for data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x000b 2 0 CRC errors within host-to-device FIS
0x000d 2 0 Non-CRC errors within host-to-device FIS