HI, and thanks for reading another “pool degraded” topic. I have 4x 4TB pool in raidz1 on the latest SCALE. One disk appears to be degraded after my scheduled scrub.
root@files[/home/admin]# zpool status -v pool4x4
pool: pool4x4
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub repaired 0B in 04:21:03 with 0 errors on Sun Jun 23 13:34:30 2024
config:
NAME STATE READ WRITE CKSUM
pool4x4 DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
b2c1bf84-b8a0-11ed-a86b-107b44191d69 ONLINE 0 0 0
b2cb2916-b8a0-11ed-a86b-107b44191d69 FAULTED 0 25 0 too many errors
b2d40fb7-b8a0-11ed-a86b-107b44191d69 ONLINE 0 0 0
b2e0084f-b8a0-11ed-a86b-107b44191d69 ONLINE 0 0 0
errors: No known data errors
As you can see, there are only write errors. I am no expert on smartctl, but it says PASSED.
SMART overall-health self-assessment test result: PASSED
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 1699 -
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 172 Command failed due to ICRC error
0x0002 2 1244 R_ERR response for data FIS
0x0003 2 177 R_ERR response for device-to-host data FIS
0x0004 2 1067 R_ERR response for host-to-device data FIS
0x0005 2 1822 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 1822 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 4584 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 4606 Device-to-host register FISes sent due to a COMRESET
0x000b 2 1505 CRC errors within host-to-device FIS
0x000d 2 1384 Non-CRC errors within host-to-device FIS
0x000f 2 1025 R_ERR response for host-to-device data FIS, CRC
0x0012 2 480 R_ERR response for host-to-device non-data FIS, CRC
I assume the drive is OK. Any idead what the cause might be? HBA, cables, RAM, software? Or is it really the drive?