I have a stock Truenas Mini X+ running TrueNAS-SCALE-23.10.2 containing 2 4-month-old 14TB Red Pro drives configured in a mirror that have been running fine for those 4 months with regular scrub and SMART tests passing. Midway through a cloud sync task today when both drives are still physically inside the enclosure, I got an email alert saying
ZFS has detected that a device was removed.
impact: Fault tolerance of the pool may be compromised.
eid: 31
class: statechange
state: REMOVED
host: truenas
time: 2024-08-24 15:36:21-0400
vpath: /dev/disk/by-partuuid/1322e964-85c3-40b4-87ad-5d8f1cbdb325
vguid: 0x10A1847C42757226
pool: pool-1 (0x3862E2ECDAE59FE6)
and then
Pool pool-1 state is DEGRADED: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The following devices are not healthy:
Disk WDC_WD142KFGX-68AFPN0 6AGHKUNX is REMOVED
Current alerts:
Pool pool-1 state is DEGRADED: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The following devices are not healthy:
Disk WDC_WD142KFGX-68AFPN0 6AGHKUNX is REMOVED
I tried some diagnostic steps shown below. How do I figure out if the drive is bad or some connection is loose or something else? Thanks.
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 12.7T 0 disk
└─sda1 8:1 0 7.3T 0 part
nvme0n1 259:0 0 232.9G 0 disk
├─nvme0n1p1 259:1 0 260M 0 part
├─nvme0n1p2 259:2 0 216.6G 0 part
└─nvme0n1p3 259:3 0 16G 0 part
└─nvme0n1p3 253:0 0 16G 0 crypt [SWAP]
zpool status
pool: boot-pool
state: ONLINE
status: One or more features are enabled on the pool despite not being
requested by the 'compatibility' property.
action: Consider setting 'compatibility' to an appropriate value, or
adding needed features to the relevant file in
/etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d.
scan: scrub repaired 0B in 00:00:13 with 0 errors on Mon Aug 19 03:45:14 2024
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
nvme0n1p2 ONLINE 0 0 0
errors: No known data errors
pool: pool-1
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 0B in 04:18:12 with 0 errors on Sun Aug 11 06:18:13 2024
config:
NAME STATE READ WRITE CKSUM
pool-1 DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
1322e964-85c3-40b4-87ad-5d8f1cbdb325 REMOVED 0 0 0
cf27888e-30ec-4012-b4ef-a864853fc485 ONLINE 0 0 0
errors: No known data errors
dmesg -H
[Aug24 15:35] ata1.00: exception Emask 0x0 SAct 0x400a3102 SErr 0x0 action 0x6 frozen
[ +0.001215] ata1.00: failed command: READ FPDMA QUEUED
[ +0.001192] ata1.00: cmd 60/08:08:08:f5:2b/00:00:36:00:00/40 tag 1 ncq dma 4096 in
res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ +0.002539] ata1.00: status: { DRDY }
[ +0.001154] ata1.00: failed command: READ FPDMA QUEUED
[ +0.001111] ata1.00: cmd 60/08:40:a0:12:a9/00:00:26:02:00/40 tag 8 ncq dma 4096 in
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ +0.002326] ata1.00: status: { DRDY }
[ +0.001160] ata1.00: failed command: READ FPDMA QUEUED
[ +0.001169] ata1.00: cmd 60/00:60:70:50:87/05:00:a6:01:00/40 tag 12 ncq dma 655360 in
res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ +0.002477] ata1.00: status: { DRDY }
[ +0.001238] ata1.00: failed command: READ FPDMA QUEUED
[ +0.001293] ata1.00: cmd 60/00:68:70:58:87/03:00:a6:01:00/40 tag 13 ncq dma 393216 in
res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ +0.001699] ata1.00: status: { DRDY }
[ +0.000830] ata1.00: failed command: READ FPDMA QUEUED
[ +0.000861] ata1.00: cmd 60/00:88:70:5b:87/01:00:a6:01:00/40 tag 17 ncq dma 131072 in
res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ +0.001732] ata1.00: status: { DRDY }
[ +0.000873] ata1.00: failed command: READ FPDMA QUEUED
[ +0.000873] ata1.00: cmd 60/40:98:b0:be:54/00:00:5e:02:00/40 tag 19 ncq dma 32768 in
res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ +0.001821] ata1.00: status: { DRDY }
[ +0.001011] ata1.00: failed command: READ FPDMA QUEUED
[ +0.000908] ata1.00: cmd 60/00:f0:80:15:5d/08:00:6c:02:00/40 tag 30 ncq dma 1048576 in
res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ +0.001914] ata1.00: status: { DRDY }
[ +0.000962] ata1: hard resetting link
[ +5.333491] ata1: link is slow to respond, please be patient (ready=0)
[ +4.627922] ata1: SATA link down (SStatus 0 SControl 300)
[ +0.619080] ata1: hard resetting link
[ +5.364792] ata1: link is slow to respond, please be patient (ready=0)
[ +4.683928] ata1: COMRESET failed (errno=-16)
[ +0.001275] ata1: hard resetting link
[ +4.082624] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ +0.002146] ata1.00: revalidation failed (errno=-2)
[ +5.077768] ata1: hard resetting link
[ +0.318217] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ +0.002313] ata1.00: revalidation failed (errno=-2)
[ +0.000698] ata1.00: disable device
[ +0.006788] sd 0:0:0:0: rejecting I/O to offline device
[ +0.000707] I/O error, dev sdb, sector 9378152200 op 0x0:(READ) flags 0x0 phys_seg 8 prio class 2
[ +0.000730] zio pool=pool-1 vdev=/dev/disk/by-partuuid/1322e964-85c3-40b4-87ad-5d8f1cbdb325 error=5 type=1 offset=4801611829248 size=462848 flags=1573248
[ +0.001515] I/O error, dev sdb, sector 908887112 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ +0.000059] I/O error, dev sdb, sector 778071744 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ +0.000747] zio pool=pool-1 vdev=/dev/disk/by-partuuid/1322e964-85c3-40b4-87ad-5d8f1cbdb325 error=5 type=1 offset=465348104192 size=4096 flags=1573248
[ +0.000852] zio pool=pool-1 vdev=/dev/disk/by-partuuid/1322e964-85c3-40b4-87ad-5d8f1cbdb325 error=5 type=1 offset=398370635776 size=4096 flags=1573248
[ +0.001702] I/O error, dev sdb, sector 4624 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ +0.002618] zio pool=pool-1 vdev=/dev/disk/by-partuuid/1322e964-85c3-40b4-87ad-5d8f1cbdb325 error=5 type=1 offset=270336 size=8192 flags=721089
[ +0.001966] I/O error, dev sdb, sector 15628055568 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ +0.001513] zio pool=pool-1 vdev=/dev/disk/by-partuuid/1322e964-85c3-40b4-87ad-5d8f1cbdb325 error=5 type=1 offset=8001562353664 size=8192 flags=721089
[ +0.002353] I/O error, dev sdb, sector 15628056080 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ +0.001105] zio pool=pool-1 vdev=/dev/disk/by-partuuid/1322e964-85c3-40b4-87ad-5d8f1cbdb325 error=5 type=1 offset=8001562615808 size=8192 flags=721089
[ +0.696124] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ +0.003413] sd 0:0:0:0: [sdb] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=60s
[ +0.002118] sd 0:0:0:0: [sdb] tag#1 Sense Key : Not Ready [current]
[ +0.002027] sd 0:0:0:0: [sdb] tag#1 Add. Sense: Logical unit not ready, hard reset required
[ +0.002118] sd 0:0:0:0: [sdb] tag#1 CDB: Read(16) 88 00 00 00 00 00 36 2b f5 08 00 00 00 08 00 00
[ +0.002075] I/O error, dev sdb, sector 908850440 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ +0.002151] zio pool=pool-1 vdev=/dev/disk/by-partuuid/1322e964-85c3-40b4-87ad-5d8f1cbdb325 error=5 type=1 offset=465329328128 size=4096 flags=1573248
[ +0.004538] sd 0:0:0:0: [sdb] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=63s
[ +0.002358] sd 0:0:0:0: [sdb] tag#8 Sense Key : Not Ready [current]
[ +0.002270] sd 0:0:0:0: [sdb] tag#8 Add. Sense: Logical unit not ready, hard reset required
[ +0.002334] sd 0:0:0:0: [sdb] tag#8 CDB: Read(16) 88 00 00 00 00 02 26 a9 12 a0 00 00 00 08 00 00
[ +0.002307] I/O error, dev sdb, sector 9238549152 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ +0.002382] zio pool=pool-1 vdev=/dev/disk/by-partuuid/1322e964-85c3-40b4-87ad-5d8f1cbdb325 error=5 type=1 offset=4730135068672 size=4096 flags=1573248
[ +0.004844] sd 0:0:0:0: [sdb] tag#12 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=87s
[ +0.002550] sd 0:0:0:0: [sdb] tag#12 Sense Key : Not Ready [current]
[ +0.002555] sd 0:0:0:0: [sdb] tag#12 Add. Sense: Logical unit not ready, hard reset required
[ +0.002530] sd 0:0:0:0: [sdb] tag#12 CDB: Read(16) 88 00 00 00 00 01 a6 87 50 70 00 00 05 00 00 00
[ +0.002576] I/O error, dev sdb, sector 7088853104 op 0x0:(READ) flags 0x0 phys_seg 10 prio class 2
[ +0.002590] zio pool=pool-1 vdev=/dev/disk/by-partuuid/1322e964-85c3-40b4-87ad-5d8f1cbdb325 error=5 type=1 offset=3629490692096 size=655360 flags=1074267264
[ +0.005365] sd 0:0:0:0: [sdb] tag#13 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=87s
[ +0.001670] sd 0:0:0:0: [sdb] tag#13 Sense Key : Not Ready [current]
[ +0.001554] sd 0:0:0:0: [sdb] tag#13 Add. Sense: Logical unit not ready, hard reset required
[ +0.001523] sd 0:0:0:0: [sdb] tag#13 CDB: Read(16) 88 00 00 00 00 01 a6 87 58 70 00 00 03 00 00 00
[ +0.001509] I/O error, dev sdb, sector 7088855152 op 0x0:(READ) flags 0x0 phys_seg 6 prio class 2
[ +0.001564] zio pool=pool-1 vdev=/dev/disk/by-partuuid/1322e964-85c3-40b4-87ad-5d8f1cbdb325 error=5 type=1 offset=3629491740672 size=393216 flags=1074267264
[ +0.003268] sd 0:0:0:0: [sdb] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=87s
[ +0.001590] sd 0:0:0:0: [sdb] tag#17 Sense Key : Not Ready [current]
[ +0.001572] sd 0:0:0:0: [sdb] tag#17 Add. Sense: Logical unit not ready, hard reset required
[ +0.001585] sd 0:0:0:0: [sdb] tag#17 CDB: Read(16) 88 00 00 00 00 01 a6 87 5b 70 00 00 01 00 00 00
[ +0.001544] zio pool=pool-1 vdev=/dev/disk/by-partuuid/1322e964-85c3-40b4-87ad-5d8f1cbdb325 error=5 type=1 offset=3629492133888 size=131072 flags=1573248