I’ve got a bunch of Western Digital Red HDDs in a zpool. It’s been running reliably for a long time, but recently I’ve noticed the occasional “hiccup” when reading files via SAMBA. It’ll hang briefly (5-10 seconds), and then return to normal operation.
All HDDs pass SMART tests. The last zpool scrub found no errors, and zpool status shows no errors.
I found the following in /var/log/messages. Should I be concerned that the da3 disk is failing?
Nov 4 06:17:27 grimlock (da3:mpr1:0:9:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 1891 Command timeout on target 9(0x000b), 60000 set, 60.1335316 elapsed
Nov 4 06:17:27 grimlock mpr1: At enclosure level 0, slot 1, connector name ( )
Nov 4 06:17:27 grimlock mpr1: Sending abort to target 9 for SMID 1891
Nov 4 06:17:27 grimlock (da3:mpr1:0:9:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 1891 Aborting command 0xfffffe015894d968
Nov 4 06:17:28 grimlock (da3:mpr1:0:9:0): READ(16). CDB: 88 00 00 00 00 04 08 40 24 48 00 00 00 08 00 00 length 4096 SMID 1465 Command timeout on target 9(0x000b), 60000 set, 60.1543062 elapsed
Nov 4 06:17:28 grimlock mpr1: At enclosure level 0, slot 1, connector name ( )
Nov 4 06:17:28 grimlock (da3:mpr1:0:9:0): READ(16). CDB: 88 00 00 00 00 03 fc 45 21 80 00 00 00 08 00 00 length 4096 SMID 1182 Command timeout on target 9(0x000b), 60000 set, 60.86710698 elapsed
Nov 4 06:17:28 grimlock mpr1: At enclosure level 0, slot 1, connector name ( )
Nov 4 06:17:28 grimlock (da3:mpr1:0:9:0): READ(16). CDB: 88 00 00 00 00 04 38 4e 61 d0 00 00 00 08 00 00 length 4096 SMID 1713 Command timeout on target 9(0x000b), 60000 set, 60.173122035 elapsed
Nov 4 06:17:28 grimlock mpr1: At enclosure level 0, slot 1, connector name ( )
Nov 4 06:17:30 grimlock mpr1: mprsas_prepare_remove: Sending reset for target ID 9
Nov 4 06:17:31 grimlock mpr1: Controller reported scsi ioc terminated tgt 9 SMID 943 loginfo 31130000 departing
Nov 4 06:17:31 grimlock mpr1: da3 at mpr1 bus 0 scbus13 target 9 lun 0
Nov 4 06:17:31 grimlock da3: <ATA ST16000NM001G-2K SN02> s/n ZL216E8H detached
Nov 4 06:17:31 grimlock Controller reported scsi ioc terminated tgt 9 SMID 1465 loginfo 31130000 departing
Nov 4 06:17:31 grimlock mpr1: Controller reported scsi ioc terminated tgt 9 SMID 1182 loginfo 31130000 departing
Nov 4 06:17:31 grimlock mpr1: Controller reported scsi ioc terminated tgt 9 SMID 1713 loginfo 31130000 departing
Nov 4 06:17:31 grimlock mpr1: No pending commands: starting remove_device
Nov 4 06:17:31 grimlock mpr1: clearing target 9 handle 0x000b
Nov 4 06:17:31 grimlock mpr1: At enclosure level 0, slot 1, connector name ( )
Nov 4 06:17:31 grimlock mpr1: Finished abort recovery for target 9
Nov 4 06:17:31 grimlock (da3:mpr1:0:9:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
Nov 4 06:17:31 grimlock (da3:mpr1:0:9:0): CAM status: Command timeout
Nov 4 06:17:31 grimlock (da3:mpr1:0:9:0): Error 5, Periph was invalidated
Nov 4 06:17:31 grimlock (da3:mpr1:0:9:0): Periph destroyed
Nov 4 06:17:32 grimlock mpr1: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x000b> enclosureHandle<0x0002> slot 1
Nov 4 06:17:32 grimlock mpr1: At enclosure level 0 and connector name ( )
Nov 4 06:17:32 grimlock ses3: da3,pass7 in 'Slot01', SAS Slot: 1 phys at slot 1
Nov 4 06:17:32 grimlock ses3: phy 0: SATA device
Nov 4 06:17:32 grimlock ses3: phy 0: parent 5003048001a7ccbf addr 5003048001a7cc81
Nov 4 06:17:32 grimlock da3 at mpr1 bus 0 scbus13 target 9 lun 0
Nov 4 06:17:32 grimlock da3: <ATA ST16000NM001G-2K SN02> Fixed Direct Access SPC-4 SCSI device
Nov 4 06:17:32 grimlock da3: Serial Number ZL216E8H
Nov 4 06:17:32 grimlock da3: 1200.000MB/s transfers
Nov 4 06:17:32 grimlock da3: Command Queueing enabled
Nov 4 06:17:32 grimlock da3: 15259648MB (31251759104 512 byte sectors)
It’s normally a sign the heads are aggressively parking themselves which overtime will reduce their life expectancy.
But this is separate to your main issue. It appears da3 is sometimes losing communication. What hardware are you running and how are the drives connected to the mobo?