Hi All,
Update:
I have does some more digging on this and it looks like the sysstat-collect.service tries to get performance information from the SSD but the SSD sometimes hangs resulting in a timeout. Presumably, when an write operation overlaps with this hang event the operation errors out and the pool gets put into a fault state. I have not yet figured out how to correct this issue. The SSD is also not in the smartd database for some reason and that may be related…
I have been kinda ignoring this issue for a while. However, I am trying to solve all my issues before i attempt to virtualizate truenas.
I have a single disk pool on an SSD (p220 listed below). That disk is holds my apps and virtual machines and is the target for the logging etc. so that my spinning disks can spin down most of the time.
I tracked down the following in the syslog but don’t know how to interpret it. Any input guidance or direction for research would be appreciated.
Jun 30 17:17:01 truenas CRON[678631]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 30 17:20:00 truenas systemd[1]: Starting sysstat-collect.service - system activity accounting tool…
Jun 30 17:20:00 truenas systemd[1]: sysstat-collect.service: Deactivated successfully.
Jun 30 17:20:00 truenas systemd[1]: Finished sysstat-collect.service - system activity accounting tool.
Jun 30 17:21:11 truenas kernel: sd 9:0:4:0: attempting task abort!scmd(0x000000007ac1f3dc), outstanding for 30068 ms & timeout 30000 ms
Jun 30 17:21:11 truenas kernel: sd 9:0:4:0: [sde] tag#7835 CDB: Write(10) 2a 00 3d 04 45 a8 00 00 08 00
Jun 30 17:21:11 truenas kernel: scsi target9:0:4: handle(0x0009), sas_address(0x4433221102000000), phy(2)
Jun 30 17:21:11 truenas kernel: scsi target9:0:4: enclosure logical id(0x500605b00cfc4e30), slot(4)
Jun 30 17:21:11 truenas kernel: scsi target9:0:4: enclosure level(0x0000), connector name( )
Jun 30 17:21:14 truenas kernel: sd 9:0:4:0: task abort: SUCCESS scmd(0x000000007ac1f3dc)
Jun 30 17:21:14 truenas kernel: sd 9:0:4:0: [sde] tag#7835 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=32s
Jun 30 17:21:14 truenas kernel: sd 9:0:4:0: [sde] tag#7835 CDB: Write(10) 2a 00 3d 04 45 a8 00 00 08 00
Jun 30 17:21:14 truenas kernel: I/O error, dev sde, sector 1023690152 op 0x1:(WRITE) flags 0x100 phys_seg 1 prio class 0
Jun 30 17:21:14 truenas kernel: zio pool=Virtualization vdev=/dev/disk/by-partuuid/c334f3fa-04c3-447e-b5f4-ab87ff0bdd6b error=5 type=2 offset=524127260672 size=4096 flags=3145856
Jun 30 17:21:14 truenas kernel: sd 9:0:4:0: attempting task abort!scmd(0x00000000d1c61bd3), outstanding for 31140 ms & timeout 30000 ms
Jun 30 17:21:14 truenas kernel: sd 9:0:4:0: [sde] tag#7837 CDB: Write(10) 2a 00 55 3a 69 b0 00 00 08 00
Jun 30 17:21:14 truenas kernel: scsi target9:0:4: handle(0x0009), sas_address(0x4433221102000000), phy(2)
Jun 30 17:21:14 truenas kernel: scsi target9:0:4: enclosure logical id(0x500605b00cfc4e30), slot(4)
Jun 30 17:21:14 truenas kernel: scsi target9:0:4: enclosure level(0x0000), connector name( )
Jun 30 17:21:14 truenas kernel: sd 9:0:4:0: No reference found at driver, assuming scmd(0x00000000d1c61bd3) might have completed
Jun 30 17:21:14 truenas kernel: sd 9:0:4:0: task abort: SUCCESS scmd(0x00000000d1c61bd3)
Jun 30 17:21:14 truenas kernel: sd 9:0:4:0: [sde] tag#7837 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=31s
Jun 30 17:21:14 truenas kernel: sd 9:0:4:0: [sde] tag#7837 CDB: Write(10) 2a 00 55 3a 69 b0 00 00 08 00
Jun 30 17:21:14 truenas kernel: I/O error, dev sde, sector 1429891504 op 0x1:(WRITE) flags 0x100 phys_seg 1 prio class 0
Jun 30 17:21:14 truenas kernel: zio pool=Virtualization vdev=/dev/disk/by-partuuid/c334f3fa-04c3-447e-b5f4-ab87ff0bdd6b error=5 type=2 offset=732102352896 size=4096 flags=3162240
Jun 30 17:21:14 truenas kernel: sd 9:0:4:0: attempting task abort!scmd(0x00000000aaeeea19), outstanding for 31144 ms & timeout 30000 ms
Jun 30 17:21:14 truenas kernel: sd 9:0:4:0: [sde] tag#7836 CDB: Write(10) 2a 00 55 3a 69 a8 00 00 08 00
Jun 30 17:21:14 truenas kernel: scsi target9:0:4: handle(0x0009), sas_address(0x4433221102000000), phy(2)
Jun 30 17:21:14 truenas kernel: scsi target9:0:4: enclosure logical id(0x500605b00cfc4e30), slot(4)
Jun 30 17:21:14 truenas kernel: scsi target9:0:4: enclosure level(0x0000), connector name( )
Jun 30 17:21:14 truenas kernel: sd 9:0:4:0: No reference found at driver, assuming scmd(0x00000000aaeeea19) might have completed
Jun 30 17:21:14 truenas kernel: sd 9:0:4:0: task abort: SUCCESS scmd(0x00000000aaeeea19)
Jun 30 17:21:14 truenas kernel: sd 9:0:4:0: [sde] tag#7836 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=31s
Jun 30 17:21:14 truenas kernel: sd 9:0:4:0: [sde] tag#7836 CDB: Write(10) 2a 00 55 3a 69 a8 00 00 08 00
Jun 30 17:21:14 truenas kernel: I/O error, dev sde, sector 1429891496 op 0x1:(WRITE) flags 0x100 phys_seg 1 prio class 0
Jun 30 17:21:14 truenas kernel: zio pool=Virtualization vdev=/dev/disk/by-partuuid/c334f3fa-04c3-447e-b5f4-ab87ff0bdd6b error=5 type=2 offset=732102348800 size=4096 flags=3162240
Jun 30 17:21:15 truenas kernel: sd 9:0:4:0: Power-on or device reset occurred
Jun 30 17:21:15 truenas zed[687713]: eid=1252 class=io pool=‘Virtualization’ vdev=c334f3fa-04c3-447e-b5f4-ab87ff0bdd6b size=2048 offset=732102348800 priority=3 err=5 flags=0x304080 delay=31144ms bookmark=140833:9323:0:0
Jun 30 17:21:15 truenas zed[687715]: eid=1251 class=delay pool=‘Virtualization’ vdev=c334f3fa-04c3-447e-b5f4-ab87ff0bdd6b size=4096 offset=732102352896 priority=3 err=5 flags=0x304080 delay=31143ms bookmark=140833:5000:0:0
Jun 30 17:21:15 truenas zed[687714]: eid=1250 class=delay pool=‘Virtualization’ vdev=c334f3fa-04c3-447e-b5f4-ab87ff0bdd6b size=4096 offset=524127260672 priority=1 err=5 flags=0x300080 delay=32753ms bookmark=140833:0:-2:11652
Jun 30 17:21:15 truenas zed[687709]: eid=1249 class=delay pool=‘Virtualization’ vdev=c334f3fa-04c3-447e-b5f4-ab87ff0bdd6b size=2048 offset=732102348800 priority=3 err=5 flags=0x304080 delay=31144ms bookmark=140833:9323:0:0
Jun 30 17:21:15 truenas zed[687717]: eid=1253 class=io pool=‘Virtualization’ vdev=c334f3fa-04c3-447e-b5f4-ab87ff0bdd6b size=4096 offset=524127260672 priority=1 err=5 flags=0x300080 delay=32753ms bookmark=140833:0:-2:11652
Jun 30 17:21:15 truenas zed[687719]: eid=1254 class=io pool=‘Virtualization’ vdev=c334f3fa-04c3-447e-b5f4-ab87ff0bdd6b size=4096 offset=732102352896 priority=3 err=5 flags=0x304080 delay=31143ms bookmark=140833:5000:0:0
Jun 30 17:23:17 truenas systemd[1]: session-711.scope: Deactivated successfully.
Jun 30 17:23:17 truenas systemd[1]: session-711.scope: Consumed 6min 55.267s CPU time.
Jun 30 17:23:27 truenas systemd[1]: Stopping user@0.service - User Manager for UID 0…
I also found this… which makes me wonder if it is an issue with the smart feature.
Jun 29 00:03:08 truenas smartd[3455]: Device: /dev/sde [SAT], Patriot P220 1024GB, S/N:P220NDCB2310230350, WWN:5-000000-000000186, FW:SN15926, 1.02 TB
Jun 29 00:03:08 truenas smartd[3455]: Device: /dev/sde [SAT], not found in smartd database 7.3/5770.
Jun 29 00:03:08 truenas smartd[3455]: Device: /dev/sde [SAT], can’t monitor Current_Pending_Sector count - no Attribute 197
Jun 29 00:03:08 truenas smartd[3455]: Device: /dev/sde [SAT], can’t monitor Offline_Uncorrectable count - no Attribute 198
Jun 29 00:03:08 truenas smartd[3455]: Device: /dev/sde [SAT], is SMART capable. Adding to “monitor” list.
Jun 29 00:03:08 truenas smartd[3455]: Device: /dev/sde [SAT], state read from /var/lib/smartmontools/smartd.Patriot_P220_1024GB-P220NDCB2310230350.ata.state