I started using the excellent multi-report recently to keep an eye on my SAS disks.
I have 41x 1.6TB SAS SSD connected via a Dell Perc H310 and HP SAS expander in an HP DL380 G6. The disks were gifted from a Netapp, with ~65k hrs use on each, and reformatted to 512 byte sectors
I scrub monthly, do long tests weekly and short tests daily.
One disk in the system is “stuck” doing a long test that started about a week ago. I noticed today after it failed the multi-report run, as there was no successful long test for today.
I’ve tried issuing ‘smartctl -X /dev/sde’ to abort the test, but get 'Abort self test failed [unsupported field in scsi command]
’
Disk info gathered just now:
root@eurybia[...l1/plex_transcodes/Transcode/Sessions]# smartctl -a /dev/sde
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.29-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: NETAPP
Product: X439_S16331T6AMD
Revision: NA04
Compliance: SPC-4
User Capacity: 1,600,321,314,816 bytes [1.60 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is resource provisioned, LBPRZ=1
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: 0x5002538a75801c60
Serial number: S20JNWAG800454
Device type: disk
Transport protocol: SAS (SPL-4)
Local Time is: Mon Jun 17 09:13:52 2024 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Percentage used endurance indicator: 0%
Current Drive Temperature: 30 C
Drive Trip Temperature: 60 C
Accumulated power on time, hours:minutes 64795:52
Manufactured in week 31 of year 2015
Accumulated start-stop cycles: 262
Specified load-unload count over device lifetime: 0
Accumulated load-unload cycles: 0
Elements in grown defect list: 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 0 23313.519 0
write: 0 0 0 0 0 28652.495 0
verify: 0 0 0 0 0 247014.160 0
Non-medium error count: 60
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background short Completed - 64769 - [- - -]
# 2 Background short Completed - 64745 - [- - -]
# 3 Background short Completed - 64721 - [- - -]
# 4 Background short Completed - 64697 - [- - -]
# 5 Background short Completed - 64673 - [- - -]
# 6 Background short Completed - 64651 - [- - -]
# 7 Background long Self test in progress ... - NOW - [- - -]
# 8 Background short Completed - 64601 - [- - -]
# 9 Background short Completed - 64577 - [- - -]
#10 Background short Completed - 64553 - [- - -]
Long (extended) Self-test duration: 3600 seconds [60.0 minutes]
The disk is part of my boot-pool mirror. I considered swapping the disk out for a spare, then in again, to see if that would clear the ongoing test. Any other ideas?