Ok for context I’m pretty new at this NAS stuff. Bought a used T320, updated both bios and the lifecycle controller firmware to the latest as well as flashed the controller into “IT” mode and changed the CMOS battery, got idrac setup. Am also on the latest version of Truenas. I Only have a boot pool atm, and no other storage pools.
The problem is that I now purchased 10 x 4TB used seagate SAS drives on ebay, first thing I did was run a long health scan on 5 drives (I only have 8 bays so I was planning on doing 2 passes, testing 5 drives each), all 5 drives passed no problems but I was presented with a critical error by truenas saying (DIF) was unsupported from those drives. A quick google told me this could be solved by formatting the drives and so I attempted, I ran “sudo sg_format --format --size=512 /dev/sda” in parallel via tmux as advised. This worked for 3/5 of the drives but I got this error when attempting the others
Format unit in progress, 7.57% done
test unit ready:
Descriptor format, <<<deferred>>>; Sense Key: Hardware Error
Additional Sense: Defect list not found
Descriptor type: Information: 0x00000000228183a0
Descriptor type: Field Replaceable unit code: 0x93
Descriptor type: Vendor specific [0x80]
00 00 00 00 00 00 00 00 00 00 00 00 00 00
FORMAT UNIT Complete
truenas_admin@truenas[~]$
(I had to manually type in the above because its not letting me paste images, but it should be verbatim outside of indention spacing)
I retried them again, unplugged the drives an replugged on different slots as well as removed all other non problematic drives from their bays. I just pulled the plug from the system because I heard it might help but I’m not expecting much to change on this next attempt. Normally I would chalk this up as the drives being bad but them passing the long with no errors has been bugging me, anyone have any ideas? here’s some more info on the drives
truenas_admin@truenas[~]$ sudo smartctl -a /dev/sda
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST4000NM0023
Revision: XMGH
Compliance: SPC-4
LU is fully provisioned
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c5006276db6f
Serial number: Z1Z5S1QF0000C510AFWY
Device type: disk
Transport protocol: SAS (SPL-4)
Local Time is: Sun Feb 2 10:15:05 2025 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature: 38 C
Drive Trip Temperature: 68 C
scsiPrintBackgroundResults Failed [medium or hardware error (serious)]
Manufactured in week 38 of year 2014
Specified cycle count over device lifetime: 10000
Accumulated start-stop cycles: 610
Specified load-unload count over device lifetime: 300000
Accumulated load-unload cycles: 3533
Read defect list: asked for grown list but didn't get it
Vendor (Seagate Cache) information
Blocks sent to initiator = 3534769525
Blocks received from initiator = 256483996
Blocks read from cache and sent to initiator = 2498505367
Number of read and write commands whose size <= segment size = 1325140844
Number of read and write commands whose size > segment size = 0
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 56937.43
number of minutes until next internal SMART test = 59
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 3873026852 0 0 3873026852 0 678553.128 0
write: 0 0 0 0 0 276646.093 0
verify: 6471279 0 0 6471279 0 0.000 0
Non-medium error count: 3586938
[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Completed - 56886 - [- - -]
# 2 Background short Completed - 18028 - [- - -]
# 3 Background short Completed - 15849 - [- - -]
# 4 Background short Completed - 11302 - [- - -]
# 5 Background short Completed - 10681 - [- - -]
# 6 Background short Completed - 1789 - [- - -]
# 7 Background short Completed - 1646 - [- - -]
Long (extended) Self-test duration: 32700 seconds [9.1 hours]
truenas_admin@truenas[~]$ sudo smartctl -a /dev/sdb
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.44-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST4000NM0023
Revision: XMGH
Compliance: SPC-4
LU is fully provisioned
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c5006277660b
Serial number: Z1Z5SJ080000C510BZ1Y
Device type: disk
Transport protocol: SAS (SPL-4)
Local Time is: Sun Feb 2 10:16:51 2025 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature: 37 C
Drive Trip Temperature: 68 C
scsiPrintBackgroundResults Failed [medium or hardware error (serious)]
Manufactured in week 38 of year 2014
Specified cycle count over device lifetime: 10000
Accumulated start-stop cycles: 607
Specified load-unload count over device lifetime: 300000
Accumulated load-unload cycles: 3508
Read defect list: asked for grown list but didn't get it
Vendor (Seagate Cache) information
Blocks sent to initiator = 2065260292
Blocks received from initiator = 875576814
Blocks read from cache and sent to initiator = 519023055
Number of read and write commands whose size <= segment size = 1883588032
Number of read and write commands whose size > segment size = 0
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 56904.57
number of minutes until next internal SMART test = 59
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 1639291679 0 0 1639291679 0 762657.538 0
write: 0 0 0 0 0 373244.293 0
verify: 2630170 0 0 2630170 0 0.000 0
Non-medium error count: 3571678
[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Completed - 56853 - [- - -]
# 2 Background short Completed - 18028 - [- - -]
# 3 Background short Completed - 15850 - [- - -]
# 4 Background short Completed - 11303 - [- - -]
# 5 Background short Completed - 10682 - [- - -]
# 6 Background short Completed - 1790 - [- - -]
# 7 Background short Completed - 1646 - [- - -]
Long (extended) Self-test duration: 32700 seconds [9.1 hours]
this is an example of what they looked like before the first reformat:
(I can’t post images, but it said user capacity [4.00 TB] , logical block size: 512, Formatted with type 2 protection 8 bytes of protection information per logical block (all of which is missing from the current information sections now for these two drives)
It is also good to note that they appear to be spitting the error around the same %, the one drives fails in the low 20s and the second in the mid 70s. Cheers
edit: trying to just call sg_format with -v gave me this which might be relevant: