Scrutiny is listing 5 drives as failed. All 5 are HGST H7240AS60SUN4.0T 4 TB drives. When I go in to details I see that all status entries are listed as “PASSED”.
sudo smartctl -H /dev/sdk
returns
SMART Health Status: OK
Running a long test still returns Health Status OK (after 10+ hours).
sg_logs for same device doesn’t show any red flags either.
I have replacement drives on order. Just wondering what other users are seeing.
Scrutiny has different thresholds on some metcis then smart has. You can set it so it uses both thresholds, only the scrutiny ones or only the smart ones.
If you don’t trust the scrutiny values, just use the smart ones
Never believe this indication blindly. The only time to trust it is if it says “FAILED”. the “PASSED” indication is a general summary of the drive electronics, not a summary of the media itself, unless the media has failed very badly.
Have you look at “details” in Scrutiny to see what it is complaining about?
Of course I like the flowchart idea, since I wrote it. Also it will educate you a little bit when using it. Being educated on these SMART things I feel is critical if you plan to run a NAS.
Last thing: before replacing any drive, prove to yourself that the drive has actually failed.
If you have any questions about reading the SMART data, post the output here and someone will dissect it for you.
I did, I just posted the -H summary as an example. -a and -x only show an abbreviated report. Here’s a full dump for /dev/sdk. I see a few errors, but nothing new (have same report from last year). Where are the true red flags? Like I said, I have new drives on order, just trying to understand what is really going on.
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.33-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: HGST
Product: H7240AS60SUN4.0T
Revision: A3A0
Compliance: SPC-4
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Logical block size: 512 bytes
LU is fully provisioned
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000cca05cd2b1c4
Serial number: 001510ERW3DX PCKRW3DX
Device type: disk
Transport protocol: SAS (SPL-4)
Local Time is: Thu Dec 4 10:29:27 2025 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
Read Cache is: Enabled
Writeback Cache is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature: 23 C
Drive Trip Temperature: 85 C
Manufactured in week 10 of year 2015
Specified cycle count over device lifetime: 50000
Accumulated start-stop cycles: 14
Specified load-unload count over device lifetime: 600000
Accumulated load-unload cycles: 3242
Elements in grown defect list: 0
Vendor (Seagate Cache) information
Blocks sent to initiator = 41443833937920
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 66815 0 0 66815 15792 2892.958 0
write: 0 0 0 0 7261 390.958 0
verify: 1 0 0 1 15939 0.000 0
Non-medium error count: 0
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Failed in segment --> 3 17767 - [- - -]
# 2 Background short Failed in segment --> 3 17571 - [- - -]
# 3 Background short Completed - 8032 - [- - -]
Long (extended) Self-test duration: 37452 seconds [10.4 hours]
Background scan results log
Status: waiting until BMS interval timer expires
Accumulated power on time, hours:minutes 17787:36 [1067256 minutes]
Number of background scans performed: 469, scan progress: 0.00%
Number of background medium scans performed: 469
General statistics and performance log page:
General access statistics and performance:
Number of read commands: 156236340
Number of write commands: 2470245
number of logical blocks received: 763589813
number of logical blocks transmitted: 5650307902
read command processing intervals: 0
write command processing intervals: 0
weighted number of read commands plus write commands: 0
weighted read command processing plus write command processing: 0
Idle time:
Idle time intervals: 1245242572
in seconds: 62262128.600
in hours: 17295.035
Protocol Specific port log page for SAS SSP
relative target port id = 1
generation code = 2
number of phys = 1
phy identifier = 0
attached device type: expander device
attached reason: SMP phy control function
reason: unknown
negotiated logical link rate: phy enabled; 6 Gbps
attached initiator port: ssp=0 stp=0 smp=0
attached target port: ssp=0 stp=0 smp=1
SAS address = 0x5000cca05cd2b1c5
attached SAS address = 0x500c04f2ce0b2b3f
attached phy identifier = 27
Invalid DWORD count = 0
Running disparity error count = 0
Loss of DWORD synchronization count = 0
Phy reset problem count = 0
relative target port id = 2
generation code = 2
number of phys = 1
phy identifier = 1
attached device type: expander device
attached reason: SMP phy control function
reason: unknown
negotiated logical link rate: phy enabled; 6 Gbps
attached initiator port: ssp=0 stp=0 smp=0
attached target port: ssp=0 stp=0 smp=1
SAS address = 0x5000cca05cd2b1c6
attached SAS address = 0x500c04f2ce0b2bbf
attached phy identifier = 27
Invalid DWORD count = 0
Running disparity error count = 0
Loss of DWORD synchronization count = 0
Phy reset problem count = 0
When I go in to details I see that all status entries are listed as “PASSED”.
By changing the settings I can see the fail status is coming from the smart data, not the scrutiny thresholds. I’m really just trying to understand why 5 drives are suddenly showing errors when they didn’t a week ago. Probably just an EOL thing and replacment is a good idea.
# 1 Background long Failed in segment --> 3 17767 - [- - -]
# 2 Background short Failed in segment --> 3 17571 - [- - -]
# 3 Background short Completed - 8032 - [- - -]
Your drive cannot read the media. If it fails a SMART Long test then the drive is failing. This drive should be replaced.
Edit: This is what I mentioned before, never trust it for a good status. If it tells you something bad, you can trust that. I have only seen one drive, maybe two, that actually said FAILED. SMART Health Status: OK
If you have not been testing your drives, as it appears from the one output, your other drives could have been bad for a long time as well. And you provided no history about these drives. Were they in a computer, sitting on a shelf due to failure, all I can tell is this one drive was likely powered on the majority of it’s life, but that is an assumption based on the start-stop count.
If all your drives have the same failure type and about the same time, then post all the drive data from smartctl -x /dev/sd? so we can take a look and see if there is something common. However, a SMART selftest is internal to the drive so I doubt there is anything common, but you never know.