How reliable is Scrutiny?

Scrutiny is listing 5 drives as failed. All 5 are HGST H7240AS60SUN4.0T 4 TB drives. When I go in to details I see that all status entries are listed as “PASSED”.

sudo smartctl -H /dev/sdk
returns
SMART Health Status: OK

Running a long test still returns Health Status OK (after 10+ hours).
sg_logs for same device doesn’t show any red flags either.
I have replacement drives on order. Just wondering what other users are seeing.

Thanks,
Kelly

Scrutiny has different thresholds on some metcis then smart has. You can set it so it uses both thresholds, only the scrutiny ones or only the smart ones.
If you don’t trust the scrutiny values, just use the smart ones

1 Like

You may want to analyse SMART reports yourself. -a or -x rather than -H

is also worth to try to run the collector

docker exec scrutiny /opt/scrutiny/bin/scrutiny-collector-metrics run

to see if errors disappear
(change scrutiny with your container name or run the command directly inside the container)

Never believe this indication blindly. The only time to trust it is if it says “FAILED”. the “PASSED” indication is a general summary of the drive electronics, not a summary of the media itself, unless the media has failed very badly.

Have you look at “details” in Scrutiny to see what it is complaining about?

Of course I like the flowchart idea, since I wrote it. Also it will educate you a little bit when using it. Being educated on these SMART things I feel is critical if you plan to run a NAS.

Last thing: before replacing any drive, prove to yourself that the drive has actually failed.

If you have any questions about reading the SMART data, post the output here and someone will dissect it for you.

1 Like

I did, I just posted the -H summary as an example. -a and -x only show an abbreviated report. Here’s a full dump for /dev/sdk. I see a few errors, but nothing new (have same report from last year). Where are the true red flags? Like I said, I have new drives on order, just trying to understand what is really going on.

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.33-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              H7240AS60SUN4.0T
Revision:             A3A0
Compliance:           SPC-4
User Capacity:        4,000,787,030,016 bytes [4.00 TB]
Logical block size:   512 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca05cd2b1c4
Serial number:        001510ERW3DX        PCKRW3DX
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Thu Dec  4 10:29:27 2025 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled
Read Cache is:        Enabled
Writeback Cache is:   Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     23 C
Drive Trip Temperature:        85 C

Manufactured in week 10 of year 2015
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  14
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  3242
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 41443833937920

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:      66815        0         0     66815      15792       2892.958           0
write:         0        0         0         0       7261        390.958           0
verify:        1        0         0         1      15939          0.000           0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Failed in segment -->       3   17767                 - [-   -    -]
# 2  Background short  Failed in segment -->       3   17571                 - [-   -    -]
# 3  Background short  Completed                   -    8032                 - [-   -    -]

Long (extended) Self-test duration: 37452 seconds [10.4 hours]

Background scan results log
  Status: waiting until BMS interval timer expires
    Accumulated power on time, hours:minutes 17787:36 [1067256 minutes]
    Number of background scans performed: 469,  scan progress: 0.00%
    Number of background medium scans performed: 469

General statistics and performance log page:
  General access statistics and performance:
    Number of read commands: 156236340
    Number of write commands: 2470245
    number of logical blocks received: 763589813
    number of logical blocks transmitted: 5650307902
    read command processing intervals: 0
    write command processing intervals: 0
    weighted number of read commands plus write commands: 0
    weighted read command processing plus write command processing: 0
  Idle time:
    Idle time intervals: 1245242572
      in seconds: 62262128.600
      in hours: 17295.035
Protocol Specific port log page for SAS SSP
relative target port id = 1
  generation code = 2
  number of phys = 1
  phy identifier = 0
    attached device type: expander device
    attached reason: SMP phy control function
    reason: unknown
    negotiated logical link rate: phy enabled; 6 Gbps
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=1
    SAS address = 0x5000cca05cd2b1c5
    attached SAS address = 0x500c04f2ce0b2b3f
    attached phy identifier = 27
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization count = 0
    Phy reset problem count = 0
relative target port id = 2
  generation code = 2
  number of phys = 1
  phy identifier = 1
    attached device type: expander device
    attached reason: SMP phy control function
    reason: unknown
    negotiated logical link rate: phy enabled; 6 Gbps
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=1
    SAS address = 0x5000cca05cd2b1c6
    attached SAS address = 0x500c04f2ce0b2bbf
    attached phy identifier = 27
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization count = 0
    Phy reset problem count = 0

When I go in to details I see that all status entries are listed as “PASSED”.

By changing the settings I can see the fail status is coming from the smart data, not the scrutiny thresholds. I’m really just trying to understand why 5 drives are suddenly showing errors when they didn’t a week ago. Probably just an EOL thing and replacment is a good idea.

Here is the issue:

# 1  Background long   Failed in segment -->       3   17767                 - [-   -    -]
# 2  Background short  Failed in segment -->       3   17571                 - [-   -    -]
# 3  Background short  Completed                   -    8032                 - [-   -    -]

Your drive cannot read the media. If it fails a SMART Long test then the drive is failing. This drive should be replaced.

Edit: This is what I mentioned before, never trust it for a good status. If it tells you something bad, you can trust that. I have only seen one drive, maybe two, that actually said FAILED.
SMART Health Status: OK

1 Like

If you have not been testing your drives, as it appears from the one output, your other drives could have been bad for a long time as well. And you provided no history about these drives. Were they in a computer, sitting on a shelf due to failure, all I can tell is this one drive was likely powered on the majority of it’s life, but that is an assumption based on the start-stop count.

If all your drives have the same failure type and about the same time, then post all the drive data from smartctl -x /dev/sd? so we can take a look and see if there is something common. However, a SMART selftest is internal to the drive so I doubt there is anything common, but you never know.