Unclear regarding drive health, multiple drives. SMART on SAS

Hey all, been researching this for a bit now and can’t find much of a good answer. Looks like I’ve got 4 drives(of 12) failing but I can’t really know for sure. Nothing is lining up.

Got alerts that /sdb/, /sdd/, /sdh/, /sdi/ self test log increased from 0 to 1, then 1 to 2, then 2 to 3. but also “disks with errors” on the dashboard says 0. “smartctl -a /dev/sdb” says “SMART Health Status: OK” while listing one background short failed in segment 3. /sdd/ still says health status okay while having 3 failed self tests. All in segment 3, i have no idea what “segment 3” even is…

Normally I’d happily replace the drives and move along. But with pricing right now. its not really an option. just about a year ago I bought all 12 of these 8TB drives for a bit under $1k. no way i could do that now.

Can someone answer what segment 3 even is? Do I even need to care?

Here’s the outputs of the “smartctl -a” on the drives with errors.

/sdb/
sudo smartctl -a /dev/sdb 
[sudo] password for truenas_admin: 
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              H7280A520SUN8.0T
Revision:             PD51
Compliance:           SPC-4
User Capacity:        8,001,563,222,016 bytes [8.00 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca23bac4948
Serial number:        001551P1S92V        2EK1S92V
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Tue Mar 24 13:56:03 2026 PDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     32 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 17309:50
Manufactured in week 51 of year 2015
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  23
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  3188
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 16362107427618816

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       57         0        57    5408297     152036.507           0
write:         0       16         0        16    5012887      57235.701           0
verify:        0        0         0         0     154135          0.000           0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   17247                 - [-   -    -]
# 2  Background short  Failed in segment -->       3   17081                 - [-   -    -]
# 3  Background short  Completed                   -    3261                 - [-   -    -]

Long (extended) Self-test duration: 77460 seconds [21.5 hours]
/sdd/
sudo smartctl -a /dev/sdd
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              H7280A520SUN8.0T
Revision:             PD51
Compliance:           SPC-4
User Capacity:        8,001,563,222,016 bytes [8.00 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca23b9cc428
Serial number:        001549PT6PHV        2EJT6PHV
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Tue Mar 24 14:08:18 2026 PDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     31 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 10034:23
Manufactured in week 49 of year 2015
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  14
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2896
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 9352518567985152

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0      753         0       753    1974943     558617.079           0
write:         0       10         0        10    1972411      26950.587           0
verify:        0      858         0       858     573722       5597.638           0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Failed in segment -->       3    9972                 - [-   -    -]
# 2  Background long   Failed in segment -->       3    9934                 - [-   -    -]
# 3  Background short  Failed in segment -->       3    9805                 - [-   -    -]

Long (extended) Self-test duration: 72480 seconds [20.1 hours]
/sdh/

truenas_admin@DCStore[~]$ sudo smartctl -a /dev/sdh
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              H7280A520SUN8.0T
Revision:             PD51
Compliance:           SPC-4
User Capacity:        8,001,563,222,016 bytes [8.00 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca23bacb600
Serial number:        001551P1ZJJV        2EK1ZJJV
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Tue Mar 24 14:09:01 2026 PDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     34 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 15130:17
Manufactured in week 51 of year 2015
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  20
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  3095
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 15909300803207168

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       79         0        79    5605986     151209.877           0
write:         0        0         0         0    4464884      56465.677           0
verify:        0        0         0         0     249518          0.000           0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Failed in segment -->       3   15068                 - [-   -    -]
# 2  Background short  Failed in segment -->       3   14901                 - [-   -    -]
# 3  Background short  Completed                   -    1082                 - [-   -    -]

Long (extended) Self-test duration: 72480 seconds [20.1 hours]
/sdi/
sudo smartctl -a /dev/sdi
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              H7280A520SUN8.0T
Revision:             PD51
Compliance:           SPC-4
User Capacity:        8,001,563,222,016 bytes [8.00 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca23b92e610
Serial number:        001551PLTGAV        2EJLTGAV
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Tue Mar 24 14:09:47 2026 PDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     35 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 17309:38
Manufactured in week 51 of year 2015
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  22
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  3186
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 16172273882890240

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       50         0        50    5604459     155774.651           0
write:         0        3         0         3    6848554      57486.022           0
verify:        0        0         0         0     344938          0.000           0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Failed in segment -->       3   17247                 - [-   -    -]
# 2  Background short  Failed in segment -->       3   17080                 - [-   -    -]
# 3  Background short  Completed                   -    3261                 - [-   -    -]

Long (extended) Self-test duration: 66960 seconds [18.6 hours]
/sde/ - no reported errors just for reference
sudo smartctl -a /dev/sde
[sudo] password for truenas_admin: 
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              H7280A520SUN8.0T
Revision:             PD51
Compliance:           SPC-4
User Capacity:        8,001,563,222,016 bytes [8.00 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca23bad9818
Serial number:        001551P2GL9V        2EK2GL9V
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Tue Mar 24 14:15:37 2026 PDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     33 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 17310:30
Manufactured in week 51 of year 2015
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  22
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  3195
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 16374849354072064

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        3         0         3    5679322     151716.935           0
write:         0        2         0         2    6721190      57723.033           0
verify:        0        0         0         0     173541          0.000           0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   17248                 - [-   -    -]
# 2  Background short  Completed                   -   17081                 - [-   -    -]
# 3  Background short  Completed                   -    3262                 - [-   -    -]

Long (extended) Self-test duration: 78480 seconds [21.8 hours]

History: Built new server. Did not have these errors before. Installed TrueNAS 25.04.2.6. (VM with HBA fully passthrough.) Reformatted drives to 4K. Began loading data. After about a week this starts happening.

ANY help or insight greatly appreciated, Thanks so very much!

The production date and the power on time and start-stop cycles do not really match, those are refurbished drives and have been resetted, I guess?

Well, they are old.

Accumulated power on time, hours:minutes 10034:23
Manufactured in week 49 of year 2015
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  14

“disks with errors” on the dashboard says 0”
I guess that is just zfs errors, not smart errors. But that may depend on the truenas version. There have been changes there, lively discussed elsewhere on this forum.

Maybe run some long smart tests?

2 Likes

@joeschmuck is the expert, but I’d say that the current sdb, sdh and sdi seem to consistently fail short SMART tests, and that sda is supicious.
Run long tests and check whether you have a warranty on these refurbished drives.

1 Like

Wow! Thanks for the quick replies.

@prez02 Agreed, they are old drives. the differences in POH are from me using them in different projects. the 17k hours are from their time in my main server. that could only connect 8 drives. The new chassis I have has a proper backplane and everything so im able to fit 12. 10k is probably the point i got them and 15k was another project drive. I wish i could afford to replace any right now.

I dont think they are those scammy smart wiped drives. They are sun branded HGSTs and even came in sun caddies, stickers and all.

@etorix no warranty. will do long tests after the format operation is complete. I don’t want to overload the controller. Just doing one format seems to make smartctl checks and such freeze the terminal for minutes while it waits.

Right now: I’ve swapped /sdd/ out with a spare, as that one had the most by a good margin of errors in the ECC log. Its being reformatted right now to 4k. so that it can be added and tested. Then will run long tests on the other drives. thats gonna be in about 20 hours from now though haha. Thinking of also shuffling the drives around and see if the errors follow. could be back connection? just something rubs me about 4 of these drives having issues after being put in new machine.

But very thankful for ideas and insight so far!

First, welcome to the TrueNAS forums. Wish it was under better circumstances.

This often is misleading. SMART OK/PASSED, All it means is that the drive Power On Self-Test (POST) diagnostics passed, it does not mean much more than that. This is generally a very basic functional check. Is the drive spinning, can the heads move, do the electronics seem to be working.

The “Failed in segment 3” means that during the drive self-test, the third portion of it’s testing, typically the media reading (manufacturer defined as I understand it), looking for unreadable sectors, failed. It does not state here which sectors failed.

S/N: 001551P2GL9V 2EK2GL9V appears fine, however run a SMART Long/Extended test. The errors are normal and were corrected.

S/N: 001551PLTGAV 2EJLTGAV absolutely has failures, however you can run a SMART Long test here as well, I’m positive it will also fail. I would replace this drive.

S/N: 001551P1ZJJV 2EK1ZJJV absolutely has failures, however you can run a SMART Long test here as well, I’m positive it will also fail. I would replace this drive.

S/N: 001549PT6PHV 2EJT6PHV has not had a SMART test run on it in forever! It failed really early on. You could have replaced it then, and this would explain all the corrected errors, but you can run a SMART Long test to verify the failure still exists.

The good news, these should all be under warranty if you purchased them new last year. RMA the drives that have failed. And run the Long test on the one drive that appears okay as it will validate if the media is still good.

Okay, now for the confusing part for me… These drives were manufactured in 2015? Where did you buy them from?

Check the warranty, see if it is valid. Also, always track the drives by the serial number, drive Idents can change.

Best of luck to you.

2 Likes

I would not claim that for myself but thank you.

Cheers