New Toshiba 18TB SAS - Errors corrected by ECC

I just got my hands on a few new Toshiba 18TB SAS drives (MG09SCP18TA). I’m currently doing a burn in test on them.

Only starting to build my familiarity with SAS drives, so this may be completely normal. But I’m trying to work out if I should be concerned with ‘errors reported by ECC’. Looking through past posts on this form and others suggests it may not be too much of a concern (if they stay low). But I’m seeing 3-9 of these for each drive only after a single short and long SMART test. So, imagine they will climb rather quickly.

I’m not assuming there are issues with the drives. Since I’m seeing the same behavior on multiple new drives (although from the same prod batch). But I am curious if there is perhaps a compatibility issue between TrueNAS/ZFS and this particular type of drive? And if so, should I take this as an indicator that I should not add them to my primary pool just yet?

SMART output below, FYI –

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     33 C
Drive Trip Temperature:        65 C

Accumulated power on time, hours:minutes 60:29
Manufactured in week 46 of year 2023
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  7
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  8
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        6         0         0          0      48823.577           0
write:         0        0         0         0          0          5.152           0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -      44                 - [-   -    -]
# 2  Background long   Self test in progress ...   -     NOW                 - [-   -    -]
# 3  Background short  Completed                   -       2                 - [-   -    -]

Long (extended) Self-test duration: 92940 seconds [25.8 hours]

Post your TrueNAS hardware details. How are the drives attached?

I have them all sitting in my SilverStone RM43-320-RS 4U rack, which has a backplane of multiple Mini-SAS SFF-8643 12 gbs interfaces. They’re directly connected to multiple Lenovo 430-16i’s on an EPYC-based build with ECC. I also have other SATA drives in the same rack, but the SMART output is much different for those drives.

If I understand correctly that these drives report about their own corrections during normal operation, kudos to Toshiba for this extra transparency.

Makes sense. So no need to be concerned if the drives are handling those type of errors, even if there are many?

MG09 Series Non-recoverable Error Rate is 10 error per 10^16 bits read.
18TB is .158*10^15 bit.
6.3 times the full volume is read, one unrecoverable error may occur.
Considering this, an ECC error is reasonable.

2 Likes