24 Currently unreadable (pending) sectors - but everything in GUI is green

I keep receiving email alerts from TrueNAS Scale, first was 11 days ago, 5, 4 days ago and yesterday:

New alerts:

    Device: /dev/sdb [SAT], 24 Currently unreadable (pending) sectors.

Current alerts:

    Device: /dev/sdb [SAT], 24 Currently unreadable (pending) sectors.

However, when I open GUI, everything seems to be fine with /dev/sdb - all S.M.A.R.T. results (Short Offline, Extended Offline) are SUCCESS, “Disks with errors: 0 of 2”, etc.

(I was planning to paste screenshots here, but forum does not allow me to do that, even as links)

The smartclt -A shows this:

[00:58:32][truenas_admin@NAS]:~$ sudo smartctl -A /dev/sdb
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   081   064   044    Pre-fail  Always       -       137172312
  3 Spin_Up_Time            0x0003   091   090   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       28
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   077   060   045    Pre-fail  Always       -       53250792
  9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       2886
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       27
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   062   050   040    Old_age   Always       -       38 (Min/Max 37/49)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       20
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1217
194 Temperature_Celsius     0x0022   038   050   000    Old_age   Always       -       38 (0 23 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       24
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       24
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       2779 (130 17 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       25630423184
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       36875878123

Note that this is a refurbished drive.

Here is some OS version and hardware info:

[01:13:45][truenas_admin@NAS]:~$ echo "$(</etc/version)"
25.04.2.4
[01:14:06][truenas_admin@NAS]:~$ lspci | grep -i -E "sata|raid|sas|storage|hba"
00:12.0 SATA controller: Intel Corporation Celeron/Pentium Silver Processor SATA Controller (rev 06)
[01:14:14][truenas_admin@NAS]:~$ sudo smartctl -i /dev/sdb
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     OOS16000G
Serial Number:    XXXXXXXXXX
LU WWN Device Id: XXXXXXXXXX
Firmware Version: OOS1
User Capacity:    16,000,900,661,248 bytes [16.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5894
ATA Version is:   ACS-4 (minor revision not indicated)
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Nov 19 01:14:19 2025 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

[01:18:24][truenas_admin@NAS]:~$ uname -r
6.12.15-production+truenas

How should I interpret this? Should I replace this drive immediately? Why does GUI report everything OK?

Drive should be considered suspect - I’d personally get a replacement drive & start burning in the replacement, then after the 10,000 hours of blackblocks & smart long tests, I’d replace the suspect drive (if replacement passed all tests).

Drives have manufacturer set thresholds for failure - technically the drive has not reached the threshold & has passed the test, therefor it is ‘okay’. Think of it as going to a doctor who passed their exams with 51% (or whatever the threshold for failure is depending on school) - technically everything is fine, but consider a replacement :stuck_out_tongue:

Depending on your risk tolerance/budget for replacement parts you could just let the drive ride & monitor the dead sectors. If they start increasing in quantity then replace asap, otherwise the failed sectors have been marked by the drive & will not be used for new data.

As for why no data lost, either those sectors didn’t have any when they were marked as failed, or the drive was able to copy over the data to working sectors before blacklisting the failed ones successfully.

tldr; have a replacement ready & either swap asap or monitor & expect to swap when failed sectors increase

1 Like

Thanks for the insight! In spite of being a refurb, the seller gives 2y warranty, so I think I’ll get the replacement, install it, but then I’ll wipe the faulty one and contact them. Hopefully I’ll end up with a working spare for the next failure :wink:

Could you clarify what do you mean by “start burning in”?

Generally it is a good idea to thoroughly test a drive prior to deploying it to avoid early failures & so that seller warranty can be used instead of manufacturer (generally less shipping & headache) for failed drives.

The way I like to do it is with a Smart -long test, then to check the output, then by doing a full run of badblocks on the drive. Badblocks will fully write & read the drive several times (three?), checking every single sector for read/write failures. This is a destructive test, so caution must be used if you’re running the badblocks command on any system with data that you wish to keep; no typos allowed! I prefer connecting the drive to a system that doesn’t have any data I care about.

If you do run it in TrueNAS, keep in mind that the console also has to stay open while badblocks runs, so you can’t run it from the shell unless you tmux; these things would be explained in detail in a proper guide.

Edit: if you’re going to be running badblocks in Truenas, pay super extra special attention that you’re running it on the correct drive(s); the sda/b/c/etc naming structure can change arbitrarily after a boot. Pay VERY close attention before executing the command. Do not have the drive in a pool when running the command.

Anyway, after badblocks passes, I do another smart long & then I consider it ‘good enough’ if it passes everything. These are very long tests, I think it took 5 or 6 days total for my 8tb drives (per drive, but you can tests multiple drives at the same time).

This is a quick overview, there are in-depth guides somewhere on the forums…

1 Like

Just finished first long SMART test on the new drive, so I figured, I’ll run badblocks. Unfortunately:

badblocks: Value too large for defined data type invalid end block (15625879552): must be 32-bit value

And that’s by design and won’t be fixed

If you really want to try your luck, you can specify a larger block size for testing, i.e. badblocks -b 16384. But I wouldn’t really trust badblocks on large devices; it wasn’t designed for this purpose.

I wonder if using fio would be the same thing? Care you share your thoughts on the issue?

I mean something like this:

fio --name=burnin \
    --filename=/dev/sdb \
    --rw=write \
    --bs=1M \
    --direct=1 \
    --numjobs=1 \
    --iodepth=32 \
    --loops=1 \
    --size=100% \
    --do_verify=1 \
    --verify=md5 \
    --verify_fatal=1 \
    --time_based=0
1 Like

It does four write-read cycles. Took five days to complete on my 4x12TB set running the test in parallel, so it’s definitely an exercise in patience.

@shalak if you decide to try badblocks anyway, I had good luck with this project:

GitHub - ezonakiusagi/bht: bulk hdd testing with badblocks script

It uses 32K block size by default.

2 Likes

You could try bbf as a replacement.