Smart tests reporting pending sector errors

jjrushford · March 28, 2025, 12:59am

Greetings,

I’m running ElectricEel-24.10.2 and I have smart short tests running periodically. It seems that when they run, I get alerts to pending sector errors on 1 or 2 ssd’s in my storage pool. I’m wondering if these alerts are bogus because whenever I read the Curent_Pending_Sector raw count from smartctl -x, the count is always 0 for the ssd in question. Am I interpreting this correctly? Could there be a bug in the alerting? Everything else seems fine, I don’t have any other errors with these drives.

thanks
John

joeschmuck · March 28, 2025, 2:58am

You have asked a question without providing any read data for us to examine and give you a proper analysis.

Joes Rules (link in my signature) has a list of data to provide for a given problem type, of which drive issues is one of those. To get some good and accurate help, please provide the required data.

As for the error messages you recieve, paste the exact full error message. Also ensure you track your drives using a serial number, the drive IDs can change on you with each reboot, even though they often remain the same, but they can an will change periodically when you least expect it.

What I’d like to see is:

The full output from each suspect drive of smartctl -x /dev/??? in code brackets.
zpool status -v
The exact error message(s).
You say pending sector errors on 1 or 2 of your SSDs. Is it 1 or 2?
Do not assume we understand you. Assume we are idiots and you have to explain everything in detail. It sounds a bit harsh but the worst thing a person could do is make an assumption that causes more harm.

Well time for me to call it a night. If you post the required data, someone or I will offer assistance.

I don’t know without seeing the data.

Could be, but again, need to see the data.

And then a failure happens when you wished you saw it earlier and could plan for it. It happens to the best of us.

jjrushford · March 28, 2025, 4:41pm

Thanks for taking the time to look at this. I’m running ElectricEel-24.10.2

So I’m getting alert notifications thru the TrueNAS UI with the following messages. They seem to be triggered following the S.M.A.R.T short tests I have scheduled. I don’t see these alerts every day just once or twice a week, the S.M.A.R.T tests run daily at midnight.

Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors
Device: /dev/sdh [SAT], 1 Currently unreadable (pending) sectors

My pool seems fine and I do not see any read, write or checksum errors. A scrub completes fine without any problems:

admin@nas01$ zpool status -L zvol
  pool: zvol
 state: ONLINE
  scan: scrub repaired 0B in 00:04:27 with 0 errors on Tue Mar 25 21:56:30 2025
config:

	NAME        STATE     READ WRITE CKSUM
	zvol        ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    sdi1    ONLINE       0     0     0
	    sdj1    ONLINE       0     0     0
	    sdc1    ONLINE       0     0     0
	    sdg1    ONLINE       0     0     0
	    sdh1    ONLINE       0     0     0
	logs
	  mirror-3  ONLINE       0     0     0
	    sdd1    ONLINE       0     0     0
	    sde1    ONLINE       0     0     0
	spares
	  sdf1      AVAIL

errors: No known data errors

When I examine the data from the SSD’a using smartctl, I do not see the Current_Pending_Sector count incrementing. As a matter of fact it is always 0. So, I’m wondering if these alerts are legitimate? I am aware that drive names can change following a reboot so this data is collected from both SSD’s using smartctl -x /dev/sdX

[sdc.txt|attachment](upload://MZCRF9CqB8fRjAmLEL70yB1g7N.txt) (32.7 KB)
[sdh.txt|attachment](upload://hHHM7CXrI2IDL6l0dh86lGnCcos.txt) (18.1 KB)

regards
John Rushford

essinghigh · March 28, 2025, 4:57pm

Files are in a codeblock so not downloadable, sticking them here.

sdc.txt
sdh.txt

jjrushford · March 28, 2025, 4:59pm

Thanks!

essinghigh · March 28, 2025, 5:09pm

/dev/sdc (P220EDCB23102704018) has some ICRC (Interleaved Cyclic Redundancy Check) errors and UDMA CRC errors, could be caused by a bad cable connection. I’d reseat the connection and confirm if this continues. The values are quite low though so may not be of concern.

199 UDMA_CRC_Error_Count    -O--CK   100   100   000    -    2

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2           20  Command failed due to ICRC error

/dev/sdh (P220EDCB23102704033) looks fine.

I don’t see anything in relation to bad/pending sectors. Have you rebooted since the alert? Maybe the labels have been shuffled around.

jjrushford · March 28, 2025, 5:14pm

No, I haven’t rebooted and I’m aware that the labels can move around following a reboot. I’m 100% positive that the data shown corresponds to the ssd’s identified in the alerts. I did have some issues with the LSI-9300-16i HBA, I had to change the PCIe bus speed to gen2 in the bios. This seems to have corrected that issue I saw when I initially built this pool. Perhaps that accounts for the CRC errors.

etorix · March 28, 2025, 5:17pm

What’s the firmware version?
sudo sas3flash -list

jjrushford · March 28, 2025, 5:18pm

Here ya go:

admin@nas01$ sudo sas3flash -list
Avago Technologies SAS3 Flash Utility
Version 16.00.00.00 (2017.05.02)
Copyright 2008-2017 Avago Technologies. All rights reserved.

	Adapter Selected is a Avago SAS: SAS3008(C0)

	Controller Number              : 0
	Controller                     : SAS3008(C0)
	PCI Address                    : 00:00:10:00
	SAS Address                    : 500062b-2-0400-d540
	NVDATA Version (Default)       : 0e.01.00.07
	NVDATA Version (Persistent)    : 0e.01.00.07
	Firmware Product ID            : 0x2221 (IT)
	Firmware Version               : 16.00.01.00
	NVDATA Vendor                  : LSI
	NVDATA Product ID              : SAS9300-8i
	BIOS Version                   : 08.37.00.00
	UEFI BSD Version               : 06.00.00.00
	FCODE Version                  : N/A
	Board Name                     : SAS9300-16i
	Board Assembly                 : 03-25600-01B
	Board Tracer Number            : SP82614431

	Finished Processing Commands Successfully.
	Exiting SAS3Flash.

etorix · March 28, 2025, 5:24pm

Please upgrade to P16.00.12.00, this has fixed some issues with SSDs.

jjrushford · March 28, 2025, 5:25pm

Ok, I’ll do that. Thanks very much, I appreciate your help!

jjrushford · March 28, 2025, 6:10pm

Ok, I flashed my HBA with the 16.00.12.00 firmware, everything looks ok after the reboot. I’m going to monitor for a few days and then I might change the PCIe speed bios setting back to Auto from gen2.

Topic		Replies	Views
Should I be worried about this error message? TrueNAS General Hardware	12	156	November 21, 2025
TrueNAS SCALE random full system hangs, pool sometimes degrades, SMART shows CRC errors and some past read failures, need help isolating root cause TrueNAS General SCALE , Hardware	12	61	February 13, 2026
Two ZFS drives failed at once? Hmm TrueNAS General	22	344	March 27, 2025
LSI - N2225 HBA - SAS3008 - Firmware Version : 16.00.12.00 TrueNAS General SCALE , Hardware	11	523	February 10, 2025
TrueNAS starts but is inaccessible when detaching the original Disk after pool Migration TrueNAS General SCALE , ZFS	18	95	July 3, 2025

Smart tests reporting pending sector errors

Related topics