Smart tests reporting pending sector errors

Greetings,

I’m running ElectricEel-24.10.2 and I have smart short tests running periodically. It seems that when they run, I get alerts to pending sector errors on 1 or 2 ssd’s in my storage pool. I’m wondering if these alerts are bogus because whenever I read the Curent_Pending_Sector raw count from smartctl -x, the count is always 0 for the ssd in question. Am I interpreting this correctly? Could there be a bug in the alerting? Everything else seems fine, I don’t have any other errors with these drives.

thanks
John

You have asked a question without providing any read data for us to examine and give you a proper analysis.

Joes Rules (link in my signature) has a list of data to provide for a given problem type, of which drive issues is one of those. To get some good and accurate help, please provide the required data.

As for the error messages you recieve, paste the exact full error message. Also ensure you track your drives using a serial number, the drive IDs can change on you with each reboot, even though they often remain the same, but they can an will change periodically when you least expect it.

What I’d like to see is:

  1. The full output from each suspect drive of smartctl -x /dev/??? in code brackets.
  2. zpool status -v
  3. The exact error message(s).
  4. You say pending sector errors on 1 or 2 of your SSDs. Is it 1 or 2?
  5. Do not assume we understand you. Assume we are idiots and you have to explain everything in detail. It sounds a bit harsh but the worst thing a person could do is make an assumption that causes more harm.

Well time for me to call it a night. If you post the required data, someone or I will offer assistance.

I don’t know without seeing the data.

Could be, but again, need to see the data.

And then a failure happens when you wished you saw it earlier and could plan for it. It happens to the best of us.

2 Likes

Thanks for taking the time to look at this. I’m running ElectricEel-24.10.2

So I’m getting alert notifications thru the TrueNAS UI with the following messages. They seem to be triggered following the S.M.A.R.T short tests I have scheduled. I don’t see these alerts every day just once or twice a week, the S.M.A.R.T tests run daily at midnight.

  • Device: /dev/sdc [SAT], 1 Currently unreadable (pending) sectors
  • Device: /dev/sdh [SAT], 1 Currently unreadable (pending) sectors

My pool seems fine and I do not see any read, write or checksum errors. A scrub completes fine without any problems:

admin@nas01$ zpool status -L zvol
  pool: zvol
 state: ONLINE
  scan: scrub repaired 0B in 00:04:27 with 0 errors on Tue Mar 25 21:56:30 2025
config:

	NAME        STATE     READ WRITE CKSUM
	zvol        ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    sdi1    ONLINE       0     0     0
	    sdj1    ONLINE       0     0     0
	    sdc1    ONLINE       0     0     0
	    sdg1    ONLINE       0     0     0
	    sdh1    ONLINE       0     0     0
	logs
	  mirror-3  ONLINE       0     0     0
	    sdd1    ONLINE       0     0     0
	    sde1    ONLINE       0     0     0
	spares
	  sdf1      AVAIL

errors: No known data errors

When I examine the data from the SSD’a using smartctl, I do not see the Current_Pending_Sector count incrementing. As a matter of fact it is always 0. So, I’m wondering if these alerts are legitimate? I am aware that drive names can change following a reboot so this data is collected from both SSD’s using smartctl -x /dev/sdX

[sdc.txt|attachment](upload://MZCRF9CqB8fRjAmLEL70yB1g7N.txt) (32.7 KB)
[sdh.txt|attachment](upload://hHHM7CXrI2IDL6l0dh86lGnCcos.txt) (18.1 KB)

regards
John Rushford

Files are in a codeblock so not downloadable, sticking them here.

sdc.txt
sdh.txt

1 Like

Thanks!

/dev/sdc (P220EDCB23102704018) has some ICRC (Interleaved Cyclic Redundancy Check) errors and UDMA CRC errors, could be caused by a bad cable connection. I’d reseat the connection and confirm if this continues. The values are quite low though so may not be of concern.

199 UDMA_CRC_Error_Count    -O--CK   100   100   000    -    2
SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2           20  Command failed due to ICRC error

/dev/sdh (P220EDCB23102704033) looks fine.

I don’t see anything in relation to bad/pending sectors. Have you rebooted since the alert? Maybe the labels have been shuffled around.

2 Likes

No, I haven’t rebooted and I’m aware that the labels can move around following a reboot. I’m 100% positive that the data shown corresponds to the ssd’s identified in the alerts. I did have some issues with the LSI-9300-16i HBA, I had to change the PCIe bus speed to gen2 in the bios. This seems to have corrected that issue I saw when I initially built this pool. Perhaps that accounts for the CRC errors.

What’s the firmware version?
sudo sas3flash -list

Here ya go:

admin@nas01$ sudo sas3flash -list
Avago Technologies SAS3 Flash Utility
Version 16.00.00.00 (2017.05.02)
Copyright 2008-2017 Avago Technologies. All rights reserved.

	Adapter Selected is a Avago SAS: SAS3008(C0)

	Controller Number              : 0
	Controller                     : SAS3008(C0)
	PCI Address                    : 00:00:10:00
	SAS Address                    : 500062b-2-0400-d540
	NVDATA Version (Default)       : 0e.01.00.07
	NVDATA Version (Persistent)    : 0e.01.00.07
	Firmware Product ID            : 0x2221 (IT)
	Firmware Version               : 16.00.01.00
	NVDATA Vendor                  : LSI
	NVDATA Product ID              : SAS9300-8i
	BIOS Version                   : 08.37.00.00
	UEFI BSD Version               : 06.00.00.00
	FCODE Version                  : N/A
	Board Name                     : SAS9300-16i
	Board Assembly                 : 03-25600-01B
	Board Tracer Number            : SP82614431

	Finished Processing Commands Successfully.
	Exiting SAS3Flash.

Please upgrade to P16.00.12.00, this has fixed some issues with SSDs.

1 Like

Ok, I’ll do that. Thanks very much, I appreciate your help!

Ok, I flashed my HBA with the 16.00.12.00 firmware, everything looks ok after the reboot. I’m going to monitor for a few days and then I might change the PCIe speed bios setting back to Auto from gen2.

1 Like