Failed disk but spare is still listed as spare

I have no idea what’s going on. I had a disk fail (sdax). I had a spare in the system(sdi).

both sdi and sdax are now listed in one of my vdevs, but sdi is still listed in spare as well. However the system reports I have an unused disk sdax…so they are both spares and in the vdev? The vdev is still listed as degraded, however its 11 wide and with sdi in there it should be whole. I don’t know what my next steps are.

The spare (sdi) has done its job and although still listed as a spare it’s crucially ‘unavailable’ as it’s now a pool member proper. The pool is still degraded as you no longer have an available spare.

The original disk (sdax) was faulted by TrueNAS for some reason often because its producing too many errors or not responding in good time.

Next steps are to replace the failed drive (sdax) NOT the spare. Once resilver is complete the spare should automatically return to being a spare again.

Sdax is not a spare.
First, confirm that sdax is indeed bad. I recommend running sudo smartctl -a /dev/sdax and posting the output.

If it is bad, the next steps would be:

  1. Take the disk offline.
  2. Detach the failed disk to promote the hot spare.
  3. Refresh the screen.
  4. Recreate the hot spare VDEV.

The steps above are from the official documentation for version 25.04.

That is correct if you want the Hot Spare (sdi) to permanently replace the failed drive.

If you want the Hot Spare (sdi) to go back to being a Hot Spare, then what @Johnny_Fartpants said is correct.

See Replacing Disks | TrueNAS Documentation Hub for a discussion of both approaches.

1 Like

Or to confirm the temporary replacement and make sdi a permanent pool member. The you can bring in a new spare.

2 Likes

Both options are fine. :slight_smile:

thanks for all the help. I am so used to raid systems automatically kicking a drive out and replacing it. I have detached the failed drive (sdax) and everything went green.

I will head out there later today and replace the drive.

From what I understand now, I technically want in a degraded state as all drives in the array were there, but it was reporting because my spare was now missing, as it had been assigned? In other words, I still had the full 2 parity drives in my Z2 vdev, even though it was complaining?

Yep you got it.

I like to order a replacement for the spare and burn it in, moving the spare into active use, otherwise they just get older while idling.

So far I have not seen any troubleshooting to determine if the driver is bad or if it is the HBA or a cable (data or power)? @neofusion has asked the question to push you into the right direction.

@Scott_Guenther do not assume the drive is bad until you know the drive is bad. Maybe it is bad, but find out.

I have a set of troubleshooting flowcharts in the resources on the forums and linked below for faster access. Prove it before you just replace the drive.

1 Like