The drive failure goblins just visited and TrueNAS did everything perfectly.
With that being said, I’m not sure of the proper cleanup procedure.
Here is the situation, I have an 8 drive vdev with a hot spare. One of the drives in the pool gave up the ghost and TrueNAS replaced it with the hot spare.
I’m now in a situation where the vdev is still showing a problem because my hot spare is no longer there. (At least that’s my thinking.) I have another spare being delivered.
What do I do about the spare that is missing? Do I remove it from the spare vdev? Or do I wait for the new drive and somehow replace the spare with the new drive in the vdev?
Please post the output of sudo zpool status -v to confirm the situation, and use </> to format.
Assuming that the spare has kicked in but the ghost of the dead drive still lingers, you would use the GUI to replace the dead drive with the spare to make the change permanent.
When the new drive arrives, test it and then add it as a new spare to the pool.
@sailing_nut
If you are happy with the current Hot Spare being a permanent replacement, then simply remove the bad disk via software. The Hot Spare becomes permanent, and your Hot Spare category of vDev disappears. I don’t remember if you can do this through the GUI… But command line should be easy enough.
You should go over the documentation for your version of TrueNAS. If something isn’t clear, ask questions on it or your can submit Feedback on that section. There is a blue Feedback button on the far right of those webpages for that.
@etorix@Arwen If you see something off in the docs, Feedback to correct would be good or if clarity could be improved
You made the decision to have a hot spare when you designed your pool so logic would assume you would like to keep it that way even after a drive failure. Therefore you would replace the failed drive and the hot spare would automatically go back to being a spare again.
Like it’s been mentioned you can make the spare permanent but that’s not the default behaviour.
You made the decision to have a hot spare when you designed your pool so logic would assume you would like to keep it that way even after a drive failure. Therefore you would replace the failed drive and the hot spare would automatically go back to being a spare again.
Like it’s been mentioned you can make the spare permanent but that’s not the default behaviour.
If I put in a replacement and then move the original spare to being a spare again (if I understand how this works) I get hit with ANOTHER resilver. Not good for disk lice or system performance.
It seems to me that the default behavior SHOULD be to make the hot spare permanent in the vdev, then add a new spare.
Simple example:
Assume you have 2 or more vDevs, (Mirrors or RAID-Zx, it does not mater), and you use different size disks in each vDev, then using a Hot Spare that is equivalent to the largest disk in use, allows it to be a Hot Spare for any vDev.
It is then up to the SysAdmin to determine if they want to continue with the larger disk. Or, replace the larger Hot Spare with same size as the failing disk.
Remember, this GUI is not just for SOHO free users. It is used by Enterprise Data Center users that will probably have multiple vDevs in a pool. Or multiple pools. And may be in the process of upgrading disks in a pool. Thus, may even have used, but fully working same size disks, that used to be in another vDev.
Please also note that Hot Spare vDevs is the ONLY vDev type that can be shared across ZFS Pools. Of course, when actually in use, it will be tied to 1 pool exclusively until no longer needed.
I think of this a bit like driving a car and having a spare tyre. If you get a blowout then you replace the tyre with your spare but at the earliest opportunity you go get another tyre and put the spare back in the boot to use another day.
No you don’t resilver twice. Well I guess you do but not quite the way you’re saying.
Disk fails hot spare kicks in and resilvers. Then default behaviour is replace the failed drive which then resilvers and once complete hot spare drops back automatically to being the spare again with no user interaction needed.
Interestingly I was just reading the above docs and it does seem to encourage you down the path of promoting your hot spare and then adding a new spare later. This goes against my understanding of hot spares being temporary as the default behaviour with the option to make permanent.
This little message was tucked away in the docs.
Do I really need to promote the hot spare and then recreate the spare vdev?
If you have a hot spare inserted into the pool and then follow the instructions in Replacing a Failed Disk Without a Hot Spare, TrueNAS automatically returns the hot spare disk to the existing Spare vdev and ONLINE status.
However, we do not recommend this method, because it causes two resilver events: one when activating the hot spare and again when replacing the failed disk. Resilvering degrades system performance until completed and causes unnecessary strain on the disk.
To avoid unnecessary resilvers, promote the hot spare by [detaching the failed disk]](#detaching-a-failed-disk) then recreate the hot spare vdev.
If recreating the spare with a replacement in place of the failed disk, insert the replacement disk now. The new disk must have the same or greater capacity as the failed disk. If recreating the spare with an available disk in the system, proceed to the next step.