I setup a spare HDD of the same size as the other 3 HDD in my RAIDZ1. I was getting errors from the SMART tests and when I ran them manually for any of the testing types. They all would fail (every SMART test). I no longer wanted to wait for the drive to fail, but I started finding the docs as well as the method to offline the failing drive and how to replace with the spare. The device I placed into spare, was a single device. It shows it as a spare vdev. Did I place the spare drive in the wrong place for it to pull the hot spare into the vdev after taking the drive offline?
I was able to remove it from the space on the VDEV and put it into a standalone device. Forcing the offline device to be replaced by the now spare device available to replace the offline device. It is now working to re-write the data from the missing HDD offline. I have things up and going and will order a replacement while I figure out what the status of the drive offline. Data is safe but have two similar TrueNAS setups and want to fix the configs on the other system that is not seeing a drive issue. It however will need me to react to fix a failed drive instead of the NAS system to replace the failed drive in the system.
Iām not positive however I think your drive needs to fail a SCRUB for the spare to be called into service.
As a side note, if you are going to have an active spare, why not just create a RAIDZ2, this way you already have the built in redundancy. Just a thought. I know some people would rather not go that route, which is perfectly fine.
@joeschmuck - Thank you for the info. Iāll look into the RAIDZ2. The main reason I was looking for it to be hot spare is if another device; EG the camera NVR I could remove it from the NAS while I protect the RAID in NVR. Not likely that a healthy RAIDZ1 would fail before the hot spare would be replaced. I read that forcing it offline would pull it in, but I might have been wrong. Below is the links and how I thought it would force it to start.
Where I pulled that information is from the following document: Replacing Disks | TrueNAS Documentation Hub
To replace a disk in a pool with a hot spare:
-
Take the disk offline.
-
Detach the failed disk to promote the hot spare.
-
Refresh the screen.
-
Recreate the hot spare VDEV.
It has happened before.
That is fair. You have a plan, donāt make me change your mind. If that works for you then keep doing it. There are pros and cons to everything. So long as you have evaluated these things, then you should realize any risks with whatever you chose to do.
I personally find it helpful to generate my own step by step written procedure on how I would replace a failing drive. I find it better for me to understand what I am doing vice just reading an online document that sometimes may be vague. Then I print that out and stick it inside the computer case for tht day to come where I would need it. Of course Iād have the procedure on my computer as well, but to each his/her own.
Rewriting a procedure is a great way to learn as well.
Glad to see you have solved the problem, your pool is back up to Healthy condition, and you can go to sleep knowing you did a good thing today.
2 Likes
Spot on, young man!
I avoid hot spares and instead have a supply of qualified cold spares that sit next to the NAS, ready for use. My pain tolerance is lower though, so I run a Z3 here. The Z3 should give me more time to replace drives as they wear out, like the 8-yo drive last year that apparently broke its case seal and lost its Helium.
How many drives to keep spinning as spares vs. using them is a different question. I am not a fan of hot spares in an environment where you can readily change drives - i.e. such as a NAS in your own home, where you can access it 24/7. The calculus changes the more remote a system is and the higher the importance / availability of the the data on it.
So if the NAS is in a co-location site, data center, etc. and lead times to drive replacements might get long, more hot spares make sense since the work order is likely $$$ and the hassle of getting it done is equally un-fun. Similarly, if you have to drive hours to replace a drive, having a hot spare to give you more time to make the swap makes more sense.
But, hot spares consume energy and wear drive motors (at least for spinners). That can also add up in terms of electricity costs and make the hot spare more susceptible to failure over time once the day comes to resilver the pool.
If the main use case is a WORM archive NAS with data that does not need to be accessed 24/7, then a non-hot-spare option for dealing with the dreaded āpool degraded due to drive failureā message is to:
- offline the offending drive
- shut the NAS down immediately thereafter
- physically replace the bad drive
- boot back up
- import the pool
- āreplaceā the offlined drive with the spare via the GUI
- start the resilvering process
- run a scrub to clear the āpool degradedā error.
No hot spare needed.