Hot spare for special vdev

Problem/Justification
(What is the problem you are trying to solve with this feature/improvement or why should it be considered?)

Currently, there is no option to add a hot spare to the meta/special vdevs. When attempting to add a hot spare, you are denied since the pool defers to the primary vdev storage devices.

Impact
(How is this feature going to impact all TrueNAS users? What are the benefits and advantages? Are there disadvantages?)

Should have no impact on existing users; hot spares require additional configuration for availability to the existing pools. The benefits would allow hot spares to replace degraded SSD’s in a metadata or special vdev pool and improve resiliency of any pools with special vdevs enabled.

User Story
(Please give a short description on how you envision some user taking advantage of this feature, what are the steps a user will follow to accomplish it)

Users would allocate a number of SSD’s available for a number of pools using special vdevs. Should one or more SSD’s in a number of special vdevs fail, the hot spare would stand in till either it is promoted or the degraded SSD is replaced for the special vdev.

If you can have a spare, just add it to the sVDEV to begin with. Two way mirror becomes a three way mirror.

2 Likes

SSD’s have write endurance limitations. Having a hot spare would improve pool resiliency from failure of the special vdev. Adding the disk to an existing special vdev configuration would apply writes to the disk, more so if you are using the special vdev to store small file/folders.

1 Like

Say for example you buy all 3-4 SSD’s of the same brand, make, model, & setup a mirror. All 3-4 disk will have equal writes and will likely have equal write endurance failures relatively within a short span of each other.

Can you currently add a spare with ZFS using the command line. If it is not possible with the base ZFS, I would say this request won’t go anywhere. If it isn’t needed for Enterprise, I don’t see this getting pushed to make changes to OpenZFS and TrueNAS

1 Like

I’ve not tried using command line, but from the UI, it errors out since it tries to default to the main vdev of the pool stating the drive is not large enough. I will try the command line method to see if it’ll allow it later.

From an enterprise perspective, I would see this as a critical feature, since you would be dealing with repeated re-writes in the special vdev if it is enabled for small file/folder storage. A hot spare would retain its write endurance unless needed, but that’s all use case for the business.

That’s one argument to buy used. Especially Intel data center SATA SSDs with insane write endurance. In soho use like my SSDs, they will live out the rest of their life loafing along with nary a worry.

I would focus on getting the redundancy right and worry less about having hot spares. For example, I used to have four SSDs dedicated to the sVDEV and ultimately decided that having a cold spare was more valuable. So it sits there in the stack for the day that it’s needed, ready to be pushed into place but presently not making an electric contact.

3 Likes

I have 2 spares just for svdevs plugged in, but why not allow hot spares for the purpose of automating increased reliability of special devs. You would increase fault tolerance by number of hot spares available. I have 3 pools with svdevs, apply 4 hot spares to those 3 pools would be drastic for reliability. SSD’s don’t suffer from failures for being hot spares, they fail generally due to write endurance. All my SSD’s have a 5 yr warranty write endurance. I have SSD’s spanning past 10 yrs still in use.

Are you really writing that much to the sVDEV that either the metadata partition or the small file block partition would ever wear out?

Unless your system is constantly churning through new data, neither of those partitions will be written to much. Use case obviously depends, however.

If your system is largely WORM like mine (Apps, VMs, and Time Machine backups excluded) then wear isn’t going to be a major issue.

However, if something managed to stress one drive in a sVDEV mirror enough to make a drive quit, do you really want a hot spare that needs a resilver (putting on even more stress on the remaining drives) or something ready to go?

That warranty doesn’t care whether the drive is plugged in or not, the clock starts ticking the minute the drive leaves the retailer.

For my use case, I go with a three-way mirror and call it a day. Just as I would build single pools with more VDEVs and not multiple pools with fewer VDEVs. But to each his / her / their own.

2 Likes

Are you making a point that hot spares are a bad idea for svdevs? Mitigate by buying used SSD’s and correct redundancy?
The arguments you are proposing against hot spares for svdevs are the same complaints you can levy against a standard pool setup, and I’m not sure why hot spares for svdevs would be a bad idea.

The reason I mention the ssd’s have a 5 year warranty means they have a reasonably rated write endurance. Generally, 600 TBW per terabyte of storage. I’ve mixed my branding a bit to stagger possible same bin failures, and the ones I use have easily surpassed their TBW rating in real-world use.

For enterprise appliances, yes. Security data log, deep packet inspections, SolarWinds, etc will cycle through small files writes/re-writes excessively.

You would need to resilver when replacing a drive hot spare or cold spare.

Hot spares have their use case. For me, hot spares of any kind make the most sense in applications where high uptimes are paramount and/or you have restricted access to the drives (ie colocation / data center). For home use, cold spares with high redundancy (Z3 or 3-way mirror) make more sense to me.

You can certainly “share” sVDEV hot spares among multiple pools, assuming you’ve normed drive capacities to allow that. I agree that besides buying used you could also buy drives with different expected DWPD ratings to allow for stuff to break pretty randomly. (Presuming that such failure rates are not just market segmentation imposed by marketing)

Anyhow, I still struggle with running multiple pools in one enclosure vs. just a single large pool with lots of VDEVs to ensure high IOPS, etc. There must be something I’m overlooking. Is this a IT security issue where data has to remain physically segregated?

This is non-prod & for home lab/personal use. The first 2 pools are segregated due to differences in physical drive capacities. Pool 1 consists of 14tb hdds, while pool 2 consists of 24tb hdds. The setup allows taking down either pool while allowing migration of vm’s across the pools in addition to the local host. This allows 3 storage failover points for vm migration. Each pool does have its unique data sets, so those are not available during maintenance. Eventually, Pool 2 will become the primary when Pool 1 retires, and a new pool will be stood up to restart the cycle.

Pool 3 is a test pool for active data with striped vdevs to maximize high IOPS. Everything in this pool is considered expendable. Processed data & backups from Pool 3, for long-term storage, is migrated to Pool 1 periodically.
Pool 1 & 2 has raidz1 svdev with 2 available ssds on standby.

Pool 3 has 4x striped svdevs.

Custom chassis will house 32x 3.5 hdds and 24x 2.5 ssds.

1 Like

After some research, this is a limitation of ZFS to allow hot spares for the primary pool only.