This is a rather odd question about zfs levels and vdevs

This is a random thought while eating lunch…

It is standard someone to say with a raidz2 setup for a pool two drives can be lost without data loss. That is true.

Pools themselves are made of one or more vdevs which contain the drives and the drive configuration such as mirror, z1, z2, z3.

So when someone talks about having a pool with a raidz2 setup and it can lose 2 drives without data loss that is true to a point. I ask because if a pool consists of two vdevs and each vdev is configured as a raidz2 then the pool itself could lose up to 4 drives without data loss provided the loss was distributed so each vdev lost no more than 2 drives. Correct?

Yes.

2 Likes

Correct, but redundancy really is at vdev level: If you lose a whole vdev, you lose the pool.
So a pool of multiple raidz2 vdevs can be entirely lost by the failure of no more than three drives, no matter how many vdevs there are. More vdevs provides more space and more performance, but not more resiliency.

True.

is this where hot spares come in play? Temporary mitigate failure of pool

That’s not specifically related to the question, but hot spares are a way to limit the time when a pool is degraded.
As you can see from some recent threads, raidz2 is supposed to be reasonably resilent… but if one doesn’t react quickly to a first failure, it only takes overheating drives or a port multiplier going in the way to end up in a bad place.

2 Likes

Indeed.

All the recent posts got me debating if I should turn my cold spare into a hot one…

Maybe.

2 Likes

Unless you don’t have easy access to the server, I wouldn’t. A hot spare is wearing out at the same rate as a disk that’s part of the pool. If you think a “hot spare” is appropriate, consider RAIDZ3 instead of RAIDZ2.

4 Likes

Exactly.

1 Like

The recent posts on lost pools is what made me think of the question.

…and that is what BACKUPS (in plural) are for…

2 Likes

This.

@winnielinnie confirmed paying by the word to post

5 Likes

I’m really surprised they don’t have systems with “hot spares” that are actually kept powered down unless they’re needed due to a disk failure.

1 Like

In some ways, that would be an Enterprise like feature. I am not saying consumer hardware can’t have such a thing. Or should not have such a thing. I can imagine such a feature would be less wanted in the consumer hardware space if it costs more.

Some HDDs have the ability to stay in reset mode, based on the old 3.3v SATA / SAS power line. That 3.3v line has been repurposed for a bit of high availability. A disk enclosure can then power up drives in sequence to avoid high current load on power up. Or even reset a drive that appears to be hung. Then in theory, SAS enclosure services could allow program access to such a feature.

Another way to look at it, is to have several cold spare HDDs and cycle them through your server like rotating the tires on your car. (At least back in the bad old days when your vehicle’s spare tire was of the same size & type as the 4 running tires…) If you did such a HDD rotation, this might help people to first verify backups before disk changes. And next, practice disk changes before the excrement hits to rotating impellers.

I guess my question is: isn’t this feature common in enterprise hardware? I can understand why it would not be in consumer hardware, but it seems like a no-brainer in enterprise hardware. Surely enterprises aren’t running racks of expensive hard drives for thousands of hours, doing nothing and just waiting for other drives to fail.

I’m unsure about Scale/CE because I’ve not upgraded yet, but Core has the ability to set a individual disk to spin down, could this be used to spin down a hot spare?

Imagine a hot spare won’t be accessed unless its being used to replace a failed hdd at which point it would spin up, or would TrueNAS access it occasionally for other tasks and cause it to spin up again ?

As @dan has mentioned correctly, a cold- or hot- spare does not really make much sense.
If you already have spent your money on that HDD, you should use a RaidZ3 setup, because then in case of any failure, your system ALREADY has 3 parity.
In case of a spare, the system have to resilver the pool and that is a long lasting (it can easily take longer than a day for a 16+TB drive), high load on the individual HDDs.
That is a really sensitive period, because, if you used a Z2 in the beginning, and the first drive is dead, you are one (and a half) HDD failure away from the data loss. If you set it up Z3 in the first place, you are 2 (and a half) failure away.