So I mentioned some time ago that I noticed you can’t replace drives in your zpool if you have a checkpoint in-place which rules out the use of hot-spares however I’ve just stumbled across another annoying issue.
Just for context I have checkpoints auto created and discarded on my systems ever week just as a failsafe. I repalced a disk the other day after discarding the checkpoint which kicked off the resilver. My systems are rather large so resilvers will take days. I was checking in on the resilver most days and it was quite happily getting closer to competition until the next time I checked it had gone back to the start. I was puzzled at first and then realised what had happened. My cron checkpoint had been taken during the resilver. Checkpoint data doesnt form part of the resilver as those blocks are skipped. However once it got to the end of the resilver it decided it needed to go again and start over because the checkpoint blocks were out of sync essentially creating the image of a dog chasing its tail. The fix was simple, discard the last checkpoint and suspend the cron job until resilver was completed.
Anyway thought I’d share this little event with you all.
I will link this post to the the thread on checkpoints.
Out of curiosity, what was this resilver for? To bring a degraded pool back to a healthy status with a replacement disk?
I’ve only ever resilvered in all my years with ZFS when intentionally doing so to expand a pool. I’ve never had a situation where it happens automatically without me knowing about it, and I never use hot spares.
EDIT: The promising thing about this is that it shows you that checkpoints are inherently safe. Such “gotchas” can be inconvenient, but no data was ever at risk and you’re not forced into irreversible or one-way operations.
I’ll take inconvenience any day, as long as I also get to use data protective features.
I’m sure much of it has to do with using few drives in total.
I also tend to “replace” drives before they fail, thanks to falling prices and increased capacities.
I even have a few 4-TiB drives laying around for which I have no use, simply because they have been replaced with larger capacities. I might use them to build a test pool to play with or as a tertiary backup of less important things.
Maybe I can build the greatest pool in the world when AnyRaid™ arrives! A bunch of 2-TiB, 4-TiB, and maybe 8-TiB drives, for a pool of who knows what capacity! I shall store my most precious data on it.