ZFS Pool "Checkpoints": They work just like seatbelts! (Not really)

winnielinnie · June 4, 2024, 4:36pm

What is a ZFS Pool “Checkpoint”?

If you do destructive or experimental actions against files and folders within a filesystem, you can always resort to rolling back to a dataset’s snapshot.

Even if you don’t wish to do a full rollback (which will ultimately lose any new data after the point-in-time of the snapshot), you can still retrieve old or deleted files from the read-only filesystem (i.e, "snapshot) on an individual basis.

But what safeguard exists if you do something outside of a filesystem?

What happens if you…

…destroy a snapshot?
…destroy an entire dataset?
…add a new vdev to the pool that you instantly regret?
…create chaos with a poorly vetted batch script that contains zfs commands?
…trust ChatGPT with rm -rf against everything?
…enable pool features that you immediately regret, thus forfeiting backwards compatibility?
…rename / shuffle your dataset structure, only to immediately realize it was a bad idea?

This is where a pool checkpoint can come in handy.

You never want to find yourself in a situation where you need to resort to “rewinding” your pool back to a checkpoint, just as you never want to be in a situation where a seatbelt saves your life from a vehicle collision.

Ideally, you never mess up your pool.

Ideally, you never get into a car accident.

But just as seatbelts exist, so do pool checkpoints.

So then what is a ZFS Pool “Checkpoint”?
It is an immutable point-in-time state of the entire ZFS pool.

Managing Checkpoints with the command-line

To check the existence of a pool checkpoint, use the zpool get command, and look for a “size” under the VALUE column.

In this example, the pool “mypool” has no checkpoint:

zpool get checkpoint mypool

NAME       PROPERTY    VALUE    SOURCE
mypool     checkpoint  -        -

To create a checkpoint, use the zpool checkpoint command:

zpool checkpoint mypool

Now we can see the VALUE column has a “size”:

zpool get checkpoint mypool

NAME       PROPERTY    VALUE    SOURCE
mypool     checkpoint  540K     -

If you want to remember when you created a checkpoint:

zpool status mypool | grep checkpoint

checkpoint: created Tue June 4 14:40:30 2024, consumes 540K

An empty output means that no checkpoint exists.

To discard a checkpoint, use the -d flag in the command:

zpool checkpoint -d mypool

Now we see that there is no “size” under the VALUE column once again:

zpool get checkpoint mypool

NAME       PROPERTY    VALUE    SOURCE
mypool     checkpoint  -        -

To actually “view” or “rewind” to a checkpoint requires that the pool is first exported, and then re-imported.

To access a pool’s checkpoint in a read-only state (such as retrieving particular data that exists on a dataset you outright destroyed):

zpool import --read-only=on --rewind-to-checkpoint mypool

To rewind to a checkpoint (which will discard everything you did after the checkpoint’s creation), remove the --read-only flag:

zpool import --rewind-to-checkpoint mypool

Remember, you will lose everything after the checkpoint’s creation (including any newly added vdevs), and you will no longer have an existing checkpoint in the pool post-importation.

For TrueNAS systems, you must also include -R /mnt to the import parameters. This is not needed for vanilla ZFS systems, but it is required for TrueNAS.

Important caveats about Checkpoints

Do not treat pool checkpoints as you would dataset snapshots.

There are some important caveats and distinctions:

A pool can only have a single checkpoint
A checkpoint’s contents cannot be accessed from a (normal) imported pool; you must export and re-import with the --rewind-to-checkpoint option to access checkpoint-exclusive content
A scrub on a (normal) imported pool will not check the data that only exists in the checkpoint
A checkpoint is pool-wide, thus “rewinding” back to a checkpoint will undo everything in the pool that you’ve done after its creation
You are not supposed to “sit” on a checkpoint: After you create one and then do some “stuff”, you should very soon make a decision on whether you want to discard the checkpoint or rewind to it
You cannot remove or modify vdevs, which includes replacing a disk, if a checkpoint exists
You can add a new vdev after creating a checkpoint, in which rewinding the checkpoint will act as if the new vdev (including any files saved after its addition) never existed
A “hot spare” drive will not automatically activate if a pool has a checkpoint. This is an important consideration if you plan to use hot spares.
If you’re going to resilver a pool, you should discard the existing checkpoint and temporarily disable any automatic checkpoints. You can resume with automatic checkpoints after the resilver completes.

TL;DR: What should I do?

You want to try something that affects the entire pool or dataset(s). This includes “upgrading” pool features, destroying datasets or snapshots, adding a new vdev, trying out a batch script that uses zfs commands, receiving a replication stream to a dedicated backup pool that you might reconsider, and so on
Before doing so, you create a checkpoint with zpool checkpoint mypool
You go ahead and continue with whatever you decided on
You assess the results. You need to make a decision, since it’s unwise to let a checkpoint “sit” in a pool for too long.
4a. Are you happy with the results? Discard the checkpoint with zpool checkpoint -d mypool
4b. Are you unhappy with the results? Export the pool and then rewind to the checkpoint with zpool import --rewind-to-checkpoint mypool

Always remember, kids!™

Use pool checkpoints as a safety net, with the mindset that you’ll never have to actually rewind your pool.

Wear seatbelts as a safety measure, with the mindset that you’ll never depend on them to save your life from a car accident.

winnielinnie · June 4, 2024, 7:26pm

Here is a forum post feature request for the TrueNAS GUI to incorporate the checkpoint feature.

Here is a Jira ticket you can vote on.

somethingweird · June 4, 2024, 8:16pm

Excellent post! - Question, will checkpoint also work if you add an addition VDEV to the pool by mistake?

winnielinnie · June 4, 2024, 8:19pm

~~I believe it’s only possible if the added vdev is a mirror (and all existing vdevs are mirrors).~~

I remember reading about this very specific condition. I’ll have to retrieve the article. (I think it was posted in the FreeBSD Journal.)

EDIT: Found it. While not from the FreeBSD Journal, this blog post was written by the same author who published the article about checkpoints in the 2018 FreeBSD Journal.

Apparently, there is no distinction on the vdev type. So this is not limited to only mirrors.

winnielinnie · June 4, 2024, 8:31pm

I updated the guide to reflect what was discovered about adding new vdev(s).

Glad you asked, @somethingweird!

stk · October 27, 2024, 6:32pm

well, i finally had a need to try it out. You have to disable the jailmkr startup script and reboot before you can export.

But on reimport, I got this:

root@truenas[~]# zpool import --rewind-to-checkpoint 10077816991006409476
cannot mount '/main': failed to create mountpoint: Read-only file system
Import was successful, but unable to mount some datasets
root@truenas[~]#

so it was re-imported ok, but NOT mounted.

Trying a reboot now.

winnielinnie · October 27, 2024, 6:38pm

That makes sense for vanilla ZFS, but not for TrueNAS.

You’ll need to add -R /mnt to the import parameters. The reason I left this out is because an “altroot” is not necessary for non-TrueNAS systems.

However, I’ll go ahead and edit my post to include -R /mnt as a needed parameter for TrueNAS users.

winnielinnie · October 27, 2024, 6:42pm

Added a note to the post.

Click to view revised section

winnielinnie:

To access a pool’s checkpoint in a read-only state (such as retrieving particular data that exists on a dataset you outright destroyed):
zpool import --read-only=on --rewind-to-checkpoint mypool
To rewind to a checkpoint (which will discard everything you did after the checkpoint’s creation), remove the --read-only flag:
zpool import --rewind-to-checkpoint mypool
Remember, you will lose everything after the checkpoint’s creation (including any newly added vdevs), and you will no longer have an existing checkpoint in the pool post-importation.

For TrueNAS systems, you must also include -R /mnt to the import parameters. This is not needed for vanilla ZFS systems, but it is required for TrueNAS.

stk · October 27, 2024, 11:44pm

I figured this out after spending 5 hours in agony as to why it was trying to mount to / instead of /mnt.

Thank you, this will save people incredible pain.

I just followed your instructions and my system was toast. I had to manually set the mountpoint property for main and for ix-apps. Learned a lot though!

stk · October 28, 2024, 7:28pm

can you edit the original post to correct it? That’s what people will see.

thanks!

winnielinnie · October 28, 2024, 7:54pm

It’s already edited.

stk · October 28, 2024, 11:06pm

It looks the same as the original even after page reload:

There is no -R option added to the import. Am I missing something?

Stux · October 28, 2024, 11:13pm

@winnielinnie I’d suggest emboldening “For TrueNAS systems” in your [i] block.

And maybe dropping a link to your “revised” comment.

winnielinnie · October 28, 2024, 11:33pm

Bolded.

The thing about TrueNAS vs “ZFS” is you jump away from upstream defaults.

-R /mnt is one of them, but so is -d /dev/disk/by-partuuid, and the irregular location of the “cachefile” (/data/zfs/zpool.cache) for TrueNAS, compared to upstream OpenZFS (/etc/zfs/zpool.cache).

That’s why I leave the examples as “vanilla” as possible, since they can be applied and tweaked for any system.

It also highlights why it’s important for TrueNAS to incorporate these useful features into their GUI, since a lot happens “under the hood”.

My ticket on Jira was “closed” because they want us to use the forums for feature requests instead.

Johnny_Fartpants · January 9, 2025, 10:16am

Indeed just learned this. Practically this means hot-spares don’t work and disk replacements will fail until you release your checkpoint.

etorix · January 9, 2025, 12:04pm

Ouch! That one is severe.

Johnny_Fartpants · January 9, 2025, 12:46pm

Indeed. Now I need to decide what do I want more checkpoints or hot-spares

etorix · January 9, 2025, 1:35pm

Checkpoints are a possible safeguard against some user errors.
Hot spares are a safeguard against hardware failures.

Do you trust your drives more than you trust yourself?

Johnny_Fartpants · January 9, 2025, 1:36pm

Well said

winnielinnie · January 9, 2025, 1:38pm

Being that I have daily access to my NAS server, hot spares would only be a waste of electricity and wear on the “spare” HDD(s).

I have some unattached “cold spare” drives (of equal size) in the same room as the server, which I occasionally check with SMART tests and badblock scans. (I then unplug them and store them safely again, as they wait to eventually be used in the future.)

If the time comes to resilver a degraded pool, I can start the process on the same day.