What is a ZFS Pool “Checkpoint”?
If you do destructive or experimental actions against files and folders within a filesystem, you can always resort to rolling back to a dataset’s snapshot.
Even if you don’t wish to do a full rollback (which will ultimately lose any new data after the point-in-time of the snapshot), you can still retrieve old or deleted files from the read-only filesystem (i.e, "snapshot) on an individual basis.
But what safeguard exists if you do something outside of a filesystem?
What happens if you…
- …destroy a snapshot?
- …destroy an entire dataset?
- …add a new vdev to the pool that you instantly regret?
- …create chaos with a poorly vetted batch script that contains
zfs
commands? - …enable pool features that you immediately regret, thus forfeiting backwards compatibility?
- …rename / shuffle your dataset structure, only to immediately realize it was a bad idea?
This is where a pool checkpoint can come in handy.
You never want to find yourself in a situation where you need to resort to “rewinding” your pool back to a checkpoint, just as you never want to be in a situation where a seatbelt saves your life from a vehicle collision.
Ideally, you never mess up your pool.
Ideally, you never get into a car accident.
But just as seatbelts exist, so do pool checkpoints.
So then what is a ZFS Pool “Checkpoint”?
It is an immutable point-in-time state of the entire ZFS pool.
Managing Checkpoints with the command-line
To check the existence of a pool checkpoint, use the zpool get
command, and look for a “size” under the VALUE
column.
In this example, the pool “mypool” has no checkpoint:
zpool get checkpoint mypool
NAME PROPERTY VALUE SOURCE
mypool checkpoint - -
To create a checkpoint, use the zpool checkpoint
command:
zpool checkpoint mypool
Now we can see the VALUE
column has a “size”:
zpool get checkpoint mypool
NAME PROPERTY VALUE SOURCE
mypool checkpoint 540K -
If you want to remember when you created a checkpoint:
zpool status mypool | grep checkpoint
checkpoint: created Tue June 4 14:40:30 2024, consumes 540K
An empty output means that no checkpoint exists.
To discard a checkpoint, use the -d
flag in the command:
zpool checkpoint -d mypool
Now we see that there is no “size” under the VALUE
column once again:
zpool get checkpoint mypool
NAME PROPERTY VALUE SOURCE
mypool checkpoint - -
To actually “view” or “rewind” to a checkpoint requires that the pool is first exported, and then re-imported.
To access a pool’s checkpoint in a read-only state (such as retrieving particular data that exists on a dataset you outright destroyed):
zpool import --read-only=on --rewind-to-checkpoint mypool
To rewind to a checkpoint (which will discard everything you did after the checkpoint’s creation), remove the --read-only
flag:
zpool import --rewind-to-checkpoint mypool
Remember, you will lose everything after the checkpoint’s creation (including any newly added vdevs), and you will no longer have an existing checkpoint in the pool post-importation.
Important caveats about Checkpoints
Do not treat pool checkpoints as you would dataset snapshots.
There are some important caveats and distinctions:
- A pool can only have a single checkpoint
- A checkpoint’s contents cannot be accessed from a (normal) imported pool; you must export and re-import with the
--rewind-to-checkpoint
option to access checkpoint-exclusive content - A scrub on a (normal) imported pool will not check the data that only exists in the checkpoint
- A checkpoint is pool-wide, thus “rewinding” back to a checkpoint will undo everything in the pool that you’ve done after its creation
- You are not supposed to “sit” on a checkpoint: After you create one and then do some “stuff”, you should very soon make a decision on whether you want to discard the checkpoint or rewind to it
- You cannot remove or modify vdevs if a checkpoint exists
- You can add a new vdev after creating a checkpoint, in which rewinding the checkpoint will act as if the new vdev (including any files saved after its addition) never existed
TL;DR: What should I do?
- You want to try something that affects the entire pool or dataset(s). This includes “upgrading” pool features, destroying datasets or snapshots, adding a new vdev, trying out a batch script that uses
zfs
commands, receiving a replication stream to a dedicated backup pool that you might reconsider, and so on - Before doing so, you create a checkpoint with
zpool checkpoint mypool
- You go ahead and continue with whatever you decided on
- You assess the results. You need to make a decision, since it’s unwise to let a checkpoint “sit” in a pool for too long.
4a. Are you happy with the results?Discard the checkpoint with
zpool checkpoint -d mypool
4b. Are you unhappy with the results?Export the pool and then rewind to the checkpoint with
zpool import --rewind-to-checkpoint mypool
Always remember, kids!™
Use pool checkpoints as a safety net, with the mindset that you’ll never have to actually rewind your pool.
Wear seatbelts as a safety measure, with the mindset that you’ll never depend on them to save your life from a car accident.