ZFS has a feature called “pool checkpoints”
This is a simple yet powerful feature that can prevent data loss and safeguard against ruining an entire pool.
TrueNAS, which leverages the robustness and power of ZFS, does not provide a way to manage, monitor, and automate pool checkpoints.
While pool checkpoints share similarities with dataset “snapshots”, their scope and usage are very different and highly specific.
Pool checkpoints have some important caveats and they are not for everyone.
Read this topic to understand what they are, why they matter, how to use them, and caveats about their consideration for your use-case.
For what it’s worth, I’m currently using them with a custom Cron Job that runs a simple command.
Here is my Cron Job
Cron Job
Description: Daily checkpoint for my pool
Command: zpool checkpoint -d -w mypool; zpool checkpoint mypool
Run As: root
Schedule: 0 3 * * * (03:00 every day)
What this does is create a new checkpoint daily at 03:00. This means that I will never sit on a checkpoint for more than 24 hours, while hopefully giving me enough time to “use” a checkpoint if something goes wrong, as long as I “rewind” the pool or disable the Cron Job before the next time the clock hits 03:00. (Otherwise, the Cron Job will replace the good checkpoint with a new checkpoint after the mistake was made.)
The -w flag is important for the Cron Job, since you want the first half of the command to “exit” only after it finishes discarding the checkpoint. The semicolon is also important, since you want them to run one after the other, even if the first half “fails” because “no checkpoint currently exists”.
How should TrueNAS integrate “pool checkpoints” into its product?
This is not easy to answer.
It needs to be available in the GUI for manual operation, management, and review.
It needs to have “automatic triggers” that create a fresh checkpoint immediately before a pool-modification task.
It needs to be available as an automated task in the same way that snapshots can be automated.
It needs to be available in the Pool Import wizard.
It needs to be considered for other parts of the middleware that could error or fail in the presence of an existing checkpoint.
My non-dev proposition
I am not a developer, so I can only explain how checkpoints should be implemented from an end-user’s perspective.
Manual Operation
Add a button that allows the user to manually manage and view a pool’s checkpoint.
This can be placed inside a pool’s page in the GUI.
It can be used whether or not an automated checkpoint exists.
Clicking this button will bring you to a page that:
- Shows you if a checkpoint currently exists
- How much space is being reserved for the checkpoint
- How old the checkpoint is
- Buttons to create or remove a checkpoint
- Disclaimers about the pool if a checkpoint exists
A warning should pop up when taking a new checkpoint: “The current checkpoint will be discarded and replaced with the new one. Only one checkpoint can exist in a pool at any time.”
Note to TrueNAS devs: Information about a checkpoint, such as creation time and size, can be extracted with zpool status <pool> | grep checkpoint and zpool get checkpoint <pool>.
Automatic Triggers
Borrowed from @dan’s idea in post #2.
Checkpoints should be automatically triggered immediately before a pool-modification action, such as when adding a new vdev.
They should be automatically triggered immediately before “upgrading” a pool or enabling new pool features.
They should be automatically triggered immediately before destroying an entire dataset.
They should be automatically triggered immediately before rolling back a dataset to a snapshot. (Snapshot rollbacks are destructive operations that cannot be reversed.)
There needs to be a pruning policy in place so that a checkpoint does not become “stale” or lose its usefulness. Maybe a maximum one-week life? This will allow the user enough time to rewind their pool in an emergency, without allowing the checkpoint to become too “stale”.
If there is no automatic pruning or “refresh” policy in place, then there should be a visual indicator that a checkpoint exists for the pool. This will leave it up to the user. (Taken from @dan in post #4.)
Automatic Schedule
Add a menu that allows a user to create a task that automatically takes a checkpoint on a schedule. (This might not be needed if “Automatic Triggers” are implemented as explained above.)
This can be its own menu or within the “Manage Checkpoint” page for each pool. (Shown above.)
The user should be able to schedule a checkpoint to be taken daily, weekly, or any custom schedule.
The page should include a recommendation of “daily” at 03:00. This allows the pool to always have a “fresh” checkpoint with enough time for the user to rewind the pool before the “good” checkpoint is overwritten the next time the task is run.
There should be a disclaimer that a pool can only have one checkpoint at any given time. It should inform the user that the task should be paused if they do not want the current checkpoint to be lost.
It needs a pause button. Pausing a checkpoint task is very important. Unlike with snapshots, a pool can only have one checkpoint that exists.
Unlike with Automatic Triggers, having a routinely “refreshed” checkpoint can create a safety net for unforeseen emergencies. (I listed some examples in the referenced threads at the end of this post.)
Considerations for TrueNAS Devs
TrueNAS cannot just make a simple button and automatic trigger for pool checkpoints. Its code and middleware must also have “safety checks” for other pool operations, since the presence of a checkpoint is incompatible with certain actions and setups.
The documentation and tooltips must make it clear that rewinding a pool to its checkpoint will destroy everything that was saved after the checkpoint’s creation.
The documentation and tooltips must make it clear that any “hot spares” in the pool will not activate if a checkpoint exists.
If attempting to remove, modify, or expand a vdev, or resilver a drive, the operation should be greyed out with a message that explains it cannot be done until the checkpoint is removed.
The Pool Import wizard should include an option to rewind to a checkpoint. A clear warning must be accepted by the user for them to continue, since rewinding a pool can and will destroy all data that is newer than the checkpoint. Selecting the “Rewind to checkpoint” option should also prompt a “readonly” option if the user does not want to commit to a rollback. (This is useful for recovery purposes.)
Is this feature request good?
Yes
THEN VOTE FOR THIS FEATURE RIGHT NOW.![]()
If pool checkpoints are implemented, this feature needs to be highlighted and advertised by TrueNAS since it can save a lot of new users from permanently losing precious data or messing up their pools.[1][2][3][4][5][6][7][8]
*There is a lot that I did not write in this feature request because it requires more involved discussions. I’ll wait for feedback and questions.


