Staggered snapshots

What is the recommended way to generate staggered snapshots? For example hourly that last for a couple days, daily that last a couple weeks, weekly that last a couple months, etc.

If you just create multiple separate schedules, is there a downside to having multiple snapshots taken simultaneously? So if the year starts with a sunday, I’ll have hourly, daily, weekly, monthly and yearly snapshots all taken at the same time at 0:00 on january 1st.

I tried to create a hourly schedule only from 1 to 23 (skip 0), and daily that skips the sunday, but then it fails because the weekly schedule doesn’t exactly fit in the monthly schedule.

ZFS snapshots are very efficient… so it should work fine.
Staggering then by 5 or 10 minutes each 12.00, 12.05, 12.10 etc would also work.

I try and avoid overlapping the timestamps if I can. here’s an example - one of our production systems has a schedule like this (datasets with SMB shares)

Hourly - Every hour from 1:00 to 23:00 - Lifetime 1 week
Daily - Every Night at 00:00 (excluding 1st day of each month) - Life of 4-12 weeks
Monthly - First of every month at 00:00 - Life of 6-12months

There for a really critical smb share (lots of activity and rollback of an hour could cost $$) - the dataset has:
Every 10mins (excluding on the hours - btw 6am and 10pm) - Life of 1 Day
Followed by the rest

The main reason we avoid doing all day for our 10min one - is it just limits the amount of snapshots in “Previous Versions” as we had performance issues at different times. On our 20/30min ver - we tend to use the whole day.

Edit: You could also limit your under hourly to biz days - such as monday to friday.

I’m not sure if there is an established recommendation.

BUT I’m trialing using minutely, hourly, daily, weekly, monthly.

The key is to have each snapshot task use a different naming scheme, ie auto-hourly, auto-daily etc

Each is a separate snapshot task. With progressively longer retention.

Minutely is actually 10 minutes and retained for 2 hours.

Hourly is retained for 2 days (or whatever I forget)

Etc.

I Replicate based on the hourlies or dailies, and use the “additional schemas” to pick up everything except the minutelies.

Replication snaps should have “allow empty” enabled. I disable empty on the minutely.

Turn on the replication task allow it to control life time on destination and hold pending snaps.

On the destination I run a separate snapshot task with 10 year retention and a different naming scheme.

1 Like

Use different names for each snapshot task, such as:

  • auto-weekly_XXXXXX
  • auto-daily_XXXXXX
  • auto-monthly_XXXXXX

Also good practice to append the expiration lifetime at the end, such as:

  • auto-weekly_XXXXXX-6m
  • auto-daily_XXXXXX-1w
  • auto-monthly_XXXXXX-2y

This isn’t simply for better organizing, filtering, and searching.

:warning: It will prevent you from inadvertently destroying your snapshots, without even realizing it!

5 Likes

Glad I searched this out! On the way into work I was wondering how it knew which snapshots to delete for hourly, weekly, monthly, etc. And I’m glad I wondered. Another thing to tweak tonight. Would be useful if it prompted you about the importance of naming the snapshots a bit more prominently in the documentation, and suggest how to name them, as I’ve double checked and there is nothing there about naming being important at all that I could see.

Periodic Snapshot Tasks Screens | TrueNAS Documentation Hub

If I may ask a perhaps dumb question.
If you name the staggered snapshots differently, can the system use the existing snapshots for the next “stagger”? E.g. Use an hour snapshot for a monthly one.
Or does it always create a new monthly one?
Thanks,
Etienne

Snapshots always “store” the delta between each other.

It doesn’t matter what they’re called, all that matters is when they were taken in relation to the history of the pool. Ie their order.

2 Likes

Are you referring to snapshots taken at the exact same time, yet have different names?

As in, if you have hourly and monthly periodic snapshots configured, what will happen on July 1, 2024, at midnight?

I would expect you’d end up with two snapshots, like this:

  • auto-hourly-2024-07-01_00:00
  • auto-monthly-2024-07-01_00:00

But when I think about it, I’m not sure if ZFS can create two snapshots at the exact same millisecond.

It would entail that you have two snapshots, with different names, with the same TXG and GUID. I wonder what the middleware or ZFS would do… :flushed:

I haven’t checked, but I assume there would be multiple threads racing.

One would win. And if they both got their at the same time, then one would spin atomically while the other won.

Ie there would be a defined order between the two snapshots.

“Dungeons and Dragons: ZFS Edition!”

:game_die:

I mindlessly created my daily, weekly and monthly snapshots with a single naming scheme and, years after, it just works: Only one snapshot is created on Mondays and on the first day of the month, and this snapshot outlives its daily siblings as expected.

If you were to disable, or possibly delete your monthly snapshot task, then your daily snapshot task would/could/might erase your monthlies, since as far as its concerned, those monthlies, are daily snapshots.

Or put another way, there is no magic metadata which prevents the deletion of your monthly snapshots, other than the fact that you currently have a monthly snapshot task configured with the right retention.

OR if you are using very old snapshot tasks, do they still have the retention coded in the name ie “2y” or “6m” or something like that?

1 Like

I got this:
auto-2024-05-27_00-00-3Months-daily 0.10 bytes 2024-05-27 00:00:02
auto-2024-05-27_00-00-Week-hourly 0.10 bytes 2024-05-27 00:00:09

So they are queued.

BTW, after I removed “recursive” from snapshot task datasets that are single folders and now my older weeklies are removed automatically.

It is a custom name, without explicit retention time. But the retention policy works as expected, even when there are gaps because the replication source was off.
What would happen if I were to change the snapshot task is an open question, but I do not intend to change it, and I expect that the backup (“pull” replication) would retain what it holds.

Made a video documenting what I’ve been doing: