Hi,
after using TrueNAS for for than 8 years it was time to rethink my snapshot and replication concept.
Goals are to reduce the number of snapshots and replication tasks in order to make it more manageable and simpler. I had 50+ snapshot tasks and almost the same amount of replication tasks.
Furthermore I want to have a 1:1 replication of specific datasets of my Pool including all snapshots but exclude others which are too big for the destination I replicate to. The replication destination is another pool in the same server.
Before, I was playing around with recursive and exludes but this leads to some caveats when creating the replication tasks. i.e. you can exclude more datasets in the replications than you excluded in the snapshots. But since these datasets are part of the snapshots (e.g. daily) the data is still being processed on the zfs send side even tough it is not being written on the zfs receive side.
For large folders like Media this leads to a lot of unnecessary processing and network traffic.
Therefore, I created parent datasets for different protection purposes with recursive snapshots and replications.
I created three parent datasets:
protected_high
- Documents
- Photos
- other important data
- ...
protected_medium
- Media
- Music
- ...
protected_no
- temp
- downloads
- ...
recursive Snapshot tasks:
protected_high_:
1. daily (02:00AM every day) with retention time of one week
2. weekly (01:00AM every Sunday) with retention time of one month
3. monthly (00:00AM on day 15 of every month) with retention time of 10 years
protected_medium_:
1. daily (02:00AM every day) with retention time of one week
2. weekly (01:00AM every Sunday) with retention time of one month
protected_no:
- no snapshots
So I end up with 5 snapshots tasks which is much more manageable than 50+ tasks.
For replication I create 3 corresponding recursive tasks:
protected_high_daily_replication
- source: protected_high (select only this parent dataset)
- recursive
- snapshot task - protected_high_daily
- run automatically
protected_high_weekly_replication
- source: protected_high (select only this parent dataset)
- recursive
- snapshot task - protected_high_weekly
- run automatically
protected_high_monthly_replication
- source: protected_high (select only this parent dataset)
- recursive
- snapshot task - protected_high_monthly
- run automatically
So protected_high as well as protected_medium do have snapshots but only protected_high will be replicated.
In case I create new datasets for any kind of purpose, I create it as a child of the correct parent dataset (protected_x) and snapshots as well as replication will include this newly created dataset automatically without the need for new snapshots or replication tasks.
What do you think? Does that concept make sense at all? Are there any caveats I haven’t realized yet? Or do you have some improvements for this concept?
Best
macx