Calculating expected Storage Needs - DataSet + Snapshots

zaidpirwani · June 16, 2024, 9:02am

Hello,

TLDR: Simplified approximation of future dataset / snapshot storage needs on both primary and backup machine. - sheets snapshot/link in the end

I have been using TrueNas for some time now, but mostly with default settings and primarily only for windows file share. I recently setup snapshots (weekly, with retention to 4 weeks)

I have also now added a replication task to another TrueNas (with push setting) and set its retention to 8 weeks.

I am trying to understand the storage used by snapshots and replication.

The first snapshot was created instantly and didnt take any space (I understand that is just the meta) - the first replication copied all data from primary to backup machine (understandable)

Now I am trying to estimate usage in future, based on some very simplified parameters - I have created an online Google Sheets - with some help from GPT and reading some docs and old forum posts - can someone check my sheet and give me some comments/remarks if this is reasonable estimation ?

Snapshot of the sheet

Link: TrueNas Storage Calculation - Google Sheets

the data in the sheet is super simplified and just for rough estimation purpose - not necessarily for actual real world usage/data.

I have added columns for
NEW DATA - added during the week
EDIT OLD DATA - editing of files during the week
DELETE OLD DATA - data deleted during the week

Snapshot is created during WEEKEND - after new data is added and any edits and deletions of OLD (previosu weeks) data is done.

Also assume that the deleted data is not the same as / does not overlap edited data.

Am still working on the snapshot/replication storage formula

My assumption is that first snapshot is same as the dataset
then the subsequent snapshots are the newdata+edited data

Overall snapshot/replication size would be the total size of last 4 snapshots (OR last 8 snapshots on backup)

zaidpirwani · June 16, 2024, 7:47pm

I think I have got it, though if an expert can have a look at my file - it would be good - so far what I have seems to be somewhat same as what the docs say.

winnielinnie · June 16, 2024, 7:53pm

From a filesystem perspective, yes.

From a “used size” perspective? No. A snapshot only retains space where it references blocks of data that is unique to the snapshot itself.

If a block exists on the live filesystem, then a reference by the snapshot does not retain any additional space that could be freed.

If a block exists on any other snapshot, the same is true as the above statement.

Only when a block is exclusively referenced by a particular snapshot (i.e, no other snapshots have pointers to it, nor does the live filesystem) will it contribute to the snapshot’s “used space”.

If you’re a visual person, you might find this helpful.

Hint: White stickers are the live filesystem, and can be removed independently. Color stickers are snapshots, and can only be added as a “set” (to existing white stickers) or completely removed as a “set”.

zaidpirwani · June 17, 2024, 3:31pm

Thanks, I understand snapshots dont take space as in space - I am considering the storage used on the backup server.

I am thinking of running a small python based simulation and run snapshots hourly to see the process LIVE.

I will definitely look at the truck, the boxes and the stickers - seems something that will help me understand.

winnielinnie · June 17, 2024, 3:47pm

You’ll notice that when you get rid of a “snapshot” (and entire color sticker set), it often does not physically free up space in the truck. (Because either there’s still a white sticker on the box or another color sticker still remains on the box.)