The question about dataset vs folders are one of the things I struggle with as a newcomer to both TrueNAS and zfs. The other week, I got some food for thought when it came to organising my container’s data.
This time I’m pondering the best option for storing backups of my family’s computers. Right now, I have a Backups dataset with sub-sets for each computer, made availabe as a SMB share for the owner of the different computers. This made a lot of sense when I got started, since that would allow me to configure individual settings for each computer’s dataset as needed.
But when I’m now have started to look into how I will protect the datasets themselves, with snapshots, replication, and offsite backups, the different parts of the Data Protection tab in the UI becomes very crowded. As I add more computers I have a feeling I soon will loose the needed overview that everything is configured as it should.
When looking around on the forums, variants of this question is asked many times, like in January. The answers almost always comes down to personal preferences. And I get that.
But as a beginner, there is a lot to learn from other’s personal preferences. So, I’m curious, how do you manage datasets, sub-sets, and/or folders for shares used as backup destinations?
I use datasets for this.
I don’t want a folder for a backup destination that other computers can access. That is a recipe for disaster I think. A dataset can be isolated from the other computers. E.g. TimeMachine I have a “main” dataset with sub-datasets per macbook; and the same for syncthing per user.
You can always set different snapshot schedules, like hourly, daily and have them automatically wiped after a week e.g. and then only keep e.g. monthly snapshots for a few years.
Yes you will have many snapshots, but security and restoring is then a breeze.
Datasets are independent filesystems that can have a variety of different functions applied to them such as quotas (as previously mentioned), compression, block size, dedup, etc. They can also be snapshotted and replicated to another machine.
As a general rule it’s a good idea to have a dataset per share and not to nest multiple datasets off one share point. This is possible but complex and can lead to permission issues so best avoided unless you really must.
I prefer datasets specific to backups for each person, host, or application. Depending on what I’m trying to accomplish and how I’m organizing them (which is mostly opaque and arbitrary based on what I think best at the time) I do have some backup sets sharing a dataset, but only a few. Mostly they’re dedicated datasets.
I like the configurability this allows for, and I also like being able to manage replication by backup use case/workload. I also apply user properties to some backup datasets on my non-TrueNAS pools. In TrueNAS I usually just use the comment to record the same information, since that’s the preferred path in TrueNAS.
Sometimes it can be really useful to set a recordsize that aligns especially well with whatever the backup software does. Quotas are useful, and if you have a dataset that acts as a parent to a bunch of other backup datasets, you might find reservation useful.
Do you have a separate snapshot schedule for each dataset? You can just set up a recursive schedule for the whole pool (or one of the parent datasets, for one). You can then exclude datasets, that don’t need snapshots (or need more rare snapshot).
Realized that there is another thing I wonder with regards to datasets for computer backups: Does it make sense to change to a more aggressive compression setting (and if so, which one GZIP9?) or is LZ4 the best?
For record size I guess the default is good enough? I don’t optimize for performance, only storage efficiency.
I tend to start with perfection and a “what if I want to in the future…” in mind, but then, practicality sets in.
Define the requirements: do you need different backup schedules for each family member computer? If yes, then different datasets are needed. I don’t. Therefore, subfolders of the same SMB share dataset are Ok, and make replication to a backup pool much easier.
Also, snapshots are not backups. They are versions. They use the same base data on the drives. If base data becomes corrupted, all snapshots that use it are useless.
Depends on the kind of backup. You should use LZ4 or ZSTD-X. LZ4 (or ZLE) for non-compressible backups, ZSTD for compressible.
Some examples:
Standard windows backup & restore backups are not very compressible. But if you choose to generate system images, those are highly compressible. I personally use zstd-9. My backup job is slow, but the saved space is worth it. You can go with just zstd-3 and be fine.
MacOS timemachine sparsebundles are not compressible by much. I use lz4.
You can just perform some tests, moving the data between datasets with different compression levels.
The answer is it depends on the data. If you have the time and energy then why not create a few datasets each with a different type of compression and put some sample data in there and see what the ratio looks like. Perhaps lz4, zstd and gzip9.
But that can be prevented by creating SMB shares for the individual folders instead of the dataset?
That seems to work indeed. And you can set ACL permissions on the folders too.
However, you first need to create the folders on your client machine or CLI.
I like to keep TrueNAS as much as it is, to prevent any issues down the road (upgrades?).
If you have done extensive experience/testing folders, love to hear this.