Hello,
I was watching some videos and noticed that people created datasets where I would have created a simple folder.
Is there some guidance on when to create a dataset and when to create a folder please? Any information on the impact of a dataset on resources vs the impact of a plain folder?
It’s a good question and often gets asked. I dare say different people will have different opinions on this but here is my view.
Datasets have properties such as quota, compression, snapshots, readonly, etc.
Folders don’t have any of these properties by default, they abide by the properties of their parent dataset.
Datasets are also sandboxed by nature as they are essentially an independent filesystem and using tools like zfs send you can easily move them between other zfs systems for backup or migration purposes.
Datasets can be snapshoted for data protection reasons, they can be rolled back (if you have snapshots) to a specific point in time. They can be cloned and even promoted turning one dataset into two exact copies of one another.
If any of the above mentioned dataset properties you wish to differ on your system then it’s probably a good case to consider another dataset instead of a folder within an existing one.
It’s also fairly common (although not necessary) to have a one-to-one relationship between a dataset and a share.
In my experience it is possible to have too many datasets (and in-turn snapshots) as this will have a negative impact on system performance and management but I find approx no more than 50 datasets and no more than 3000 active snapshots works well. This obviously depends on your system specs so this is just my personal experience.
Hopefully you find this helpful.
2 Likes
It’s generally better to follow one dataset per share and not nest datasets within shares. Many clients do not expect to have new filesystems within a share.
2 Likes
Thanks.It confirms some weird behavior I noticed but it means that 3 out of 4 videos on YT have to be remade. It’s a drama! 
It is very helpful. I’m translating this in my head as directories where I can, datasets if there is a good reason and you mentioned those reasons.
1 Like
Youtube is terrible source for TrueNAS (and really a lot of other things). It has been a source of tears to a lot of people here that lost their data following many bad recommendations from those videos. There is no vetting whatsoever on it and just about anyone at anytime can upload videos.
I experienced some problems myself. The “problem” is that there is not a lot of guidance in the manuals. They are very good at explaining what you have to do but never go into the why, as most manuals do… I understand that training, knowledge and experience are marketable skills and it;s great that TrueNAS and volunteers invest so much in the community edition and the support arround it so don’t think I’m disrespectful or ungrateful but I was grateful I had the videos in the beginning. It’s only after some weeks that I started questioning advice and by following the forum and recognizing names, I’m now starting to be able to separate “authoritative” videos like those form @Stux and a very few others. Beginning with 24.10 didnt help the situation.
1 Like
Other people have told you the technical differences, I’m going to highlight to you the biggest differences in practice:
- If you move a file from a folder to another folder, it will happen instantenously. If you move a file from a dataset into another dataset, it will not happen instantenously because you are crossing “filesystem” boundaries and it will have to move the actual blocks rather than just a simple pointer operation. This behavior is somewhat equivalent to moving a file from one partition to another partition.
- If you nest a dataset into another dataset, You will notice that if you mount ONLY the parent dataset via NFS, the child dataset will just be an empty folder until you also mount the child dataset separately. However, this is NOT the case with SMB, SMB will happily list the child dataset with just the parent dataset mount.
- Organizing and coordinating backups through snapshot + ZFS send/recv are much easier if you have your data separated logically by datasets. If you have music as its own dataset, photos as its own dataset, you can back them up separately. If you have them both just as 2 plain folders in the same dataset, you will have to backup both of them as the snapshot will contain both of them. Why does this matter? Your backups will be faster if you separate the datasets because obviously the amount of data to backup will be less than if you had both of them combined. This is particularly useful if you do incremental regular backups.
- If you have nested datasets, you can simply take a recursive snapshot of the parent dataset and it will automatically take a snapshot of all its child datasets. This is convenient for full system backup.
Those are all I could think off the top of my head for now. If I think of more stuff, I will add them to the list.
I am not recommending you to choose one over the other. I’m merely giving you the information you need on their behaviors so you can decide for yourself what meets your use case best.
3 Likes
I’m every so grateful. The NFS thing had be up the walls. 