Organizing container data: Child datasets vs folders

Many of the apps use two or more locations to store their data. During my “learning period”, I have created a main dataset for each application with additional child datasets whenever an app need more than one location.

But I have come to question wether this is good practice.

The other option would be to create a dataset for each app, with folders for each storage location.

What are pros and cons with each alternative?

That is exactly how I have started doing it.

Started, but then changed? :slight_smile:

No - originally didn’t but now do.

For what reason did you decide to go with child datasets?

I do not do that. I simply use command line (via ssh) to create subfolders and use those as Hostpaths. I don’t have a dataset for any of my 24 apps. I just don’t like all those datasets and see no purpose for them and it clutters shares and zfs commands. Well, l do have one for mariadb as it needs different dataset settings.

I decided to just use standard posix permissions and not use ACLs at all. Just simpler to me as I don’t have dozens of users with varying permissions required.

Not sure there is a perfect answer as it depends what your goal is.

You could do your option 2. No reason you couldn’t have one dataset per app with subfolders, if you have some reason you need the dataset.

I organize mine by backup type as not all data by any given app needs backed up. So, I have:

tank
  Data
  Archive
  NoBackup

And under each of those, I have subfolders as needed for each app. Many apps exist in multiple places, so, one could say it’s disorganized but it makes it simple for my backup strategy as each of those 3 datasets have a different backup strategy and mechanism. But that’s not what you asked, just providing another option, probably no one but me doing so, lol.

When I add a new app, I don’t have to change any backup at all if I put it in the right place. No need to worry about picking or excluding datasets or folders. Makes backups faster and no maintenance.

I originally did by app but it was causing way too many issues with backups.

1 Like

I use separate datasets rather than sub-directories because I want to apply different snapshot rules to e.g. Jellyfin transcoding and Jellyfin config datasets.

I use separate datasets

  • because I can
  • just in case I ever want different snapshot/retention policies
  • for individual rollback
1 Like

@thoresson Anders

So there doesn’t seem to be a right or wrong way. It just depends on what you decide your needs are.

Yeah, as it often is! :slight_smile: And that’s why I started this thread: To get input on what route others have choosen, and why. So thanks a lot for all your input @Protopia, @pmh, and @sfatula!

But still a bit undecided, as I have also realized that there is an overlap between this topic and another I started yesterday (TrueCloud Backup Tasks: How to use settings for snapshots and paths?): With nested datasets its not possible to make snapshots before doing an offline backup using TrueCloud. And the way I understand that option, that’s something I want to do when backing up my appdata.

While I see the benefit of creating separate datasets for each applications, so that the dataset configuration can be tailored for each app, that also mean setting up a lot of backup tasks.

So, two new questions:

  1. Those of you who have separate datasets for each app, how do you do backups?
  2. And for those of you who have one dataset with nested folders for each app, how do you manage permissions? @sfatula, if you have the time, could you expand on posix vs ACL? Do you set permissions for each folder manually using the command line?

I recursively replicate to two other TrueNAS systems.

Ah, I see. I don’t have a second TrueNAS system, so will do backups to Storj.

You can use any destination system that runs a sufficiently modern ZFS, like Debian, Ubuntu or FreeBSD, and is reachable via SSH.

I set them via cli, yes, so my technique only useful for people who can chown. Most apps are run as a single user, so, rarely need to do anything other than a mkdir and a chown to whoever the app runs as. Less complicated than ACLs can be at times.

As I cannot possibly backup everything to a single server which I do not have and one backup isn’t good enough anyway for a 3-2-1 backup believer, I only back up my Data dataset the most, i.e. most often, most techniques. Archive is stuff that is written once and read after that (think media). And NoBackup is obvious, work spaces, stuff that’s already backed up by the app, temporary stuff, recreate-able stuff, etc. In fact, it’s half my storage once you think about it and that made so many things possible that were not possible otherwise (or easy at least)

1 Like

I have a similar parent structure, but I use subdatasets instead of subfolders.