Advice: Dataset Configuration for Business Use FileShare

acutech · April 12, 2024, 3:38pm

Looking to bounce my plan here and gain any advice anyone has.

We have just set up a new installation of TrueNas Scale 23 on a bare-metal Dell server. Dual 8C/16T Xeons and currently 126GB of RAM.

This primarily will be replacing an old aging VM running Wins server 2012 and a NFS/DFS file share.

The old file shares main purpose was to share access to our client folders (approx. 700 clients folders, always growing) all with the same exact sub folder structure for numbered projects and templated sub project folders within each client.

The top level of the old share went as follows:

Clients
A Client
B Client
C Client
etc

In addition to this folder structure and sub structure was a least permission model. Essential staff can see all client folders as read only and enter any client folder as read only but cannot get any further unless they are part of the project team (project security group) for that given client+project.

In the new TrueNas system we have 6 960GB SSD drives configured in a 2 wide RAIDZ1 pool for a total usable capacity of 3.47TB*. My initial plan is to create a dataset called “clients” and then nest all current and future client “folders” as datasets. Which then means all the “templated” sub folders would just be folders/files in the share at the file system level. We plan on making the share an SMB share as we are a Windows only shop for laptops.

My questions are:

Is there a limit to the amount of datasets TrueNas can handle, not in terms of data size but simply total count?
If dataset max is not an issue, am I still potentially creating an issue for myself down the road by going this approach?
If the above is fine, is there a better approach and what would the advantages be?

Things not covered above are backups/snapshots/replication. Right now we run a nightly backup of the current system at the VM level but with this new system we are going to need to re-think our strategy. There is a possibility as well in the near future that a “sister” server is spun up at another location which we would want to run replication tasks to sync the two. Open to suggestions here as well!

Stux · April 12, 2024, 11:16pm

Do you need datasets per client? Wouldn’t you just need a directory?

Datasets allow different snapshots, replications etc.

But aren’t you going to backup replicate all clients every time anyway?

Meanwhile, you can use replication to backup from server a to server b, but you can’t use it to synchronize two servers.

SyncThing can do that I believe.

alexj · April 12, 2024, 11:40pm

(1) Essential staff can see all client folders as read only and enter any client folder as read only (2) unless they are part of the project team (project security group) for that given client+project

Creating 700 datasets would give the most granular control per client. Would a better structure possibly be to create datasets for each team, and then create SMB directory folders within those?

TeamA (TeamA=RWE, TeamB=ReadOnly, Staff=ReadOnly)
– Client1
– Client3

TeamB (TeamA=ReadOnly, TeamB=RWE, Staff=ReadOnly)
– Client2
– Client4

Then you could move/add users to whichever group(s) is most appropriate e.g. Admin/Manager = TeamA & TeamB

Hope I got that right…

acutech · April 15, 2024, 12:46pm

I guess that is what I’m trying to determine. I know for sure I could just setup one dataset called “clients” and then manage via folder permissions like we have been. I just thought I would explore what TrueNAS/ZFS features I may be missing by going that route.

My thinking for having a dataset per client was the ability to have different backup cadences per client dataset as at some point we start to archive client folders at a different rate for clients older than 7 years. AS well, having a top parent control for each client that is clear defined.

As well, i wondered if there is any performance enhancements or downfalls to this approach.

acutech · April 15, 2024, 12:49pm

We’ve talked about that approach in the past. We do have “teams” but ultimately we are a very multi-functional company so just because someone works on Team A does not mean they won’t do working with Team B when a project has a knowledge overlap (this happens often and on almost every project/client). This is why we went the project group permission route to create a very customized permission set per project.

I guess my big hesitation is whether 700 datasets vs 700 parent client folders cause a headache or not both admin wise but also technically with ZFS management or features.

Stux · April 15, 2024, 12:56pm

The issue is, I think you go from having ONE dataset, with one backup/replication/snaspshot to hundreds.

The GUI is really not designed to manage 100s of datasets and snapshots etc

So now you have to write scripts and management systems.

Or you could just have one dataset, and the only thing you need to do is arrange directory permissions.

And with snapshots, you can enable SMB history type access to the snapshots, so it’s not like you would have to do single client restores.

The replicated backup would be for disaster recovery.

And the beauty of replication is stagnant/stale data does not change and takes no additional time to backup, unlike say rsync backups. So you could run hourly backups of everything quite easily.

acutech · April 15, 2024, 1:40pm

This is the gut check I needed lol, even without this though 700+ datasets always felt like a Mad Hatter approach without a super good reason.

I tend to lean GUI so managing all the top level client folders as datasets felt clean to me but given your list of drawbacks i think I will opt for one data set and manage things via folders.

Another thing we currently do is use something like robo copy to spin up a new client and/or project folder with that standard template structure + permissions. So keeping things at the folder level just makes that easier or less split.