How do you personally structure your data?

Koop · April 1, 2024, 5:34pm

I am curious to hear how others structure their datasets/shares/directories. The obvious answer here is that it all depends on your personal use case.

As @ericloewe has stated in past discussions:

I also refer to Samuel Tai’s “Path to Success for Structuring Datasets in Your Pool”

Just curious to see how others structure things. Personally I don’t have much organization for my data but I am hoping to comb through the years of backed up data I have copied over to my primary system to organize things better.

dan · April 1, 2024, 9:45pm

Kind of a hodgepodge for me, really. Every share is a dataset. Obviously apps on SCALE each have their own datasets. Shares are kind of ad hoc, one for Time Machine, one for video (TV and Movies–mainly served via Plex, but I want to be able to move stuff around via network clients), one for other media, NFS shares for Proxmox.

garm · April 2, 2024, 4:16pm

I have two pools, a larger one with four 4TB disks in RaidZ2 and a smaller one with dual 256 GB SSDs (plus a 128 GB SSD for boot). The SSD pool is where I keep jails, datasets for databases and “ingest” datasets with lots of traffic. To keep fragmentation to a minimum I only write data to the bigger pool once it hits the SSD first. So all /tmp folders are on this pool as well.

The bigger pool is set up with storage “vaults” for applications and shares for backups etc. I’m slowly moving away from managing files to using applications, Lightroom for photography uses SMB shares, but most everything else is in Nextcloud.

NickF1227 · April 2, 2024, 10:28pm

Data structures are such a bugbear of mine.
I like the parent/children modality. Each of these is a dataset, and each “top level” dataset is just a list of the children “directories”

I do this with SMB shares but also with Sharepoints (Micrsoft land is fun)
Here’s an example:

tank
-media
--movies
--tv.shows
--games

Professionally, I’ve had trees that cascade down like this much deeper, where there are parents and grandparents and great-grandparents. Each of these is still just a list of subdirectories, read only and traversal to whatever groups, and write permission is only ever inhereted in the bottom.

tank
-accounting
--private
---managers
---templates
--public
---forms
---announcements
-hr
--private
---managers
---templates
--public
---forms
---announcements

This makes things like ACLS, snapshot retention, shadow copies, etc all more flexible and straightforward to manage.
Further, I hate mixed cases and whacky characters. I had a department once who’s folder structure was a series of !!!! in front of each folder. It drove me nuts. All of my shares are always lowercase, if you want to make something show up at the top throw some numbers in front of the folder names.

Stux · April 3, 2024, 12:35am

Another deciding factor is varying replication and snapshot policies for a set of data… necessitating making it a dataset

Because of the NFS dataset vs mount issue I try not to have sub dataset inside shares.

Thus a few SMB shares, and the rest of my datasets are to do with various services, maybe a pseudo CDN for development purposes or a jellyfin media hierachy

A gitlab installation for example would necessitate a parent dataset and then some sub ones at the root level.

NickF1227 · April 3, 2024, 12:44am

I’ve never really used NFS in production. VMWare is always iSCSI/FC and file shares are always SMB. NFS is useful for super simple things, but never really bothered. Besides mount -t cifs is easy enough on Linux

But yes, to your point, if using NFS, you can’t follow my structure there. You can’t traverse file systems through an NFS mount, meaning sub-datasets wouldn’t work right.

Constantin · April 3, 2024, 12:48am

I use a similar approach to @nickf1227. I run a single HDD pool with a sVDEV to speed small file transfers and metadata. I agree Nicks structure helps a lot with permissions, allowing fairly logical, granular control over sections / sub-sections as needed. I also agree that it helps for similar reasons with snapshots and replication tasks as some of your sets may need different retention and/or snapshot lifetimes.

I don’t obsess about capitalization, however. My use case is different and the operating systems of most of my NAS users are not case-sensitive. I have also minimized my sharing down to one protocol since Apple no longer supports AFP. Using only one protocol per dataset minimizes potential mangling-potential of pool contents. Plus, SMB support for Mac systems on ZFS has been amazing thanks to the efforts of @anodos and like demi-gods at iXsystems.

NickF1227 · April 3, 2024, 12:54am

For sure, it isn’t necessary anymore. It is still faster to load directories my way though
Performance Tuning - SambaWiki

case sensitive = true
default case = lower
preserve case = no
short preserve case = no

Combined with a case-insensitive dataset in ZFS go vroooom.

Besides how many versions of file.txt or File.txt or FilE.txt does one really need

Constantin · April 3, 2024, 1:06am

Hah. Early days of SMB before Mac compatibility resulted in some truly epic file names as special characters, etc. got mangled.

Now, the Finder aspects of the SMB protocol seem indistinguishable for Apple users from AFP other than that SMB is not deprecated. Lots of credit goes to iXsystems for making it all possible despite the likely lack of Apple documentation re: its implementation of SMB.

NickF1227 · April 3, 2024, 1:17am

Because Apple has not learned this lesson, sticking this here for others. ‘Ds store’ files are wonderful.

Stux · April 3, 2024, 2:07am

I was using NFS in order to mount datasets into an ubuntu VM for docker, but with sandboxes/jailmaker I’ve now been able to shutdown the NFS daemon as I no longer need a VM for docker compose

Koop · April 3, 2024, 2:15am

Appreciate everyone sharing so far! It’s been insightful.

From my professional experience I’ve seen that many people don’t like to give this a lot of thought until things are a total mess. Then they come around and want to create snapshot schedules or replication jobs and all you can ask is “which of the 1000(000) directories at the root of the filesystem were you looking to target” …

Trying my best to avoid that same lazy trap with my personal data even if it’s not all that critical.

NickF1227 · April 3, 2024, 2:39am

THIS

usergiven · April 5, 2024, 1:25am

True but the workaround I found useful was to mount NFS shares at each dataset level on TrueNAS then on the client side mount the shares to mirror the same directory/dataset hierarchy. From the client’s POV the structure looks as if it’s all part of the same dataset.