Unintentional dataset

integerpoet · May 8, 2024, 9:04pm

Let me tell you a story to make sure I understand something.

I created a pool and this pool of course started with a single dataset. I was happy with this. I set up a single SMB share against this single dataset and configured it as a multi-user time machine. (“Time Machine” is the name of the backup software provided by the other computers.)

I created a user A. I performed a backup of computer A. I looked at the dataset, and sure enough it had grown by a plausible amount.

I created user B. I performed a backup of computer B. I looked at the dataset, and sure enough it had grown by a plausible amount.

However, there was now also a child dataset called B. I didn’t explicitly ask for this dataset. Its existence was not the end of the world, but it made me think I must have made some kind of mistake that I had better understand before risking making things worse.

After sniffing around, I discovered I had given user A a home directory and I had specified a “home directory” for user B of /nonexistent. Why did I do something different for A and B? Who knows?

I deleted the child dataset and gave user B a home directory alongside that of user A. I then started another backup of computer B. It hasn’t yet finished, but I can see its files accumulating in the home directory and there is no child dataset.

It seems to me likely that the child dataset for B got created in the absence of a home directory for B. And that makes a certain amount of sense, I guess, given that I asked for a multi-user time machine.

My “mistake”, if you can call it that, was in giving A a home directory. Had I not done that, I imagine both A and B would have gotten child datasets, and not only would I have been able to manipulate them separately at that level of abstraction but the fact that there was a dataset per user would have induced me to assume everything was working as expected.

Now, in terms of datasets, I have a single big soup, which is what I originally intended because I didn’t know any better, so I’m not annoyed.

But it occurs to me that in future I might want to do things to each user’s data at the dataset level. I mean, in practice, probably not — the users are just me and my wife. But in theory the big soup I have right now may not have been the best way to go.

Make sense so far? If so, what documentation can I consult in order to understand better the relationship between datasets and home directories and directories in general?

pmh · May 8, 2024, 9:19pm

You should never share the top level dataset of your pool but create datasets for sharing underneath.

Requirements: Dataset with Share Type set to SMB.

integerpoet · May 8, 2024, 9:23pm

OK, but that sounds more like a “best practices” sort of recommendation than it does like an answer to my question. I am more than prepared to make and correct all kinds of “best practices” mistakes before considering myself done with this project , but I still want to understand what has happened so far.

pmh · May 8, 2024, 9:34pm

The idea of the multi user time machine type of share is to automatically create a dataset for each user when they use the share for the first time.

That’s what my TN CORE does, at least, and what I expected. The home directories of the users are on another dataset, not identical with the TM share. Which I also would consider best practice - users can login via SSH to use the system for shell “things” in our organisation, therefore they need a home directory to save their “stuff”. TM backups are a different thing.

integerpoet · May 8, 2024, 10:01pm

That makes sense.

The remaining mystery is why, in my case at least, giving these accounts a home directory seems to have inhibited the child dataset. This might be an academic consideration as long as the intent of the “time machine” configuration is to work as you say, but I’m still curious about what happened.

FWIW, for my case a home directory is superfluous. I would actually prefer to avoid creating one for the sake of tidiness — and not having to wonder why someone thought dotfiles would be a good thing to put alongside a Time Machine backup. (For me, shell access seems kinda beside the point of a NAS. That’s what my Mac is for.)

Stux · May 8, 2024, 10:08pm

I believe the sub-dataset is made when their isn’t already a home directory

It may be a bug when the home directory is the parent multi-user-Time Machine directory.

Now, I use separate accounts for the Time Machine backup versus the user logins, so, you could create a new user for A’s backup… and that’d start backing up to a new dataset, and you could manually delete the contents of the root Time Machine dataset.

Or you could make a new multi-user tm dataset and zfs rename the b dataset into that and the root dataset into it as “a”

integerpoet · May 8, 2024, 10:15pm

I kinda like the idea of renaming the existing users and recreating them. The pool has enough storage behind it that I could just wait for the “initial” backups to complete again and then delete the “old” users and, if necessary, their files. Then I would get the discrete datasets which might eventually become handy without ever having truly un-backed-up users.

integerpoet · May 11, 2024, 4:52pm

FWIW, this strategy seems to have worked.

I also took the opportunity to move the share “down” a level in the dataset hierarchy.

Now I wish I had named the root dataset something more generic.

Stux · May 11, 2024, 9:49pm

Well you can rename your pool or any dataset if you want. And renaming a dataset includes moving it.

integerpoet · May 11, 2024, 10:20pm

When I try to rename a root dataset, I’m told… well, let’s just show it.

root@Foo[/mnt]# ls
md_size Bar
root@Foo[/mnt]# zfs rename Bar Baz
cannot create ‘Baz’: missing dataset name

I can rename non-root (child) datasets just fine.

Stux · May 11, 2024, 10:25pm

zfs list will list all your datasets.

You’ll find that a dataset is named including its pool and parents.

So tank/parent_dataset/child_dataset

And bar is a pool.

integerpoet · May 11, 2024, 10:30pm

Ah. That would make more sense.

I guess when I look at the pools page I see dataset hierarchies rather than pools.

It probably didn’t help that I named different kinds of object with the same name.

PEBKAC. Thanks again.

pmh · May 12, 2024, 6:46am

To rename a pool:

Export the pool from the UI.
Then in the CLI:

zpool import oldname newname
zpool export newname

Import the pool from the UI.

integerpoet · May 12, 2024, 5:56pm

Yup.

A link to a more-detailed version of those instructions appears above.

They mostly worked.

(I posted details of my experience with those instructions in that other thread.)