ZFS Replication to untrusted location - Options and question

marshalleq · June 5, 2024, 3:06am

Hi all, I’m mulling around a couple of options for backing up some data using zfs send.

This is not for a business so I don’t have that kind of cash unfortunately.

My criteria is only that I can synchronise it automatically, that it is done at a reasonable frequency and that there is room to grow.

The data is a large amount of high res scanned TIF files of family negatives and photos, 8mm film, VHS digitisations plus an amassment of digital media taken on more modern cameras + the usual documents and such. On top of that I have system backups such as a few VM’s and dockers plus some purchased software with associated serial numbers and such and iso files that I could re-download but would rather not. Presently This is somewhere in the vicinity of 26TB and growing as more media is scanned in. As you can tell, this will need to be encrypted.

I have some online cloud backups already but am way past the size allocation and it’s getting too expensive to keep increasing. So I’ve been looking at some other options which include:

Tape - I’ve specifically been looking at LTO - the affordable versions of that seem to be LTO-6 or below, but these are not big enough and are a bit inconvenient in that tapes need to be manually changed, I do like tape but I don’t think it’s going to work
Telehousing in a friends datacenter - I have two contacts whom might be able to arrange this for me - I am yet to approach them. This could be the best option but it’s unlikely to be entirely free.
Discussing with a friend some kind of truenas to truenas encrypted zfs send to a friends house in return for receiving backups from him.

What I’m hoping someone can answer is the following: Is there a way I can e.g. give him a number of disks and he gives me remote access to TrueNAS but I only get to see / write to my disks? And vice versa? Clearly I can raw send encrypted volumes, so that takes care of the privacy aspect. I’m going to have a look over the weekend, but upon a Quick Look it appears that possibly the way to do it is at the dataset level. I don’t see a way to do it at the disk level. It does appear that some unix style permissions can be set at the pool level, I’m assuming only at the console as there is no permissions edit button in my setup at this level.

So just clarifying that a bit, I’m not concerned with my friend seeing my data as it is encrypted, what I am concerned with is safeguarding our own systems from mistakes on the other end. If I zfs send something that happens to have the same name as something existing on the receiving side is there a way to limit these problems with permissions so that the person sending the zfs dataset can only read/write to from datasets that they are allowed to?

Has anyone done anything like this? Any advise to share?

Thanks very much for reading.

Marshalleq

winnielinnie · June 5, 2024, 3:23am

By “volumes”, I think you mean datasets? Etiher way, the source dataset must already be encrypted in order to send a “raw stream” that remains encrypted on the destination, without the destination server having access to the key. Nor does it require the destination dataset to ever be unlocked to receive the stream.

So are your datasets already encrypted?

The only way to enforce your data to only remain on certain disks is to tell your friend to create a dedicate pool for you, which only uses those particular disks to construct the vdev.

It’s hard to interpret what you mean by those questions?

marshalleq · June 5, 2024, 3:31am

Yes, I meant datasets sorry. And yes, I realise they have to be already encrypted. Anything I send will be on an already encrypted dataset.

I will send my friend some disks, he will create a pool. We will need to create an account for each other, however I am unsure how he protects himself from me though (and vice versa when he sends me some disks). That is the root of the question.

if he makes me an account - let’s call it Marshalleq. Can he stop me from zfs sending stuff to where it doesn’t belong?

winnielinnie · June 5, 2024, 3:35am

Anyone who has physical access of the server (and hence “root user” access) can bypass anything.

He can delete your user account, your datasets, your public SSH key, etc.

marshalleq · June 5, 2024, 3:44am

Yes, that’s not what I’m asking though. I’m saying if I have an account to send files to his server, how does he limit what I can do. Does TrueNAS permissions have something in place to protect against this. Can an account be assigned just to that pool so that I can only access and destroy that pool?

Also, I notice on your account it says TrueNAS MVP. What is a TrueNAS MVP? I googled it, didn’t find anything. TrueNAS forum FLAIR perhaps? Thanks.

winnielinnie · June 5, 2024, 3:49am

You can do this at a dataset level with zfs allow and zfs unallow.

So theoretically, you can “allow” your user account (on his server) all zfs permissions for only a specific dataset(s). (Assuming you are not granted root user access to his server.)

To simplify matters, you can issue the allow command against the pool’s top-level root dataset, and it will by default apply the “allows” to all child datasets.

winnielinnie · June 5, 2024, 3:51am

Someone who knows nothing about anything, except how to “game” the clout system.

etorix · June 5, 2024, 11:55am

In doubt, make it up!
Most Vertuous Poster
Malignant Vicious Paranoid
Mysterious Virtual Prattle
combinations are endless…

marshalleq · June 5, 2024, 8:34pm

No way! I am going to google this right away, never heard of it! So the theory is, I can ZFS all my user account on his system, his user account on my system at the pool level essentially because that’s the root dataset and that’s that? It means I can only access ZFS commands to allocated datasets but the system owner can access zfs commands of everything?! This is fantastic. Is that user account just a standard one created in ZFS GUI?

winnielinnie · June 5, 2024, 9:13pm

Correct. In theory.

Just be careful. You should probably allow access to every ZFS subcommand, otherwise you might bump into unpredictable issues. (Some “allowance rights” might not be obvious for simple send-recv replications.)

It applies to the user account on his server, which he will need to create. This will be the user specified in the SSH credential settings (not sure what it’s called in SCALE). It doesn’t have to match your server’s username, but you might as well for consistency’s sake.

In fact, as far as I understand, it’s the “root user” on your server that will initiate the send, which will be received by the user account on his server (assuming the proper allowances.)

marshalleq · June 5, 2024, 9:33pm

Thanks, we have yet to make a decision whether we will just create a VPN between us or if we’ll use SSH. VPN has the advantage of not being as likely to get endless dictionary attacks I think. And if I recall doing SSH the truenas way required quite a lot of permissions I didn’t want to give, but could be wrong.

Stux · June 5, 2024, 10:03pm

I wouldn’t expose a truenas instance to the naked internet.

VPN then SSH+Netcat replication

winnielinnie · June 5, 2024, 10:18pm

The added benefit of raw encrypted streams is that even if someone were to intercept the stream, it’s just encrypted blocks of data. (Since nothing is being encrypted after being received. Already encrypted, and remains so, before, during, and after the transfer.)

marshalleq · June 8, 2024, 5:04am

Thanks for everyone’s help. I started to test out some scenarios for this and then remembered I have a special vdev added to the source pool whereby files under a certain size are sent there. If I send a raw encrypted stream, what happens to the files on the special vdev? I assume they are left behind?

Because of this, I wonder if I’m better off just rencrypting on send and not sending raw stream. Also, I’ve seen the raw send option in command line, but didn’t find if there was a GUI option for this, which would be ideal (in case anyone knows).

Thanks.

etorix · June 8, 2024, 7:02am

Why would some files be left behind? Replicating a snapshot send everything that is in this snapshot, no matter where it is physically stored.

winnielinnie · June 8, 2024, 2:47pm

It’s automatically invoked if the source is encrypted, and you select either the “Include Dataset Properties” and/or “Full Filesystem Replication” checkboxes.

To elaborate, when dealing with an encrypted source, if you try using the -p and/or -R flags in a zfs send command, but you do not include -w, it will fail with an error message that says you must use -w. It will not automatically “add” the flag for you.

With the GUI, it invokes -w under-the-hood automatically, which is why it doesn’t immediately fail and prompt you to “try again with the raw stream parameter.”

marshalleq · June 10, 2024, 4:22am

I wasn’t sure that RAW didn’t mean something specific to the disk or local pool or whatever, but from what you’re saying it’s dataset related. So this would mean there should be some flexibility to go from or to various things such as dedup. In that vein, I actually wonder if there’s a way of sending deduped data to the backup system without having to dedup it at the source - and since it’s not in this case actively used, dedup shouldn’t use much in the way of memory right? I mean it’s just an unmounted dataset right?

I have successfully sent an encrypted source to an encrypted destination and tested it is using the source key, so that bits working well.

What I also need to do is convert original unencrypted datasets to encrypted. There are a few ways of doing that, but ideally I’d keep all the snapshots. It’s ‘nearly’ working, I say nearly because it keeps complaining there are no snapshots to send - when there are. I saw that using .* in the regex field for snapshots to send is supposed to send all, but it doesn’t seem to work. I’ll also want to send all snapshots to the destination system once source encryption is enabled too so it will be good to know in both situations. The existing snapshots and the replication doesn’t seem easy to align. This seems like it would be easier at the command line.

If it comes down to it though, I’ll just rsync everything over to a new encrypted dataset and create replication tasks that match the snapshots I want from scratch.

Written in a rush, please forgive any mistakes.

marshalleq · June 10, 2024, 5:07am

OK, I just got the unencrypted to encrypted sync to work with snapshots. Will have a play around and see if I can find the setting that was blocking it later. Going out now. This is fun.

etorix · June 10, 2024, 11:11am

You cannot replicate to an unmounted dataset. And writing to a deduped datset involves, for each and every block, looking up the DDT to see whether this block already exists; so either the whole DDT fits in RAM (ca. 5 GB per TB, with default settings) or the system is slow as molasses.

marshalleq · June 10, 2024, 9:36pm

Ah of course, it’s mounted, but not unlocked! So no dedup will be enabled on the foreign system then. Good to know, thanks.