Best way to encrypt existing dataset using GUI

I have a TNC13 system (Online) with a dataset (unencrypted) that is routinely replicated to another TNC13 system (Local Backup) locally via LAN. I now want to backup to a third TNC13 system (Remote Backup) remotely via VPN over Internet. This third system is not owned or managed by me hence I would like that third system’s dataset to be encrypted. Although I am reasonably adept with CLI, I would like to use the GUI as TN is an appliance and skirting the GUI can occasionally compromise the consistency of the system (not the data unless your commands are wrong!).

So the obvious is to use a replication task and check the “encryption” box for the destination. This requires the destination dataset to be unlocked for all but the very first replication (when there isn’t one). This is not acceptable for the Remote Backup.

Another possibility is to encrypt the source dataset and not check the encryption box for the destination in which case it will be sent raw and does not need to be unlocked at the destination. This mild compromise (encrypting the source) works fine for me. I would have to use key not passcode as I want it to come up unlocked on the Online system. It just means I have to be SURE to keep the key online and offline or I’m in deep doodoo if my system crashes!

So this begs the question: what is the best way to encrypt the dataset within the GUI. My best guess is the following:

  1. one time push replication to anywhere (including the same pool) with the encryption box on the destination checked. Save the key on the host system.
  2. delete the dataset on the Online system
  3. click on the Restore button of the replication task on the Online system
  4. unlock the restored dataset with the key from the original replication (noting that the file has the wrong pool/dataset name).
  5. delete the one time replication and the temporary dataset

Then I can just set up a periodic snapshot/replication without the encryption box checked on the destination but with Include Dataset Properties set.

So I ask. Will this sequence work? What about:

  1. Where do I find the dataset key on the Online system?
  2. How will this affect any Windows or NFS shares I have set up on the Online system for this dataset
  3. Anything else of concern?

This seems pretty simple and fully GUI based. Restore in the case of failure of the pool or dataset would be to click on the Restore button for either replication task (Local Backup or Remote Backup).

Can anyone validate this or, perhaps, tell me a better way to accomplish a replicated, encrypted, locked backup on a third machine?

I have read MANY posts from many people (@winnielinnie standing out!) but few have my GUI based constraint.

You’ll need to employ “raw streams” then, since in order to write new blocks to the destination, the dataset would need to be unlocked. Otherwise, whoever has physical access to the third server will be able to access the data within your dataset.

With a “raw” stream, nothing needs to be unlocked: not the source, not the destination, not even for a single second.

Keep in mind that the person on the other side will be able to read:

  • dataset names
  • snapshot names
  • ZFS properties
  • total capacity used by the dataset and each snapshot

I’m having trouble following those steps. Are you essentially creating a “sister dataset” on the same (local) pool via a full replication (with encryption enabled), and then destroying the original plain dataset?

Do you have enough free space? Do you feel comfortable managing and safeguarding your encryption keys?

1 Like

Thanks, @winnielinnie for looking at my issue. I apologise for not being clear or concise. Let me try again.

Yes. This I understand (thanks mostly to your posts elsewhere!). This is why I must change the unencrypted dataset on the Online machine to encrypted.

Ah yes, I forgot! Doesn’t affect us in this case but could if our dataset names were informative (they aren’t).

No. My overall intent was ONLY to encrypt the Online dataset so I could then use raw replication to the Remote site. I propose doing this by encrypting the Online dataset by replicating+encrypting to another temporary dataset anywhere (step one above) then restoring from the temporary dataset using the restore function of the replication that created the temporary (steps 2, 3 and 4 above) then deleting the temporary (step 5 above). Should leave me with my original Online dataset. but now encrypted. Nothing more. Hopefully nothing less (e.g. shares).

Yes, I have the space to replicate to. I wish I could do this without keys but I don’t see any way to make my backup private (locked) without encryption and that requires keys or passcode. Passcode won’t work for us as it will not allow the system to boot unattended. So I guess we have to safeguard our keys.

Right.

I get lost here. I’m not sure if it’s the terminology or how you’re describing it?

If you delete the “temporary” (encrypted) dataset, you lose the ability to leverage the privacy and security of encrypted raw streams.


If it helps, remove “SSH” and “networking” from the equation.

Imagine you’re speaking over the phone to the person who physically owns and controls the backup server.

You would have to instruct him what commands to use to receive the replication stream.

“Now use the zfs load-key command and I’ll provide the keyfile XXYYZZ to decrypt the dataset. Done? Okay, so the dataset is unlocked? Great. I’m sending the incremental replication now. Once it’s finished sending, I need you to use the zfs unload-key command to lock the dataset again. I would greatly appreciate that. Thank you.”

If you use “raw streams”, there is no unlocking/locking. The dataset remains locked at all times. (Even the source dataset can remain locked.)

But in order to use “raw streams” (and its added security/privacy benefits) for encrypted backups, the source must be encrypted. The source cannot be a plain unencrypted dataset.

I see why you are confused. What’s in my head should be in the post! My apologies.

Since you have confirmed that encrypting the Online dataset and using raw replication is the recommended way to keep my data private from Remote and my current backup replication is already raw all I need to accomplish is encrypting the Online dataset and duplicate my existing Backup replication with appropriate minor edits for Remote and I will be successfully backing up to Backup and Remote and neither of those need be unlocked.

My 5 steps are my proposal for how to encrypt the Online dataset, NOT how to back up the Online dataset to Backup or Remote. It is the conversion to encrypted that I am trying to accomplish without nasty side efffects such as invalid key or shares being lost.

So very sorry for the confusion. I hope this makes it clearer. The five steps (and associated three questions) all pertain to the process of encrypting the existing dataset.

I took a page out of the book of @winnielinnie and experimented on a testbed. Here is how to encrypt a dataset from the TNC (13.0-U3.1) GUI with no command line entry (in the spirit of TN Appliance). This procedure replicates the dataset so you need an accessible pool with enough space to hold the replication. As always, make sure you have a good backup.

Warnings:

  1. Backup first.

  2. I did this using two separate pools on one machine. It is possible that there could be an issue with doing it across two machines (Key location?).

  3. Ensure your dataset is unused during this procedure or you will lose all data changes that happen between taking the first snapshot and completing the process.

  4. My system is fairly straightforward and the only dataset dependencies I checked were Samba and NFS shares. Any other dependencies (e.g. Apple shares, dataset part of existing replication etc.) are unknown but I strongly suspect they would be affected and must be manually repaired (like the Samba and NFS shares) so be very careful!

  5. The parent dataset of the replicated dataset was not encrypted (probably matters only when you are decrypting as mentioned below).

  6. The new encrypted dataset is also a new encryptionroot.

This looks long but takes less than than 10 minutes of “work” plus the time for the two transfers to take place.

Here we go!

  1. Create a snapshot for transfer (<1 minute)
    a) Go to Storage | Pools, right click the three dots on the right of the source dataset and choose Create Snapshot
    b) Tick the recursive box then click Create Snapshot

  2. Replicate to a temporary location (1 minute)
    a) go to Tasks | Replication Tasks and click ADD
    b) enter source and destination datasets (and SSH info if either is on a different machine)
    c) check Recursive under the source
    d) check Replicate Custom Snapshots under the source and change “auto” to “manual”
    e) check Encryption under the destination
    f) select Encryption Key Format (Hex or Passphrase) as desired
    g) click Next
    h) click Start Replication (it won’t actually “start”)
    i) click the “>” to the right of your new task then click RUN NOW for this task

You now have a replicated copy of your dataset that is encrypted (and ReadOnly). Your encryption key is stored on your source machine if you used HEX (save it off the machine!!!) and we are halfway done!!!.

SAVE THE ENCRYPTION KEY!!!

  1. Reverse the Replication (<1 minute)
    a) go to Tasks | Replication Tasks and click the “>” to the right of your new task and choose Edit
    b) swap the source and destination choices
    c) uncheck Encryption under the destination
    d) under “Also Include Naming Schema” (further down below source) type manual-%Y-%m-%d_%H-%M
    e) uncheck scheduled
    e) then click SAVE

  2. Save Dependencies (2 minutes)
    a) for each replication that involves this dataset as a source, edit it to refer to the new replication
    b) for each periodic snapshot that involves this dataset, edit it to refer to the new replication
    c) for each share that involves this dataset, edit it to refer to the new replication
    d) I’m sure there is more…

  3. Delete the original dataset (you DO have another backup don’t you???)

  4. Start reverse replication (10s)
    a) go to Tasks | Replication Tasks and click the “>” to the right of your new task and choose Edit
    b) click RUN NOW for this task

  5. Fixup and Restore Dependencies (2 minutes)
    a) unlock the destination dataset with the encryption key you saved
    b) mark the destination dataset as Read Only = Off (assuming it was not readonly to start)
    c) for each replication that involves the replicated dataset as a source, edit it to refer to the original dataset (now encrypted)
    d) for each periodic snapshot that involves the replicated dataset, edit it to refer to the original dataset (now encrypted)
    e) for each share that involves the replicated dataset, edit it to refer to the original dataset (now encrypted)

  6. Tidy up system (1 minute plus testing)
    a) check over everything
    b) delete the replicated dataset
    c) delete the replication task we created

Note that to go from encrypted to non-encrypted is almost exactly the same. The big difference is step 2e and 2f. Instead of checking Encryption below the destination, uncheck Include Dataset Properties underneath the source.

Hope this helps anyone trying to add or remove encryption.
Keith

1 Like