Moving folder in another dataset == file duplication?

jeanmich · May 9, 2024, 7:06pm

Hi,

I’ve a question regarding ZFS duplication of “physical” FS object.

Let’s say I have this layout

data_zpool
- mydata_dataset
  - file_1 (2GB)
  - folder_1 (5GB)

Total Space consumed is 7GB. Now let’s say I make a snapshot of mydata_dataset.

Then I create a new dataset in data_zpool, called mydata2_dataset.

data_zpool
- mydata_dataset
  - file_1 (2GB)
  - folder_1 (5GB)
- mydata2_dataset

If I move the folder folder_1 in mydata2_dataset, what happen?

Hypothesis 1: “hard copy” of the file is done, thus we have the following, total space consumed is now 2+5+5 = 12GB

data_zpool
- mydata_dataset
  - file_1 (2GB)
  - (folder_1_snapshot (5GB))
- mydata2_dataset
  - folder_1 (5GB)

Hypothesis 2: “linked copy”, so total space consumed would still be 7GB:

data_zpool
- mydata_dataset
  - file_1 (2GB)
  - (folder_1_snapshot (0 GB))
- mydata2_dataset
  - folder_1 (5GB)

Which one is good?

To put it in a nutshell, I want folder_1 to be a dataset instead of a folder, but I have no enough space for a 2nd copy of the folder_1.

Thank!

etorix · May 9, 2024, 7:53pm

New copy of the folder and its content. Datasets are somewhat like partitions with other file systems.

winnielinnie · May 9, 2024, 8:27pm

With block-cloning enabled, this is what happens.

However, block-cloning has been disabled in upstream OpenZFS as a precautionary safeguard in the meantime while they investigate the potential for “silent” corruption.^[1]

Such corruption was reproduced in synthetic tests, however, at least one person discovered it from regular usage on ZFS by compiling software. The latter is unlikely to be an issue for TrueNAS users, since we normally interact with our data via SMB, NFS, or “slowly” with normal operations. Long story short, it’s possible that you can “safely” re-enable block-cloning, and you’ll not likely be a victim of silent data corruption, but it depends if this is worth it to you. ↩︎

winnielinnie · May 9, 2024, 8:32pm

Why not do the “move” operation before creating a snapshot on the first dataset?

Is this a real situation or only hypothetical? Tread carefully if it’s something you actually need to do with larger sizes. (Backups and all, you know the lecture…)