Hardlinks accross Datasets?

gigagames · September 30, 2024, 9:30pm

Hello,

i have 2 Datasets on my Truenas Scale

Downloads
Videos

Files get downloaded by qBittorent to the Downloads dataset, and should stay in there, but additional there should be placed in the Videos Dataset,

is this somehow possible without copying the file and taking additional space?

winnielinnie · September 30, 2024, 9:34pm

Across datasets? Yes, as long as it’s within the same pool.^[1]

Hardlinks? No. Not across datasets.

Block-cloning? Yes.^[2]

Your pool needs to be “upgraded” to support the latest features.
Block-cloning needs to be enabled as a ZFS module parameter. (I believe TrueNAS Core 13.3 and SCALE 24.x enables it by default.)^[3]

You can check with the command:

cat /sys/module/zfs/parameters/zfs_bclone_enabled

A 0 means it is disabled. A 1 means it is enabled.

Otherwise, the other approach is to use “symlinks”.

Block-cloning does not work across datasets if you are using encryption. ↩︎
The standard cp tool should leverage this feature, as well as many other common tools. ↩︎
“Upgrading” a pool is a one-way action. You will not be able to import the pool into an older system. ↩︎

Protopia · September 30, 2024, 10:49pm

Just to provide a little background to Winnie’s explanation:

AFAIK, there are (possibly - I may be wrong) five ways of duplicating a file without using space on a ZFS pool (I am not a Linux / ZFS expert so I may have this wrong):

Hard links - every file has a default hard link from the file name to the inode. Creating a hard link with ln creates a new direct link from the new file name to the same inode - so you have two filenames pointing to the exact same file. ~~I am not sure what the restrictions are in ZFS for doing this between 2 datasets or 2 pools, but~~ Because they point to the same inode (which I think is dataset specific because each dataset is individually mounted) ~~I suspect that~~ they need to be in the same dataset. You can delete the original (or duplicate) file-name and the other file-name still points to the same data. If the data is modified using either filename, then the other filename sees the updated data.
Soft / symbolic links - this is like a web page redirect but you redirect one filename to another. The symbolic link does not survive intact if the original file is deleted or not mounted or mounted in a different path. Because it is a redirect from one path to a different path I suspect that it can happen across different datasets and pools.
Dataset cloning - this creates a copy of an entire dataset snapshot into another dataset ~~across pools~~ in the same pool without using any additional space until the files are changed. It is a bit like a snapshot, except it creates a duplicated set of directories and files elsewhere in the pool’s sub-tree. Like Block cloning copies, if you change either the original dataset or the cloned dataset, then the data starts to diverge.
Block cloning - this is similar to dataset cloning but for individual files rather than an entire dataset and ~~can be done across different pools~~ also needs to be in the same pool. Some utilities like the standard Linux cp command do this automatically. If you change either file, then the other file still points to the original data blocks.
De-duplication - this is absolutely NOT recommended - don’t do it!! - but it should create a duplicate without needing extra space.

These methods also have different ways that security permissions on the original affect the copy and you will need to understand these too.

P.S. I have edited this entry twice to blur the stuff I got wrong and correct it. Thanks to the other users who have put me right where I got it wrong.

winnielinnie · September 30, 2024, 10:55pm

Good unraveling, but a couple corrections!

This is actually within a pool, not across pools.

This is also within a pool, not across pools. (Pools have their own Block-Reference Table, “BRT”.) The only exception for within a pool is encryption. An encrypted dataset cannot use block-cloning across datasets of the same pool (only within its own dataset.)

Protopia · October 1, 2024, 8:31am

That is what I had intuitively guessed to start with - linking to blocks in another pool sounds extremely difficult and you would have to maintain references to the 2nd pool so that when you change files in the first pool the blocks are not released. But then I read that the clone is based on a snapshot, and the snapshot has a reference that effectively stops it being deleted, so it seemed possible to do it across pools.

And then I read the (Oracle) ZFS documentation which states: “The new file system or volume can be located anywhere in the ZFS hierarchy.” It did not state “anywhere in the ZFS hierarchy in the same pool.” But of course A) Open ZFS is not the same as Oracle ZFS [1] and B) I might have misunderstood the meaning of “anywhere in the ZFS hierarchy”.

But now that you have pointed this out and I have read further I see that the zfs promote command is used to transfer the entire ownership of the snapshot that was used for the clone from the original dataset to the cloned dataset, effectively reversing the hierarchy and allowing the original dataset to be deleted without losing all the original data blocks at the point of cloning, and of course this can only be done when the clone is in the same pool as the original dataset.

And this is why I heavily caveated my post to point out that I am not a Linux or ZFS expert - and thanks for helping improve my knowledge. (I have blurred the original errata and added corrected text.)

[1] The Open ZFS documentation for dataset cloning uses exactly the same language: “The target dataset can be located anywhere in the ZFS hierarchy”.

pmh · October 1, 2024, 9:34am

Exactly. Hard links are limited to a single file system.

Protopia · October 1, 2024, 10:52am

Thank you. I have adjusted what I said about hard links too.

gigagames · October 2, 2024, 5:54pm

Thanks you guys for the past reply.

So Block cloning it it.
Both datasets are in the same Storage-Pool

I think the pool is on the latest available version (i upgraded some weeks ago) and zpool status doesn’t show me a warning saying something else.
I run the cat command, and got a 1 as result.

But after coping a ~70GiB file, it seems like something is still missing
The dataset had before the transfer ~55GiB and after the transfer ~127GiB of disk usage, and the Used Capacity of the Storage also increased by ~70GiB.

Edit: Non of the drives / Datasets has encryption
Edit2: if i run zpool get all | grep -e bclone -e block_cloning my Storage Pool gets shown as active

winnielinnie · October 2, 2024, 6:07pm

What tool did you use to “copy” the file?

gigagames · October 2, 2024, 6:10pm

Just plain old cp directly from the truenas console.

The standard cp tool should leverage this feature, as well as many other common tools.

edit:

I had a look into the openZFS doku there it says

This feature becomes active when first block is cloned

so it seems it has done “something” “sometime” (and saved 110K…) but between the 2 Datasets i want it to work, it right now does’t work.

It also says:

under some conditions (like equal recordsize

The recordsize and all other settings (from the Truenas webUI) are the same for both datasets

https://openzfs.github.io/openzfs-docs/man/master/7/zpool-features.7.html

winnielinnie · October 2, 2024, 6:58pm

As a test, what if you try --sparse=never with your cp command?

awalkerix · October 2, 2024, 7:03pm

That is a good point. cp is a rather complicated tool and so it is possible to use it in such a way as to not perform block clones. The tool needs to issue the copy_file_range() syscall. --reflink=always might also help. In principle it is possible to write a python tool to iterate two duplicate trees that were copied via incorrect mechanism and deduplicate the data by issuing the relevant ioctl on every file (though a fresh copy is probably faster).

gigagames · October 2, 2024, 7:06pm

If i run cp with the command the file transfer finished instantly
bcloneused shows now 68.8G.
But the Storage Dashboard still shows an Usage increase of ~70GiB

Protopia · October 2, 2024, 7:40pm

If you want to know whether Block Cloning is enabled on a pool you can run sudo zpool upgrade will tell you if there are features not yet enabled.

You should expect to see features missing from boot pool (which is completely normal - never upgrade the feature sets on the boot-pool) but not on other pools.

gigagames · October 2, 2024, 7:52pm

Ok it seems like the Usage shown on the Storage Dashboard used the “used” value which will always increase even though no additional space got used.
If i check with zpool get allocated
No additional space i taken

so the solution for me looks like running cp with --sparse=never
(now i need to figure out, how to get radarr to use cp with --sparse=never)

Protopia · October 2, 2024, 8:54pm

You also need to check that the from and to datasets have the same recordsize otherwise block cloning won’t work.

Here is my guess about how cp handles sparse files depending on the --sparse= setting.

I think that there are 3 types of file:

actual sparse files
non-sparse files with sparse-like data i.e. reasonably long sections of all null characters
non-sparse files with no sparse-like data

and that there are 3 settings for this parameter:

--sparse=auto (default) - attempts to detect whether the file is sparse by a crude algorithm at the start of the copy e.g. by looking for sections which are all nulls in the early parts of the file, and then decides for the whole filewhether to look for and replace full blocks of nulls with a sparse equivalent based on this guess
--sparse=always - always replace full blocks of nulls with the sparse equivalent
--sparse=never - don’t replace full blocks of nulls with the sparse equivalent but just copy it as-is.

Block cloning happens when the blocks to be written are identical and I am guessing that if ZFS with default compression if you change anything in a block then you will likely change whatever blocks follows that block too.

So it is hit and miss whether for a particular file --sparse=auto is equivalent to --sparse=always or --sparse=never, but if it copies with --sparse=always then any long sections of null characters will result in changes to that block and subsequent blocks and so they won’t get cloned.

There is also a cp option --reflink= which has an impact on block cloning. Options are:

--reflink=auto (default in coreutils >= 9.0) - try to block clone, but fail-over to full copy if needed.
--reflink=always (default in coreutils < 9.0) - fail the copy if block-cloning is not possible (not sure when this could be the case).
--reflink=never - always duplicate the data.

So I think the default in SCALE is --reflink=auto which is fine.

How to fix this in radarr? Assuming that radarr actually uses cp under the covers (and I haven’t checked) then what you might like to do is something like creating an alias for the UID used by radarr as follows: alias cp='cp --sparse=always', but of course this normally applies onto to interactive calls to cp and not those called from a programme or within a script, so it probably wouldn’t work with radarr. Other than this, I have no ideas.

winnielinnie · October 2, 2024, 8:58pm

Probably because of the different ways of estimating “used” storage when multiple datasets are involved. After all, you have two “different” files on two different datasets that are referencing the same blocks of data. If you delete one of these files, you’ll see a drop in “used space” of only that dataset, but not the other… and yet zero change of the actual amount of data consumed by the pool at whole.

Not familiar with Radarr, myself. Where in the application do you tell it to “copy” media?

awalkerix · October 2, 2024, 9:26pm

Typically non-trivial applications (things that aren’t collections of shell scripts) don’t use cp. Various programming languages provide wrappers around the unix syscall interfaces. In order to make a block clone, you simply need to use the syscall copy_file_range() which was explicitly developed on FreeBSD and Linux to provide an efficient mechanism for copying from one file to another with passing through userspace.

ZFS block cloning is wired together such that copy_file_range() simply does a block clone (which is why cp basically worked without us having to touch it).

Not all applications and languages use the most modern / correct mechanism to perform a fast copy. For instance python’s shutil.copytree will actually open the source and destination and use sendfile to perform the write (which means no block cloning).

The way to fix this for various applications is to:

determine that they’re using an older implementation (or non-ideal one)
file a bug report / feature request with the project to implement the newer syscall
or
write your own patch, validate it works, and submit upstream

gigagames · October 2, 2024, 9:50pm

Thats what i have guessed, but on the Storage page, where it shows the total Usable Capacity / Used and Available the increasing used space is shown, not the actual amount of the pool at wohle

You don’t really tell it, you tell it where the Source file are and the destination, it then copies the files and renames them properly

Protopia · October 2, 2024, 10:00pm

Radarr is written in C#.

C# and dotnet was updated to use copy_file_range() in Jan 2022 for dotnet 7.0.0 which was released in November 2022.

Unfortunately according to the radarr developers wiki radarr is still using .NET6.

I have no idea whether .NET7+ is source code compatible with .NET6 or whether it will be relatively easy to make the TN …arr apps work with .NET7+ on your own system. You will need to research that for yourself.