Hi, when sending/receiving a dataset to a single hard disk drive for backup, i noticed that it’s difficult to estimate how much space the compressed dataset willeventually used recompressed on the new pool, even with the same settings, same content.
For example, i have a dataset with 14.03TB used, but logcially referenced, it’s 17.1TB, recordsize 1MB. Sending this dataset to my drive with 14.7TB free (i have used -p to verify Bytes), it always fails with no more space, even though the compression ratio is identical. I have used raw and -Lec, makes no difference. It’s only one snapshot.
When i make sure, the free space at the destination is higher than logicalreferenced, it works for sure.
But shouldn’t the same dataset with the same content, same recordsize, same settings, same compression, use the same space on two different drives when sending raw? Where is my misunderstanding? (BTW: There are people who suggest that send/recv will not change the dataset at all, which in my findings is untrue)
It should. Not sure what happens in your example above.
Just to be sure, you are talking about the destination dataset, and you are talking about Replication tasks, right?
That is setting. AFAIK either “include dataset properties” or “Full filesystem replication” will do this. Default zfs send probably not.
ah, i see - you mean i can try with -R (or at least with -p) from Scale? I’ll check. Thankyou!
not sure i get it. You mean copying at any time? That would be besides the point for a FS, wouldnt it? For sure i copied files on the source pool. But the snapshot was taken fresh with no changes. And i certainly hope it’s block-cloning… otherwise i could have been using rsync but i want to get away from this.
i have four times moved approximately 210GB each from the dataset, this is getting tedious.
Last time i monitored with zfs get all -p | grep TV on the destination and the logicalreferenced went up to 17.03TB, that was with 14.6TB free initially on the destination and the source dataset reported with 14.03TB (like above). So i assume if i move another 400GB to reach 15TB free as you suggest, the logicalreferenced will reach the 17.1TB as in the source and i am pretty sure this will work (only 70GB are missing for 17.1TB).
But that’s a huge difference from the “USED” to my estimation. You know, i check in the TrueNAS Scale GUI under datasets, i see a figure and i compare the figure with what Unraid shows as “FREE” on my backup disk. If i just can not follow this process and/or don’t find a procedure to calculate how to fill my backup disks efficiently, the whole backup send/recv is pointless.
I seem to have a misunderstanding somewhere but i cannot figure out where.
I did not expand it. The pool consists of 10 spindles in a RAIDz2 configuration and i am pretty sure i started fresh with the pool using all 10 drives from the beginning.
This might be a reason, but only if you’re cutting it close between actual space consumed from data blocks in the source RAIDZ and free space on the destination mirror or stripe.
I just confirmed it should work with two test pools that each have a capacity of 3.75 GB.
On the source pool, I created a dataset with ZSTD compression and saved 4.71 GB of data on it, which was transparently compressed to only consume 3.53 GB. USED = 3.53 GB LUSED =4.71 GB
I took a snapshot and replicated it to the 3.75 GB destination pool, using the -w and -R flags.
It completed successfully, even though the LUSED on the source dataset exceeded the capacity of the destination pool. The end result was a dataset on the destination pool with LUSED of 4.71 GB and USED of 3.53 GB.
This is why I think RAIDZ → single-drive stripe and the 5 GB of cloned blocks might be just enough to make it slightly too big to fit in the free space of the destination.