Disclaimer: this question is not directly connected to TrueNAS. Let’s assume it is a general ZFS question.
On one of my systems (proxmox) I had a pool with ashift=9 (0 actually, but zdb showed the value of 9). This pool consisted of one 2-way NVME mirror. These NVMEs were formatted to 512e (being 4Kn); thus, aiui, causing the auto ashift 9 for the pool (and vdev).
All datasets/zvols had a recordsize <= 128K.
I’ve:
renamed this “old” pool (with export/import).
created a new ashift=12 pool (with a 4Kn-formatted drive underneath) with the same name.
taken -r snapshot of my encrypted root dataset.
sent this snapshot to the new pool with -R --raw.
After rebooting and unlocking the dataset (on the new pool) everything is working ok so far.
However! zpool list -v is showing almost 20% more allocated space (79G vs 67G) on the new pool.
While I have some assumptions about possible reasons, I would like to hear opinions/insights of ZFS-using veterans.
My first hunch is that something didn’t transfer over to the new pool or maybe block-cloning or deduplication, if applicable, was involved.
What if you compare datasets to each other and check block-cloning with zpool on both pools?
EDIT: It’s possible that the “less efficient” 4096-byte minimum writes could contribute to more space being used, but 20% seems too much.
EDIT 2: In terms of actual disk usage, I don’t think it would make a difference on drives with 4K physical sectors. No matter if you use ashift=9 or ashift=12, the drive cannot physically write less than a 4096-byte unit.
Not sure about block-cloning; I didn’t review this part of zfs yet. I do not use deduplication (even though I mb should).
I’ve already compared datasets but forgot to mention it. All datasets (at least that I’ve randomly checked) have this discrepancy in USED, and, what is more interesting, in REFER.
Again, idk how to check/troubleshoot block-cloning.
Yes, it is exactly my thought as well. I’ve read once (iirc from @mav’s post) that compression would be applied if at least one ashift can be saved. Thus, with raw replication, there could be old “tail” chunks that are less than 4K. But I doubt they would eat up 20%.
On second thought, perhaps I should watch the blocksize histogram for old and new pools.
Yeah, that is my assumption about the reason as well. ZFS thinks that it occupies less space, and NVMes just don’t report “true” values in the 512e case.
Welp, it was a mirror of 2 NVMe drives. I detached one drive, formatted it to 4K and created a single-drive pool with ashift=12. Now I have two single-drive pools. And yes, I have backups.
So psize and lsize match (within margin of error/new use) but asize is off.
Same compression? But it wouldn’t make sense for the new pool to use a less efficient algoritm than the old pool…
I can’t guarantee 100% the same settings for pools. However, I have the command for pool creation in the proxmox datacenter’s notes. With ashift=12, which iirc I’ve fixed later after this auto ashift=9 “discovery”. And I used this exact command to create new pool.
So let’s say I’m 99% sure pools have the same settings (apart from ashift).