Dataset sizes differ after replication

Hello.
I am confused about how a zfs send/recieve replication could have the size of the dataset differ.
Could you please explain, or point me to resources?
Below a description of what I did and what has me confused:

I have two Truenas Scale machines, both run the latest TrueNAS SCALE 25.04-RC.1.
Each has a RaidZ2 pool - one is my old machine, one the newer one.
Because I originally did not create the ZFS pool with Truenas it is not exactly set up right, so I wanted to copy all the data onto my old system, re-create the whole pool (the right way this time, without paritions) and re-migrate all the datasets back.

Here I noticed something strange:
While the data inside the datasets are identical (included total sizes of all files), the size of the ZFS dataset as displayed by TrueNAS/zfs is different (larger after replication):

Source:

Dataset 			Name Used / Available
RAIDZ2_Pool			11.42 TiB / 802.62 GiB

Backups				1.13 TiB / 802.62 GiB
Images				322.14 GiB / 802.62 GiB
VM_Backups			3.59 TiB / 802.62 GiB
XFS_RAID6_Content	6.38 TiB / 802.62 GiB

Destination:

Dataset 			Name Used / Available
RaidZ2				11.59 TiB / 2.81 TiB

Backups				1.17 TiB / 2.81 TiB
Images				332.22 GiB / 2.81 TiB
VM_Backups			3.7 TiB / 2.81 TiB
XFS_RAID6_Content	6.4 TiB / 2.81 TiB

All datasets are unencrypted, the Destination pool was freshly created with truenas for the purpose of this data-migration.

I did all the replication with the following commands:

zfs snapshot -r RAIDZ2_Pool/<dataset-name>@migration
zfs send -R RAIDZ2_Pool/<dataset-name>@migration | ssh root@TargetTruenasMachine "zfs receive -Fu RaidZ2/<dataset-name>"

I have done some checks and it seems the data was copied properly and all files are identical, also all snapshots where copied.

As you can see, especially the VM_Backups dataset is noticably larger (at least when reported by TrueNAS). How can it just use roughly 100 GigiByte more, or at least report it so?

When trying du -s on a dataset folder the files are identifal with --apparent-size, but the bytes taken up on the disk differ (I checked the Images dataset and there the actual bytes taken up are smaller then the apparent size on both machines/pools, but slightly more on the target.)

Thank you

What does this output on both pools?

zpool status -v <nameofpool>

zpool list -v -o name,size,alloc,free,bcloneused,frag,cap,dedup <namepfpool>

zfs list -t filesystem,volume -r -d1 -o space,compression,ratio <nameofpool>

Just any change in pool geometry, record size or compression algorithm would cause the same data to use different sizes on disk.

1 Like

Since this is rather lengthy, here is a link to the outputs of the commands on pastebin:

Output for Source Pool
Source ZFS Pool information:

root@StormwingTrueNAS:/home/truenas_admin# lsblk -o NAME,FSTYPE,SIZE,STATE,TYPE,LABEL,MODEL
NAME   FSTYPE              SIZE STATE   TYPE LABEL           MODEL
sda                        120G running disk                 QEMU HARDDISK
├─sda1                       1M         part
├─sda2 vfat                512M         part EFI
└─sda3 zfs_member        119.5G         part boot-pool
sdb                       12.7T running disk                 ST14000NE0008-2JK101
├─sdb1 linux_raid_member   6.4T         part stormwing-nas:0
└─sdb2 zfs_member          6.4T         part RAIDZ2_Pool
sdc                       12.7T running disk                 ST14000NE0008-2JK101
├─sdc1 linux_raid_member   6.4T         part stormwing-nas:0
└─sdc2 zfs_member          6.4T         part RAIDZ2_Pool
sdd                       12.7T running disk                 ST14000NE0008-2JK101
├─sdd1 linux_raid_member   6.4T         part stormwing-nas:0
└─sdd2 zfs_member          6.4T         part RAIDZ2_Pool
sde                       12.7T running disk                 ST14000NE0008-2JK101
├─sde1 linux_raid_member   6.4T         part stormwing-nas:0
└─sde2 zfs_member          6.4T         part RAIDZ2_Pool

root@StormwingTrueNAS:/home/truenas_admin# zpool status -v
  pool: RAIDZ2_Pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 04:14:06 with 0 errors on Sun Mar  9 04:38:07 2025
config:

        NAME                                      STATE     READ WRITE CKSUM
        RAIDZ2_Pool                               ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            43817a70-2f39-493c-a2c2-c2e75607cd77  ONLINE       0     0     0
            37f9ad44-99dd-4f6d-ab97-f301cb5824be  ONLINE       0     0     0
            sdd2                                  ONLINE       0     0     0
            sde2                                  ONLINE       0     0     0

errors: No known data errors

root@StormwingTrueNAS:/home/truenas_admin# zpool list -v -o name,size,alloc,free,bcloneused,frag,cap,dedup RAIDZ2_Pool
NAME                                       SIZE  ALLOC   FREE  BCLONE_USED   FRAG    CAP  DEDUP
RAIDZ2_Pool                               25.5T  23.6T  1.88T            0     1%    92%  1.00x
  raidz2-0                                25.5T  23.6T  1.88T        -         -     1%  92.6%      -    ONLINE
    43817a70-2f39-493c-a2c2-c2e75607cd77  6.37T      -      -        -         -      -      -      -    ONLINE
    37f9ad44-99dd-4f6d-ab97-f301cb5824be  6.37T      -      -        -         -      -      -      -    ONLINE
    sdd2                                  6.37T      -      -        -         -      -      -      -    ONLINE
    sde2                                  6.37T      -      -        -         -      -      -      -    ONLINE

root@StormwingTrueNAS:/home/truenas_admin# zfs list -t filesystem,volume -r -d1 -o space,compression,ratio RAIDZ2_Pool
NAME                           AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD  COMPRESS        RATIO
RAIDZ2_Pool                     803G  11.4T        0B    174K             0B      11.4T  on              1.02x
RAIDZ2_Pool/.system             803G  42.4M        0B    174K             0B      42.2M  on              1.50x
RAIDZ2_Pool/Backups             803G  1.13T     2.98G   1.13T             0B         0B  on              1.04x
RAIDZ2_Pool/Images              803G   322G      163K    322G             0B         0B  off             1.00x
RAIDZ2_Pool/VM_Backups          803G  3.59T     2.28T   1.31T             0B         0B  on              1.00x
RAIDZ2_Pool/XFS_RAID6_Content   803G  6.38T        0B   6.38T             0B         0B  on              1.03x

Commands to replicate the Pool (executed for each dataset):

zfs snapshot -r RAIDZ2_Pool/<dataset-name>@migration
zfs send -R RAIDZ2_Pool/<dataset-name>@migration | ssh root@<TargetTruenasMachine> "zfs receive -Fu RaidZ2/<dataset-name>"
Output for Destination Pool
Destination ZFS Pool:

root@GeorgNAS:/home/truenas_admin# lsblk -o NAME,FSTYPE,SIZE,STATE,TYPE,LABEL,MODEL
NAME        FSTYPE       SIZE STATE   TYPE LABEL     MODEL
sda                      3.6T running disk           WDC WD40EFRX-68N32N0
└─sda1      zfs_member   3.6T         part RaidZ2
sdb                      3.6T running disk           WDC WD40EFRX-68N32N0
└─sdb1      zfs_member   3.6T         part RaidZ2
sdc                      3.6T running disk           WDC WD40EFRX-68N32N0
└─sdc1      zfs_member   3.6T         part RaidZ2
sdd                      3.6T running disk           WDC WD40EFRX-68N32N0
└─sdd1      zfs_member   3.6T         part RaidZ2
sde                      3.6T running disk           WDC WD40EFRX-68N32N0
└─sde1      zfs_member   3.6T         part RaidZ2
sdf                      3.6T running disk           WDC WD40EFRX-68N32N0
└─sdf1      zfs_member   3.6T         part RaidZ2
nvme0n1                476.9G live    disk           Samsung SSD 950 PRO 512GB
└─nvme0n1p1 zfs_member 474.9G         part SSD
nvme1n1                232.9G live    disk           Samsung SSD 970 EVO 250GB
├─nvme1n1p1                1M         part
├─nvme1n1p2 vfat         512M         part EFI
└─nvme1n1p3 zfs_member 232.4G         part boot-pool

root@GeorgNAS:/home/truenas_admin# zpool status -v
  pool: RaidZ2
 state: ONLINE
config:

        NAME                                      STATE     READ WRITE CKSUM
        RaidZ2                                    ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            ff88d1ef-0d99-477f-9654-85b7319c7e49  ONLINE       0     0     0
            104d3931-4680-49e9-aa05-d94ff5c36dc0  ONLINE       0     0     0
            fe419981-fe00-4e4d-90ce-9bf54d778d6a  ONLINE       0     0     0
            8e1ba9ba-84c8-427f-93e7-1490272fc587  ONLINE       0     0     0
            52b0729a-59d8-4a16-a642-126de65708bf  ONLINE       0     0     0
            5d0321a5-347e-4463-939d-ff4a253ac501  ONLINE       0     0     0

errors: No known data errors

root@GeorgNAS:/home/truenas_admin# zpool list -v -o name,size,alloc,free,bcloneused,frag,cap,dedup RaidZ2
NAME                                       SIZE  ALLOC   FREE  BCLONE_USED   FRAG    CAP  DEDUP
RaidZ2                                    21.8T  17.4T  4.41T            0     0%    79%  1.00x
  raidz2-0                                21.8T  17.4T  4.41T        -         -     0%  79.8%      -    ONLINE
    ff88d1ef-0d99-477f-9654-85b7319c7e49  3.64T      -      -        -         -      -      -      -    ONLINE
    104d3931-4680-49e9-aa05-d94ff5c36dc0  3.64T      -      -        -         -      -      -      -    ONLINE
    fe419981-fe00-4e4d-90ce-9bf54d778d6a  3.64T      -      -        -         -      -      -      -    ONLINE
    8e1ba9ba-84c8-427f-93e7-1490272fc587  3.64T      -      -        -         -      -      -      -    ONLINE
    52b0729a-59d8-4a16-a642-126de65708bf  3.64T      -      -        -         -      -      -      -    ONLINE
    5d0321a5-347e-4463-939d-ff4a253ac501  3.64T      -      -        -         -      -      -      -    ONLINE

root@GeorgNAS:/home/truenas_admin# zfs list -t filesystem,volume -r -d1 -o space,compression,ratio RaidZ2
NAME                      AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD  COMPRESS        RATIO
RaidZ2                    2.81T  11.6T        0B    224K             0B      11.6T  lz4             1.02x
RaidZ2/Backups            2.81T  1.17T     3.85G   1.16T             0B         0B  lz4             1.02x
RaidZ2/Images             2.81T   332G      208K    332G             0B         0B  off             1.00x
RaidZ2/VM_Backups         2.81T  3.70T     2.34T   1.35T             0B         0B  lz4             1.00x
RaidZ2/XFS_RAID6_Content  2.81T  6.40T        0B   6.40T             0B         0B  lz4             1.03x

My plan now is to wipe the disks where the original RAIDZ2_Pool pool is and re-create it, since it is not formatted in a way ideal to TrueNAS (see the lsblk output I also added to see what I mean,
the partition structure I have there is a bad Idea, I will make one big 4x 14TB RaidZ2 pool out of it.

How can I create the new Pool so it is more efficient when it comes to use of storage space?
It seems the compression ratio is better on the original pool,
but that’s probably not everything that effects size of the datasets?

Thanks!

I think this is it. :point_up:

Did you ever “expand” the RAIDZ vdev by adding extra drives to it?

No need for expansion. Just raidz2 with different widths would result in different padding and different used space for the same data.

As John Cleese could have said: “I’m not especially qualified to confuse cats users, but I can recommend an extremely good file system.”

1 Like

Just any change in pool geometry

Do you mean for instance changing to RaidZ1 or are you talking about something different? I am not sure what is meant by pool geometry.

Did you ever “expand” the RAIDZ vdev by adding extra drives to it?

No. The original pool was created on a Open Media Vault system and since then the general structure for the pool and datasets has not changed since setup.

Anyway, I am still not sure I understand why the dataset sizes differ after a replication.
Is it because of the way I setup the Pool?

The pool information almost looks identical.
When I created the new pool on TrueNAS (the target one called RaidZ2) I used standard settings except I raised the Record Size to 1M since I store mostly big files (Videos, Images and Audio) and otherwise the performance would not be good.
However the Original pool datasets use the same Record Size as the target, just that the default of the pool is not 1M but 128K.

I have added output of the commands
zfs get all and zfs get all /VM_Backups to the pastebin.

Just about anything about the layout. Mirror vs. raidz1 vs. raidz2 vs. raidz3 (will take increasing space to store the same) for sure. But even raidz width could matter.

Here is another difference. Just a little bit of padding with larger records would do.

Scrub and trust ZFS that the data is sane and safe. But do not attempt to rely on size to compare.