24.10 RC2 Raidz expansion caused miscalculated available storage

Isn’t ZFS wonderful? :innocent:

So if my pool shows “73 GiB available space remaining”, I could still save a 103-GiB video file, if I wanted to.

Don’t ask how! Something to do with parity, calculations, block-striping, vdev width, new vs old math, “compression” (for a video file, of all things), and… good ol’ ZFS magic!

It’s actually very intuitive.

2 Likes

Video files do not compress like that. It’s not possible.

2 Likes

this is all because zfs does not want to recalculate parity data (which I forced it to do with the funny script anyhow)… so zfs has to use old values for calculating available storage, so my pool is left in a zombie state where who knows how much space is taken up by anything.

2 Likes

Then it sounds like “RAIDZ expansion” isn’t a fleshed out feature. It does what it’s meant to do… and that’s it. It stops there. From a developer’s and engineer’s perspective, the work is finished. No followup on real-world usage.

From an end-user’s perspective, it’s important (and expected) to be able to intuitively and pragmatically understand how much space is being consumed by a file, how large the pool’s total capacity is, and how much free space is available.

A 30% discrepancy is not acceptable.

4 Likes

Right, but as I said elsewhere, this is the first release with this feature and I wouldn’t personally use it until a few zfs versions later to let the bugs be fleshed out. Same reason I won’t install Eel 24.10.0. It is not surprising at all to me there may be issues, possibly even data loss. Just my opinion.

1 Like

Yep exactly, making a ZFS target with cheap SAS drives, replicate over just the data I actually care about, destroy 5-wide and make an 8-wide, replicate the other way.

Between replication and burn in it took about 2 weeks.

The raidz expansion looks more comfortable, and doesn’t need a second system to replicate to. If / when the display oddities can be fixed, it should be solid. Another 3 years? :sweat_smile:

1 Like

That’s the equivalent of one week in “GIMP 3 development time”.

What’s going to arrive first? The “Year of the Linux Desktop” or the “Year of GIMP 3’s Release”?

3 Likes

If there was data loss with basic usage of the system at this point in the development I would be very concerned. For me it’s more of an annoyance that the storage usage and availability isn’t reporting correctly. As long as 24.10.0 or 1-2 iterations later fixes the issue for me the tradeoff is worth it to get the extra space with RAIDZ expansion. Yes it is annoying that it appears that there’s less space available, but unless it’s a production system I’m not worried about it.

If I had the drive space to do this with ~15TB I would have done it this way. That being said I’m not buying extra drives just to temporarily copy a bunch of data to them before recreating the new pool. This becomes even tougher as data usage increases.
Also, please be MUCH less than 3 years😅

I don’t work in tech, but if I reported that something using X amount of something but I knew it was actually using Y amount and it caused a negative outcome I’d expect to get fired.

OTOH, the reality is that no boss of mine would ever figure what had gone wrong so :person_shrugging:

2 Likes

So… what ya do, is you work out how many raidz stripes are filled… then you multiply that by the original raidz width… and now you know the apparent size.

Or something dumb like that.

Does ZFS not know the parity ratio of every file written since the expansion finished enough to enable the new parity ratio?

Should not the new parity ratio be used when working out available size?

1 Like

It’s actually not that complicated. There’s a quick formula you can use to demystify how much total capacity you actually have and how much space is available.

Try a 2wZ1 to 10wZ1 expansion…

2 Likes

The correct implementation in Truenas should have been to expand the vdev, run the scrub, and then immediately run the rebalance script, not as a random user, but as part of the expansion process and then force the recalculation of pool size.

Now someone will come along as say “but we couldn’t do that”. This might be a ZFS expansion issue, but it’s solvable for the appliance OS that iX is building.

And if the rebalance script (or an implementation created by iX) still has this capacity error issue, then I think iX have a big problem with a major part of this Scale release from a marketing perspective.

1 Like

Regarding “allocation size” in samba: this is what is happening. ZFS gets count of 512 byte blocks for specified object and returns as st_blocks in stat(2) output. Samba and other applications multiply this by 512 to determine the “allocation size” as opposed to st_size.

So no samba bug here.

That said, ZFS space accounting is complicated.

1 Like

Understatement of the century.

2 Likes

I’ve just created a VM, with EE RC.2 installed… and 10 1TiB virtual disks.

Created a 3wZ1 pool, then extended twice…

root@truenas[/home/truenas_admin]# zpool status test_pool
  pool: test_pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:00 with 0 errors on Thu Oct 17 20:27:26 2024
expand: expanded raidz1-0 copied 6.02M in 00:00:01, on Thu Oct 17 20:27:26 2024
config:

        NAME                                      STATE     READ WRITE CKSUM
        test_pool                                 ONLINE       0     0     0
          raidz1-0                                ONLINE       0     0     0
            115cd1b7-a793-414e-a129-45e2b957b1bd  ONLINE       0     0     0
            3e4a8cf3-9b3e-4e62-86e8-130ef583a51c  ONLINE       0     0     0
            c68390ef-3625-42ba-a584-aa9d6f6b3516  ONLINE       0     0     0
            d3cc02b6-de80-4c5e-a53a-1ff18ddb6e58  ONLINE       0     0     0
            e224f009-c449-4ca9-852c-c792f6e86a0e  ONLINE       0     0     0

errors: No known data errors

Okay, that looks fine.

root@truenas[/home/truenas_admin]# zpool list test_pool
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
test_pool  4.98T  5.02M  4.98T        -         -     0%     0%  1.00x    ONLINE  /mnt

Okay, 5w so 5T “free”

root@truenas[/home/truenas_admin]# zfs list test_pool
NAME        USED  AVAIL  REFER  MOUNTPOINT
test_pool  3.32M  3.22T   128K  /mnt/test_pool

But you’d expect to see 4T available…

2 Likes

Extended to 6w

root@truenas[/home/truenas_admin]# zpool list test_pool
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
test_pool  5.98T  5.41M  5.98T        -         -     0%     0%  1.00x    ONLINE  /mnt
root@truenas[/home/truenas_admin]# zfs list test_pool
NAME        USED  AVAIL  REFER  MOUNTPOINT
test_pool  3.51M  3.86T   128K  /mnt/test_pool

Adds another 2/3rds of a TiB, which is the original parity ratio.

add another disk… now 7w

root@truenas[/home/truenas_admin]# zpool list test_pool && zfs list test_pool
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
test_pool  6.98T  6.30M  6.98T        -         -     0%     0%  1.00x    ONLINE  /mnt
NAME        USED  AVAIL  REFER  MOUNTPOINT
test_pool  3.82M  4.53T   128K  /mnt/test_pool

Again, another 0.67 T gained… again, 2/3s of TiB… should be 1T.

1 Like

Which doesn’t even make sense, since you didn’t write much (if anything) on the pool, even before expanding the RAIDZ1 dev.

That takes “rebalancing” completely out of the equation.

So is ZFS “lying”? Can you, as the user, take at face-value the values and capacities it reports?

Rebalancing has nothing to do with it.

1 Like