24.10 RC2 Raidz expansion caused miscalculated available storage

HoneyBadger · October 11, 2024, 3:13pm

well if you insist

The actual consumed space went from 2.8G → 2.1G, or in other words a 33% reduction in space when doing a rewrite of 3wZ1 to 5wZ1 … I know it’s not to the same extent, but my experiment was admittedly a lot smaller in scope.

HoneyBadger · October 11, 2024, 6:41pm

Have a little more data, on tests run on a little more data.

3wZ1, data ingested.

NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
expanding  47.5G  41.2G  6.34G        -         -     0%    86%  1.00x    ONLINE  /mnt
NAME                AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD  VOLSIZE
expanding           3.24G  27.4G        0B    128K             0B      27.4G        -
expanding/expandme  3.24G  27.4G        0B   27.4G             0B         0B        -

Expanding to 4wZ1 added ~16G to FREE in zpool list but only ~10G to AVAIL in zfs list

NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
expanding  63.5G  41.2G  22.3G        -         -     0%    64%  1.00x    ONLINE  /mnt
NAME                AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD  VOLSIZE
expanding           13.6G  27.4G        0B    128K             0B      27.4G        -
expanding/expandme  13.6G  27.4G        0B   27.4G             0B         0B        -

Same for expanding again to 5wZ1:

NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
expanding  79.5G  41.2G  38.3G        -         -     0%    51%  1.00x    ONLINE  /mnt
NAME                AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD  VOLSIZE
expanding           23.9G  27.4G        0B    128K             0B      27.4G        -
expanding/expandme  23.9G  27.4G        0B   27.4G             0B         0B        -

After rebalancing we gain a decent chunk of both FREE and AVAIL space.

NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
expanding  79.5G  34.3G  45.2G        -         -     0%    43%  1.00x    ONLINE  /mnt
NAME                AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD  VOLSIZE
expanding           28.5G  22.8G        0B    128K             0B      22.8G        -
expanding/expandme  28.5G  22.8G        0B   22.8G             0B         0B        -

Hittsy · October 11, 2024, 10:43pm

I already used the zfs-inplace-rebalance script to rewrite all of my files, and it did recover about 2TiB of space. I know it’s intended for rebalancing across vdevs, but it just copies everything and deletes the original, which is effectively the same thing. I should be getting accurate accounting from zfs list already.

Stux · October 11, 2024, 11:03pm

It if it’s triggering block cloning…

I would use dataset replication personally.

winnielinnie · October 12, 2024, 8:49pm

I agree. It’s not like you filled your pool to 80% capacity before you expanded your RAIDZ2. It was only about 25% full, right?

So even if you did not do any in-place rebalancing, you should still expect to see the total usable storage capacity much higher than 47 TiB.

sfatula · October 13, 2024, 12:15am

Well, keeping in mind this is a brand new feature of zfs, one that I wouldn’t dare use at this point (glad others are testing!). Not too surprising given that.

Breit · October 13, 2024, 9:51am

Out of curiosity, do you happen to have snapshots on your datasets? If you use that script, you actually double the used space, because your snapshot holds all the old data as well.
If would recommend to copy your whole datasets with zfs send/recv, delete the old dataset and rename the new one to the same name the old one had.

Hittsy · October 13, 2024, 10:08am

Thank you for the suggestion, but with this server’s role for hosting media, I don’t keep snapshots. I did have some when I used replication instead of zfs send | receive, but those were cleared before I ran the script.

I do not wish for this to come off as rude, but tone is difficult to convey via text. If you look at the 2 pictures in my OP, you will note that the “used” space value decreases after the script was ran, yet in both, they still show an incorrect value for total capacity.

Krill · October 13, 2024, 10:46am

I expanded a 6 wide RaidZ2 vdev to 10 wide using 16tb drives. I’ve found the same problem, zpool list -v shows available space increasing after running the inplace balancing script (I watched the free space increase whilst it was taking place, and the capacity used decrease but the dashboard widget never changed (96 tb capacity reported in widget, zpool reported 117 tb capacity).

Watching with interest, if there is any information I can offer happy to help.

Captain_Morgan · October 13, 2024, 11:43am

This confirms there is a bug somewhere… please report it. Did the problems appear after the 1st drive?

Krill · October 13, 2024, 2:00pm

Will do, but I did this expansion when in Beta (literally took a week to add the four disks and then about 60 hours to run the ZFS rebalance script on around 24 Tb of data) and it finished the weekend before RC1 was released. One complication: I also added a metadata vdev prior to running the rebalance script so something else to affect the trouble shooting.

Krill · October 13, 2024, 2:31pm

Apologies for double post: bug report filed as requested

Arwen · October 13, 2024, 3:46pm

If my memory serves me correctly, the ZFS Scrub is mandatory after each expansion. This is due because the re-striping does not confirm checksums, in order to speed up the process.

So, start a scrub of the pool and don’t reboot until it is finished.

To be clear, I am not sure this is the source of the problem. But, as I said, I remember a ZFS Scrub being automatically started after a RAID-Zx column expansion.

Hittsy · October 16, 2024, 11:05pm

I finally got word back from my Jira ticket. Apparently the usable space looks weird not because of any parity ratio mismatch, or because a drive wasn’t inserted properly, but because zfs/truenas is deliberately using pre-expansion values to calculate available storage space. Space used is underrepresented proportionally, so used capacity % is accurate - as seen here.

Okedokey · October 16, 2024, 11:06pm

Lol Im so dumb I don’t even understand the solution

winnielinnie · October 16, 2024, 11:24pm

Think of it like this:

You go into a store to purchase a USB flash drive. There’s no capacity printed on the box. You ask the sales rep “How much data can this hold?” They just shrug their shoulders and say “Not sure lol. It’s a lot though. You’ll find out when you get home. Or maybe you won’t. Let’s just say 512 GiB. Okay, maybe more like 700 GiB, but it depends how you plan to use it.”

I think I’m just as confused as you.

Hittsy · October 16, 2024, 11:38pm

in short… this isn’t an error with zfs, it’s just the UI being funky with how it reports stuff.

HoneyBadger · October 16, 2024, 11:47pm

I mean, that’s kind of impacted by the inline compression as well. You might store 100G of “logical data” on there, but if it compacts 2:1, you might have “100G used: 462G free” and wonder where your extra 50G of space came from.

winnielinnie · October 16, 2024, 11:56pm

@HoneyBadger about to market and sell the first consumer USB sticks pre-formatted with ZFS pools?

Badger Stick™:
Pre-formatted with a ZFS pool, with LZ4 compression.

Badger Stick™ Plus:
Pre-formatted with a ZFS pool, with ZSTD-19 compression, and deduplication enabled.^[1]

“At Badger Labs, our methodical research shows that our ‘Plus’ series can store an extra 50% data per stick! More data, same great Badger technology!” ↩︎

Bagginses · October 17, 2024, 3:21am

I think I understand what you’re saying, but is this the intended behavior? If I download what’s reported as a 10GB file, will truenas report it as smaller than that? And consequently the same behavior for drives, I purchase a 5TB drive but once expanded the pool only reports some fraction of that space available? This behavior seems misleading and counterintuitive to me, although I know there is some complex compression and zfs magic going on behind the curtains.