I have a server with 6x18tb (16.37TiB) drives in RAIDz2, and due to hardware limitations, needed to start with 4 drives and expand later.
Using 24.10 RC2, I extended my vdev twice, and during the second vdev expansion, I rebooted the system (as zfs documentation said it was safe to reboot during expansion, and I had a need to reboot). After all expansion was completed, it seems like the usable capacity value wasn’t updated to properly reflect the new capacity.
I did try the zfs rebalancing script, which did reduce the used cacity value from 10TiB to around 8TiB which matches my estimations, but the usable capacity value was unchanged.
I have so far tried rebooting the system and exporting/importing the pool, to little effect. Is there any shell command I can run to force Truenas to reassess how much space there actually is in the pool?
ok, it seems like the UI is reflecting accurately what ZFS thinks - only issue is ZFS is wrong.
With a size of 98.2T, the pool shows the total capacity of all 6 disks, but is reporting 39.2T of available space, which is the space of 3 disks - except I should have the space of 4 disks. Is there any way to fix this?
It’s as if one (of the two) newly added drives expanded the pool’s capacity. (Hence the ~48 TiB instead of the expected ~64 TiB.)
By all means, a RAIDZ2, comprised of five 18-TiB drives, yields about 48 TiB usable capacity.)
Then there’s this:
expand: expanded raidz2-0 copied 23.0T in 1 days 06:17:31, on Tue Oct 8 20:23:56 2024
So did it start to auto-expand your pool (“expand RAIDZ2”), but then get interrupted (by the reboot), and then after the reboot it “finished” the expansion?
Considering that RAIDZ expansion is fairly new, I wonder if perhaps you’re not meant to reboot in the middle of this process?
“The pool remains accessible during expansion. Following a reboot or export/import, the expansion resumes where it left off.”
Of course, that’s kind of irrelevant as evidently, I’m in a weird state. Perhaps there’s a command that normally should be ran at the end of the process, which my reboot interrupted?
Okay, so it seems there’s an expected loss of capacity with RAIDZ expansion, based on how much data already exists (and is thus rebalanced). But you only had stored about 1 drive’s worth of data on your entire pool.
This feature (for OpenZFS 2.3.x) is too new for me.
If that’s really the expectation, then RAIDZ expansion is kind of disappointing, especially to those who want to expand their pools… because their pool is getting full.
I have doubts that the expected loss of capacity is what’s going on here - I only have 8TiB of data. Even in my starting config of 4x18TB RAIDz2, where I was running with effectively a 1:1 parity ratio, the lost capacity shouldn’t get anywhere near a 18TB drive in size.
I also ran the zfs-inplace-rebalance script, which reduced the used space by a few TiB, as seen in the difference between my first and 2nd picture (in the OP), so I don’t think parity ratio mismatch excess is even counted in the “usable capacity”.
That’s why I find it odd that it’s acting “as if” you only expanded from 4-wide to 5-wide (of 18-TiB drives).
To lose out on a whole drive’s worth of capacity seems too much.
That was my initial assumption.
But the GUI (and the zfs command) is showing you the usable capacity of a 5-wide RAIDZ2 vdev (not 6-wide) of 18-TiB drives. (Parity and rebalancing taken into account, you’d lose more than just the parity itself, but I agree… surely not nearly an entire drive’s worth of space.)
EDIT: There might indeed be a zpool command to force it to properly expand the RAIDZ vdev, but like I said, the feature is too new, and so I don’t feel comfortable to “wing it”.
Hopefully the other users I pinged can chime in and unravel this mystery.
Yes the first expansion went as expected… I don’t remember the usable capacity displayed however, I just fired off the 2nd expansion then cancelled the automatic scrub (intending on doing it after the 2nd one).
After the reboot, the expansion listing in the “Jobs” drop-down list disappeared, but disk usage continued for a few hours more. I considered the 2nd expansion ‘complete’ when disk usage returned to normal (which it did, after a similar amount of time to the first.)
I have rebooted and tried exporting/importing the pool to resolve this, neither seems to affect the overall storage space.
To clarify, I waited for the entry in the job log for the first expansion to disappear, before I started the 2nd expansion. Unbeknownst to me, it started a scrub. I then started the 2nd expansion, and stopped the scrub when I noticed it was causing a severe performance hit to the expansion.
Since the scrub should only be done after the expansion (as it would otherwise be useless), I have doubts it affected anything other than risking the integrity of my data.
I’d be more than happy to look over logs and find out otherwise, or provide them if needed.
In a basic test, it would allow me to “enqueue” the second expand (select a second disk, and hit the “Expand/Confirm” button, but it sat waiting for both the expand and scrub processes to complete before proceeding.
So you may need to rewrite your files in place in order to get proper “accounting” from zfs list commands that query the filesystem vs. the pool structure.