[Not Accepted] Add full rebalance feature in the GUI

I created 3 disks raidz1 vdev, then transferred data from 3 disks windows jbod there, then added the 3 windows disks from windows to the truenas vdev (existing vdev extended 3 times).

First instead of displaying 64tb of usable space (raidz1 with 5*16tb disks) Truenas displays 54tb, this is 10tb error. Please fix this.

Second, i would like to do a full rebalance after extending 3 times the vdev, to spread data across all disks equally.

There is an example script in github to do this :

It would be nice to add a Ā« full rebalance Ā» feature directly as a button/option somewhere in the graphical interface.

This operation make sense after extending existing raid with new disks (rewrite all the data to use the new parity ratio to get to full capacity).

Please consider adding this, and make it easy to use.

Best regards :slight_smile:

Rebalancing explained by Microsoft Copilot

Rebalancing a RAIDZ pool after extending it is important to ensure optimal performance and efficient use of the new storage capacity. When you add new disks to a RAIDZ pool, the existing data is not automatically redistributed across the new disks. This can lead to an imbalance where the new disks remain underutilized while the old disks are heavily loaded.

Why Rebalance?

  1. Performance: An imbalanced pool can lead to performance issues, as the older disks may become bottlenecks while the new disks are underutilized.
  2. Efficiency: Rebalancing ensures that the data is evenly distributed across all disks, maximizing the storage efficiency and reducing the risk of any single disk becoming a point of failure.
  3. Longevity: Even distribution of data helps in spreading the wear and tear evenly across all disks, potentially extending the lifespan of the disks.

99% confident that iX will NOT do that, because incorporating this script would assume responsibility for its behaviour and would require to take this additional function into account for testing.

User responsibility here!

3 Likes

In fact the real idea is to develop a feature inside truenas, using this script as an example. Of course simply linking Ć  github script is not a viable option :slight_smile:

Use the CLI to check space until this known issue is fixed, eventually.

Did you check the data distribution after doing your expansion? You should only ā€˜needā€™ to do that if you are adding another VDEV to a pool.

Rebalancing after raidz vdev expansion will rewrite data with the new stripe width instead of merely redistributing data, and reclaim some lost space.

1 Like

I guess a visual representation of expansion and data distribution would be helpful. The way the docs have it gives me a different image on Raid-Z(1,2,3)

Reading through another thread. Maybe that will help me

Iā€™ve done it also, would be cool if you could do it through the GUI (even if itā€™s under advanced / expert).

1 Like
1 Like

It is a nice idea however what really needs to happen is ZFS needs to include that feature, that is the only way I can see a button saying ā€œRebalance Poolā€ every coming into existence. As @etorix said, iXsystems will unlikely take responsibility for implementing a script like that. Remember, they have people paying them a lot of money for the corporate product and it must be flawless.

With all that said, it would be nice to have a button to rebalance the pool. An option for you is to just use the script via the CLI as it says, after all, how often do you need to rebalance a pool.

BUT!!! Here is the cautions and why I likely would not do this:

So use at your own risk.

4 Likes

I donā€™t see this as really dangerous.
The operation consist to do a copy of any file (one by one) to get it to new parity ratio, then delete the old file (once the copy succeeded), and of course keep track of what has already been rewritten and what not (in case of reboot/abort during the operation).
iXsystems has smart engeneers, they can probably handle a file copy with persistent tracking of the progress (via a log or text file surviving a reboot).
But if they prefer to wait this feature to be implemented at the raidz maintainers level it would of course be understandable.
As other experimental features this option could be implemented as an option for advanced users with a clear warning and explanation in the GUI.

Having the same parity ratio for all files in the system is important in my opinion,
so this feature deserves the effort.

Best regards :slight_smile:

Rewriting and deleting all data in the poolā€¦
iX and yours truly have a different understanding of ā€œdangerousā€.

As you can see from the presentation posted by @Stux, the decision to reflow rather than rewrite data was taken early in the design stage of raidz expansion, a long time ago, and was taken by one of the main authors of ZFS. Itā€™s going to take A LOT of efforts to move on from that.

5 Likes

This is pretty normal to implement the safe option first, i understand it.
But rewrite files one by one in the background task is not so difficult, this is just a file copy.
If implemented as ā€œexperimental featureā€ with appropriate explanation text,
and task progress monitoring as this already exists for other features,
this should not be harming.
They could first publish this as experimental for homelabbers who will test this.
I can take the risk on my files because they are not production critical.
If published, i will test.

If this is not implemented, we will see everybody adding disks to their existing vdevs via the new expand option,
and we will see perfromance efficiency and hardware degradation over time due to data imbalance over disks,
and finally this will harm the Truenas systems and community worldwide.

Best regards :slight_smile:

While I will agree with you that it is a simple process of copying, deleting, renaming, I donā€™t think you are thinking of this as a business product.

Oh hell no. If us home lab folks want to take a risk, we run the script manually. If iXsytems included this into TrueNAS, and people lose their valuable data, that becomes a significant stain on the iXsystem reputation. They have had enough of those in the past, I know they donā€™t want another one.

Anyway, we can agree to disagree, Iā€™m good with that. We all have opinions as you well know.

2 Likes

Donā€™t forget that any ā€œrebalancingā€ (rewriting files back to the dataset) will make its current snapshots moot. So you would have to destroy the existing snapshots in order to avoid a doubling of your datasetā€™s space usage.

:warning: Users really need to understand this caveat.

ā€œRebalancingā€ was never some built-in, low-level technology for ZFS. Itā€™s all done at a file-based level, whereas ZFS is block-based and ā€œcopy-on-writeā€ (CoW).

3 Likes

Thank you, important point.
Should be mentioned on the interface if rebalance feature is added.
Will be helpful for people who are not experts in this stuff.

I just started my Truenas instance, i am in the beginning of the learning curve with Truenas and ZFS/raidz stuff.

Although it may be possible to implement an effective rebalance in ZFS, it seems unlikely that someone will.

This is the relevant comment from the author of RaidZ Expansion (and one of the original ZFS authorā€™s too)

BUT the trick is to do it ā€œproperlyā€, ie atomic and safe, as detailed in Point 1 of the link.

Doing this properly - online, working with other existing features (snapshots, clones, dedup), without requiring tons of extra space - would require incrementally changing snapshots, which is a project of similar scale to RAIDZ Expansion.

And that is a huge amount of work, and will probably not be done.

I would not expect it to be integrated into TrueNAS unless it was done ā€œproperlyā€

The rebalance script is not atomic nor 100% safe. Ie if you crashā€¦ then its not safe. If you modify a file as its being rebalanced, then its not safeā€¦ if there are snapshotsā€¦ then the snapshots preserve the old location. If a file is larger than free space, then it will failā€¦

There are a bunch of reasons why, imo, it should NOT be included in TrueNAS. Its just not robust enough.

BUT if you understand the issues, then go ahead and use it, its a valid solution.

And generally for a home user, its probably fine, just donā€™t have any snapshotsā€¦ and make sure to not crash during the processā€¦ and if you do, fix itā€¦ and try not to modify any files that are being rebalancedā€¦ while the rebalance is going on.

4 Likes

so you have to :

  • desactivate/suspend snapshots while rebalancing (only on the one data store that is in process of being rebalanced) , than reactivate snapshots when rebalancing done for this datastore (this could be automatic)
  • have more free space than your biggest file in the system (this also could be automatic, the rebalance script could check free space before copying one file, and if there is no enough free space with a comfortable margin ignore this file and log an information about it into the rebalancing report, so you know some files were ignored while processing)
  • note that in case of reboot the current file that was in process of being duplicated was interrupted and you need restart the copy of that one file again (this also could be automatic, while rebalancing just log pocessed progress into a file, and after reboot restart rebalancing from the last unfinished file)
  • while rebalancing, ignore/log files locked by users, than report them in the final log, eventually propose to restart rebalancing for those locked files only (because there is a chance they have been unlocked by users after some time)

no big deal i donā€™t see anything complicated there

Youā€™re still only looking at this from a consumer level.

In an enterprise environment, which is what I do for a living, the risks in this far outweigh the benefits and I would never want to see this feature available when every file in my system means massive amounts of revenue.

It looks easy and low risk to you but as everyone here has tried to explain, in numerous ways, it is the opposite. You have the script to run from the CLI, so youā€™re not prevented from achieving the result you seek. Thatā€™s worth something in itself and is a great reason why TrueNAS is a flexible solution for all types of users, with a much more robust filesystem, unlike competing products. You can take consolation in that.

1 Like

I understand the enterprise standpoint :
when it works, donā€™t touch it.

Best regards :slight_smile:

2 Likes

We will not be implementing a feature like this in userspace for all the reasons other people tried to explain here.

This would have to be implemented transparently in ZFS which at this time we are not willing to do due to the complexity involved. Maybe there will be movement in upstream which we can leverage. Data shuffling that usually happens automatically over long periods mitigate this and people that really need this can run these scripts at their own risk.

4 Likes