Enhance the user experience during RAIDZ expansion

Over the last two weeks, I tried out the RAIDZ expansion feature with an 18TB drive in my main pool. I encountered a few issues that could be handled better. When performing a RAIDZ expansion, the current process can be optimized to improve user information and system performance.

Estimated Duration Information:
Clearly inform the user of the estimated duration for the RAIDZ expansion process. This can help set realistic expectations and plan accordingly(~50MB/s,https://openzfs.org/w/images/5/5e/RAIDZ_Expansion_2023.pdf).

Automatic Deactivation of Scrubs:
During the expansion process, automatically deactivate scrubs for the affected pool. This is to prevent significant IO WAIT issues, which can severely impact system performance. Once the expansion is completed, scrubs can be reactivated.

Progress Display in Task Manager:
Integrate detailed progress information from zpool status into the Task Manager. This includes the current data transfer rate, percentage completed, and estimated time to completion. For example:

expand: expansion of raidz1-0 in progress since Wed Jun 19 01:12:14 2024
        1.99T / 50.9T copied at 54.8M/s, 3.91% done, 10 days 19:48:35 to go



2 Likes

I agree, i couldn’t even find the option, i kept trying from expand and other entry points like the ‘you have 1 disk unused tile’ agree with all your points.

Until TrueNAS 24.10 hits BETA, it should not be assumed its complete.
After it does, please provide your feedback.

Can you provide details of system before and after, how much data was in the pool, and how long RAIDZ expansion took?

That data might be useful for making the UI more informative.

Actually, most/all of the data in the vdev gets shuffled around. It stays at the same offset within the vdev, but it gets shuffled around to match the new geometry.

1 Like

Good catch.

It preserves existing parity - So it keeps the data
Allows for online expansion - so reads and writes can continue immediately
But it shuffles the data to free the space - that takes a while.

Great video here: https://www.youtube.com/watch?app=desktop&v=yF2KgQGmUic

I’ll fix my post.

We do have a tutorial available for the process now, available here: Managing Pools | TrueNAS Documentation Hub

2 Likes

It took ~ 11 days (~55M/s). Raidz 5 pool was filled at 75%, 4x 18TB. Now 5x 18TB. I’m on the nighty train. It would have taken longer If I hadn’t stopped the scrubbing. In the nedata screenshot you can see right what happend after I stoppend the scrubbing.

Also a good video: https://www.youtube.com/watch?v=tqyNHyq0LYM

Out of curiosity, can you post the output of zpool list -v POOLNAME


root@truenas[~]# zpool list -v Media
NAME                                       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
Media                                     81.8T  51.3T  30.5T        -         -     2%    62%  1.00x    ONLINE  /mnt
  raidz1-0                                81.8T  51.3T  30.5T        -         -     2%  62.7%      -    ONLINE
    3d0bc877-03e7-4c8d-ab07-7fc46a39d179  16.4T      -      -        -         -      -      -      -    ONLINE
    7eeedebb-b6cb-42d9-8c09-a49929d3009d  16.4T      -      -        -         -      -      -      -    ONLINE
    78cb2807-3d85-4ab9-8579-3a87a907a84b  16.4T      -      -        -         -      -      -      -    ONLINE
    6ecb6553-780f-4448-8c5b-4db5204e3bbd  16.4T      -      -        -         -      -      -      -    ONLINE
    15c58918-9e74-4ee0-afdc-7b57f35d1873  16.4T      -      -        -         -      -      -      -    ONLINE
cache                                         -      -      -        -         -      -      -      -         -
  faba655c-d81c-495d-99ad-dd20d28bb30e     466G   466G  68.0M        -         -     0%  100.0%      -    ONLINE

1 Like

The stable build came out today. Unfortunatly, the user experience has not been improved. I have the feeling that TrueNAS is working on expanding my 4x 4 TB to 5x4TB Z1 pool, but the interface got stuck at the same screen as the OP.

Wonder if there is a way to check the intermediate progress manually?

sudo zpool status in the shell

3 Likes

I am running a RAIDz expansion (says it’s going to take 10 days!) and I also had poor UI experiences.

First, the job sat all day at 25%. It happened that I needed to restart the server, and since I had no idea how long this was going to take I just bit the bullet and did it.

When it came back the drive showed as unavailable. Ok, my server is kind of weird, I can accept that.

Upon rebooting again, all drives were visible. The pool showed the same capacity as before, but with all 5 drives. There was no indication that the expansion was still happening.

I confirmed via the shell that the expansion is in fact still happening, though it is going to take a very long time >_>

Per the discussion below, Scrubs should be prevented until RAID-Zx expansion has completed.

Further, any Scrub in progress should probably prevent RAID-Zx expansion unless the user agrees to:

There is a Scrub in progress, starting RAID-Zx expansion would stop it.

  • Scrubs on this pool will be prevented until RAID-Zx expansion is done.
  • A Scrub will automatically be run after the RAID-Zx expansion completes.

Okay?


Here is the discussion:

I’m a TrueNAS newb coming over from a LONG time Synology user. My build has been in progress. Today I added a 2TB drive to my existing 4 drive zvol. After an hour I get the following:

  pool: p1
 state: ONLINE
expand: expansion of raidz1-0 in progress since Tue Mar  4 18:38:45 2025
	373G / 1.26T copied at 124M/s, 28.90% done, 02:06:43 to go
config:

	NAME                                      STATE     READ WRITE CKSUM
	p1                                        ONLINE       0     0     0
	  raidz1-0                                ONLINE       0     0     0
	    3ea502fa-c4d0-4c55-a33f-3c865b0b8a68  ONLINE       0     0     0
	    f439046e-5a86-402c-b098-ef71721c0101  ONLINE       0     0     0
	    ad7dc358-6e22-4f70-98d1-f4134fb850b6  ONLINE       0     0     0
	    fb73c4f5-7531-41e4-9983-d738a42f5bea  ONLINE       0     0     0
	    e52390cc-a664-4b22-b27d-5d9ffd0334c9  ONLINE       0     0     0
	logs
	  9f9c3599-40c2-4c0f-b4ca-5b23a8ba8dcc    ONLINE       0     0     0
	cache
	  0a9dd1f0-b579-4c40-81f5-bb88ed209e07    ONLINE       0     0     0

errors: No known data errors

I agree with most of the comments about improving the UI. TrueNAS can use a lot of improvement. Most operations have been painful or confusing coming over from DSM. The virtualization is especially troublesome. Either the shares on this zvol are offline during expansion, or my first attempt add creating one samba share and one Time Machine share have failed miserably. They are not connecting from my Mac (mini on latest Sequoia). What I will say on virtualization is my Synology stopped working with any Redhat newer than version 8. Seemingly because they dropped support in the kernel for the old Atom processor. TrueNAS on my 3 year old i7/64Gb IS running Rocky9, after a LOT of fiddling with settings to find a combination that worked.

With regard to my failed shares, it seems both tm and smb shares will not work unless the execute bit is set. Weird.

Your p1 pool will lose all its data if your LOGS device goes bad. You usually want the LOGS to have at least the redundancy of the main pool. You should have a mirror pair.
EDIT
I thought the LOGS listing was not SLOG.

You also should check your hard drives to make sure they are CMR and not SMR.

BASICS

iX Systems pool layout whitepaper

Special VDEV (sVDEV) Planning, Sizing, and Considerations

SMR vs CMR ServeTheHome

1 Like

What? Neither SLOG nor L2ARC is essential to the pool; the data will be just fine (with the exception of anything in flight during an unexpected loss of power) with failure of either or both of them. There’s not likely to be any good reason for SLOG with a parity RAID pool, but it still isn’t dangerous.

1 Like

There’s no need for SLOG to be redundant. Data is still written from RAM, not from the copy in SLOG. If the NAS has an unclean shutdown AND the SLOG fails on reboot, then you possibly lose data from the last txgs—pool can still be imported by discarding transactions and/or using -m. If the SLOG fails but the NAS keeps running, ZFS will revert to its default ZIL: No data is lost, just some performance.
And the ZIL is a non-redundant stripe across pool drives.

The actual question is whether a raidz1 of smallish drives needs a SLOG and L2ARC to begin with… I do not know of a workload which requires sync writes and for which raidz1 is an appropriate geometry.

1 Like

My drives are Seagate Ironwolf (not Pro) green.

Apple Timemachine Backups :wink: