"Force Resilver Now" for parallel resilver

,

Problem/Justification

If multiple disks are replaced, sequential resilvering may be used, but oftentimes it is desired to perform a parallel resilver.

The only way to do this currently is to issue a zpool resilver <pool> command from the shell.

This was discussed in this thread

The below zpool status shows the result when a replacement is awaiting a previous resilver operation

# zpool status tank
  pool: tank
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Oct 24 14:10:40 2024
	1.84T / 20.9T scanned at 13.8G/s, 0B / 19.4T issued
	0B resilvered, 0.00% done, no estimated completion time
config:

	NAME                                        STATE     READ WRITE CKSUM
	tank                                        ONLINE       0     0     0
	  mirror-0                                  ONLINE       0     0     0
	    1b2a34e0-e57c-4a5b-9033-ca0fe03f51a3    ONLINE       0     0     0
	    26609e36-6148-48b2-9c12-309a385d17a6    ONLINE       0     0     0
	  mirror-1                                  ONLINE       0     0     0
	    a167e02d-4382-4b04-9010-31036b3a29b5    ONLINE       0     0     0
	    37d7938e-d4dd-4e94-8005-7a4358c4ee17    ONLINE       0     0     0
	  mirror-2                                  ONLINE       0     0     0
	    85648b65-c328-4e25-b243-2cc2b9a6952e    ONLINE       0     0     0
	    46b01e54-2d59-4eb4-97a3-3629d6fcd48b    ONLINE       0     0     0
	  mirror-3                                  ONLINE       0     0     0
	    replacing-0                             ONLINE       0     0     0
	      3048fff0-c7b0-4c84-98b4-b9b0d19f3999  ONLINE       0     0     0
	      b9bef6ea-b826-4e5c-80cf-d18c8ff930ba  ONLINE       0     0     0
	    aa20557c-b462-4129-a45c-a6f859a4bbdd    ONLINE       0     0     0
	  mirror-4                                  ONLINE       0     0     0
	    replacing-0                             ONLINE       0     0     0
	      28ccfe09-ff3d-4095-9e55-7c3686dc18ef  ONLINE       0     0     0
	      a2a37f56-b979-4658-bf02-6ec89f90d1fb  ONLINE       0     0     0  (awaiting resilver)
	    8038480a-73f3-43d6-b4af-2601fcb7cf86    ONLINE       0     0     0
	  mirror-5                                  ONLINE       0     0     0
	    8b6c50d7-8227-4175-a41b-25bf3f8f49e7    ONLINE       0     0     0
	    sdk2                                    ONLINE       0     0     0
	  mirror-6                                  ONLINE       0     0     0
	    cb39e5a9-1c39-4a74-96d7-662644a4df9f    ONLINE       0     0     0
	    sdu2                                    ONLINE       0     0     0
	  mirror-7                                  ONLINE       0     0     0
	    sdg2                                    ONLINE       0     0     0
	    1c11a1f1-678a-4b7c-a2ab-0feaec78acb2    ONLINE       0     0     0
	  mirror-8                                  ONLINE       0     0     0
	    1f18dc41-5935-4059-823b-bf0874742deb    ONLINE       0     0     0
	    sdi2                                    ONLINE       0     0     0
	  mirror-15                                 ONLINE       0     0     0
	    2de3cf07-af92-4ba6-ba1d-b5da5f5316de    ONLINE       0     0     0
	    bf819af9-1504-4d59-bc4a-c65606c0f0fd    ONLINE       0     0     0
	logs	
	  nvme0n1p1                                 ONLINE       0     0     0
	cache
	  sdw1                                      ONLINE       0     0     0
	  sdx1                                      ONLINE       0     0     0

errors: No known data errors

The key is that the second disk replacement is awaiting the first replacement to complete before it will start, even though the second disk is a separate vdev.

ZFS does this so as to not have to restart the initial resilver, which may have been running for hours or days… but of course, its only been running for a minute or so…

Additionally, TrueNAS gets confused when trying to figure out the ETA in this situation…

Issuing a manual resilver command will trigger the parallel resilver

# zpool resilver tank
# zpool status tank
  pool: tank
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Oct 24 14:18:48 2024
	3.47T / 20.9T scanned at 27.6G/s, 0B / 18.3T issued
	0B resilvered, 0.00% done, no estimated completion time
config:

	NAME                                        STATE     READ WRITE CKSUM
	tank                                        ONLINE       0     0     0
	  mirror-0                                  ONLINE       0     0     0
	    1b2a34e0-e57c-4a5b-9033-ca0fe03f51a3    ONLINE       0     0     0
	    26609e36-6148-48b2-9c12-309a385d17a6    ONLINE       0     0     0
	  mirror-1                                  ONLINE       0     0     0
	    a167e02d-4382-4b04-9010-31036b3a29b5    ONLINE       0     0     0
	    37d7938e-d4dd-4e94-8005-7a4358c4ee17    ONLINE       0     0     0
	  mirror-2                                  ONLINE       0     0     0
	    85648b65-c328-4e25-b243-2cc2b9a6952e    ONLINE       0     0     0
	    46b01e54-2d59-4eb4-97a3-3629d6fcd48b    ONLINE       0     0     0
	  mirror-3                                  ONLINE       0     0     0
	    replacing-0                             ONLINE       0     0     0
	      3048fff0-c7b0-4c84-98b4-b9b0d19f3999  ONLINE       0     0     0
	      b9bef6ea-b826-4e5c-80cf-d18c8ff930ba  ONLINE       0     0     0
	    aa20557c-b462-4129-a45c-a6f859a4bbdd    ONLINE       0     0     0
	  mirror-4                                  ONLINE       0     0     0
	    replacing-0                             ONLINE       0     0     0
	      28ccfe09-ff3d-4095-9e55-7c3686dc18ef  ONLINE       0     0     0
	      a2a37f56-b979-4658-bf02-6ec89f90d1fb  ONLINE       0     0     0
	    8038480a-73f3-43d6-b4af-2601fcb7cf86    ONLINE       0     0     0
	  mirror-5                                  ONLINE       0     0     0
	    8b6c50d7-8227-4175-a41b-25bf3f8f49e7    ONLINE       0     0     0
	    sdk2                                    ONLINE       0     0     0
	  mirror-6                                  ONLINE       0     0     0
	    cb39e5a9-1c39-4a74-96d7-662644a4df9f    ONLINE       0     0     0
	    sdu2                                    ONLINE       0     0     0
	  mirror-7                                  ONLINE       0     0     0
	    sdg2                                    ONLINE       0     0     0
	    1c11a1f1-678a-4b7c-a2ab-0feaec78acb2    ONLINE       0     0     0
	  mirror-8                                  ONLINE       0     0     0
	    1f18dc41-5935-4059-823b-bf0874742deb    ONLINE       0     0     0
	    sdi2                                    ONLINE       0     0     0
	  mirror-15                                 ONLINE       0     0     0
	    2de3cf07-af92-4ba6-ba1d-b5da5f5316de    ONLINE       0     0     0
	    bf819af9-1504-4d59-bc4a-c65606c0f0fd    ONLINE       0     0     0
	logs	
	  nvme0n1p1                                 ONLINE       0     0     0
	cache
	  sdw1                                      ONLINE       0     0     0
	  sdx1                                      ONLINE       0     0     0

errors: No known data errors

Notice the second replace is no longer awaiting a resilver

Ideally, the “Scrub” button which is disabled during a “resilver” operation, could perhaps be overloaded to “force resilver” or, perhaps if there is an “awaiting resilver” some button to start all resilvers could appear/apply?

I find that wanting to have multiple resilvers running is a fairly common operation, especially when upgrading disks…

Perhaps verbiage such as “Additional resilvers deferred” with a “resilver now” option would work well