VDEV Expand Operation Succeeded, but disk is now missing

Hello,

I currently have a data pool with a single vdev that is 1x RAIDZ2 17 wide with 8TB (7.28TiB) disks. I am attempting to use the new “expand” feature, moving my 16 wide vdev to 17 wide. I will note the new disks are 10TB, and I accept that will be “waisting” some of that space, but they were cheaper for me to get locally. I intend to add 4 more disks if I can get this method to work, bringing my storage to a 20 wide vdev.

I added a disk and expanded it. Upon completion the vdev showed degraded. Investigating further, the new disk was showing a status of “UNAVAIL” and ZFS Errors of “No errors”. I attempted a scrub, SMART scan, reboot, and no matter what it showed no errors, but was still UNAVAIL and degraded. Since the device name was showing a long numeric string instead of the typical device location, I assumed I just had a bad disk, and proceeded with removing the “missing” disk, adding another and using a replace operation. I just performed the replace through the UI, replacing the missing disk, and after it finished the process, and finished a scrub, I now show a nested “Replacing” section in the vdev, which shows both the replaced disk AND the new disk as unavailable, and No errors. If I select “replace” in the gui, I can see the member disk that I just added show as an available disk with he tag sdr (9.1 TiB). It appears after it completed the resilver, that the disk was “forgotten” by the OS for some reason.

Last Scan: Finished Resilver on 2025-08-11 00:04:23
Last Scan Errors: 0
Last Scan Duration: 2 days 1 hour 59 minutes 43 seconds

I will also note, that the pool does report the increase in space is now present, even though the added disk is now “missing”. Also, I am not currently using a Metadata, Log, Cache, Spare or Dedup VDEV. I likely could add such if anyone felt it would help with this design, as I have spare SSD and NVME drives ready to go, I just haven’t messed around with it.

My largest concern at the moment is getting the pool healthy, and expanding my storage. I have no idea what to try at this point. Any advise would be appreciated.

I would suggest that a 20 wide Z1 vdev is umm brave, very brave of you.

Not that that helps you in the circumstances.

Can you post your hardware please - especially how the disks are connected to the motherboard, and the PSU

Thank you for your help. It is a RaidZ2, so two disks of parity, seems fine? I also back it up to a separate NAS.

That said, I am running a super micro expansion bay via HBA, attached to an old super micro 2U server.

TrueNAS Scale ElectricEel-24.10.0
Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz x2
313.6 GiB (320GB) total available (ECC)
Free:82.5 GiB
ZFS Cache:218.2 GiB
Services:13 GiB
2x connected 10 GB Ethernet
12 bay Super Micro Chasis
HBA Card attached to a 16 bay Supermicro expansion
DataPool is a RAIDZ2 | 17 wide | 7.28 TiB
-pool is an intermix of various Exos and WD Red drives.
ssdpool is used for apps and vms and is a RAIDZ1 | 8 wide | 931.51 GiB and backs up to the datapool
2x NVMe mirror’d boot pool
Data pool replicates to secondary NAS.

I don’t run any Metadata vdevs, log vdevs, cache vdevs, spare vdevs, or dedup vdevs.

Looking at my details again, I suppose I am not on latest release. Does anyone know if upgrading to Fangtooth might help with my situation? Again, any other thoughts are appreciated.

So, I dropped the old disk that was showing missing. I used CLI to drop the disk I had just replaced that was showing unavailable as well, and then reformatted that disk, and started a new replace operation with the following two commands:

zpool labelclear -f /dev/sdr1
zpool replace -f datapool 8306642534797259757 /dev/sdr1

checking on my progress for the new resilver, I saw something unexpected.

zpool status datapool | sed -n ‘/scan:/,/config:/p’

scan: resilver in progress since Mon Aug 11 20:15:14 2025
3.23T / 105T scanned at 19.0G/s, 0B / 105T issued
0B resilvered, 0.00% done, no estimated completion time
expand: expansion of raidz2-0 in progress since Sun Aug 3 14:53:32 2025
5.82T / 105T copied at 8.58M/s, 5.54% done, paused for resilver or clear
config:

It looks like the expansion I started back on the third never actually finished, even though the UI implied it had. I will let this run for a bit and keep an eye on it. It took two days previously to resilver.

Again, any support or feedback from anyone is highly appreciated.

Yes, RAID-Zx vDev expansion is a bit strange, and certain things don’t work ideally. No data loss issues that I know of, just not perfect user interface.

The RAID-Zx vDev expansion will automatically pause an expansion on disk failure, until the disk is fully replaced. (Or cleared as it says…)

Further, a pool scrub is always run after the expansion phase, because of the methodology used to speed up the expansion.

Ty. Here’s to hoping that it completes this time around. I have about another day on the resilver.