I added another drive to expand my VDEV a few days ago, but it’s stuck…
sudo zpool status -v
pool: archive
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: resilvered 102M in 00:00:06 with 0 errors on Mon Jan 20 06:35:28 2025
expand: expansion of raidz3-0 in progress since Thu Jan 16 15:27:00 2025 4.22T / 6.52T copied at 14.1M/s, 64.76% done, paused for resilver or clear
config:
NAME STATE READ WRITE CKSUM
archive ONLINE 0 0 0
raidz3-0 ONLINE 0 0 0
004cd7d6-04a5-4ada-a490-04612cd362f9 ONLINE 0 0 0
44815e96-698c-4f4f-90a5-43a568988da0 ONLINE 0 0 0
7822b77f-e4f0-4769-a016-1c32bdc7404a ONLINE 257 0 0 too many errors
f87745a5-f59c-44ad-90c2-2d9dd6f8b493 ONLINE 0 0 0
6383d27f-ff64-4896-8480-50a2e1625097 ONLINE 0 0 0
56342a97-5208-4adb-b080-107d400b4939 ONLINE 0 0 0
After doing
sudo clear archive
It cleared the message: " too many errors", but is still stuck saying “paused for resilver or clear”
The progress does not seem to be moving, but when I look at the UI Reports, there is constant CPU and DISK activity on all the disks in that VDEV.
Does anyone have any insight into what is going on here?
How can I get this expansion to unpause and complete?
If I shut down the system to add another spare disk, will this corrupt the expansion?
Okay, rebooting the system seemed to help??? Not what I expected.
I guess it is safe to restart TrueNAS. And sometimes necessary.
pool: archive
state: ONLINE
scan: resilvered 2.58M in 00:00:01 with 0 errors on Mon Jan 20 16:58:21 2025
expand: expansion of raidz3-0 in progress since Thu Jan 16 15:27:00 2025
4.33T / 6.52T copied at 12.9M/s, 66.41% done, 2 days 01:24:19 to go
config:
I also added three more 8 TB drives, so I plan to replace all the 4 TB drives in the ‘archive’ VDEV with 8 TB drives, and expand it with 3 more, for a total of eight 8TB drives.
So, it looks like expanding my VDEV requires about four days.
If the disk having trouble: 7822b77f-e4f0-4769-a016-1c32bdc7404a ONLINE 257 0 0 too many errors
is one of the 4TB drives you intend to replace with an 8TB drive, you might have been better served by replacing it, instead of rebooting.
The issue is that if the drive has problems, they may come back. Which would again pause your RAID-Zx expansion
In regard to rebooting a server with ZFS. The original design of ZFS was to be always consistent on disk, even after an OS crash, un-expected power loss or user hit power button. So a simple reboot mostly does what it should, resume what it was doing before, (Scrub, disk replacement or RAID-Zx expansion, etc...).
Obviously any data that had not finished writing would be lost on those un-expected reboots. Just like any other file system. However, zero data lost of existing data is not just expected of ZFS, it was designed into ZFS.
It was still not necessary to restart TrueNAS. However, RAID-Zx expansion is still new, so it is possible their is a bug in the restarting of the expansion.
Or it could be that the disk with the errors had to be re-silvered. Meaning brought up to date. This happens when a disk is offline or faulted and then brought back on-line. This message sort of indicates that is what happened.
The disk failed again, but I was not able to recover the expansion as before. Black magic or blind luck, I tried several times… The expansion was almost finished
However, I was able to replace the failed disk with a larger healthier one, and the expansion completed.
After that, I replaced two more 4 TB disks with 8 TB, and after all the resilvering, the VDEV has twice the capacity as before. I have two more 8 TB disks to expand the VDEV with.
So, the expansion process needs improvement, but the replacement and resilvering works great.
@Arwen I really appreciate your help and insights. Thanks.