Paused for resilver or clear

Eric_Kolotyluk · January 20, 2025, 3:06pm

I am using TrueNAS Scale ElectricEel-24.10.1

I added another drive to expand my VDEV a few days ago, but it’s stuck…

sudo zpool status -v

  pool: archive
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An       attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: resilvered 102M in 00:00:06 with 0 errors on Mon Jan 20 06:35:28 2025
expand: expansion of raidz3-0 in progress since Thu Jan 16 15:27:00 2025 4.22T / 6.52T copied at 14.1M/s, 64.76% done, paused for resilver or clear

config:

   NAME                                      STATE     READ WRITE CKSUM
   archive                                   ONLINE       0     0     0
     raidz3-0                                ONLINE       0     0     0
       004cd7d6-04a5-4ada-a490-04612cd362f9  ONLINE       0     0     0
       44815e96-698c-4f4f-90a5-43a568988da0  ONLINE       0     0     0
       7822b77f-e4f0-4769-a016-1c32bdc7404a  ONLINE     257     0     0    too many errors
       f87745a5-f59c-44ad-90c2-2d9dd6f8b493  ONLINE       0     0     0
       6383d27f-ff64-4896-8480-50a2e1625097  ONLINE       0     0     0
       56342a97-5208-4adb-b080-107d400b4939  ONLINE       0     0     0

After doing

sudo clear archive

It cleared the message: " too many errors", but is still stuck saying “paused for resilver or clear”

The progress does not seem to be moving, but when I look at the UI Reports, there is constant CPU and DISK activity on all the disks in that VDEV.

Does anyone have any insight into what is going on here?

How can I get this expansion to unpause and complete?

If I shut down the system to add another spare disk, will this corrupt the expansion?

Arwen · January 21, 2025, 1:15am

One of your disks is having trouble. You need to fix that, or clear the error before RAID-Zx expansion will continue. This more or less covers it;

action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.

However, that is ZFS command line. You can do the replacement from the GUI, and possibly the pool clear as well.

Eric_Kolotyluk · January 21, 2025, 1:20am

Okay, rebooting the system seemed to help??? Not what I expected.

I guess it is safe to restart TrueNAS. And sometimes necessary.

pool: archive
state: ONLINE
scan: resilvered 2.58M in 00:00:01 with 0 errors on Mon Jan 20 16:58:21 2025
expand: expansion of raidz3-0 in progress since Thu Jan 16 15:27:00 2025
4.33T / 6.52T copied at 12.9M/s, 66.41% done, 2 days 01:24:19 to go
config:

    NAME                                      STATE     READ WRITE CKSUM
    archive                                   ONLINE       0     0     0
      raidz3-0                                ONLINE       0     0     0
        004cd7d6-04a5-4ada-a490-04612cd362f9  ONLINE       0     0     0
        44815e96-698c-4f4f-90a5-43a568988da0  ONLINE       0     0     0
        7822b77f-e4f0-4769-a016-1c32bdc7404a  ONLINE       0     0     0
        f87745a5-f59c-44ad-90c2-2d9dd6f8b493  ONLINE       0     0     0
        6383d27f-ff64-4896-8480-50a2e1625097  ONLINE       0     0     0
        56342a97-5208-4adb-b080-107d400b4939  ONLINE       0     0     0

errors: No known data errors

I also added three more 8 TB drives, so I plan to replace all the 4 TB drives in the ‘archive’ VDEV with 8 TB drives, and expand it with 3 more, for a total of eight 8TB drives.

So, it looks like expanding my VDEV requires about four days.

Arwen · January 21, 2025, 1:30am

If the disk having trouble:
7822b77f-e4f0-4769-a016-1c32bdc7404a ONLINE 257 0 0 too many errors
is one of the 4TB drives you intend to replace with an 8TB drive, you might have been better served by replacing it, instead of rebooting.

The issue is that if the drive has problems, they may come back. Which would again pause your RAID-Zx expansion

In regard to rebooting a server with ZFS. The original design of ZFS was to be always consistent on disk, even after an OS crash, un-expected power loss or user hit power button. So a simple reboot mostly does what it should, resume what it was doing before, (Scrub, disk replacement or RAID-Zx expansion, etc...).

Obviously any data that had not finished writing would be lost on those un-expected reboots. Just like any other file system. However, zero data lost of existing data is not just expected of ZFS, it was designed into ZFS.

Eric_Kolotyluk · January 21, 2025, 1:30am

Interestingly enough, I did not have to replace the drive, I just restarted the system.

Could it be, that after a zpool clear it’s necessary to restart TrueNAS?

Arwen · January 21, 2025, 1:33am

No.

The below shows you did not clear the ZFS error, just something else.

As I showed, the correct command is;

sudo zpool clear archive

Eric_Kolotyluk · January 21, 2025, 2:05am

That is the command I used, but there was no way to go back and edit my post.

Arwen · January 21, 2025, 6:57am

Uh, okay.

It was still not necessary to restart TrueNAS. However, RAID-Zx expansion is still new, so it is possible their is a bug in the restarting of the expansion.

Or it could be that the disk with the errors had to be re-silvered. Meaning brought up to date. This happens when a disk is offline or faulted and then brought back on-line. This message sort of indicates that is what happened.

Eric_Kolotyluk · January 21, 2025, 8:47pm

The disk failed again, but I was not able to recover the expansion as before. Black magic or blind luck, I tried several times… The expansion was almost finished

However, I was able to replace the failed disk with a larger healthier one, and the expansion completed.

After that, I replaced two more 4 TB disks with 8 TB, and after all the resilvering, the VDEV has twice the capacity as before. I have two more 8 TB disks to expand the VDEV with.

So, the expansion process needs improvement, but the replacement and resilvering works great.

@Arwen I really appreciate your help and insights. Thanks.