Unable to add new drive on existing pool

Berboo · December 7, 2024, 9:42pm

Hi everyone,

Before seeking help from ChatGPT, I thought I’d ask here to avoid messing with my pool.

Following that post, I’m trying to replace a drive in a 3-drive RAIDZ1 pool.

Here’s what I’ve done so far:

Removed the drive with pending sectors.
Wiped the new drive with the following command:

wipefs -a /dev/sdb

Attempted to replace the removed disk with the new one using the TrueNAS Scale web interface.
I checked the “Force” checkbox but encountered the following error:

2077 is not a valid Error
remove_circle_outline
More info...
Error: concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.11/concurrent/futures/process.py", line 256, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 112, in main_worker
    res = MIDDLEWARE._run(*call_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 46, in _run
    return self._call(name, serviceobj, methodobj, args, job=job)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 34, in _call
    with Client(f'ws+unix://{MIDDLEWARE_RUN_DIR}/middlewared-internal.sock', py_exceptions=True) as c:
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call
    return methodobj(*params)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 183, in nf
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs_/pool_actions.py", line 131, in replace
    with libzfs.ZFS() as zfs:
  File "libzfs.pyx", line 534, in libzfs.ZFS.__exit__
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs_/pool_actions.py", line 142, in replace
    target.replace(newvdev)
  File "libzfs.pyx", line 2348, in libzfs.ZFSVdev.replace
  File "libzfs.pyx", line 663, in libzfs.ZFS.get_error
  File "/usr/lib/python3.11/enum.py", line 717, in __call__
    return cls.__new__(cls, value)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/enum.py", line 1133, in __new__
    raise ve_exc
ValueError: 2077 is not a valid Error
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/job.py", line 488, in run
    await self.future
  File "/usr/lib/python3/dist-packages/middlewared/job.py", line 533, in __run_body
    rv = await self.method(*args)
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 179, in nf
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 49, in nf
    res = await f(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/pool_/replace_disk.py", line 94, in replace
    await self.middleware.call('zfs.pool.replace', pool['name'], options['label'], new_devname)
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1626, in call
    return await self._call(
           ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1465, in _call
    return await self._call_worker(name, *prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1471, in _call_worker
    return await self.run_in_proc(main_worker, name, args, job)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1377, in run_in_proc
    return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1361, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: 2077 is not a valid Error

I can’t do it either using command line

Can someone please help?

Thanks a lot

winnielinnie · December 7, 2024, 9:48pm

Why use a command? You can do this in the GUI with a “quick wipe”.

Physically removed from the case? Or only “removed” (“offlining”) from the pool’s vdev? (If the former, did you bypass the pool/vdev operations? I don’t recommend just pulling a drive from the system without first “offlining” it from the vdev.)

What does this say:

zpool status -v poolname

Are you not able to leave the “bad” drive installed, and then use the “Replace” option with the new drive also installed? Are there not enough SATA ports available to do this?

Since you’re doing a remove/replace, you must first discard the checkpoint (and also disable the daily checkpoint task).

This is why I think it would be nice to integrate it officially with TrueNAS’s GUI and middleware. It would work in tandem with other pool operations.

Berboo · December 7, 2024, 9:56pm

More convenient, but yeah I should have

In fact I tried this two times :
The first time by offlining it before physically removing it
The second time by just removing it

I have just 4 Sata ports 1 for the boot drive, 3 for the drives, I have no choice

I swear I did ! Once bitten, twice shy !

zpool status -v TankPrincipal

root@truenas[~]# zpool status -v poolname
cannot open 'poolname': no such pool
root@truenas[~]# zpool status -v TankPrincipal
  pool: TankPrincipal
 state: ONLINE
  scan: resilvered 1.08M in 00:00:01 with 0 errors on Sat Dec  7 13:56:40 2024
    scan warning: skipped blocks that are only referenced by the checkpoint.
checkpoint: created Sat Dec  7 11:22:34 2024, consumes 5.79M
config:

        NAME                                      STATE     READ WRITE CKSUM
        TankPrincipal                             ONLINE       0     0     0
          raidz1-0                                ONLINE       0     0     0
            dd378696-0361-11e8-a56d-d017c288bd12  ONLINE       0     0     0
            ddfea46c-0361-11e8-a56d-d017c288bd12  ONLINE       0     0     0
            debee879-0361-11e8-a56d-d017c288bd12  ONLINE       0     0     0

errors: No known data errors
root@truenas[~]#

I put back the bad drive

winnielinnie · December 7, 2024, 9:59pm

Disable the daily task (if you have one) that creates a checkpoint.

Then zpool checkpoint -d TankPrincipal

Then try to do the replace for the vdev with the new drive.

“Offline” the failing drive in the vdev → power off (if the system doesn’t support/enable hot-swapping) → unplug the bad drive → install the new drive → power on → “Replace” with the new drive → Let it resilver

(You can go back to daily checkpoints after this is all over.)

DISCLAIMER: Make sure you’re dealing with the correct drives. Not sure if you have stickers to label them or if you are able to see their serial numbers before unplugging.

Berboo · December 7, 2024, 10:28pm

Wow now it’s working. So having a checkpoint set prevents you from adding new drives ?

Yes my faulty drive was the sdb. So with

lsblk -o NAME,SERIAL

I checked the serial number displayed in front of sdb with the label affixed to the physical drive.

Thanks @winnielinnie

winnielinnie · December 7, 2024, 10:32pm

It does not prevent you from adding new vdevs. You can still do that.

You cannot remove vdevs.

You cannot detach, offline, or remove drives from a vdev, either.