TrueNAS Scale 25.04.0 Upgrade disaster

Post the output of each these in their own code brackets, in the same order.

zdb -l /dev/disk/by-partuuid/e4bb08a2-9a5d-4a24-9b8b-3ab220bb0267

zdb -l /dev/disk/by-partuuid/02f5451d-1e09-400a-9e08-2ba73150618f

zdb -l /dev/disk/by-partuuid/02f5451d-1e09-400a-9e08-2ba73150618f

You might have to use “sudo” if it says command not found or that you don’t have sufficient privileges.

1 Like
admin@truenas[~]$ sudo zdb -l /dev/disk/by-partuuid/e4bb08a2-9a5d-4a24-9b8b-3ab220bb0267
------------------------------------
LABEL 0 
------------------------------------
    version: 5000
    name: 'WizzPool'
    state: 0
    txg: 11851573
    pool_guid: 3518430912930309335
    errata: 0
    hostid: 1780220649
    hostname: 'truenas'
    top_guid: 11842154841165029957
    guid: 18340154160196067035
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 11842154841165029957
        nparity: 1
        metaslab_array: 133
        metaslab_shift: 34
        ashift: 12
        asize: 11995904212992
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 18340154160196067035
            path: '/dev/disk/by-partuuid/e4bb08a2-9a5d-4a24-9b8b-3ab220bb0267'
            whole_disk: 0
            DTL: 90826
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 15295080939741160980
            path: '/dev/disk/by-partuuid/02f5451d-1e09-400a-9e08-2ba73150618f'
            whole_disk: 0
            DTL: 90825
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 1389250103858833512
            path: '/dev/disk/by-partuuid/c1e813a1-9f62-4450-88e0-1a7c64def8a3'
            whole_disk: 0
            DTL: 90824
            create_txg: 4
            degraded: 1
            aux_state: 'err_exceeded'
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 0 1 2 3
admin@truenas[~]$ sudo zdb -l /dev/disk/by-partuuid/02f5451d-1e09-400a-9e08-2ba73150618f
------------------------------------
LABEL 0 
------------------------------------
    version: 5000
    name: 'WizzPool'
    state: 0
    txg: 11856176
    pool_guid: 3518430912930309335
    errata: 0
    hostid: 1780220649
    hostname: 'truenas'
    top_guid: 11842154841165029957
    guid: 15295080939741160980
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 11842154841165029957
        nparity: 1
        metaslab_array: 133
        metaslab_shift: 34
        ashift: 12
        asize: 11995904212992
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 18340154160196067035
            path: '/dev/disk/by-partuuid/e4bb08a2-9a5d-4a24-9b8b-3ab220bb0267'
            whole_disk: 0
            DTL: 90826
            create_txg: 4
            faulted: 1
        children[1]:
            type: 'disk'
            id: 1
            guid: 15295080939741160980
            path: '/dev/disk/by-partuuid/02f5451d-1e09-400a-9e08-2ba73150618f'
            whole_disk: 0
            DTL: 90825
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 1389250103858833512
            path: '/dev/disk/by-partuuid/c1e813a1-9f62-4450-88e0-1a7c64def8a3'
            whole_disk: 0
            DTL: 90824
            create_txg: 4
            degraded: 1
            aux_state: 'err_exceeded'
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 0 1 2 3 

Last one:

admin@truenas[~]$ sudo zdb -l /dev/disk/by-partuuid/02f5451d-1e09-400a-9e08-2ba73150618f
------------------------------------
LABEL 0 
------------------------------------
    version: 5000
    name: 'WizzPool'
    state: 0
    txg: 11856176
    pool_guid: 3518430912930309335
    errata: 0
    hostid: 1780220649
    hostname: 'truenas'
    top_guid: 11842154841165029957
    guid: 15295080939741160980
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 11842154841165029957
        nparity: 1
        metaslab_array: 133
        metaslab_shift: 34
        ashift: 12
        asize: 11995904212992
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 18340154160196067035
            path: '/dev/disk/by-partuuid/e4bb08a2-9a5d-4a24-9b8b-3ab220bb0267'
            whole_disk: 0
            DTL: 90826
            create_txg: 4
            faulted: 1
        children[1]:
            type: 'disk'
            id: 1
            guid: 15295080939741160980
            path: '/dev/disk/by-partuuid/02f5451d-1e09-400a-9e08-2ba73150618f'
            whole_disk: 0
            DTL: 90825
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 1389250103858833512
            path: '/dev/disk/by-partuuid/c1e813a1-9f62-4450-88e0-1a7c64def8a3'
            whole_disk: 0
            DTL: 90824
            create_txg: 4
            degraded: 1
            aux_state: 'err_exceeded'
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 0 1 2 3 

I didn’t mean in separate posts. :sweat_smile:


One of your ZFS members has a different TXG. This is similar to another user who could not import their pool, regardless of the version of SCALE or even on Core.

e4bb08a2-9a5d-4a24-9b8b-3ab220bb0267
txg: 11851573 ← behind by 3 TXGs

02f5451d-1e09-400a-9e08-2ba73150618f
txg: 11856176

02f5451d-1e09-400a-9e08-2ba73150618f
txg: 11856176


@HoneyBadger since this is a RAIDZ1, and two of the ZFS members of the vdev share the same TXG, should an import be attempted in a degraded state for only the two that have TXG 11856176?

In this situation, it seems the one that lagged behind is the “odd one out”.

Why did this happen? Do you think there’s a common denominator or pattern?

It’s possible that the “more recent” TXG of the other user with the unimportable mirror pool is also “correct”. If they had a three-way mirror, we might have seen that two of the drives would have a more recent TXG, rather than an older TXG?

It was just easy to do separate replies. I can do it in one if it’s better…

It’s too late. I already replied and the issue is clear. :+1:

I was meaning in the future. LOL. All is good.

1 Like

So, what is the solution to this?

I don’t know.

Here’s another user with the same problem, and it hasn’t yet been resolved.

Waiting to see if it’s safe for them to try to force a degraded import with the other (“more recent”) TXG.

If that works, then it might be possible to do the same with you, and then it’s a matter of resilvering the pool back into a healthy state.

What caused this in the first place? I don’t know. It was noted that four different people seemed to have a similar issue of “pool cannot import with any version of TrueNAS anymore” after upgrading/sidegrading to 24.10 or 25.04.

You and @JGordonT are two of them. The other two have moved on, one of whom wiped and reused their drives, so there’s no way to diagnose any further.

Ok. I see. Thank you for all the help so far. What do you think would happen if I just did:

sudo zpool import WizzPool

You’ll be met with an I/O error.

I supposed it’s safe to try? @HoneyBadger mentioned that ZFS tries to match as far back as 3 TXGs into the past. Your TXGs are only skewed by 3, so…

:thinking:

OK. Wasn’t sure.

admin@truenas[~]$ sudo zpool import WizzPool
[sudo] password for admin: 
cannot import 'WizzPool': I/O error
	Recovery is possible, but will result in some data loss.
	Returning the pool to its state as of Tue May 20 18:20:09 2025
	should correct the problem.  Approximately 4 seconds of data
	must be discarded, irreversibly.  Recovery can be attempted
	by executing 'zpool import -F WizzPool'.  A scrub of the pool
	is strongly recommended after recovery.
admin@truenas[~]$ sudo zpool import -F WizzPool
cannot mount '/WizzPool': failed to create mountpoint: Read-only file system
Import was successful, but unable to mount some datasets
admin@truenas[~]$ 

1 Like

Biscuits and gravy! The crazy badger was right!

Export the pool again, but don’t reimport it just yet.

Check the following three once more.

Let’s see if it did in fact rewind back to the earlier TXG of 11851573

zdb -l /dev/disk/by-partuuid/e4bb08a2-9a5d-4a24-9b8b-3ab220bb0267

zdb -l /dev/disk/by-partuuid/02f5451d-1e09-400a-9e08-2ba73150618f

zdb -l /dev/disk/by-partuuid/02f5451d-1e09-400a-9e08-2ba73150618f

3 Likes

When I went into datasets, I got this error:

concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs_/dataset_quota.py", line 76, in get_quota
    with libzfs.ZFS() as zfs:
  File "libzfs.pyx", line 534, in libzfs.ZFS.__exit__
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs_/dataset_quota.py", line 78, in get_quota
    quotas = resource.userspace(quota_props)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "libzfs.pyx", line 3834, in libzfs.ZFSResource.userspace
libzfs.ZFSException: cannot get used/quota for WizzPool: dataset is busy

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 116, in main_worker
    res = MIDDLEWARE._run(*call_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 47, in _run
    return self._call(name, serviceobj, methodobj, args, job=job)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 41, in _call
    return methodobj(*params)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs_/dataset_quota.py", line 80, in get_quota
    raise CallError(f'Failed retreiving {quota_type} quotas for {ds}')
middlewared.service_exception.CallError: [EFAULT] Failed retreiving USER quotas for WizzPool
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/api/base/server/ws_handler/rpc.py", line 323, in process_method_call
    result = await method.call(app, params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/api/base/server/method.py", line 40, in call
    result = await self.middleware.call_with_audit(self.name, self.serviceobj, methodobj, params, app)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 883, in call_with_audit
    result = await self._call(method, serviceobj, methodobj, params, app=app,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 692, in _call
    return await methodobj(*prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 174, in nf
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/pool_/dataset_quota.py", line 48, in get_quota
    quota_list = await self.middleware.call(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 977, in call
    return await self._call(
           ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 700, in _call
    return await self._call_worker(name, *prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 706, in _call_worker
    return await self.run_in_proc(main_worker, name, args, job)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 612, in run_in_proc
    return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 596, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
middlewared.service_exception.CallError: [EFAULT] Failed retreiving USER quotas for WizzPool

See my reply. Export the pool, but we’re checking one more thing before you get back on track.

Just do:

Sudo Zpool Export WizzPool

???

You don’t see it as exportable in the web GUI? Use the GUI.

I didn’t look. I will try that.