SOLVED: Zpool import shows insufficient replicas and invalid label for main pool

stk · October 27, 2024, 11:50pm

I had these “invalid label” entries on all my truenas systems.

I noticed this after doing an export into to import the checkpoint. When I tried to import main, it said main was ambiguous and I had to import by number.

I tried:
zpool labelclear -f /dev/zd....
will clear the entry but that is NOT recommended and didn’t work.

THE SOLUTION
So this turned out to be a misleading error message and nothing was wrong. The reason I couldn’t just import “main” is because I have a Zvol containing a ZFS pool with the name “main” so that caused the ambiguity which led me to zpool import to find the correct number. If you don’t try to run truenas in truenas, you’ll never run in to this problem

Here’s what happened.

Note: Most people will never run into this.

I created a truenas VM to install truenas on truenas. It worked flawlessly which is great. I named my pool “main” just like my main system. That is where the confusion happens. Most people aren’t running truenas in a truenas VM and using the same pool name.

So this was not the problem.

The problem was the local backup copies of the ZVol for the pool show up with that message above.

My main system (called truenas) has a pool called main. The VM’ed truenas has a pool called main (a ZVol on the real truenas).

I have three backup systems: one is SSD pool attached to my main system and I have two offsite replicas where I zfs replicate the main pool to a dataset on the backup devices.

So the main truenas system was showing the SSD backup of the truenas-pool Zvol, and of course the other systems had replicas of the same Zvol.

So all 3 systems had a backup copy of the same ZVol used for the VM.

This caused “zpool import” to give me the very misleading error message which isn’t even documented (if you follow the link, it doesn’t describe the error I got).

So there was absolutely nothing wrong; it was a misleading error message caused by the particular choices I made.

But I learned a lot during the process.

I restored the label removal mistake simply by restoring an earlier snapshot, but I now realize that I simply removed the label of the backup, not the original.

Next time, I’ll check the /dev/z### device in /dev/zvols first to see where it was. That would’ve saved a lot of time.

winnielinnie · October 27, 2024, 11:52pm

Why are you (forcefully) clearing the label on a disk/partition?

Are you intending to destroy the pool and wipe the disk?

joeschmuck · October 28, 2024, 1:00am

If you look up man zpool labelclear, you will see related commands are destroy, detach, remove, and replace.

Why exactly are you running this command? I’m actually curious.

stk · October 28, 2024, 10:26pm

that is NOT my main pool.

the reason I even looked is because when I followed your instructions to import the pool, I got a message saying “main” was ambiguous and to use the numeric ID. That’s when I notice the weirdness.

I veried on a fresh install you get this as well.

This cannot be some ZFS hidden label, because it is ZFS itself saying “invalid label”

Is this a bug or a feature?

stk · October 28, 2024, 10:28pm

because import wouldn’t let me just use main… said it was ambiguous pool name. that super surprised me. said i had to use the long numeric ID.

winnielinnie · October 28, 2024, 10:34pm

The question is, are you trying to wipe a drive?

Are you actually intending to destroy the pool?

Are you trying to “start fresh” with a drive that was previously part of a pool, which you want to use for something else instead?

The reason I’m asking is because zpool labelclear is a destructive action. It has nothing to do with “names”. The word “label” is akin to a partition or filesystem label. (Not a “name” label, such as a pool name.)

As an example, in the parted utility, there’s an action called “make label”. This action will destroy the entire partition table, even though the word “label” might sound as harmless as renaming a folder.

I’m not sure where you read anywhere on these forums to use zpool labelclear for anything. (Unless you want to wipe/destroy a disk.)

stk · October 28, 2024, 11:15pm

I’m trying to be able to follow your instructions which don’t work because you have to specify a number, not a pool name on the import. It simply won’t work with a name because it said the name was ambiguous when i used the import.

zpool import showed the TWO main pools, one was unimportable, the other was the exported pool.

So I was thinking it must be something wrong with my system, but all my other truenas installs have these INVALID labels for main pool. What’s up with that?

My main system (main is the main pool name for all my systems):

root@truenas[/mnt/main/user/stk]# zpool import
  pool: rpool
    id: 4534497686456552008
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config:

        rpool       ONLINE
          zd128p3   ONLINE

  pool: boot-pool
    id: 7621744736219908709
 state: ONLINE
status: One or more devices are configured to use a non-native block size.
        Expect reduced performance.
action: The pool can be imported using its name or numeric identifier.
config:

        boot-pool   ONLINE
          zd48p3    ONLINE

  pool: main
    id: 413467148470438577
 state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config:

        main        UNAVAIL  insufficient replicas
          zd208     UNAVAIL  invalid label

  pool: rpool
    id: 9640816830726672797
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config:

        rpool                                            ONLINE
          ata-SuperMicro_SSD_SMC0515D90717C6K5595-part3  ONLINE
root@truenas[/mnt/main/user/stk]#

my backup system:

root@backup[/home/truenas_admin]# zpool import
  pool: main
    id: 413467148470438577
 state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config:

        main        UNAVAIL  insufficient replicas
          zd128     UNAVAIL  invalid label
root@backup[/home/truenas_admin]#

My other backup sysem (you can never have too many backup systems):

root@qnap[/mnt/main/user/stk]# zpool import
  pool: main
    id: 413467148470438577
 state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config:

        main        UNAVAIL  insufficient replicas
          zd80      UNAVAIL  invalid label
root@qnap[/mnt/main/user/stk]#

winnielinnie · October 28, 2024, 11:25pm

A pool name will work just fine with an import, so long as there are no conflicts.

To cover every nuance would go beyond the scope of a specific tool or guide.

Same name? Same ID? Different locations? Both having been “last accessed by another system”?

Did a hypervisor and (virtualized) TrueNAS ever attempt to import this pool at the same time?

stk · October 29, 2024, 12:10am

no. the backup systems are fresh installs, one bare metal, the other on Xen. The main pool has NEVER been exported.

winnielinnie · October 29, 2024, 12:16am

How do you have two “different” pools with the same exact GUID?

413467148470438577

Those are not two different pools. It is the same pool.

neofusion · October 29, 2024, 6:54pm

Probably some partition cloning shenanigans that were left out in the description.
The ZFS-8000-EY error code is telling, although typically seen when a VM is involved.

My wild guess is that the intent was to rapidly prototype something, possibly ZFS-related, and an LLM was employed to write a script to help do so. This is the fallout.

HoneyBadger · October 29, 2024, 8:28pm

stk:

root@truenas[/mnt/main/user/stk]# zpool import
  pool: main
    id: 413467148470438577
 state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config:

        main        UNAVAIL  insufficient replicas
          zd208     UNAVAIL  invalid label

stk:

root@backup[/home/truenas_admin]# zpool import
  pool: main
    id: 413467148470438577
 state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config:

        main        UNAVAIL  insufficient replicas
          zd128     UNAVAIL  invalid label

stk:

root@qnap[/mnt/main/user/stk]# zpool import
  pool: main
    id: 413467148470438577
 state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config:

        main        UNAVAIL  insufficient replicas
          zd80      UNAVAIL  invalid label

According to this, you have the same pool ID on not just two but three systems.

stk · October 29, 2024, 8:45pm

my backup systems are all fresh installs with “main” created from the GUI and no scripts being run.

winnielinnie · October 29, 2024, 8:49pm

How do you have a pool with the same GUID across three different hosts?

That can’t be due to random chance.

stk · October 29, 2024, 9:24pm

I have two backup systems and they have the artifact. The backup systems store full filesystem backup of my main pool.

I have another truenas install running under proxmox and I checked there and the zpool import shows nothing. I never used that to backup my main system.

So I think this must have something to do with the backup systems being reciptients of a full backup (using replication) of my main pool because of the hint that the pool was last accessed by another system. Could that be my main truenas host which is backing up to these systems (which is why it says “The pool was last accessed by another system.”??

On my main truenas system:

root@truenas[~]# zpool import 413467148470438577
cannot import 'main': pool was previously in use from another system.
Last accessed by truenas (hostid=584594f) at Wed Dec 31 16:00:00 1969
The pool can be imported, use 'zpool import -f' to import the pool.
root@truenas[~]#

So that is interesting! I love the date!

And what’s interesting is I get the exact same message from my backup system which is evidently getting a replica of the bad label which is why it is the same.

So presumably if I get rid of the bogus pool on my primary truenas, the backups will go away as well, but we’ll see.

On my backup system:

root@backup[/home/truenas_admin]# zpool import
  pool: main
    id: 413467148470438577
 state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config:

        main        UNAVAIL  insufficient replicas
          **zd128**     UNAVAIL  invalid label
root@backup[/home/truenas_admin]# lsblk
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sr0      11:0    1  1024M  0 rom
xvda    202:0    0   200G  0 disk
├─xvda1 202:1    0     1M  0 part
├─xvda2 202:2    0   512M  0 part
└─xvda3 202:3    0 199.5G  0 part
xvdb    202:16   0     2T  0 disk
└─xvdb1 202:17   0     2T  0 part
xvdc    202:32   0     2T  0 disk
└─xvdc1 202:33   0     2T  0 part
zd0     230:0    0     1T  1 disk
zd16    230:16   0   150G  1 disk
zd32    230:32   0    50G  1 disk
zd48    230:48   0   256G  1 disk
zd64    230:64   0    10G  1 disk
zd80    230:80   0     2G  1 disk
zd96    230:96   0     2T  1 disk
zd112   230:112  0    10G  1 disk
**zd128**   230:128  0     1G  1 disk
zd144   230:144  0     8G  1 disk
root@backup[/home/truenas_admin]#

which is pretty interesting. A 1G disk eh? hmmm…

OK, so I’ve figured out these zd devices are zvols and if you go to /dev/zvol you can look under the covers to find out the truenas name of the zvol.

BINGO: To be creative, I used the truenas VM to install truenas on truenas. to create the VM, I created two zvols, one for the installation, and another zvol for the pool. in the VM, I called my pool “main”.

So this is why the clone exists on the backup systems because it’s backing up all my VMs, including my truenas VM.

So basically, that zvol looks like an importable pool last used by truenas, no surprise. Makes perfect sense.

More later, but I’m getting real close now. I think I’ll destroy the main pool in the vm and call it a new name

neofusion · October 29, 2024, 10:05pm

Are these three fully independent systems?
How are the systems cabled?

Do you have a SSF-8XXX cable or two going to a shared HBA or similar?
Or some kind of HA setup connecting the systems together?

Stux · October 29, 2024, 10:06pm

Very very concerned that this is going to end in dataloss.

stk · October 29, 2024, 10:07pm

yes they are fully independent. See Zpool import shows insufficient replicas and invalid label for main pool - #15 by stk for the explanation of the mystery!

winnielinnie · October 29, 2024, 10:33pm

That would not explain why three different systems have a pool that shares the exact same GUID.^[1]

Pools are not replicated. Pools are created.

Datasets are replicated (whether independent, nested parents and children, or the “entire thing” via a root dataset replication).

Normal (“safe”) usage of ZFS should not result in multiple hosts trying to import and “use” the same pool with the same GUID.

If you have a host with a pool that replicates filesystems (datasets) to two other hosts, then you will still have three different pools with three different GUIDs. (The pool on the first host, and then the pools on the receiving hosts.) It doesn’t matter if the three pools are “identical” in that they contain the same “pool names”, datasets, snapshots, and filesystem hierarchies. Each pool will still have a unique GUID. ↩︎

Stux · October 29, 2024, 10:37pm

Unless some sort of of Xen hypervisor “replication” is being used, when we’re all thinking about ZFS Replication