Cant import pool [SOLVED]

QB256 · December 20, 2024, 6:18am

I solved the following problem occcured after few restarts of my TrueNAS SCALE 22.04.2.

My pool “data” was not imported after reboot. It was visible in “Storage dashboard” menu, but disks were not attached to it. When I pressed “Disks” button in “Storage dashboard”, disks of pool were visible and column “Pool” contained “data (Exported)” so zfs knows disks belong to pool “data”. Disks are “/dev/sdb” to “/dev/sdf”.

When I tried to “Import pool” from “Storage dashboard”, the list of available pools to import was empty.

I tried from command line:

root@truenas[/etc/zfs]# zpool status data
cannot open 'data': no such pool

so no information about pool available.

I tried

root@truenas[/etc/zfs]# zpool import -d /dev/disk/by-id/
   pool: data
     id: 12627903007845579206
  state: UNAVAIL
status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
 config:

        data                                          UNAVAIL  insufficient replicas
          raidz2-0                                    UNAVAIL  insufficient replicas
            wwn-0x50014ee21666dd5a                    UNAVAIL  invalid label
            wwn-0x5000cca0b0ccd04a                    UNAVAIL  invalid label
            ata-WDC_WD60EFPX-68C5ZN0_WD-WX62D2409J6S  UNAVAIL  invalid label
            wwn-0x50014ee26b9c3a4a                    UNAVAIL  invalid label

so information about pool data is stored on disks somehow.

I tried to get information about zfs metadata on disk wwn-0x50014ee21666dd5a

root@truenas[/etc/zfs]# zdb -l /dev/disk/by-id/wwn-0x50014ee21666dd5a
failed to unpack label 0
failed to unpack label 1
------------------------------------
LABEL 2 (Bad label cksum)
------------------------------------
    version: 5000
    name: 'data'
    state: 0
    txg: 2124991
    pool_guid: 12627903007845579206
    errata: 0
    hostid: 1601898500
    hostname: 'truenas'
    top_guid: 8880454056440280913
    guid: 8183342154247264152
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 8880454056440280913
        nparity: 2
        metaslab_array: 256
        metaslab_shift: 34
        ashift: 12
        asize: 23996082946048
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 8183342154247264152
            path: '/dev/disk/by-partuuid/5b51fea1-2f6e-460d-987e-1af4ee5d2ff6'
            whole_disk: 0
            DTL: 99
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 3311350377944452613
            path: '/dev/disk/by-partuuid/6b80f859-f6c0-4ea9-a291-cd07c1a88210'
            whole_disk: 0
            DTL: 921
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 3183920643935737096
            path: '/dev/disk/by-partuuid/e23335a0-1e3c-4864-bad3-88be84b3c59f'
            whole_disk: 0
            DTL: 114
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 12828138325581573469
            path: '/dev/disk/by-partuuid/c307b771-367d-4d25-aec5-7742646bb403'
            whole_disk: 0
            DTL: 113
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
        com.klarasystems:vdev_zaps_v2
    labels = 2 3

From the result can be seen that labels stored on disk are incorrect, but information about pool are stored there. Similar situation on all other disks.

Solution

I have backups of important data, so I did not created copies of all disks and tried to be more offensive.

Tried

 fdisk -l /dev/sdf
The primary GPT table is corrupt, but the backup appears OK, so that will be used.
Disk /dev/sdf: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Disk model: WDC WD102KFBX-68
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 9085DF32-2965-4808-9704-79E906A9F608

Device       Start         End     Sectors  Size Type
/dev/sdf1     2048     4196352     4194305    2G Linux swap
/dev/sdf2  4198400 19532873694 19528675295  9.1T Solaris /usr & Apple ZFS

partition table is corrupt but backup is correct.

Tried to backup partition table for case of problems with recovery

root@truenas[/home/admin]# sgdisk --backup=gpt_backup-sdf.sgdisk /dev/sdf
Caution! After loading partitions, the CRC doesn't check out!
Warning! Main partition table CRC mismatch! Loaded backup partition table
instead of main partition table!

Warning! One or more CRCs don't match. You should repair the disk!
Main header: OK
Backup header: OK
Main partition table: ERROR
Backup partition table: OK

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
The operation has completed successfully.

Tried to restore backup of GPT

root@truenas[/home/admin]# gdisk /dev/sdf
GPT fdisk (gdisk) version 1.0.9

Caution! After loading partitions, the CRC doesn't check out!
Warning! Main partition table CRC mismatch! Loaded backup partition table
instead of main partition table!

Warning! One or more CRCs don't match. You should repair the disk!
Main header: OK
Backup header: OK
Main partition table: ERROR
Backup partition table: OK

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: damaged

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************

Command (? for help): w

Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!

Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT) to /dev/sdf.
The operation has completed successfully.

And tested if partition table is OK

root@truenas[/home/admin]# fdisk -l /dev/sdf
Disk /dev/sdf: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Disk model: WDC WD102KFBX-68
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 9085DF32-2965-4808-9704-79E906A9F608

Device       Start         End     Sectors  Size Type
/dev/sdf1     2048     4196352     4194305    2G Linux swap

So disk is OK now.

Testing of zfs metadata showed that problems with labels still occurs

root@truenas[/home/admin]# zdb -l /dev/sdf
failed to unpack label 0
failed to unpack label 1
------------------------------------
LABEL 2 (Bad label cksum)
------------------------------------
    version: 5000
    name: 'data'
    state: 0
    txg: 2124991
    pool_guid: 12627903007845579206
    errata: 0
    hostid: 1601898500
    hostname: 'truenas'
    top_guid: 8880454056440280913
    guid: 3311350377944452613
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 8880454056440280913
        nparity: 2
        metaslab_array: 256
        metaslab_shift: 34
        ashift: 12
        asize: 23996082946048
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 8183342154247264152
            path: '/dev/disk/by-partuuid/5b51fea1-2f6e-460d-987e-1af4ee5d2ff6'
            whole_disk: 0
            DTL: 99
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 3311350377944452613
            path: '/dev/disk/by-partuuid/6b80f859-f6c0-4ea9-a291-cd07c1a88210'
            whole_disk: 0
            DTL: 921
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 3183920643935737096
            path: '/dev/disk/by-partuuid/e23335a0-1e3c-4864-bad3-88be84b3c59f'
            whole_disk: 0
            DTL: 114
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 12828138325581573469
            path: '/dev/disk/by-partuuid/c307b771-367d-4d25-aec5-7742646bb403'
            whole_disk: 0
            DTL: 113
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
        com.klarasystems:vdev_zaps_v2
    labels = 2
failed to unpack label 3

So I fixed GPT on all other disks sdc, sdd, and sdf using the same steps.

Then I tried to import pool again

root@truenas[/home/admin]# zpool import data
root@truenas[/]# zpool status
  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:11 with 0 errors on Sat Dec 14 03:45:12 2024
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          sda3      ONLINE       0     0     0

errors: No known data errors

  pool: data
 state: ONLINE
  scan: scrub in progress since Thu Dec 19 16:34:07 2024
        655G / 10.9T scanned at 4.09G/s, 0B / 10.9T issued
        0B repaired, 0.00% done, no estimated completion time
config:

        NAME                                      STATE     READ WRITE CKSUM
        data                                      ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            5b51fea1-2f6e-460d-987e-1af4ee5d2ff6  ONLINE       0     0     0
            6b80f859-f6c0-4ea9-a291-cd07c1a88210  ONLINE       0     0     0
            e23335a0-1e3c-4864-bad3-88be84b3c59f  ONLINE       0     0     0
            c307b771-367d-4d25-aec5-7742646bb403  ONLINE       0     0     0

errors: No known data errors

and my pool data is back again.

Cause of problem
It seems that the cause of problem was corrupted GPT on disks. In “Storage dashboard” everythink seemed that it should work and there is no reason for pool import, but it must be done manually.

I hope this troubleshooting can help someone.

Protopia · December 20, 2024, 12:41pm

TBH I am surprised that it worked - according to your fdisk -l the GPT partition table only has the swap file and not a ZFS partition.

That said, I have had the same issues with my USB SSD boot drive when it was connected to a SATA DOM port, but not since it has been connected to an external port instead.

For the future you can also try zpool import -d /dev/disk/by-partuuid because TrueNAS Scale uses ZFS partitions rather than whole drives, and because zdb -l clearly suggests that this is the way to do it.

QB256 · December 20, 2024, 1:19pm

I did not copied the whole fdisk -l output. The correct is

Device       Start         End     Sectors  Size Type
/dev/sdf1     2048     4196352     4194305    2G Linux swap
/dev/sdf2  4198400 19532873694 19528675295  9.1T Solaris /usr & Apple ZFS

There is also normal ZFS partion on each disk.

I did not try zpool import -d /dev/disk/by-partuuid. I am not familiar with rescuing pools.