Invalid partition labels and failed import after resliver/extend and reboot

Just as a quick intro, I believe both the cause and fix are going to be similar to this thread from a couple of months back. Possible to reattach a disconnected drive to an empty pool? | TrueNAS Community

I have 2 separate single drive pools this is occurring on. Temp data, nothing too important, But I’d prefer to get this fixed and re-attach than rebuild. I’m confident the drives are still good and that the damage is due to procedure, not failure.

Note: any mention to “Freenas” in my code blocks below is simply the hostname, The system has gone through numerous full upgrades since the first build including Freenas > Truenas > Truenas scale but the original hostname remains.

The original procedure that lead to this is as follows. (Same for both drives):

6TB single drive for the original pool, replaced with a 10TB using the replace function through the UI. Full resliver completed successfully. The re-sizing was not automatic (apparently prevented by having the pool in use somehow, both pools have multiple shares and some applications that used the directly) The message that popped up when attempting to use the extend function was that a reboot would be required to get the kernel to recognize the new size. The pool still had 6TB usable but the Data VDEV’s line showed a ? for size instead of the old 5.46TB or new 9ish TB.
The pools were at this point for several days, functioning normally until I had an unexpected UPS overload (too much draw for more than 2 minutes, not a surge) followed by a full power failure for all systems on the box.
On reboot, both drives are visible in the list of disks, but are just listed as N/A pool and unused. I ended up doing a export/disconnect without deleting settings on one of the two offline pools simply to test a clean import, but to no avail.

A zpool import finds no pools.

This led me to test the following

root@freenas[/tmp]# zdb -l /dev/sdbg
failed to unpack label 0
failed to unpack label 1
failed to unpack label 2
failed to unpack label 3
root@freenas[/tmp]# ls -ls /dev/disk/by-id
<truncated>
0 lrwxrwxrwx 1 root root 10 May  6 00:20 wwn-0x5000cca26c194f04 -> ../../sdbg
0 lrwxrwxrwx 1 root root 11 May  6 00:20 wwn-0x5000cca26c194f04-part1 -> ../../sdbg1
root@freenas[/tmp]# zhack label repair -cu /dev/disk/by-id/wwn-0x5000cca26c194f04
Calculated filesize to be 10000831348736
error: label 0: Expected the nvlist checksum magic number to not be zero
There should already be a checksum for the label.
error: label 1: Expected the nvlist checksum magic number to not be zero
There should already be a checksum for the label.
error: label 2: Expected the nvlist checksum magic number to not be zero
There should already be a checksum for the label.
error: label 3: Expected the nvlist checksum magic number to not be zero
There should already be a checksum for the label.
label 0: uberblock: skipped checksum: skipped
label 1: uberblock: skipped checksum: skipped
label 2: uberblock: skipped checksum: skipped
label 3: uberblock: skipped checksum: skipped

And

root@freenas[/dev/disk/by-id]# zhack label repair -cu /dev/sdbg
Calculated filesize to be 6001175126016
error: cannot unpack nvlist label 0
error: cannot unpack nvlist label 1
error: label 2: Expected the nvlist checksum magic number to not be zero
There should already be a checksum for the label.
error: label 3: Expected the nvlist checksum magic number to not be zero
There should already be a checksum for the label.
label 0: uberblock: skipped checksum: skipped
label 1: uberblock: skipped checksum: skipped
label 2: uberblock: skipped checksum: skipped
label 3: uberblock: skipped checksum: skipped

Looking directly at the /dev/sdbg apparently can see labels 0 and 1 but fails to unpack them. while looking at the same drive by ID can’t see anything. Also interestingly, the /dev/sdbg version shows calculated size to be the original 6TB, not 10.

If I manually pull the section with 4096 sectors skipped, the first two labels are readable. Also showing an ashift size that would match the original 6TB size.

root@freenas[/dev/disk/by-id]# sudo dd if=/dev/sdbg bs=512 iseek=4096 count=8192 of=/tmp/img.sdbg
8192+0 records in
8192+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0359062 s, 117 MB/s
root@freenas[/dev/disk/by-id]# zdb -l /tmp/img.sdbg
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'Temp6TB-1'
    state: 0
    txg: 14827860
    pool_guid: 18333573255668817670
    errata: 0
    hostid: 2046751772
    hostname: 'freenas'
    top_guid: 3218190718435304907
    guid: 3218190718435304907
    vdev_children: 1
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 3218190718435304907
        path: '/dev/disk/by-partuuid/96b7de78-98f9-4757-9093-6c548078b28e'
        whole_disk: 0
        metaslab_array: 38
        metaslab_shift: 35
        ashift: 12
        asize: 5999023357952
        is_log: 0
        DTL: 9476
        create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
        com.klarasystems:vdev_zaps_v2
    labels = 0 1
failed to unpack label 2
failed to unpack label 3

At this point, I expect it’s probably something that’ll involve manually fixing the partition table as was done in the other thread I found, I’m just not confident on what the new values should be.

I am also curious whether or not the power failure is what the final cause for this was, Is there normally a final step done on a clean shutdown after this type of replace/resliver/extend that would write the partition table succesfully? Or was this behavior going to happen regardless of how it was shut-down.

I do still have the original 6TB of one of the two I upgraded, but the other has been wiped and overwritten since then. It also won’t import (not unexpected). But if it’s easier to rebuild recover from partitions/labels on that drive, it is also an option. I do also have spare drives in both sizes if I do end up needing to clone them to do this.

A few other apparently useful outputs for this.

root@freenas[/tmp]# sfdisk -d /dev/sdbg
label: gpt
label-id: BA21B11A-B8FE-4D2F-A0A3-A90EFBBF0177
device: /dev/sdbg
unit: sectors
first-lba: 6
last-lba: 2441609210
sector-size: 4096

/dev/sdbg1 : start=        4096, size=  2441605115, type=6A898CC3-1DD2-11B2-99A6-080020736631, uuid=96B7DE78-98F9-4757-9093-6C548078B28E
root@freenas[.../Temp6TBRecovery]# sgdisk -p /dev/sdbg
Disk /dev/sdbg: 2441609216 sectors, 9.1 TiB
Model: HUH721010AL4200
Sector size (logical/physical): 4096/4096 bytes
Disk identifier (GUID): BA21B11A-B8FE-4D2F-A0A3-A90EFBBF0177
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 5
First usable sector is 6, last usable sector is 2441609210
Partitions will be aligned on 256-sector boundaries
Total free space is 4090 sectors (16.0 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            4096      2441609210   9.1 TiB     BF01