Backup drive - invalid label cannot import

Hello

Wanted to restore the pool from backup. However now cannot mount backup drive:
Using TrueNAS CORE 13.0-U6.7

# zpool import
   pool: usb_bkp
     id: 16136177666929402786
  state: UNAVAIL
status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
 config:

        usb_bkp     UNAVAIL  insufficient replicas
          ada0      UNAVAIL  invalid label

dumping the label:

# zdb -ll /dev/ada0
failed to unpack label 0
failed to unpack label 1
------------------------------------
LABEL 2 (Bad label cksum)
------------------------------------
    version: 5000
    name: 'usb_bkp'
    state: 1
    txg: 2796501
    pool_guid: 16136177666929402786
    errata: 0
    hostid: 677086199
    hostname: 'jt-nas.local'
    top_guid: 13417891300543037129
    guid: 13417891300543037129
    vdev_children: 1
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 13417891300543037129
        path: '/dev/gptid/64e94f32-52ed-11ed-a081-94de80a78ddd'
        metaslab_array: 128
        metaslab_shift: 34
        ashift: 12
        asize: 19998435966976
        is_log: 0
        DTL: 84970
        create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 2 


ZFS Label NVList Config Stats:
  1068 bytes used, 113580 bytes free (using  0.9%)

   integers:   18    660 bytes (61.80%)
    strings:    4    192 bytes (17.98%)
   booleans:    2     92 bytes ( 8.61%)
    nvlists:    3    124 bytes (11.61%)


failed to unpack label 3

Please help…

Your drive should be partitioned.

What does this reveal?

zdb -ll /dev/ada0p2

The drive has about 14TB of data. I had copied data using replication task. Had originally connected via USB, it took a week to backup all the data.

Here is output:

# zdb -ll /dev/ada0p2

cannot open '/dev/ada0p2': No such file or directory

# zdb -ll /dev/ada0p0

cannot open '/dev/ada0p0': No such file or directory

# zdb -ll /dev/ada0p1

cannot open '/dev/ada0p1': No such file or directory

You might want to run a short or long SMART selftest on the drive. If it’s failing, and that’s the reason for the bad label, then it could explain what you’re seeing.

How long have you been using the USB drive?

If the drive was set up within TrueNAS, I would expect to see partitions. If it’s using the whole disk, that seems unusual.

Can you show the result of gpart list ada0 ?

Yeah, the drive was setup within TrueNAS.

# gpart list ada0
Geom name: ada0
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 4294967294
first: 63
entries: 4
scheme: MBR
Consumers:
1. Name: ada0
   Mediasize: 20000588955648 (18T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0

The drive was in USB enclosure for about 2 months or so. Not actively used. Used for one time backup.
Now, i have connected the drive to SATA port. Here is the output of smart:

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   083   064   044    Pre-fail  Always       -       205488793
  3 Spin_Up_Time            0x0003   090   089   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       106
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   080   060   045    Pre-fail  Always       -       109323439
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       9931
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       44
 18 Unknown_Attribute       0x000b   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       8590065666
190 Airflow_Temperature_Cel 0x0022   069   048   000    Old_age   Always       -       31 (Min/Max 24/38)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       21
193 Load_Cycle_Count        0x0032   070   070   000    Old_age   Always       -       61889
194 Temperature_Celsius     0x0022   031   051   000    Old_age   Always       -       31 (0 19 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   100   000    Old_age   Offline      -       3335 (80 242 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       31094970992
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       270653652407

Odd, I thought that even the newest CORE was still using slices/partitions. For it to do a whole-disk is unusual in my mind.

If we can back up your labels by dding them to a separate file first, we might be able to zhack label repair this one - but without a backup of your backup I’m a bit hesitant to suggest that.

dd if=/dev/ada0 bs=1M count=32 of=first32M.bin
dd if=/dev/ada0 bs=1M count=32 skip=20000555401216 iflag=skip_bytes of=last32M.bin

This should copy the first and last 32M of your disk, respectively, to those files in your current working directory. Then you might be able to take a shot at letting zhack rebuild a correct checksum on that disk.

I found this odd too.

Could not backup the last 32M, got this error:

# dd if=/dev/ada0 bs=1M count=32 seek=20000555401216 of=last32M.bin

dd: seek offsets cannot be larger than 9223372036854775807

Is this right seek point?
20000588955648 - 32*2048 = 20000588890112 blocks

But even this gives the same error:

# dd if=/dev/ada0 bs=1M count=32 seek=20000588890112 of=last32M.bin

dd: seek offsets cannot be larger than 9223372036854775807

With a bs=1M, dd is interpreting each block as 1-MiB in size. This affects the seek as well.

1 Like

I thought seek would just go by bytes. Apparently it’s something other than that.

Okay, I got it - we need skip not seek.

dd if=/dev/ada0 bs=1M count=32 skip=20000588890112 iflag=skip_bytes of=last32M.bin

This should make it behave as intended.

I approve of the above command. :+1:

I totally did not edit my post after looking like a fool because I am infallible.

I totally never wrote this.

I believe skip is also dictated by the blocksize (bs).

1 Like

hence iflag=skip_bytes :wink:

3 Likes

Once you get them both dumped, pick into them a bit with

hexdump -C first32M.bin | head -n 40

and look for the telltale ZFS headers and labels:

root@truenas[/home/truenas_admin]# hexdump -C first32M.bin | head -n 40
00000000  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30  |0000000000000000|
*
00002000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00003fd0  00 00 00 00 00 00 00 00  11 7a 0c b1 7a da 10 02  |.........z..z...|
00003fe0  3f 2a 6e 7f 80 8f f4 97  fc ce aa 58 16 9f 90 af  |?*n........X....|
00003ff0  8b b4 6d ff 57 ea d1 cb  ab 5f 46 0d db 92 c6 6e  |..m.W...._F....n|
00004000  01 01 00 00 00 00 00 00  00 00 00 01 00 00 00 24  |...............$|
00004010  00 00 00 20 00 00 00 07  76 65 72 73 69 6f 6e 00  |... ....version.|
00004020  00 00 00 08 00 00 00 01  00 00 00 00 00 00 13 88  |................|
00004030  00 00 00 20 00 00 00 20  00 00 00 04 6e 61 6d 65  |... ... ....name|
00004040  00 00 00 09 00 00 00 01  00 00 00 04 68 75 72 72  |............hurr|
00004050  00 00 00 24 00 00 00 20  00 00 00 05 73 74 61 74  |...$... ....stat|
00004060  65 00 00 00 00 00 00 08  00 00 00 01 00 00 00 00  |e...............|
00004070  00 00 00 00 00 00 00 20  00 00 00 20 00 00 00 03  |....... ... ....|
00004080  74 78 67 00 00 00 00 08  00 00 00 01 00 00 00 00  |txg.............|
00004090  00 08 8f 43 00 00 00 28  00 00 00 28 00 00 00 09  |...C...(...(....|
000040a0  70 6f 6f 6c 5f 67 75 69  64 00 00 00 00 00 00 08  |pool_guid.......|

your error is throwing about a bad label checksum which is what zhack label repair is designed to be able to rebuild but I obviously want to ensure we back up what’s there first so we can revert if need be.

# dd if=/dev/ada0 bs=1M count=32 skip=20000588890112 iflag=skip_bytes of=last32M.bin

dd: unknown iflag skip_bytes

Looking at man page:
only option for iflag is fullblock and direct

Might be a FreeBSD vs Linux thing?

Here is the output of hexdump:

# hexdump -C first32M.bin | head -n 40
00000000  fc 31 c0 8e c0 8e d8 8e  d0 bc 00 0e be 1a 7c bf  |.1............|.|
00000010  1a 06 b9 e6 01 f3 a4 e9  00 8a be 2d 06 eb 07 bb  |...........-....|
00000020  07 00 b4 0e cd 10 ac 84  c0 75 f4 eb fe 54 68 69  |.........u...Thi|
00000030  73 20 69 73 20 61 20 46  72 65 65 4e 41 53 20 64  |s is a FreeNAS d|
00000040  61 74 61 20 64 69 73 6b  20 61 6e 64 20 63 61 6e  |ata disk and can|
00000050  20 6e 6f 74 20 62 6f 6f  74 20 73 79 73 74 65 6d  | not boot system|
00000060  2e 20 20 53 79 73 74 65  6d 20 68 61 6c 74 65 64  |.  System halted|
00000070  2e 00 9d 6b bd 83 41 7f  dc 11 be 0b 00 15 60 b8  |...k..A.......`.|
00000080  4f 0f 90 90 90 90 90 90  90 90 90 90 90 90 90 90  |O...............|
00000090  90 90 90 90 90 90 90 90  90 90 90 90 90 90 90 90  |................|
*
000001b0  90 90 90 90 90 90 90 90  00 00 00 00 00 00 00 00  |................|
000001c0  02 00 ee ff ff ff 01 00  00 00 ff ff ff ff 00 00  |................|
000001d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|
00000200  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  45 46 49 20 50 41 52 54  00 00 01 00 5c 00 00 00  |EFI PART....\...|
00001010  14 44 47 ba 00 00 00 00  01 00 00 00 00 00 00 00  |.DG.............|
00001020  fe ff 0b 23 01 00 00 00  06 00 00 00 00 00 00 00  |...#............|
00001030  f9 ff 0b 23 01 00 00 00  d9 7b 92 64 ed 52 ed 11  |...#.....{.d.R..|
00001040  a0 81 94 de 80 a7 8d dd  02 00 00 00 00 00 00 00  |................|
00001050  80 00 00 00 80 00 00 00  bf de 7e 5d 00 00 00 00  |..........~]....|
00001060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002000  b5 7c 6e 51 cf 6e d6 11  8f f8 00 02 2d 09 71 2b  |.|nQ.n......-.q+|
00002010  ce d4 ad 64 ed 52 ed 11  a0 81 94 de 80 a7 8d dd  |...d.R..........|
00002020  80 00 00 00 00 00 00 00  7f 00 08 00 00 00 00 00  |................|
00002030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002080  ba 7c 6e 51 cf 6e d6 11  8f f8 00 02 2d 09 71 2b  |.|nQ.n......-.q+|
00002090  32 4f e9 64 ed 52 ed 11  a0 81 94 de 80 a7 8d dd  |2O.d.R..........|
000020a0  80 00 08 00 00 00 00 00  f9 ff 0b 23 01 00 00 00  |...........#....|
000020b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
02000000

That looks like the beginning of the drive.

EDIT: I need coffee. I keep missing key words in people’s posts. :persevere:

Without the iflag I got following error:

# dd if=/dev/ada0 bs=1M count=32 skip=20000588890112 of=last32M.bin

dd: seek offsets cannot be larger than 18446744073709551615