I solved the following problem occcured after few restarts of my TrueNAS SCALE 22.04.2.
My pool “data” was not imported after reboot. It was visible in “Storage dashboard” menu, but disks were not attached to it. When I pressed “Disks” button in “Storage dashboard”, disks of pool were visible and column “Pool” contained “data (Exported)” so zfs knows disks belong to pool “data”. Disks are “/dev/sdb” to “/dev/sdf”.
When I tried to “Import pool” from “Storage dashboard”, the list of available pools to import was empty.
I tried from command line:
root@truenas[/etc/zfs]# zpool status data
cannot open 'data': no such pool
so no information about pool available.
I tried
root@truenas[/etc/zfs]# zpool import -d /dev/disk/by-id/
pool: data
id: 12627903007845579206
state: UNAVAIL
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
config:
data UNAVAIL insufficient replicas
raidz2-0 UNAVAIL insufficient replicas
wwn-0x50014ee21666dd5a UNAVAIL invalid label
wwn-0x5000cca0b0ccd04a UNAVAIL invalid label
ata-WDC_WD60EFPX-68C5ZN0_WD-WX62D2409J6S UNAVAIL invalid label
wwn-0x50014ee26b9c3a4a UNAVAIL invalid label
so information about pool data is stored on disks somehow.
I tried to get information about zfs metadata on disk wwn-0x50014ee21666dd5a
root@truenas[/etc/zfs]# zdb -l /dev/disk/by-id/wwn-0x50014ee21666dd5a
failed to unpack label 0
failed to unpack label 1
------------------------------------
LABEL 2 (Bad label cksum)
------------------------------------
version: 5000
name: 'data'
state: 0
txg: 2124991
pool_guid: 12627903007845579206
errata: 0
hostid: 1601898500
hostname: 'truenas'
top_guid: 8880454056440280913
guid: 8183342154247264152
vdev_children: 1
vdev_tree:
type: 'raidz'
id: 0
guid: 8880454056440280913
nparity: 2
metaslab_array: 256
metaslab_shift: 34
ashift: 12
asize: 23996082946048
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 8183342154247264152
path: '/dev/disk/by-partuuid/5b51fea1-2f6e-460d-987e-1af4ee5d2ff6'
whole_disk: 0
DTL: 99
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 3311350377944452613
path: '/dev/disk/by-partuuid/6b80f859-f6c0-4ea9-a291-cd07c1a88210'
whole_disk: 0
DTL: 921
create_txg: 4
children[2]:
type: 'disk'
id: 2
guid: 3183920643935737096
path: '/dev/disk/by-partuuid/e23335a0-1e3c-4864-bad3-88be84b3c59f'
whole_disk: 0
DTL: 114
create_txg: 4
children[3]:
type: 'disk'
id: 3
guid: 12828138325581573469
path: '/dev/disk/by-partuuid/c307b771-367d-4d25-aec5-7742646bb403'
whole_disk: 0
DTL: 113
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
com.klarasystems:vdev_zaps_v2
labels = 2 3
From the result can be seen that labels stored on disk are incorrect, but information about pool are stored there. Similar situation on all other disks.
Solution
I have backups of important data, so I did not created copies of all disks and tried to be more offensive.
Tried
fdisk -l /dev/sdf
The primary GPT table is corrupt, but the backup appears OK, so that will be used.
Disk /dev/sdf: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Disk model: WDC WD102KFBX-68
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 9085DF32-2965-4808-9704-79E906A9F608
Device Start End Sectors Size Type
/dev/sdf1 2048 4196352 4194305 2G Linux swap
/dev/sdf2 4198400 19532873694 19528675295 9.1T Solaris /usr & Apple ZFS
partition table is corrupt but backup is correct.
Tried to backup partition table for case of problems with recovery
root@truenas[/home/admin]# sgdisk --backup=gpt_backup-sdf.sgdisk /dev/sdf
Caution! After loading partitions, the CRC doesn't check out!
Warning! Main partition table CRC mismatch! Loaded backup partition table
instead of main partition table!
Warning! One or more CRCs don't match. You should repair the disk!
Main header: OK
Backup header: OK
Main partition table: ERROR
Backup partition table: OK
****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
The operation has completed successfully.
Tried to restore backup of GPT
root@truenas[/home/admin]# gdisk /dev/sdf
GPT fdisk (gdisk) version 1.0.9
Caution! After loading partitions, the CRC doesn't check out!
Warning! Main partition table CRC mismatch! Loaded backup partition table
instead of main partition table!
Warning! One or more CRCs don't match. You should repair the disk!
Main header: OK
Backup header: OK
Main partition table: ERROR
Backup partition table: OK
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: damaged
****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
Command (? for help): w
Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!
Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT) to /dev/sdf.
The operation has completed successfully.
And tested if partition table is OK
root@truenas[/home/admin]# fdisk -l /dev/sdf
Disk /dev/sdf: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Disk model: WDC WD102KFBX-68
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 9085DF32-2965-4808-9704-79E906A9F608
Device Start End Sectors Size Type
/dev/sdf1 2048 4196352 4194305 2G Linux swap
So disk is OK now.
Testing of zfs metadata showed that problems with labels still occurs
root@truenas[/home/admin]# zdb -l /dev/sdf
failed to unpack label 0
failed to unpack label 1
------------------------------------
LABEL 2 (Bad label cksum)
------------------------------------
version: 5000
name: 'data'
state: 0
txg: 2124991
pool_guid: 12627903007845579206
errata: 0
hostid: 1601898500
hostname: 'truenas'
top_guid: 8880454056440280913
guid: 3311350377944452613
vdev_children: 1
vdev_tree:
type: 'raidz'
id: 0
guid: 8880454056440280913
nparity: 2
metaslab_array: 256
metaslab_shift: 34
ashift: 12
asize: 23996082946048
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 8183342154247264152
path: '/dev/disk/by-partuuid/5b51fea1-2f6e-460d-987e-1af4ee5d2ff6'
whole_disk: 0
DTL: 99
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 3311350377944452613
path: '/dev/disk/by-partuuid/6b80f859-f6c0-4ea9-a291-cd07c1a88210'
whole_disk: 0
DTL: 921
create_txg: 4
children[2]:
type: 'disk'
id: 2
guid: 3183920643935737096
path: '/dev/disk/by-partuuid/e23335a0-1e3c-4864-bad3-88be84b3c59f'
whole_disk: 0
DTL: 114
create_txg: 4
children[3]:
type: 'disk'
id: 3
guid: 12828138325581573469
path: '/dev/disk/by-partuuid/c307b771-367d-4d25-aec5-7742646bb403'
whole_disk: 0
DTL: 113
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
com.klarasystems:vdev_zaps_v2
labels = 2
failed to unpack label 3
So I fixed GPT on all other disks sdc, sdd, and sdf using the same steps.
Then I tried to import pool again
root@truenas[/home/admin]# zpool import data
root@truenas[/]# zpool status
pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:11 with 0 errors on Sat Dec 14 03:45:12 2024
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
sda3 ONLINE 0 0 0
errors: No known data errors
pool: data
state: ONLINE
scan: scrub in progress since Thu Dec 19 16:34:07 2024
655G / 10.9T scanned at 4.09G/s, 0B / 10.9T issued
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
5b51fea1-2f6e-460d-987e-1af4ee5d2ff6 ONLINE 0 0 0
6b80f859-f6c0-4ea9-a291-cd07c1a88210 ONLINE 0 0 0
e23335a0-1e3c-4864-bad3-88be84b3c59f ONLINE 0 0 0
c307b771-367d-4d25-aec5-7742646bb403 ONLINE 0 0 0
errors: No known data errors
and my pool data is back again.
Cause of problem
It seems that the cause of problem was corrupted GPT on disks. In “Storage dashboard” everythink seemed that it should work and there is no reason for pool import, but it must be done manually.
I hope this troubleshooting can help someone.