A TrueNAS Scale upgrade caused a RAIDZ1 pool to become unimportable despite all disks being present and readable, with every import attempt failing with "EZFS_BADDEV - devices unavailable" even though the pool metadata shows as ONLINE.

After upgrading TrueNAS Scale, my MainPool (RAIDZ1, 3x 4TB Seagate ST4000NM0035) won’t import. zpool import shows the pool as ONLINE with all devices present, but every import attempt fails with “cannot import ‘MainPool’: one or more devices is currently unavailable” (error code EZFS_BADDEV).

System Info:

  • TrueNAS Scale latest version (kernel 6.12.15)

  • Pool: MainPool, ID: 7508289598988259504

  • Configuration: RAIDZ1 with 3x 4TB drives (sdb, sdc, sdd)

  • All disks on pci-0000:03:00.0 SATA controller

What Works:

  • All disks are readable: dd if=/dev/sdb1 of=/dev/null works on all 3

  • ZFS metadata intact: zdb -l /dev/sdb1 shows valid MainPool structure

  • Dry-run succeeds: sudo zpool import -nFX MainPool completes without errors

  • Pool scan shows ONLINE: zpool import lists all 3 devices as ONLINE

What Fails:$ sudo zpool import MainPoolcannot import ‘MainPool’: one or more devices is currently unavailable
$ sudo zpool import -FX MainPoolcannot import ‘MainPool’: one or more devices is currently unavailable
$ sudo midclt call pool.import_pool ‘{“guid”: “7508289598988259504”}’[EZFS_BADDEV] Failed to import ‘MainPool’ pool

What I’ve Tried:

  • Cleared /etc/zfs/zpool.cache multiple times

  • Tried on both upgraded and rolled-back boot environments

  • Used recovery flags: -F, -FX, -f, -m

  • Tried different device paths: /dev, /dev/disk/by-id, /dev/disk/by-path

  • Set zfs_recover=1

  • Used TrueNAS midclt commands

  • Multiple reboots

Additional Context:

  • One disk (sdc1, guid: 12799823983626328722) shows degraded: 1, aux_state: 'err_exceeded' in zdb output but is still readable

  • This started immediately after the upgrade - pool was working fine before

  • TrueNAS GUI shows “3.64 TiB HDD x3 (MainPool)” as exported/unassigned disks

  • GUI Import Pool dialog shows “No options”

    Key Question: Why would zpool import -nFX succeed (dry-run) but actual import fail with EZFS_BADDEV when all devices are present and readable? Is this a known bug in recent TrueNAS Scale versions?

    Data recovery is critical. Any help would be greatly appreciated!

1 Like

Please show us the zdb -l output from all 3 disks. We have seen import failures caused by transaction differences between the member disks. Sometimes this can be over-come by importing using the common number.

Also, please describe how the disks are connected to the server.

Is this a virtual TrueNAS CE / SCALE server?

3 Likes

I apologize for the confusion in my initial post. After restoring TrueNAS and reconnecting drives,

admin@homelab[~]$ sudo zdb -l /dev/sda1

LABEL 0

version: 5000
name: 'MainPool'
state: 0
txg: 9248975
pool_guid: 7508289598988259504
errata: 0
hostid: 1145057306
hostname: 'homelab'
top_guid: 2952053271173347707
guid: 12821753558201250095
vdev_children: 1
vdev_tree:
    type: 'raidz'
    id: 0
    guid: 2952053271173347707
    nparity: 1
    metaslab_array: 134
    metaslab_shift: 34
    ashift: 12
    asize: 12002338013184
    is_log: 0
    create_txg: 4
    children[0]:
        type: 'disk'
        id: 0
        guid: 13599292403454264645
        path: '/dev/disk/by-partuuid/51880de8-d9e5-4da9-a0d7-3be42262df32'
        whole_disk: 0
        DTL: 80376
        create_txg: 4
    children[1]:
        type: 'disk'
        id: 1
        guid: 12821753558201250095
        path: '/dev/disk/by-partuuid/026760de-f155-4343-ba66-16c076560dd4'
        whole_disk: 0
        DTL: 80375
        create_txg: 4
    children[2]:
        type: 'disk'
        id: 2
        guid: 12799823983626328722
        path: '/dev/disk/by-partuuid/04d45059-b38c-40b7-910f-1a8c91e6b7cd'
        whole_disk: 0
        DTL: 80374
        create_txg: 4
        degraded: 1
        aux_state: 'err_exceeded'
features_for_read:
    com.delphix:hole_birth
    com.delphix:embedded_data
    com.klarasystems:vdev_zaps_v2
labels = 0 1 2 3 

admin@homelab[~]$ sudo zdb -l /dev/sdb1

failed to unpack label 0
failed to unpack label 1
failed to unpack label 2
failed to unpack label 3

admin@homelab[~]$ sudo zdb -l /dev/sdd1


LABEL 0

version: 5000
name: 'MainPool'
state: 0
txg: 9272794
pool_guid: 7508289598988259504
errata: 0
hostid: 1145057306
hostname: 'homelab'
top_guid: 2952053271173347707
guid: 12799823983626328722
vdev_children: 1
vdev_tree:
    type: 'raidz'
    id: 0
    guid: 2952053271173347707
    nparity: 1
    metaslab_array: 134
    metaslab_shift: 34
    ashift: 12
    asize: 12002338013184
    is_log: 0
    create_txg: 4
    children[0]:
        type: 'disk'
        id: 0
        guid: 13599292403454264645
        path: '/dev/disk/by-partuuid/51880de8-d9e5-4da9-a0d7-3be42262df32'
        whole_disk: 0
        DTL: 80376
        create_txg: 4
    children[1]:
        type: 'disk'
        id: 1
        guid: 12821753558201250095
        path: '/dev/disk/by-partuuid/026760de-f155-4343-ba66-16c076560dd4'
        whole_disk: 0
        DTL: 80375
        create_txg: 4
    children[2]:
        type: 'disk'
        id: 2
        guid: 12799823983626328722
        path: '/dev/disk/by-partuuid/04d45059-b38c-40b7-910f-1a8c91e6b7cd'
        whole_disk: 0
        DTL: 80374
        create_txg: 4
        degraded: 1
        aux_state: 'err_exceeded'
features_for_read:
    com.delphix:hole_birth
    com.delphix:embedded_data
    com.klarasystems:vdev_zaps_v2
labels = 0 1 2 3 

All Data:

==========================================
MainPool - zdb -l Output for All 3 Disks
==========================================

Physical Drive Mapping:
sda = Serial ZC19EG3P (PARTUUID 026760de...)
sdb = Serial ZC1A4Z3P (PARTUUID 51880de8...)
sdd = Serial ZC19B0LL (PARTUUID 04d45059...)

==========================================
=== sda1 (026760de...) ===
==========================================
------------------------------------
LABEL 0 
------------------------------------
    version: 5000
    name: 'MainPool'
    state: 0
    txg: 9248975
    pool_guid: 7508289598988259504
    errata: 0
    hostid: 1145057306
    hostname: 'homelab'
    top_guid: 2952053271173347707
    guid: 12821753558201250095
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 2952053271173347707
        nparity: 1
        metaslab_array: 134
        metaslab_shift: 34
        ashift: 12
        asize: 12002338013184
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 13599292403454264645
            path: '/dev/disk/by-partuuid/51880de8-d9e5-4da9-a0d7-3be42262df32'
            whole_disk: 0
            DTL: 80376
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 12821753558201250095
            path: '/dev/disk/by-partuuid/026760de-f155-4343-ba66-16c076560dd4'
            whole_disk: 0
            DTL: 80375
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 12799823983626328722
            path: '/dev/disk/by-partuuid/04d45059-b38c-40b7-910f-1a8c91e6b7cd'
            whole_disk: 0
            DTL: 80374
            create_txg: 4
            degraded: 1
            aux_state: 'err_exceeded'
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
        com.klarasystems:vdev_zaps_v2
    labels = 0 1 2 3 

==========================================
=== sdb1 (51880de8...) ===
==========================================
failed to unpack label 0
failed to unpack label 1
failed to unpack label 2
failed to unpack label 3

==========================================
=== sdd1 (04d45059...) ===
==========================================
------------------------------------
LABEL 0 
------------------------------------
    version: 5000
    name: 'MainPool'
    state: 0
    txg: 9272794
    pool_guid: 7508289598988259504
    errata: 0
    hostid: 1145057306
    hostname: 'homelab'
    top_guid: 2952053271173347707
    guid: 12799823983626328722
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 2952053271173347707
        nparity: 1
        metaslab_array: 134
        metaslab_shift: 34
        ashift: 12
        asize: 12002338013184
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 13599292403454264645
            path: '/dev/disk/by-partuuid/51880de8-d9e5-4da9-a0d7-3be42262df32'
            whole_disk: 0
            DTL: 80376
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 12821753558201250095
            path: '/dev/disk/by-partuuid/026760de-f155-4343-ba66-16c076560dd4'
            whole_disk: 0
            DTL: 80375
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 12799823983626328722
            path: '/dev/disk/by-partuuid/04d45059-b38c-40b7-910f-1a8c91e6b7cd'
            whole_disk: 0
            DTL: 80374
            create_txg: 4
            degraded: 1
            aux_state: 'err_exceeded'
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
        com.klarasystems:vdev_zaps_v2
    labels = 0 1 2 3 

==========================================
Current Pool Import Status:
==========================================
   pool: TmpOnly
     id: 13138442689823688648
  state: FAULTED
status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-72
 config:

        TmpOnly                                 FAULTED  corrupted data
          5d3bbb27-e36e-4660-b9f4-88628bc46961  ONLINE

   pool: MainPool
     id: 7508289598988259504
  state: FAULTED
status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
 config:

        MainPool                                  FAULTED  corrupted data
          raidz1-0                                DEGRADED
            51880de8-d9e5-4da9-a0d7-3be42262df32  UNAVAIL  invalid label
            026760de-f155-4343-ba66-16c076560dd4  ONLINE
            04d45059-b38c-40b7-910f-1a8c91e6b7cd  ONLINE
admin@homelab[~]$ 

This is likely the problems, (meaning more than 1 disk problem on RAID-Z1 prevents import):

sda1 (026760de...)
    txg: 9248975

sdb1 (51880de8...)
failed to unpack label 0
failed to unpack label 1
failed to unpack label 2
failed to unpack label 3

sdd1 (04d45059...)
    txg: 9272794

With a failed disk, AND the TXG differences between the only 2 surviving disks, the pool will not import.

Generally their are 2 causes for TXG differences:

  1. Incorrect virtualization of TrueNAS. You still have not answered if this instance of TrueNAS is virtualized.
  2. Hardware RAID, (or USB attached storage using poorly chosen chassis). Again you have not answered how the disks are wired to the server.

As for the fix, well, there are 23,819 ZFS write transactions difference between the 2 surviving disks. More than say 16 or 32 is bad. Over 10,000 could be fatal.

You could try:

zpool import -o readonly=on -fT 9248975 MainPool

But, be very clear, you are throwing out 23,819 writes and it is likely the pool will be corrupt. I’ve included the Read Only option just in case.

2 Likes
  1. TrueNAS is running on bare metal, not in a virtual machine.
    It is installed directly on an ODROID H4 Ultra.
  2. All disks are connected directly to the ODROID (direct SATA connections). There is no hardware RAID controller, no HBA, and no USB storage involved — the disks are presented directly to the OS.
1 Like

I’m not saying that the hardware itself is inherently bad or at fault, but from my perspective, this is not a particularly reliable platform for a NAS (some information online suggests that there may occasionally be issues with this platform in terms of drive handling and detection).

While the disks may be directly connected to the ODROID H4 Ultra, this is not the same as being attached to a native chipset SATA controller. AFAIK, the H4 uses a PCIe-to-SATA bridge (ASM1064 or similar…, somtimes poorly cooled…), which means all SATA ports share a single PCIe lane. This can lead to bandwidth limitations and potential I/O instability, unlike a standard motherboard SATA controller where each port is typically fully managed by the chipset.

Of course, it remains a question why it apparently worked before and only started showing problems after an upgrade.

1 Like

Then I don’t know what caused the TXGs to vary dramatically between 2 RAID-Z1 pool disks. You could try the command I listed…

Thanks everyone.

Is there a recommended tool to safely rebuild the pool? Or software that can restore the files, since I know the data exists but the pool configuration is the problem.

Thanks.

Otherwise known as a “SATA controller”. The ASM1064 is old and an ASM11x4 would be better but should be at least OK-ish.
It could be a RAM issue with the N305.
Or any of the above on top of a drive failure…

Can you show

lsblk -bo NAME,LABEL,MAJ:MIN,TRAN,ROTA,ZONED,VENDOR,MODEL,SERIAL,PARTUUID,START,SIZE,PARTTYPENAME

to check that it’s not a partition issue with sdb (ZFS data not in sdb1?).

Unless we can ressurect sdb, I’m afraid there’s no safe way.
If you can import read-only by discarding 24k txgs, backup everything to another drive.

Klennet ZFS Recovery
Runs from Windows. Scanning is free, but if it can recover files you’d need to purchase the $299 license.

Thanks , im testing UFS Explorer RAID Recovery , i did quick scan and i can see my exacts folders and data , Hopefully can recover

2 Likes

Please report the results either way.
At 130 € and also running on macOS and Linux, UFS Explorer RAID Recovery could be a valuable alternative to Klennet.

2 Likes

From what I’m seeing, Klennet does not show me the exact folder structure, while UFS Explorer RAID Recovery shows it correctly. I am still waiting for the scan to complete.

1 Like