Pool won't Import

I’ve been having problems with my main pool called FusionPool. It has 2 spinning vDevs both z2 and a mirrored metadata pool. It appeared one of the nvme metadata drives was missing so I replaced, I thought it finished resilvering so I planned to upgrade the 2nd one as well. When I rebooted, the FusionPool was listed exported so through the GUI I exported the pool and tried to re-import it, which worked in the past. When I try to import via the GUI I get this message:

I have adequate disks available, with only one of the mirrored nvme drives missing.

When I go to manually import the pool via the shell I get

When I run zpool import I get the following output:

root@truenas[~]# zpool import               
  pool: FusionPool
    id: 7105278952023681001
 state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
config:

        FusionPool                                FAULTED  corrupted data
          raidz2-0                                ONLINE
            f9a95cd1-86c4-4c2d-bef5-01e020f29abd  ONLINE
            a88e61ec-82c2-4255-92b0-298de1335798  ONLINE
            bebea15b-3341-4c5a-8cb5-2a04b726e64e  ONLINE
            8d3a4f69-a526-4bf0-ae6c-efc851d43bfe  ONLINE
            5b712775-ed2d-46cf-81cb-1936a6f27936  ONLINE
            a34b923d-b7ff-4aca-98e9-ef7ef22d28c7  ONLINE
          mirror-1                                DEGRADED
            840bed05-be73-4e33-95bf-7d0374c1a70e  ONLINE
            a2b880e0-d3a5-4d15-99c0-bec7e5436d12  UNAVAIL
          raidz2-2                                ONLINE
            ef1bf023-d6db-438a-b9d1-7d2bdbb2ed83  ONLINE
            8e9793b2-8d5f-4fcb-9aef-2012536184e8  ONLINE
            63599f5d-7f6a-4f42-99ce-64517c6984ef  ONLINE
            4cb91005-5dc1-453e-942f-5ad6ecdddefe  ONLINE

Any suggestions on how I can try to revive this pool?

Do you mean “metadata vdev”, which are generically referred to as “special vdevs”?

From your zpool import output, mirror-1 is a data vdev, not a special vdev.

It was intended to be a special vdev, when I originally set it up. Either way I can’t reimport now.

:warning: You’re very close to losing all your data. I hope you have backups. I also suggest redoing this pool if possible.

Because a device is missing from a vdev, you’ll need to force an import of a degraded pool. From here you can resilver the degraded mirror-1 vdev.

The command to force an import of a degraded pool would be:

zpool import -f -R /mnt FusionPool

Use at your own risk.


How new is this pool? How recently did you add the mirror? Do you have a recent backup?

1 Like

The force flag is no required to import a pool with a degraded vdev.

The I/O error is the problem.

Is it expected that one of the NVMe is offline? It sounded like you had completed a resilver already.

2 Likes

NO WAY! Are you THE Allan Jude?

1 Like

The critical data is backed up, but media that makes up a majority isn’t.
I thought I had added one of the 1tb nvmes to the pool, it even noted that vdev was mixed size but it’s not showing as part of the pool now.
What’s are the possible causes of the io error?

Are all devices available?

Without trying to import the pool, what is the output of this?

lsblk -o NAME,PARTUUID,SIZE,MODEL,PTTYPE,TYPE,PARTTYPENAME

I had same problem the last week I thought I had lost all my data.
with a help of my brother in law and under shell we could recovery all the data mounting the pool and copy all the dataset to anothter pool in my system
we use this commands

to mount the pool: (name of my pool was BACKUP) (4910952484766769216 is my disk id)
command:
sudo zpool import -R /mnt -o readonly=on -d /dev/disk/by-uuid/4910952484766769216 -f BACKUP

to copy to another pool in my system:
command:
sudo rsync -av --info=progress2 --no-owner --no-group /mnt/BACKUP/Backup2 /mnt/SharedFiles/SharedFiles/

my pools were called: BACKUP and SharedFiles
my datasets were called: Backup2 and SharedFiles

good luck!!!

root@truenas[~]# lsblk -o NAME,PARTUUID,SIZE,MODEL,PTTYPE,TYPE,PARTTYPENAME  
NAME        PARTUUID                               SIZE MODEL                 PTTYPE TYPE PARTTYPENAME
sda                                                1.9T T-FORCE 2TB           gpt    disk 
├─sda1      50bc92cb-8159-43a8-b735-b045ae1c92ef     2G                       gpt    part Linux swap
└─sda2      92468ff4-3bfc-4a28-b6fa-9b4c935eb7eb   1.9T                       gpt    part Solaris /usr & Apple ZFS
sdb                                                1.9T T-FORCE 2TB           gpt    disk 
├─sdb1      1debc02a-279e-46f1-ab58-9729f42b1e46     2G                       gpt    part Linux swap
└─sdb2      1ade89c1-67aa-468b-8a3e-7c12fa4efefc   1.9T                       gpt    part Solaris /usr & Apple ZFS
sdc                                              119.2G Lexar SSD NS100 128GB gpt    disk 
├─sdc1      cc01cb61-b193-4e3e-9266-6bee0acd2a2c     1M                       gpt    part BIOS boot
├─sdc2      c53e05fc-7a94-458a-b135-bc66829039f5   512M                       gpt    part EFI System
└─sdc3      17babc35-b377-4f72-9bec-f1c836dfc05f 118.7G                       gpt    part Solaris /usr & Apple ZFS
sdd                                               14.6T ST16000NM001J-2TW113  gpt    disk 
├─sdd1      f0a6c8ae-2175-401e-9b6f-103b6e529b8f     2G                       gpt    part Linux swap
└─sdd2      a88e61ec-82c2-4255-92b0-298de1335798  14.6T                       gpt    part Solaris /usr & Apple ZFS
sde                                               12.7T ST14000NM001G-2KJ103  gpt    disk 
├─sde1      f562288d-4b5a-43bf-b8f1-8e6dbe553d44     2G                       gpt    part Linux swap
└─sde2      8e9793b2-8d5f-4fcb-9aef-2012536184e8  12.7T                       gpt    part Solaris /usr & Apple ZFS
sdf                                               14.6T ST16000NM000J-2TW103  gpt    disk 
├─sdf1      f5d62139-9472-4669-a2c5-cac8634a6a1b     2G                       gpt    part Linux swap
└─sdf2      8d3a4f69-a526-4bf0-ae6c-efc851d43bfe  14.6T                       gpt    part Solaris /usr & Apple ZFS
sdg                                               14.6T ST16000NM000J-2TW103  gpt    disk 
├─sdg1      8855cc9f-06d6-45f5-b241-23eb8dc25b58     2G                       gpt    part Linux swap
└─sdg2      bebea15b-3341-4c5a-8cb5-2a04b726e64e  14.6T                       gpt    part Solaris /usr & Apple ZFS
sdh                                               14.6T ST16000NM018J-2WT103  gpt    disk 
└─sdh1      a34b923d-b7ff-4aca-98e9-ef7ef22d28c7  14.6T                       gpt    part Solaris /usr & Apple ZFS
sdi                                               14.6T ST16000NM018J-2WT103  gpt    disk 
└─sdi1      5b712775-ed2d-46cf-81cb-1936a6f27936  14.6T                       gpt    part Solaris /usr & Apple ZFS
sdj                                               16.4T WDC WD180EDGZ-11B2DA0 gpt    disk 
└─sdj1      f9a95cd1-86c4-4c2d-bef5-01e020f29abd  16.4T                       gpt    part Solaris /usr & Apple ZFS
sdk                                               12.7T ST14000NM001G-2KJ103  gpt    disk 
├─sdk1      f7c8e660-cfb3-465a-9532-aca31a50adf6     2G                       gpt    part Linux swap
└─sdk2      ef1bf023-d6db-438a-b9d1-7d2bdbb2ed83  12.7T                       gpt    part Solaris /usr & Apple ZFS
sdl                                               12.7T ST14000NM000J-2TX103  gpt    disk 
├─sdl1      9e189ff1-4c44-44d3-819e-5a9a574090fc     2G                       gpt    part Linux swap
└─sdl2      4cb91005-5dc1-453e-942f-5ad6ecdddefe  12.7T                       gpt    part Solaris /usr & Apple ZFS
sdm                                              119.2G Lexar SSD NS100 128GB gpt    disk 
├─sdm1      ed0395f1-c434-446a-8869-aa02bd87d44e     1M                       gpt    part BIOS boot
├─sdm2      787bbefb-67a7-489e-897e-b190fb627c8e   512M                       gpt    part EFI System
└─sdm3      d1155f5d-72ec-4e22-a163-b1b1c3dbeb1f 118.7G                       gpt    part Solaris /usr & Apple ZFS
sdn                                               12.7T ST14000NM001G-2KJ103  gpt    disk 
├─sdn1      59f872e2-c3c3-4694-ab84-22052c2d5509     2G                       gpt    part Linux swap
└─sdn2      63599f5d-7f6a-4f42-99ce-64517c6984ef  12.7T                       gpt    part Solaris /usr & Apple ZFS
zd0                                              907.7M                              disk 
zd16                                               5.8M                              disk 
zd32                                               5.9G                       gpt    disk 
nvme4n1                                          476.9G PCIe SSD              gpt    disk 
└─nvme4n1p1 840bed05-be73-4e33-95bf-7d0374c1a70e 476.9G                       gpt    part Solaris /usr & Apple ZFS
nvme0n1                                            1.8T CT2000T500SSD8        gpt    disk 
└─nvme0n1p1 11bf8b83-7c5b-4e9e-bd38-5eb1fa10b886   1.8T                       gpt    part Solaris /usr & Apple ZFS
nvme1n1                                          931.5G KINGSTON SNV3S1000G          disk 
nvme2n1                                          931.5G KINGSTON SNV3S1000G   gpt    disk 
└─nvme2n1p1 d7411367-c719-4bfc-b938-477518f30b4e 476.9G                       gpt    part Solaris /usr & Apple ZFS
nvme3n1                                            1.8T CT2000P3PSSD8         gpt    disk 
└─nvme3n1p1 903f4483-9450-415a-9be7-584ca7c8bfe1   1.8T                       gpt    part Solaris /usr & Apple ZFS

When I originally set up this pool, it was a hybrid. I had 4 14TB in z2 and the mirrored nvme were designated for metadata via the wizard. These were set up about 5 or 6 years ago.


Prior to pool expansion, I added another vdev with 4 16TB drives. Later with ZFS expansion, I added 2 spare drives to the 4x16 vdev. Haven’t had any issues prior to this week. It started with rotating 3 uncorrectable errors, on startup, but the drive kept changing and pool scrubs didn’t indicate any issue so I ignored. If that could be creating the I/O I can disconnect that drive.

Errors reported where? From SMART or ZFS? Was it for a spinner or the NVMe’s?


Out of curiosity, what does this show. It requires sudo/root privileges:

for dev in {f9a95cd1-86c4-4c2d-bef5-01e020f29abd,\
a88e61ec-82c2-4255-92b0-298de1335798,\
bebea15b-3341-4c5a-8cb5-2a04b726e64e,\
8d3a4f69-a526-4bf0-ae6c-efc851d43bfe,\
5b712775-ed2d-46cf-81cb-1936a6f27936,\
a34b923d-b7ff-4aca-98e9-ef7ef22d28c7,\
840bed05-be73-4e33-95bf-7d0374c1a70e,\
ef1bf023-d6db-438a-b9d1-7d2bdbb2ed83,\
8e9793b2-8d5f-4fcb-9aef-2012536184e8,\
63599f5d-7f6a-4f42-99ce-64517c6984ef,\
4cb91005-5dc1-453e-942f-5ad6ecdddefe,\
};
do zdb -l /dev/disk/by-partuuid/$dev | grep -E ^"    "txg;
done

I only tested this syntax on Core (bash). It should be able to run as a single line. Adjust accordingly.


Your output makes it look like you added 2 NVME’s as a data vdev in a two-way mirror.


To verify, you tried this command which results in the same I/O error?

zpool import -f -R /mnt FusionPool
1 Like

First, thank you for your help on a Saturday, I really appreciate it. I’m trying to download all my critical data from a cloud backup to a local external drive in case I can’t reimport this pool. I don’t keep backups of the media but it’s replaceable.

root@truenas[~]# zpool import -f -R /mnt FusionPool  
cannot import 'FusionPool': I/O error
        Destroy and re-create the pool from
        a backup source.

The error is for a spinning disk, it shows up in alerts on reboot


If you think it’s related to that I/O error I can try disconnecting that drive and importing the pool. I shuffled all the nvme drives but that didn’t resolve the I/O error.

I didn’t know what to replace your commands with so I ran it exactly, this is was the output

    txg: 23803623

    txg: 23803623
    txg: 23803623
    txg: 23803623
    txg: 23803623
    txg: 23803623
    txg: 23720432
    txg: 23803623
    txg: 23803623
    txg: 23803623
    txg: 23803623
1 Like

No wonder. The TXG for your remaining NVMe is lagging by a lot. I don’t believe you can do an emergency import at this point. It’s not just a few TXG… it’s lagging by over 80,000!

That’s so ridiculous that it seems impossible. I almost want to think that it’s the TXG for a different pool.

Let us confirm. Post the full output of this:

zdb -l /dev/disk/by-partuuid/840bed05-be73-4e33-95bf-7d0374c1a70e
root@truenas[~]# zdb -l /dev/disk/by-partuuid/840bed05-be73-4e33-95bf-7d0374c1a70e
------------------------------------
LABEL 0 
------------------------------------
    version: 5000
    name: 'FusionPool'
    state: 0
    txg: 23720432
    pool_guid: 7105278952023681001
    errata: 0
    hostid: 785381139
    hostname: 'truenas'
    top_guid: 6333384889662361424
    guid: 10719641310859050811
    vdev_children: 3
    vdev_tree:
        type: 'mirror'
        id: 1
        guid: 6333384889662361424
        metaslab_array: 256
        metaslab_shift: 32
        ashift: 12
        asize: 512105381888
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 10719641310859050811
            path: '/dev/disk/by-partuuid/840bed05-be73-4e33-95bf-7d0374c1a70e'
            DTL: 396
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 6951971175218986677
            path: '/dev/disk/by-partuuid/a2b880e0-d3a5-4d15-99c0-bec7e5436d12'
            DTL: 395
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
        com.klarasystems:vdev_zaps_v2
        org.openzfs:raidz_expansion
    labels = 0 1 2 3 

I hate to say it, but it looks like you cannot import your pool, even if you tried an emergency import with -F.

A TXG that is lagged by over 80,000 is ridiculous.

I don’t know what happened to your other NVMe or how you ended up like this. Where is the other NVMe? What do you mean by “shuffling” your NVMe’s around?

@HoneyBadger: Is there any hope?

Probably more details than you care to know but I started to move TN into a Proxmox environment as a VM. I put 2, 2TB nvme drives for VM storage but I never figured out NFS shares between Proxmox and TN so shelved that idea. Going back to bare metal TN, I put those drives into the first 2 nvme slots but didn’t use that pool for anything. One nvme drive for the FusionPool was in the 3rd nvme slot and other was connected via pcie card. I removed the larger nvme drives and connected the original FusionPool nvme drive directly to the MB along with the 2 new nvme drives I planned to migrate 2. I can try moving the nvme drive from the pool to the card and rerun the command that checked for the delay.

Maybe I didn’t seat the nvme drive properly. I’ll move it around again and try to reimport. I will rerun and post those commands again if I can’t import. Once all the critical data is downloaded and tested, probably a couple of days. I will recreate the pool.

1 Like

Output looks the same to me. I put the old nvme drive back in but I removed it from the pool so I doubt it be recognized. Guess I’m out of options and will create the pool next week.

root@truenas[~]# for dev in {f9a95cd1-86c4-4c2d-bef5-01e020f29abd,\
a88e61ec-82c2-4255-92b0-298de1335798,\
bebea15b-3341-4c5a-8cb5-2a04b726e64e,\
8d3a4f69-a526-4bf0-ae6c-efc851d43bfe,\
5b712775-ed2d-46cf-81cb-1936a6f27936,\
a34b923d-b7ff-4aca-98e9-ef7ef22d28c7,\
840bed05-be73-4e33-95bf-7d0374c1a70e,\
ef1bf023-d6db-438a-b9d1-7d2bdbb2ed83,\
8e9793b2-8d5f-4fcb-9aef-2012536184e8,\
63599f5d-7f6a-4f42-99ce-64517c6984ef,\
4cb91005-5dc1-453e-942f-5ad6ecdddefe,\
};
do zdb -l /dev/disk/by-partuuid/$dev | grep -E ^"    "txg;
done
    txg: 23803623
    txg: 23803623
    txg: 23803623
    txg: 23803623
    txg: 23803623
    txg: 23803623
    txg: 23720432
    txg: 23803623
    txg: 23803623
    txg: 23803623
    txg: 23803623
root@truenas[~]# zdb -l /dev/disk/by-partuuid/840bed05-be73-4e33-95bf-7d0374c1a70e
------------------------------------
LABEL 0 
------------------------------------
    version: 5000
    name: 'FusionPool'
    state: 0
    txg: 23720432
    pool_guid: 7105278952023681001
    errata: 0
    hostid: 785381139
    hostname: 'truenas'
    top_guid: 6333384889662361424
    guid: 10719641310859050811
    vdev_children: 3
    vdev_tree:
        type: 'mirror'
        id: 1
        guid: 6333384889662361424
        metaslab_array: 256
        metaslab_shift: 32
        ashift: 12
        asize: 512105381888
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 10719641310859050811
            path: '/dev/disk/by-partuuid/840bed05-be73-4e33-95bf-7d0374c1a70e'
            DTL: 396
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 6951971175218986677
            path: '/dev/disk/by-partuuid/a2b880e0-d3a5-4d15-99c0-bec7e5436d12'
            DTL: 395
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
        com.klarasystems:vdev_zaps_v2
        org.openzfs:raidz_expansion
    labels = 0 1 2 3 

When did you remove the other NVMe from the pool? Did you do that in the GUI intentionally? Is that when you “replaced” an NVMe? What happened to the replacement?