I’ve been having problems with my main pool called FusionPool. It has 2 spinning vDevs both z2 and a mirrored metadata pool. It appeared one of the nvme metadata drives was missing so I replaced, I thought it finished resilvering so I planned to upgrade the 2nd one as well. When I rebooted, the FusionPool was listed exported so through the GUI I exported the pool and tried to re-import it, which worked in the past. When I try to import via the GUI I get this message:
I have adequate disks available, with only one of the mirrored nvme drives missing.
When I go to manually import the pool via the shell I get
When I run zpool import I get the following output:
root@truenas[~]# zpool import
pool: FusionPool
id: 7105278952023681001
state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
config:
FusionPool FAULTED corrupted data
raidz2-0 ONLINE
f9a95cd1-86c4-4c2d-bef5-01e020f29abd ONLINE
a88e61ec-82c2-4255-92b0-298de1335798 ONLINE
bebea15b-3341-4c5a-8cb5-2a04b726e64e ONLINE
8d3a4f69-a526-4bf0-ae6c-efc851d43bfe ONLINE
5b712775-ed2d-46cf-81cb-1936a6f27936 ONLINE
a34b923d-b7ff-4aca-98e9-ef7ef22d28c7 ONLINE
mirror-1 DEGRADED
840bed05-be73-4e33-95bf-7d0374c1a70e ONLINE
a2b880e0-d3a5-4d15-99c0-bec7e5436d12 UNAVAIL
raidz2-2 ONLINE
ef1bf023-d6db-438a-b9d1-7d2bdbb2ed83 ONLINE
8e9793b2-8d5f-4fcb-9aef-2012536184e8 ONLINE
63599f5d-7f6a-4f42-99ce-64517c6984ef ONLINE
4cb91005-5dc1-453e-942f-5ad6ecdddefe ONLINE
Any suggestions on how I can try to revive this pool?
Do you mean “metadata vdev”, which are generically referred to as “special vdevs”?
From your zpool import output, mirror-1 is a data vdev, not a special vdev.
It was intended to be a special vdev, when I originally set it up. Either way I can’t reimport now.
You’re very close to losing all your data. I hope you have backups. I also suggest redoing this pool if possible.
Because a device is missing from a vdev, you’ll need to force an import of a degraded pool. From here you can resilver the degraded mirror-1 vdev.
The command to force an import of a degraded pool would be:
zpool import -f -R /mnt FusionPool
Use at your own risk.
How new is this pool? How recently did you add the mirror? Do you have a recent backup?
1 Like
The force flag is no required to import a pool with a degraded vdev.
The I/O error is the problem.
Is it expected that one of the NVMe is offline? It sounded like you had completed a resilver already.
2 Likes
NO WAY! Are you THE Allan Jude?
1 Like
The critical data is backed up, but media that makes up a majority isn’t.
I thought I had added one of the 1tb nvmes to the pool, it even noted that vdev was mixed size but it’s not showing as part of the pool now.
What’s are the possible causes of the io error?
Are all devices available?
Without trying to import the pool, what is the output of this?
lsblk -o NAME,PARTUUID,SIZE,MODEL,PTTYPE,TYPE,PARTTYPENAME
I had same problem the last week I thought I had lost all my data.
with a help of my brother in law and under shell we could recovery all the data mounting the pool and copy all the dataset to anothter pool in my system
we use this commands
to mount the pool: (name of my pool was BACKUP) (4910952484766769216 is my disk id)
command:
sudo zpool import -R /mnt -o readonly=on -d /dev/disk/by-uuid/4910952484766769216 -f BACKUP
to copy to another pool in my system:
command:
sudo rsync -av --info=progress2 --no-owner --no-group /mnt/BACKUP/Backup2 /mnt/SharedFiles/SharedFiles/
my pools were called: BACKUP and SharedFiles
my datasets were called: Backup2 and SharedFiles
good luck!!!
root@truenas[~]# lsblk -o NAME,PARTUUID,SIZE,MODEL,PTTYPE,TYPE,PARTTYPENAME
NAME PARTUUID SIZE MODEL PTTYPE TYPE PARTTYPENAME
sda 1.9T T-FORCE 2TB gpt disk
├─sda1 50bc92cb-8159-43a8-b735-b045ae1c92ef 2G gpt part Linux swap
└─sda2 92468ff4-3bfc-4a28-b6fa-9b4c935eb7eb 1.9T gpt part Solaris /usr & Apple ZFS
sdb 1.9T T-FORCE 2TB gpt disk
├─sdb1 1debc02a-279e-46f1-ab58-9729f42b1e46 2G gpt part Linux swap
└─sdb2 1ade89c1-67aa-468b-8a3e-7c12fa4efefc 1.9T gpt part Solaris /usr & Apple ZFS
sdc 119.2G Lexar SSD NS100 128GB gpt disk
├─sdc1 cc01cb61-b193-4e3e-9266-6bee0acd2a2c 1M gpt part BIOS boot
├─sdc2 c53e05fc-7a94-458a-b135-bc66829039f5 512M gpt part EFI System
└─sdc3 17babc35-b377-4f72-9bec-f1c836dfc05f 118.7G gpt part Solaris /usr & Apple ZFS
sdd 14.6T ST16000NM001J-2TW113 gpt disk
├─sdd1 f0a6c8ae-2175-401e-9b6f-103b6e529b8f 2G gpt part Linux swap
└─sdd2 a88e61ec-82c2-4255-92b0-298de1335798 14.6T gpt part Solaris /usr & Apple ZFS
sde 12.7T ST14000NM001G-2KJ103 gpt disk
├─sde1 f562288d-4b5a-43bf-b8f1-8e6dbe553d44 2G gpt part Linux swap
└─sde2 8e9793b2-8d5f-4fcb-9aef-2012536184e8 12.7T gpt part Solaris /usr & Apple ZFS
sdf 14.6T ST16000NM000J-2TW103 gpt disk
├─sdf1 f5d62139-9472-4669-a2c5-cac8634a6a1b 2G gpt part Linux swap
└─sdf2 8d3a4f69-a526-4bf0-ae6c-efc851d43bfe 14.6T gpt part Solaris /usr & Apple ZFS
sdg 14.6T ST16000NM000J-2TW103 gpt disk
├─sdg1 8855cc9f-06d6-45f5-b241-23eb8dc25b58 2G gpt part Linux swap
└─sdg2 bebea15b-3341-4c5a-8cb5-2a04b726e64e 14.6T gpt part Solaris /usr & Apple ZFS
sdh 14.6T ST16000NM018J-2WT103 gpt disk
└─sdh1 a34b923d-b7ff-4aca-98e9-ef7ef22d28c7 14.6T gpt part Solaris /usr & Apple ZFS
sdi 14.6T ST16000NM018J-2WT103 gpt disk
└─sdi1 5b712775-ed2d-46cf-81cb-1936a6f27936 14.6T gpt part Solaris /usr & Apple ZFS
sdj 16.4T WDC WD180EDGZ-11B2DA0 gpt disk
└─sdj1 f9a95cd1-86c4-4c2d-bef5-01e020f29abd 16.4T gpt part Solaris /usr & Apple ZFS
sdk 12.7T ST14000NM001G-2KJ103 gpt disk
├─sdk1 f7c8e660-cfb3-465a-9532-aca31a50adf6 2G gpt part Linux swap
└─sdk2 ef1bf023-d6db-438a-b9d1-7d2bdbb2ed83 12.7T gpt part Solaris /usr & Apple ZFS
sdl 12.7T ST14000NM000J-2TX103 gpt disk
├─sdl1 9e189ff1-4c44-44d3-819e-5a9a574090fc 2G gpt part Linux swap
└─sdl2 4cb91005-5dc1-453e-942f-5ad6ecdddefe 12.7T gpt part Solaris /usr & Apple ZFS
sdm 119.2G Lexar SSD NS100 128GB gpt disk
├─sdm1 ed0395f1-c434-446a-8869-aa02bd87d44e 1M gpt part BIOS boot
├─sdm2 787bbefb-67a7-489e-897e-b190fb627c8e 512M gpt part EFI System
└─sdm3 d1155f5d-72ec-4e22-a163-b1b1c3dbeb1f 118.7G gpt part Solaris /usr & Apple ZFS
sdn 12.7T ST14000NM001G-2KJ103 gpt disk
├─sdn1 59f872e2-c3c3-4694-ab84-22052c2d5509 2G gpt part Linux swap
└─sdn2 63599f5d-7f6a-4f42-99ce-64517c6984ef 12.7T gpt part Solaris /usr & Apple ZFS
zd0 907.7M disk
zd16 5.8M disk
zd32 5.9G gpt disk
nvme4n1 476.9G PCIe SSD gpt disk
└─nvme4n1p1 840bed05-be73-4e33-95bf-7d0374c1a70e 476.9G gpt part Solaris /usr & Apple ZFS
nvme0n1 1.8T CT2000T500SSD8 gpt disk
└─nvme0n1p1 11bf8b83-7c5b-4e9e-bd38-5eb1fa10b886 1.8T gpt part Solaris /usr & Apple ZFS
nvme1n1 931.5G KINGSTON SNV3S1000G disk
nvme2n1 931.5G KINGSTON SNV3S1000G gpt disk
└─nvme2n1p1 d7411367-c719-4bfc-b938-477518f30b4e 476.9G gpt part Solaris /usr & Apple ZFS
nvme3n1 1.8T CT2000P3PSSD8 gpt disk
└─nvme3n1p1 903f4483-9450-415a-9be7-584ca7c8bfe1 1.8T gpt part Solaris /usr & Apple ZFS
When I originally set up this pool, it was a hybrid. I had 4 14TB in z2 and the mirrored nvme were designated for metadata via the wizard. These were set up about 5 or 6 years ago.
Prior to pool expansion, I added another vdev with 4 16TB drives. Later with ZFS expansion, I added 2 spare drives to the 4x16 vdev. Haven’t had any issues prior to this week. It started with rotating 3 uncorrectable errors, on startup, but the drive kept changing and pool scrubs didn’t indicate any issue so I ignored. If that could be creating the I/O I can disconnect that drive.
Errors reported where? From SMART or ZFS? Was it for a spinner or the NVMe’s?
Out of curiosity, what does this show. It requires sudo/root privileges:
for dev in {f9a95cd1-86c4-4c2d-bef5-01e020f29abd,\
a88e61ec-82c2-4255-92b0-298de1335798,\
bebea15b-3341-4c5a-8cb5-2a04b726e64e,\
8d3a4f69-a526-4bf0-ae6c-efc851d43bfe,\
5b712775-ed2d-46cf-81cb-1936a6f27936,\
a34b923d-b7ff-4aca-98e9-ef7ef22d28c7,\
840bed05-be73-4e33-95bf-7d0374c1a70e,\
ef1bf023-d6db-438a-b9d1-7d2bdbb2ed83,\
8e9793b2-8d5f-4fcb-9aef-2012536184e8,\
63599f5d-7f6a-4f42-99ce-64517c6984ef,\
4cb91005-5dc1-453e-942f-5ad6ecdddefe,\
};
do zdb -l /dev/disk/by-partuuid/$dev | grep -E ^" "txg;
done
I only tested this syntax on Core (bash). It should be able to run as a single line. Adjust accordingly.
Your output makes it look like you added 2 NVME’s as a data vdev in a two-way mirror.
To verify, you tried this command which results in the same I/O error?
zpool import -f -R /mnt FusionPool
1 Like
First, thank you for your help on a Saturday, I really appreciate it. I’m trying to download all my critical data from a cloud backup to a local external drive in case I can’t reimport this pool. I don’t keep backups of the media but it’s replaceable.
root@truenas[~]# zpool import -f -R /mnt FusionPool
cannot import 'FusionPool': I/O error
Destroy and re-create the pool from
a backup source.
The error is for a spinning disk, it shows up in alerts on reboot
If you think it’s related to that I/O error I can try disconnecting that drive and importing the pool. I shuffled all the nvme drives but that didn’t resolve the I/O error.
I didn’t know what to replace your commands with so I ran it exactly, this is was the output
txg: 23803623
txg: 23803623
txg: 23803623
txg: 23803623
txg: 23803623
txg: 23803623
txg: 23720432
txg: 23803623
txg: 23803623
txg: 23803623
txg: 23803623
1 Like
No wonder. The TXG for your remaining NVMe is lagging by a lot. I don’t believe you can do an emergency import at this point. It’s not just a few TXG… it’s lagging by over 80,000!
That’s so ridiculous that it seems impossible. I almost want to think that it’s the TXG for a different pool.
Let us confirm. Post the full output of this:
zdb -l /dev/disk/by-partuuid/840bed05-be73-4e33-95bf-7d0374c1a70e
root@truenas[~]# zdb -l /dev/disk/by-partuuid/840bed05-be73-4e33-95bf-7d0374c1a70e
------------------------------------
LABEL 0
------------------------------------
version: 5000
name: 'FusionPool'
state: 0
txg: 23720432
pool_guid: 7105278952023681001
errata: 0
hostid: 785381139
hostname: 'truenas'
top_guid: 6333384889662361424
guid: 10719641310859050811
vdev_children: 3
vdev_tree:
type: 'mirror'
id: 1
guid: 6333384889662361424
metaslab_array: 256
metaslab_shift: 32
ashift: 12
asize: 512105381888
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 10719641310859050811
path: '/dev/disk/by-partuuid/840bed05-be73-4e33-95bf-7d0374c1a70e'
DTL: 396
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 6951971175218986677
path: '/dev/disk/by-partuuid/a2b880e0-d3a5-4d15-99c0-bec7e5436d12'
DTL: 395
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
com.klarasystems:vdev_zaps_v2
org.openzfs:raidz_expansion
labels = 0 1 2 3
I hate to say it, but it looks like you cannot import your pool, even if you tried an emergency import with -F.
A TXG that is lagged by over 80,000 is ridiculous.
I don’t know what happened to your other NVMe or how you ended up like this. Where is the other NVMe? What do you mean by “shuffling” your NVMe’s around?
@HoneyBadger: Is there any hope?
Probably more details than you care to know but I started to move TN into a Proxmox environment as a VM. I put 2, 2TB nvme drives for VM storage but I never figured out NFS shares between Proxmox and TN so shelved that idea. Going back to bare metal TN, I put those drives into the first 2 nvme slots but didn’t use that pool for anything. One nvme drive for the FusionPool was in the 3rd nvme slot and other was connected via pcie card. I removed the larger nvme drives and connected the original FusionPool nvme drive directly to the MB along with the 2 new nvme drives I planned to migrate 2. I can try moving the nvme drive from the pool to the card and rerun the command that checked for the delay.
Maybe I didn’t seat the nvme drive properly. I’ll move it around again and try to reimport. I will rerun and post those commands again if I can’t import. Once all the critical data is downloaded and tested, probably a couple of days. I will recreate the pool.
1 Like
Output looks the same to me. I put the old nvme drive back in but I removed it from the pool so I doubt it be recognized. Guess I’m out of options and will create the pool next week.
root@truenas[~]# for dev in {f9a95cd1-86c4-4c2d-bef5-01e020f29abd,\
a88e61ec-82c2-4255-92b0-298de1335798,\
bebea15b-3341-4c5a-8cb5-2a04b726e64e,\
8d3a4f69-a526-4bf0-ae6c-efc851d43bfe,\
5b712775-ed2d-46cf-81cb-1936a6f27936,\
a34b923d-b7ff-4aca-98e9-ef7ef22d28c7,\
840bed05-be73-4e33-95bf-7d0374c1a70e,\
ef1bf023-d6db-438a-b9d1-7d2bdbb2ed83,\
8e9793b2-8d5f-4fcb-9aef-2012536184e8,\
63599f5d-7f6a-4f42-99ce-64517c6984ef,\
4cb91005-5dc1-453e-942f-5ad6ecdddefe,\
};
do zdb -l /dev/disk/by-partuuid/$dev | grep -E ^" "txg;
done
txg: 23803623
txg: 23803623
txg: 23803623
txg: 23803623
txg: 23803623
txg: 23803623
txg: 23720432
txg: 23803623
txg: 23803623
txg: 23803623
txg: 23803623
root@truenas[~]# zdb -l /dev/disk/by-partuuid/840bed05-be73-4e33-95bf-7d0374c1a70e
------------------------------------
LABEL 0
------------------------------------
version: 5000
name: 'FusionPool'
state: 0
txg: 23720432
pool_guid: 7105278952023681001
errata: 0
hostid: 785381139
hostname: 'truenas'
top_guid: 6333384889662361424
guid: 10719641310859050811
vdev_children: 3
vdev_tree:
type: 'mirror'
id: 1
guid: 6333384889662361424
metaslab_array: 256
metaslab_shift: 32
ashift: 12
asize: 512105381888
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 10719641310859050811
path: '/dev/disk/by-partuuid/840bed05-be73-4e33-95bf-7d0374c1a70e'
DTL: 396
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 6951971175218986677
path: '/dev/disk/by-partuuid/a2b880e0-d3a5-4d15-99c0-bec7e5436d12'
DTL: 395
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
com.klarasystems:vdev_zaps_v2
org.openzfs:raidz_expansion
labels = 0 1 2 3
When did you remove the other NVMe from the pool? Did you do that in the GUI intentionally? Is that when you “replaced” an NVMe? What happened to the replacement?