Uncorrectable I/O Error hangs system, prevents pool import

Hello!

I ran a Core NAS setup with two 6TB drives in a mirror (for SMB). One of them started getting issues, so I decided to upgrade rather than swap. Bought two 16TB drives, planning to create a pool and move data from one to another.

I set up a new pool with the 16TB drives and started moving data with the replication task. It got to about 71GB, but then failed due to an I/O error. I figured it was the failing drive, so I detached it from the pool via the GUI. Trying again, it gave me the same error. I found a post that said that scrubbing the pool should help, so I did that, but the progress hanged at 30.5% and the NAS was unresponsive, so I rebooted it. It would refuse to boot, stating (with a monitor plugged into it directly) that it has encountered an uncorrectable I/O faliure and has been suspended.

The only way to get it to boot was to detach the drives, after that I removed the pool in the GUI and re-attached the drive. Trying to import the pool back results in an I/O error every time, even after using various zpool import options, like -fFX.

After trying many times, re-installing Truenas Core, even trying under Truenas Scale and getting the same result, the one command that made a difference was

zpool import -o readonly=on

This made the GUI unable to see any pools at all, but I am unaware if it was imported and how would I be able to see and read data off of it. I tried zpool export, which made it visible in the GUI, but importing them just repeats the same process.

I have a backed-up config and keys from before I started this process if that helps.

Thank you in advance for any and all help.

Figured out how to save my data:

Make a new, health pool with healthy drive(s).

Shut down the system and unplug the faulty drive(s).

Turn it on, export/disconnect the faulty pool in the GUI and shut the system down again.

Turn it on, go to shell and enter zpool import it (faulty pool) should be listed.

Enter zpool import -o readonly=on poolname

Since my pool is encrypted, the command zfs mount poolname says encryption key not loaded.

Load the encryption key as a passphrase with the command zfs load-key -r -L prompt poolname

Then copy the hex key from the dataset_poolname_key.json you hopefully have and enter the passphrase.

Enter zfs mount poolname

This mounts the pool and you should be able to cd poolname

Now mount the dataset with zfs mount poolname/datasetname

cd into the dataset to check if your folders are there

In my case, all folders were there, but I was only interested in copying one “main” folder, if your folder don’t appear (like mine didn’t) make sure to also decrypt the dataset with zfs load-key -r -L prompt poolname/datasetname

Having made sure that my files were on the drive, I navigated into my healthy pool and dataset and created a new folder with mkdir foldername (Checking where the dataset was mounted with zfs list)

Then cp -r -f /poolname/datasetname/folder_I_want_copied /mnt/healthypool/dataset/foldername and just let it copy.

1 Like

I presume you meant cp. Great job saving your pool!

It is situations like this why I implemented a Z3 VDEV and a 4-way mirror for my sVDEV for my pool. Too much opportunity for a single-drive failure to turn pool recovery into an “interesting” experience. Congratulations again.

2 Likes

Thank you! I did indeed mean cp, edited the post.

It is just these sort of experiences that make you learn the most tho!

I believe you can perform replication of a disk based on the native zfs encryption even though it is still locked.
However, I don’t know if the I/O error would still occur when the pool is set as readonly=on.
You might still get the I/O error with the regular cp command.

I did indeed still get I/O errors using the cp command, however it was only for specific corrupted files. Even when set to readonly, the send/recieve function would copy about 70GB and then the I/O error would completely stop the system from working.