Need help to get my zfs back running again

I think I messed up. I have a ZFS pool with 2xraidz1 with 4 hdds each.

Today I changed the controller because it was giving me lots of errors. After changing, my pool showed up, but on drive removed itself and got online again and so on, so I swapped this hdd out and started resilvering, since it looked like the hdd had a defect (it happened with rebooting rewiring etc. So it must be a defect).

Resilvering was in progress. After a while I heard some sort of a squeek, the sound the Toshiba mg09 make when it turns off. So it seemed 2 drives randomly turned off and on for a fraction of a second. Now not enough parity was left, since both hdd were unplugged for a short time, so I restarted the PC to re-initialize everything.

The pc got stuck so I forced shutdown (maybe my bad mistake), after restart, one of the working hdds showed as missing, even though it was plugged in. I used lsblk to check. I saw that no partition was shown (the forced restart broke the partition table).

So I used testdisk to recover the partition, now it shows only 1 partition (the main zfs partition) but the other 8m partition is missing.

Now I’m trying to clone this drive, shrink the zfs partition to fit the 8m partition in, since due to the recovery, it’s only 3m left for some reason.

Now I’m standing there with 1 drive half defective, which needs to be resilvered, and a second drive with messed up partitions. Unfortunately zfs shows the drive with messed up partition as missing, I’m afraid to detach and re-attach, because when detached, there is no way back attaching it, when something is missing.

It’s funny since both drives are in one raidz1, which means my data will be fckd up, since there are no sufficient parities. Is there some way I’m missing to do a magic rescue. Who can save me? I don’t want this to be my vilian arc. I hope some ZFS magician can help me out.

TiA

No, their is little you can do. If you can’t get at least 3 drives per 4 drive RAID-Z1, your pool is toast.

On the practical side, you can run zdb -l /dev/X9 against the failing / missing disks in the pool. Replace “X” with the device naming, like “sda” and “9” with the ZFS partition. You should see some information about labels and TXGs, as well as vDev and pool information.

On occasion, we have seen partition tables disappear due to booting foreign OSes. Restoring the “normal” partition can restore the disk’s ability to be part of the vDev. Now, what is “normal” is extremely specific to your disk sizes and version of TrueNAS.

If needed, here are 2 external options:

  • Paid recovery service
  • Paid recovery tool, Klennet, (which seems to have a free scanner)

I managed to recover the pool. I copied a partition table via sgdisk from a working hdd of the pool (same hdd model, same revision) and “transplanted” it to the hdd with the messed up partition table. After that I gave it new UUIDs and put it back into the pool, the hdd was recognized as nothing happened, now I’m back at 3 drives and can resilverings the defective one. It took me lots of research and as kind stupid questions to perplexity, but now I learned something new. :crossed_fingers: for the resilver

2 Likes

Hey, take a deep breath it’s stressful, but don’t panic. First thing, make a full clone of the drive with messed-up partitions using ddrescue or something similar. Leave the original drive untouched. Let the partially defective drive finish resilvering. Once you have a clone, you can safely try zpool import F or D on the clone to see if it recovers. Avoid detaching or reattaching anything on the original drives for now. Take it slow ZFS can be really forgiving if you’re careful.