Corrupted Pool

Hi,

My TrueNAS server’s only pool just got corrupted. I discovered it when my NFS shares became frozen

After a reboot, I saw two services starting that were taking forever and couldn’t complete the boot

ix-netif.service/start
ix-zfs.service/start

To allow a full boot, I unplugged everything except the boot drives from the motherboard. The problem was really happening when drives were plugged back

I tried to come back to a previous OS version, didn’t worked
I tried to come back to a previous snapshot of the top Dataset, didn’t worked

One, the boot was successful. But the Storage Dashboard were showing an error / problematic Pool

I did try the command line to see the Pool Status, but couldn’t see what this line really means, since no files were listed:

errors: Permanent errors have been detected in the following files:

Decided to export the Pool and mount it as Read-Only to finish a critical backup to avoid loosing files

sudo zpool import -o readonly=on RZ1_3x16TB -R /mnt

Is there something to do to make this Pool back on tracks? Fix the corruption it now has?

Here’s my TrueNAS System Informations :

  • Platform: Generic
  • Version: ElectricEel-24.10.0.2
  • Data VDEVs : 2 x RAIDZ1 | 3 wide | 14.55 TiB
  • Log VDEVs : 1 x MIRROR | 2 wide | 931.51 GiB
  • Auto TRIM: On
  • Compression Level: LZ4
  • ZFS Deduplication: OFF

Thanks a lot in advance!

Were you able and if so, have you finished backing up your data?

I am not the pool expert so if you feel like waiting for confirmation or other advice, I understand.

My first comment is how odd your log mirror has identical cksum errors. Did power drop? Sudden reboot?

Here is what I would do:

  1. While your pool is still mounted ro, run a SMART long test on all of your drives. smartctl -t long /dev/sda and sdb… Then examine the date using smartctl -x /dev/sda to ensure the Extended tests passed without error. Next examine the rest of the data for key indicators of drive failure. If you do not know how to decipher the rest of the SMART data, post it for each drive. I do not expect you to have drive failure if you never received a failure message beforehand.
  2. If all looks good from the SMART data, then I’d try to import your pool and if that works, run a SCRUB. See what files have errors, if any. If there are not files listed, then run zpool clear RZ1_3x16TB to clear the errors, run zpool status RZ1_3x16TB to check if the errors are cleared.
  3. If all looks good, Power Off, wait 10-20 seconds, Power On, make sure everything is working properly.
  4. You may need to remove the logs and then wipe them and add them back, however, I would only do that as a last step.

Best of luck to you.

1 Like

Post your hardware details. How are your drives attached to the motherboard. Are you using SATA expanders? For examples, expand the arrow sections at the bottoms of our posts.

Run the following additional commands and post the output:

  • lsblk -bo NAME,MODEL,ROTA,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
  • lspci
  • sas2flash -list
  • sas3flash -list
  • smartctl -x /dev/* for each drive in the lsblk output.

Also, in addition to posting your hardware details please say whether you use VMs, Apps etc. in addition to NFS.