Hi,
My TrueNAS server’s only pool just got corrupted. I discovered it when my NFS shares became frozen
After a reboot, I saw two services starting that were taking forever and couldn’t complete the boot
ix-netif.service/start
ix-zfs.service/start
To allow a full boot, I unplugged everything except the boot drives from the motherboard. The problem was really happening when drives were plugged back
I tried to come back to a previous OS version, didn’t worked
I tried to come back to a previous snapshot of the top Dataset, didn’t worked
One, the boot was successful. But the Storage Dashboard were showing an error / problematic Pool
I did try the command line to see the Pool Status, but couldn’t see what this line really means, since no files were listed:
errors: Permanent errors have been detected in the following files:
Decided to export the Pool and mount it as Read-Only to finish a critical backup to avoid loosing files
sudo zpool import -o readonly=on RZ1_3x16TB -R /mnt
Is there something to do to make this Pool back on tracks? Fix the corruption it now has?
Here’s my TrueNAS System Informations :
- Platform: Generic
- Version: ElectricEel-24.10.0.2
- Data VDEVs : 2 x RAIDZ1 | 3 wide | 14.55 TiB
- Log VDEVs : 1 x MIRROR | 2 wide | 931.51 GiB
- Auto TRIM: On
- Compression Level: LZ4
- ZFS Deduplication: OFF
Thanks a lot in advance!
Were you able and if so, have you finished backing up your data?
I am not the pool expert so if you feel like waiting for confirmation or other advice, I understand.
My first comment is how odd your log mirror has identical cksum errors. Did power drop? Sudden reboot?
Here is what I would do:
- While your pool is still mounted ro, run a SMART long test on all of your drives.
smartctl -t long /dev/sda
and sdb… Then examine the date using smartctl -x /dev/sda
to ensure the Extended tests passed without error. Next examine the rest of the data for key indicators of drive failure. If you do not know how to decipher the rest of the SMART data, post it for each drive. I do not expect you to have drive failure if you never received a failure message beforehand.
- If all looks good from the SMART data, then I’d try to import your pool and if that works, run a SCRUB. See what files have errors, if any. If there are not files listed, then run
zpool clear RZ1_3x16TB
to clear the errors, run zpool status RZ1_3x16TB
to check if the errors are cleared.
- If all looks good, Power Off, wait 10-20 seconds, Power On, make sure everything is working properly.
- You may need to remove the logs and then wipe them and add them back, however, I would only do that as a last step.
Best of luck to you.
1 Like
Post your hardware details. How are your drives attached to the motherboard. Are you using SATA expanders? For examples, expand the arrow sections at the bottoms of our posts.
Run the following additional commands and post the output:
lsblk -bo NAME,MODEL,ROTA,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
lspci
sas2flash -list
sas3flash -list
smartctl -x /dev/*
for each drive in the lsblk
output.
Also, in addition to posting your hardware details please say whether you use VMs, Apps etc. in addition to NFS.