Hi All,
Sorry for long first post. I’ve been having intermittent problems with random files coming up as “permanent errors” (see export below).
This seems to impact:
- Small old files such as mp3 which have been carried across through various ext3, ext4, and NTFS file systems over the last 25 years. On review, when restoring from the (Pre ZFS) backup these files may already be corrupted prior to migration into TrueNAS. I have scrubbed several times since install without error and would have expected any of this corruption to be detected then (if it was going to be at all)
- New files, although usually only iso type images, and not every time. These files are usually downloaded to the “download” directory and moved to another location within same pool (but different dataset). Day to day type documents and photos etc. have not been effected (yet?).
Errors are coming up with zero checksum or read/write errors. When the errors occur they impact all previous snapshots containing the affected files, not just the latest snapshots.
To clear the errors I have deleted the impacted files AND all previous snapshots that included the file, then two scrubs (each stopped after a few minutes).
I have done this several times over the last 6 months, and randomly more errors pop up and files are lost.
output from: "sudo zpool status -v"
pool: datastore
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub canceled on Thu Mar 14 22:28:16 2024
config:
NAME STATE READ WRITE CKSUM
datastore ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
1a962e00-4c94-45dd-a560-2b8dffd62cbd ONLINE 0 0 0
52d950c3-8a5f-4cfa-8881-79e1456b59a9 ONLINE 0 0 0
fb23a123-3f22-45a4-9b4b-6f9b4b1354f7 ONLINE 0 0 0
0fe2b142-a647-44ee-81e8-39be5dd5f85d ONLINE 0 0 0
18ecb100-ff58-4306-b488-e21fff64dbe0 ONLINE 0 0 0
2ebbce27-c886-4011-8048-dcba479e18a3 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
datastore/storage/family/user1@DAILY-2024-03-28_01-00:/images/systemrescue-11.00-amd64.iso
datastore/storage/family/user1@DAILY-2024-03-28_01-00:/images/clonezilla-live-3.1.2-9-i686.iso
datastore/storage/family/user1@DAILY-2024-03-28_01-00:/images/clonezilla-live-3.1.2-9-amd64.iso
datastore/storage/family/user1@DAILY-2024-04-03_01-00:/images/systemrescue-11.00-amd64.iso
datastore/storage/family/user1@DAILY-2024-04-03_01-00:/images/clonezilla-live-3.1.2-9-i686.iso
datastore/storage/family/user1@DAILY-2024-04-03_01-00:/images/clonezilla-live-3.1.2-9-amd64.iso
datastore/storage/media@MONTHLY-2024-01-01_03-00:/Music/Mp3/amusicfile.cfa
datastore/storage/media@MONTHLY-2024-01-01_03-00:/Shows/show1/show1 Season 3/show1.mp4
datastore/storage/media@MONTHLY-2024-01-01_03-00:/Music/musicvideo.mp4
datastore/storage/family/user1@DAILY-2024-04-02_01-00:/images/systemrescue-11.00-amd64.iso
datastore/storage/family/user1@DAILY-2024-04-02_01-00:/images/clonezilla-live-3.1.2-9-i686.iso
datastore/storage/family/user1@DAILY-2024-04-02_01-00:/images/clonezilla-live-3.1.2-9-amd64.iso
datastore/storage/family/user1@MONTHLY-2024-04-10_02-00:/images/systemrescue-11.00-amd64.iso
datastore/storage/family/user1@MONTHLY-2024-04-10_02-00:/images/clonezilla-live-3.1.2-9-i686.iso
datastore/storage/family/user1@MONTHLY-2024-04-10_02-00:/images/clonezilla-live-3.1.2-9-amd64.iso
/mnt/datastore/storage/family/user1/images/systemrescue-11.00-amd64.iso
/mnt/datastore/storage/family/user1/images/clonezilla-live-3.1.2-9-i686.iso
/mnt/datastore/storage/family/user1/images/clonezilla-live-3.1.2-9-amd64.iso
Hardware details
- HP Prodesk G2 i5 6500, (in ATX case with 600W PSU)
- OS Version:TrueNAS-SCALE-23.10.0.1
- Intel i5-6500 CPU @ 3.20GHz
- 32GB DDR4 non-ECC 2133MHz
- UPS
Storage:
- 1x120g SSD boot pool (motherboard SATA controller)
- LSI 9211-8i SAS controller in IT Mode (all other drives)
- Datastore Pool: RaidZ2, 6x6TB, mix of Seagate Ironwolf and WD Red (all CMR).
- Native ZFS encryption: Automatically unencrypt on boot from passkey stored on USB Key as per: SOLVED - ZFS Encryption USB auto unlock and mount with pass-phrase as I do with GLI. | TrueNAS Community
I understand it’s not ideal TrueNAS hardware, but I wouldn’t have expected these repeated errors.
I have completed the following troubleshooting:
- Memtest 86, multiple run throughs, over about 24 hours, no errors
- Checked IT mode on LSI 6Gbps SAS HBA 9211-8i
- Checked, reseated all HDD cables, nothing obvious
- SMART extended offline, all drives - no errors
- SMART short (daily) - no errors
I hope someone can give me some pointers on what to try next.
Only similar reports I can find point to a bug in ZFS native encryption (link below). Could this be related?
https://github.com/openzfs/zfs/issues/12014
Thanks in advance,
AdrianTheFifth