Specific dataset snapshots corrupted, but pool scrub shows no errors?

This one is strange. Running 25.04.2.5 with ECC RAM for a while now with no issues. I noticed recently that zpool status -v shows corruption only in snapshots of one encrypted dataset. I’ve deleted the offending snapshots, but every nightly snapshot after also shows up as corrupted. The snapshots are 0 bytes in size as this drive is rarely accessed or written to. No issues with any other drives or datasets. Scrub ran last night without detecting errors.

Has anyone seen this issue before? I’m not sure how to proceed.

└>[λ] zpool status POOL -v
  pool: POOL
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 11:39:11 with 0 errors on Mon Mar 16 11:38:13 2026
config:

        NAME                                      STATE     READ WRITE CKSUM
        POOL                                      ONLINE       0     0     0
          mirror-0                                ONLINE       0     0     0
            b7d41040-4b21-4ad3-a355-3da496af0aab  ONLINE       0     0     0
            6f1fe9bf-0308-4e02-b816-7447d489c87c  ONLINE       0     0     0
          mirror-1                                ONLINE       0     0     0
            040ea455-3720-4ea1-b057-18b0e41236e7  ONLINE       0     0     0
            da8ce55c-cc90-492f-8372-b7d2ec2c060b  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        POOL/TANK:<0x1>
        POOL/TANK@auto-2026-03-13_03-10:<0x1>
        POOL/TANK@auto-2026-03-14_03-10:<0x1>
        POOL/TANK@auto-2026-03-11_09-00:<0x1>
        POOL/TANK@auto-2026-03-15_03-10:<0x1>
        POOL/TANK@auto-2026-03-12_03-10:<0x1>
        POOL/TANK@auto-2026-03-16_03-10:<0x1>

The bottom 6 errors in that list are in snapshots, so you can just delete the snapshots and the issue will go away. Easy.

They are metadata errors.

Your problem is that the top error is an issue with metadata in your pool and NOT in a snapshot. To my knowledge there is no way of fixing this other than trashing the pool and restoring data (less than ideal). I would hazard a guess that any snapshot you take from now on will have the same error.

What does zpool status -v say on your Backup Server?

Also - your RAID card (9240-8i) is a MegaRAID Card. It can (I think) be flashed to IT mode, but has it? Or are you running JBOD mode.

The backup NAS is clean with no errors. My HBA has always been in IT mode. I’ve tested all of the files on the dataset and found no errors. This smells like a bug.

BACKUP NAS OUTPUT

└>[$] zpool status -v
  pool: POOL
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 18:57:28 with 0 errors on Mon Mar  2 20:42:31 2026
config:

        NAME                                      STATE     READ WRITE CKSUM
        POOL                                      ONLINE       0     0     0
          mirror-0                                ONLINE       0     0     0
            90859f98-9b79-4257-8f4a-e451f92a6ff0  ONLINE       0     0     0
            72ed84e6-b668-4f08-8496-88799d08ded8  ONLINE       0     0     0

errors: No known data errors

This user appears to have had the same issue, but maybe a little different.

I discussed this with Claude and went through some testing. Here’s what it had to say.

it’s a reporting bug, not real corruption. The <0x1> permanent errors in zpool status -v are phantom entries that ZFS is incorrectly flagging on encrypted datasets in TrueNAS SCALE 25.04. The actual data and metadata are completely intact.

The evidence is conclusive:

  • :white_check_mark: Scrub with dataset unlocked = 0 errors
  • :white_check_mark: All files readable
  • :white_check_mark: Replication working
  • :white_check_mark: Matches known bug pattern in 25.04
  • :white_check_mark: Errors vanish to “permission denied” when locked

That’s a real corner bug, it is possible that your pool created using old version zfs, and expanded on new version zfs, and filled nearly full. backup and recreate pool if possible. Scrub won’t fix it, and reboot might make your pool cannot be imported again.

This pool has never had a ZFS upgrade regarding New ZFS version or feature flags are available for pool 'TANK'. Upgrading pools is a one-time process that can prevent rolling the system back to an earlier TrueNAS version. if that’s what you’re referring to. It’s only at 50% capacity.

All snapshots are replicated to my backup NAS and there are no errors on the replicated snapshots.

What evidence do you have to support the claim that it would affect the pool being imported? Have you had a similar issue where this was the case?

I had a raidz2 6x hdd pool that was created on zfs-2.2.8-pve1, later imported and enabled some features on TrueNAS Goldenfish virtual machine to expand as a 8x hdd raidz2 for capacity. When the pool has less than 10G usable space, it began reporting metadata 0x0 errors.

At this time I tried ran zpool import on hypervisor, it reported that The pool cannot be imported due to damaged devices or data. So I am sure it cannot survive after a reboot, Even the structure and zfs send seems OK.

So I doubt metadata 0x1 damage can cause the same effect.

However I must apologize for my rude guessing, I don’t know what’s on these metadata blocks, but I think these are important structure data that may affect import.

The ZFS update I referred is host’s zfs version, not the pool’s.

Yeah, not sure your issue is relevant to this. I’m trying to figure out why its only snapshots that have the corruption, on only one dataset in the pool, and why all snapshots are replicated to my backup NAS are fine with no corruption.

I’m still facing this issue. Does anyone have any idea what is going on here?

Where are the GURUS! Or have you all lost yourselves to AI! Don’t give up the good fight. In any case, these corrupted snapshots, which replicate over without corruption, are still showing up every night and rotating out on schedule. No clue what’s going. Can’t pin it down. I need me some MVPs and resident GURU insight y’all. :face_with_monocle: