ZFS Permanent Errors - How to resolve

I’m currently running into an issue with my App pool and checksum errors that I cannot figure out how to either resolve or clearly identify a specific next step that I need to take to resolve.

These are on 2 SATA SSD’s in mirror (Micron 1300 256GB). The output from zpool status -v is below

The single metadata file that is referenced at the end is not needed and as far as I can tell was removed when my Plex server cleaned up old bundles (all the unused uploaded posters). Either way, I don’t need this file so I don’t really care that it’s gone.

The other file errors I have no idea how to track down what they were supposed to be and whether I should care.

Things I’ve done so far:

  1. Reseated all the cables
  2. SMART tests on both drives - no errors
  3. Multiple zpool clears and zpool scrubs. Clear does nothing.
  4. Started a scrub and immediately cancelled it. Actually clears the errors. Running a new scrub brings them back.

The pool has a replication task set up every Sunday to a different pool so worst case I blow it out and in theory rebuild from the replication, but not positive on how to do that.

Thank you for any guidance here. I’m not very technical, but slowly learning.

  pool: App_Pool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:02:24 with 4 errors on Wed Feb 11 14:05:54 2026
config:

        NAME                                      STATE     READ WRITE CKSUM
        App_Pool                                  ONLINE       0     0     0
          mirror-0                                ONLINE       0     0     0
            c5140fd5-5f19-469e-b799-7dae329a50cd  ONLINE       0     0    16
            662f4e95-9b32-4ba9-962e-23af445db904  ONLINE       0     0    16

errors: Permanent errors have been detected in the following files:

        App_Pool/applications/plex:<0x7106a>
        App_Pool/applications/plex:<0x7107e>
        App_Pool/applications/plex:<0x70fee>
        /mnt/App_Pool/applications/plex/Metadata/Movies/e/5353b6a06c84639c4793fe47586601035434013.bundle/Contents/_combined/posters/tv.plex.agents.movie_b4abc30582f04e220e521983f481f3099d1c1813

I know it sounds pointless, and it’ll probably come back clean, but it doesn’t hurt to run a few memtest passes just to rule out the RAM.

Have you scrubbed the backup pool?

2 Likes

No errors after scrubbing the backup target pool. Digging through my pile of things to find a usb I can use to run memtest off of.

It’s more weird that it cares about a file I am fine with being deleted.

If memtest comes back fine, is there a “for dummies” process to rebuild this pool from a backup? I just have all my apps running on here, but would like to not lose my metadata or the minecraft world I play with my kid.

If the file is actually there where the path says it is, you could try to ssh into the server then sudo -s to change to the root user Then either change to that directory, (ssh will allow a right click paste of the path since it is complicated and cd / will get you to the root of the file system where /mnt is and go from there) and remove the file example: rm filename.txt or maybe easier is to type mc to open midnight commander at the command prompt after changing to root. then in the mc filemanager navigate to the file and delete it..

The pain just piles on. Memtest says my ram is failing. Murphy’s law running full force.. got this set just before the insanity.. a replacement will be 4x what I paid a year ago. :sob:

2 Likes

This needs to be addressed before anything else. You should fully pass multiple memtest runs on your NAS server.

Are you using any overlocking or XMP settings in your BIOS? You might be able to resolve this by changing the settings in the BIOS.

That’s my next test. I intentionally chose to not use XMP, but will manually underclock and re-test. Just trying to at least let one full pass of memtest go through.

Either way I will also bite the bullet and update my CPU. First gen Ryzen is flaky with RAM. This was super stable for years as my main rig and then in the server with a 16gb kit of TridentZ, but this Corsair 32gb kit has only been installed since November.

Looks like the ram is at least one issue. Swapped the two sticks to both get a good re-seat and in case RamThings. Downclocked to 2666 and still threw errors on the first test.

Stuck my old 16gb kit back in and so far it’s clean on test #6 of the first pass. Will let it run for a few passes before bringing the server back up and trying to re-scrub.

Now to scour trustworthy sites for used ram because new prices are :face_vomiting:

16 GB of RAM that passes multiple tests is better than 32 GB that fails. At least you have something that can be used in the meantime, assuming it passes.


Why purchase new RAM? Isn’t your Corsair kit eligible for an RMA?

1 Like

Because they were purchased used off ebay.

Is the seller trustworthy enough to contact and ask for an exchange for RAM that passes? Maybe it’s an honest person or company that will work with you.

It sounds like you were sold used sticks that were already failing, even if unknown by the seller.

I doubt it. They have a bunch of random things listed and probably not worth the headache and time vs just buying another kit from a more trusted source (I’ve already reached out to a few sellers on r/homelabsales).

That was my first foray into used hardware once my TrueNAS made it clear that it wanted more than 16GB to run my apps.

The lost $ isn’t as bad as the dead 14TB SATA drive I got from a seller off ebay. The replacement was dead as well but they have screenshots showing it was fine (my guess is they shipped them poorly) but ebay thinks they did what they needed to do.

Out of kindness I would send you a spare 32 GB DDR4 RAM kit, but ironically it’s also bad.

Thank you for that, but I can afford a new kit no problem, just generally averse to spending money unless I need or want to.

You can try Jawa.

If you purchase something from Jawa, you must immediately run memtests the day your package is delivered. Their return policy only allows refunds/returns within 48 hours of delivery. If you catch bad RAM early, then you can file for a return.

Are both Corsair sticks bad? If one is good, you could still have a total of 32 GB in your system: 16 GB Corsair + the old 16 GB stick

You likely will lose out on dual-channel support.

Winnie - just for once, can you be wrong about memtest? We’re going to have to automate it as a response when people post problems at this rate…

3 Likes

I have two hypotheses.

  1. RAM is not built or tested as rigorously as it used to be, which is why bad RAM is not rare to see on these forums.

  2. We’re not seeing more cases of bad RAM compared to the past, but because of ZFS bringing corrupted data to our attention, we are more likely to test the RAM and verify it is bad.

  3. Both could be true.

Casual PC users in the past might have been hit with bad RAM just as often as we see today, but they may have not known about it and simply dismissed their problems as buggy software, a failing drive, file-system corruption, a bad GPU, and so on. (Gamers might have already upgraded their PC or parts before they would have seen their RAM failing.)

2 Likes

I’ll potentially test it at some point. My old kit is a 2x8GB kit. I’m just going to turn off some less critical apps for the the next week or so.

But now my checksum errors have actually gone up now that I’ve got stable ram installed… :face_exhaling:

Something may be up with my Plex install. The errors are all isolated to 3 metadata bundles within the same folder but will be “fun” trying to find the specific media those are related to so I can do the plex dance on them.. maybe that will help.

NAME                                      STATE     READ WRITE CKSUM
        App_Pool                                  ONLINE       0     0     0
          mirror-0                                ONLINE       0     0     0
            c5140fd5-5f19-469e-b799-7dae329a50cd  ONLINE       0     0 1.09M
            662f4e95-9b32-4ba9-962e-23af445db904  ONLINE       0     0 1.09M

errors: Permanent errors have been detected in the following files:

        /mnt/App_Pool/applications/plex/Metadata/Movies/e/5c56c3f91621f71827c14024fac43c790a97c68.bundle/Contents/com.plexapp.agents.localmedia
        App_Pool/applications/plex:<0x7106a>
        /mnt/App_Pool/applications/plex/Metadata/Movies/e/5c56c3f91621f71827c14024fac43c790a97c68.bundle/Contents/com.plexapp.agents.none/Info.xml
        /mnt/App_Pool/applications/plex/Metadata/Movies/e/5d4bc2d824d9f1b6020b6d02d66b0475f87d0cf.bundle/Contents
        App_Pool/applications/plex:<0x7107e>
        App_Pool/applications/plex:<0x70fed>
        App_Pool/applications/plex:<0x70fee>
        App_Pool/applications/plex:<0x70ff9>
        /mnt/App_Pool/applications/plex/Metadata/Movies/e/5353b6a06c84639c4793fe47586601035434013.bundle/Uploads

When you see checksum errors into the millions, it’s a sign of poor cable connections or an LBA issue.

Possibly related, you had been writing, reading, and checksumming data while you had bad/failing RAM installed, possibly as far back as November. This means that if “corrupted” data was written to disk (because of bad RAM), it will not show up as corruption with a ZFS scrub. The checksum will confirm that the block written to disk has the expected checksum.


An app has no effect on the data and checksums that were already written via ZFS.