Which is interesting because those didn’t show up until after I ran memtest on the failing and the old non-failing ram. I didn’t touch my cables at all. It was sitting at 16 as shown in my initial post and was consistent across a few scrubs and after cleaning up old (over a year) snapshots.
Not sure what my next troubleshooting steps are. I can re-seat the cables.. maybe try new ones while I’m at it?
It’s a very consistent set of failed folders that are listed when I run zpool status along with the hex codes. These are files that are ok if they’re gone - I can fix metadata. I just want to figure out the cause and how I can clear them out. The physical disks appear to be fine.
Maybe this isn’t your case, but over time I’ve learned not to dismiss RAM that generates Memtest errors as defective unless I get the same results with another motherboard and CPU combo. Besides, two modules dying at the same time is (always imho) rather strange.
A common point of failure for some entry level motherboard is the use of dual rank sticks, filled in every slot, just to give an example.
Take the time to test them one by one, and if you have it, in another system… with these prices is totally worth be sure to not trash good sticks
Re-seat the cards and cables. Leave only the old sticks of 16 GB RAM that fully pass memtests.
Issue zpool clear to your pool.
Check if the checksum and error status has been reset.
Run a full scrub on the pool.
Check the results again.
You want to know that your pool will pass a scrub with a working RAM kit.
There is also the possiblity that when you had bad RAM, good data was written to a block, yet a bad checksum was generated for it. This will always return as a “checksum error” on good data for those blocks, even with healthy RAM and healthy drives.[1]
EDIT: This is why ECC RAM is recommended for NAS servers. It’s not that corruption is more likely on a NAS server, but that a NAS usually prioritizes data integrity and reliable archiving more than we expect of a desktop PC. If you’re already going out of your way to set up a NAS, you might as well consider raising your standards.
If cost is an issue and you can’t afford ECC, then you should always run full memtest passes on your new build. Even if the RAM sticks are good, certain settings in the BIOS could cause it to fail. This is especially bad if it doesn’t fail on reading OS or boot files, since the system will happily keep running without alerting you to a problem.
One way to confirm this is to find a known good copy of a file and compute its SHA256 hash against the file on the NAS server. If they match, then it doesn’t matter that ZFS claims the file is supposedly “corrupt”. If you do not have a good copy of the file elsewhere, you can manually inspect it for any obvious errors. (A JPEG photo will usually have a random horizontal line somewhere.) ↩︎
Well the big count is gone so that’s good. Interestingly the hex files are the same and two of the files that have shown up consistently finally dropped off after a clear.
After a scrub run via the CLI:
pool: App_Pool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:02:25 with 4 errors on Fri Feb 13 17:36:20 2026
config:
NAME STATE READ WRITE CKSUM
App_Pool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c5140fd5-5f19-469e-b799-7dae329a50cd ONLINE 0 0 16
662f4e95-9b32-4ba9-962e-23af445db904 ONLINE 0 0 16
errors: Permanent errors have been detected in the following files:
App_Pool/applications/plex:<0x7106a>
App_Pool/applications/plex:<0x7107e>
App_Pool/applications/plex:<0x70fee>
App_Pool/applications/plex:<0x70ff9>
Immediately after zpool clear:
pool: App_Pool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:02:25 with 4 errors on Fri Feb 13 17:36:20 2026
config:
NAME STATE READ WRITE CKSUM
App_Pool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c5140fd5-5f19-469e-b799-7dae329a50cd ONLINE 0 0 0
662f4e95-9b32-4ba9-962e-23af445db904 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
/mnt/App_Pool/applications/plex/Metadata/Movies/e/5c56c3f91621f71827c14024fac43c790a97c68.bundle/Contents/com.plexapp.agents.localmedia
App_Pool/applications/plex:<0x7106a>
/mnt/App_Pool/applications/plex/Metadata/Movies/e/5d4bc2d824d9f1b6020b6d02d66b0475f87d0cf.bundle/Contents
App_Pool/applications/plex:<0x7107e>
App_Pool/applications/plex:<0x70fed>
App_Pool/applications/plex:<0x70fee>
App_Pool/applications/plex:<0x70ff9>
The struggle there is the cost to get into an ECC space where the power cost makes sense. Getting into that space is way more CPU than I really need at a bit more power draw for a big chunk of money. I’ve attempted to try and find Unregistered ECC RAM since Ryzen supports that (and I know it’s not a common config at all), but I’ve had a hard time validating if what I find really meets that requirement.
The paths that the zpool status notes have errors to are just folders, not files themselves. When I navigate to those folders in my SMB share, there’s no files in them. For the hex references, I have no idea what those are pointing to. Best I could find on some reddit posts were that those are referencing old snapshot files? But not sure on that.
Thank you for all your help so far. I’m definitely learning a lot more about ZFS and getting more comfortable with the CLI. I’m a non-technical person who came from a cruddy QNAP trying to keep my old hardware from becoming e-waste and own my media (I buy physical disks both for movies and music).
More info… running two status checks back to back gives me different set of permanent file errors.
admin@truenas[~]$ sudo zpool status App_Pool -v
pool: App_Pool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:02:25 with 4 errors on Fri Feb 13 17:41:59 2026
config:
NAME STATE READ WRITE CKSUM
App_Pool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c5140fd5-5f19-469e-b799-7dae329a50cd ONLINE 0 0 8
662f4e95-9b32-4ba9-962e-23af445db904 ONLINE 0 0 8
errors: Permanent errors have been detected in the following files:
App_Pool/applications/plex:<0x7106a>
App_Pool/applications/plex:<0x7107e>
App_Pool/applications/plex:<0x70fee>
App_Pool/applications/plex:<0x70ff9>
admin@truenas[~]$ sudo zpool status App_Pool -v
pool: App_Pool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:02:25 with 4 errors on Fri Feb 13 17:41:59 2026
config:
NAME STATE READ WRITE CKSUM
App_Pool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c5140fd5-5f19-469e-b799-7dae329a50cd ONLINE 0 0 8
662f4e95-9b32-4ba9-962e-23af445db904 ONLINE 0 0 8
errors: Permanent errors have been detected in the following files:
/mnt/App_Pool/applications/plex/Metadata/Movies/e/5c56c3f91621f71827c14024fac43c790a97c68.bundle/Contents/com.plexapp.agents.localmedia
App_Pool/applications/plex:<0x7106a>
/mnt/App_Pool/applications/plex/Metadata/Movies/e/5d4bc2d824d9f1b6020b6d02d66b0475f87d0cf.bundle/Contents
App_Pool/applications/plex:<0x7107e>
App_Pool/applications/plex:<0x70fed>
App_Pool/applications/plex:<0x70fee>
App_Pool/applications/plex:<0x70ff9>
This is with the RAM that passes memtests? My hunch is that you’re seeing this:
It’s likely that a read of a file/folder (browsing in SMB or Plex running in the background) forces a checksum verification on the the specific blocks. Because it does not match what is expected, it flags it as a checksum error.
You could have good data that is incorrectly being flagged as “corrupt” by ZFS, since the checksum that was saved to disk (when you had bad RAM) does not match the checksum of the block when computed during a read or scrub.
What is this pool? Does it contain irreplaceable data or media? Is it only used for app installs, configs, and “app metadata”, such as thumbnails and library info for Plex?
This is app installs, configs, and metadata. I’m ok rebuilding it, just would want to understand the “for dummies” way to not have to reconfigure everything from the ground up.
I know I can copy out the Minecraft world my kid and I share and then reload it, but the configs I’m not sure on. Plex Metadata is a very nice to have to retain as I have a number of concerts and ballets that don’t appear in IMDB so have to have metadata generated manually.
Because it’s not a simple data pool, to rebuild it will take some patience and manual intervention. Unfortunately, you cannot replicate it to another pool because it will just copy the corruption and/or bad checksums exactly as they are.
What I would do is copy anything important or irreplaceable to a temporary dataset. You can create a new dataset on the data pool with the name “migration”. Give it the most permissive permissions. In the command-line, rsync (with “archive mode”) everything to it. From here, you can rsync everything back over to the new apps pool, after you configure it with your apps again. Do not destroy the “migration” dataset until you are 100% sure that you have everything you need saved on the new apps pool.
This assumes you’re doing everything with the 16 GB of good RAM.
Tracked down a file under one of the directories.. an XML file Plex created that the system somehow thinks is well over 1 petabyte where this Pool only has 258 GiB in a single VDEV.
Can you help verify if I have the steps down correctly?:
Copy the whole pool somewhere else just in case so I have a full copy? I do have a replication task currently to another pool, does that cover it?
Go through my apps and figure out what smaller set of data I actually need to create a copy somewhere (probably by looking at documentation on how to migrate servers)
Copy those folders to a different dataset (will probably copy them to a different pool to be safer)
Take screenshots of all my apps and their config details so I can rebuild them (seems I need to vote on a feature request to back up app configs because I don’t have the time to properly learn custom YAML stuff yet)
Unset the Pool as my Apps pool
Destroy the Pool? VDEV only? Not sure specifically what to do here to clean it out.
Re-create the Pool and/or VDEV
Re-set the new Pool as my Apps pool
Re-install and manually reconfigure all my apps
Copy over the specific backup files needed for specific apps for a “Migration”
Or digging into this further… the root folder that all the errors are under seems to be a vestige.. possibly from when I migrated off of my old cruddy QNAP and maybe copied that metadata to the wrong location (which would explain why I had to manually rebuild so much). Plex documentation points to a different folder where metadata bundles are stored.
Would deleting that whole folder help? If the files aren’t there, the bad checksums should be gone?
I can stomach jumping through the rebuild hoops if I need, but if this is a less painful way, it would save me a lot of time. I can still rsync it somewhere just in case it borks something up.
You also have corrupted ZFS metadata which you cannot delete with user tools, like @etorix said.
It’s just not worth it to try to delete specific files/folders.
Looks like filesystem corruption. I’m not sure what will happen when such a file gets copied with rsync. After all, the size of a file (in bytes, as set by the filesystem) determines where the end of a file is.
A ZFS replication will retain the corrupted data and/or checksums.
This is tedious, but it might be your best shot.
You can make a dataset on your storage pool to temporarily hold everything outside of your apps pool, which you’re going to end up rebuilding.
Only after you are 100% sure the data you need is safely copied to another pool. I would use the -a flag with rsync, and run it as “sudo” or “root” to make sure the permissions and owernships are the same as before. The temporary dataset should have open permissions, so that it doesn’t get into the way of any granular permissions below it.
There is a “quick wipe” option. After you export a pool, its disks can be wiped. Be careful that you don’t accidentally wipe the wrong disks.
“ZFS + non-ECC” is still better than “non-ZFS + non-ECC”.
It intuitively feels like ZFS is more prone to data corruption and integrity issues without ECC RAM, but that is only because it is making the user aware of corruption from bad RAM. If you were not using ZFS, you would be sitting on silent corruption, and possibly be in an even worse situation.
ECC RAM is not required for ZFS. ZFS is not somehow more dependent on ECC to work properly, even if it seems like it. If this was a generic Linux home servers forum, the recommendation for ECC RAM on an Ext4 or XFS Linux server would be the same.
In terms of cost and feasibility, you can have a setup where only your main NAS server has ECC. Your backup (non-ECC) ZFS server will receive replications that will immediately make you aware of any issues because of a checksum mismatch. Once the bad RAM is replaced on the backup server, you can continue your replications. You won’t be stuck in a situation where corrupted data/checksums are replicated “intact” to a backup.
Um, why does this matter? When I had to RMA RAM before, they didn’t care when I purchased it or from who. It is a lifetime warranty. Try to get the RAM RMA’d. You have nothing to lose.
How many times did you let MemTest86+ run? I recommend a minimum of 5 complete passes, but since you are having failure, double that. It doesn’t matter how long it takes, it matters that it works.
Also, run a CPU Stress Test for 4 hours. Your RAM issues could also be tied to the CPU, or motherboard.
I would not use this computer until you have established it is rock solid or you are just risking more pain.
Can you provide some specific details on your hardware?
–Motherboard: Make, Model, BIOS version
–CPU: Make, Model, BIOS Clock Speed
–RAM (the failing stuff): Make, Model (please provide a photo of both sides of the stick, there are a lot of numbers and unless you know which one they are, this is the easy way to post it).
I have enough I think on what you have tried with respect to RAM clock speed.
–Any additional hardware you have plugged into the motherboard.
Provide these details, please. We can then see what might be happening. Also, is the RAM on the QVL for your motherboard? Eh, we will check that once you provide the data requested.
@winnielinnie Good catch on the RAM Test. My jaw dropped when you started saying “memtest”, waiting for a link to a Meme.
Getting into a mainboard/cpu that will support ECC will use more than the 65w my current CPU maxes out at. This machine mostly sits idle except for a Plex watch or my son and I playing on the Minecraft server maybe once every few weeks. Threadripper can idle low, but the entry price is high. Cheaper Intel stuff uses more power. So not the ram itself, but the overall system. It’s sitting in my office which is already ~8 degrees warmer than the rest of the house due to the machines in it (this server, my personal rig, my work laptop, an old micro PC running my home automation, another micro pc for my kid to play Minecraft on, and a laser printer. So the power use is important to me.
Because Corsair’s RMA process requires proof of purchase from either them or an authorized reseller? Yes, some people have gotten lucky, but not everyone, and it takes months to go through an RMA process. I may try, but my focus now is getting things stable again.
On the 32gb ebay kit that failed? 3 passes, but the errors were very clear and significant. The old 16gb kit (GSkill Trident Z F4-3200C16D-16GTZB) I ran 5 times with no issues. I purchased these new back in 2017 and they were running in this server for 2 years with no displayed errors before I “upgraded” to the 32gb kit this past November.
This is all in my signature except my bios version. Currently it’s at P7.10.
It’s basically impossible to find anything on the QVL list because it’s so old and the kits they tested were not what was available for sale even at launch. QVL doesn’t really mean anything, just that Asrock chose to test those over others.
What I may do is take a spare drive I have, add it in, set it as my app pool, set up all the apps and test moving files so I can verify running the apps with as much data retention as I can before clearing out and rebuilding the old pool. Will still copy the more critical stuff elsewhere, but being able to in theory flip back and forth while verifying the “migration” process feels less risky.
Your BIOS is very out of date. I recommend you update it, but read the warnings on the ASRock website. There is a CPU issue, but I believe that is a CPU before Ryzen. I don’t know if this will fix your issue, it might.
Do they have the heatsinks still installed? And of course the RAM could have been defective when the person sold it to you, unless you performed a MemTest on the RAM once you first installed it.
DDR4 RAM? What country do you live in? If USA, I can look to see what I have, if I have any. Send me a DM if interested.
I’ve traditionally stuck to the old adage of “unless you need to update your bios, don’t”. My CPU is first gen Ryzen (Summit Ridge, not the Athlon that the newer bios versions drop support for) and none of the newer versions have microcode, but they will have AGESA versions which could help.
I doubt it will help the huge ram failings, but I can still update.
Yes they do
I honestly don’t remember. I remember running memtest at some point, but I was eyeball deep in work and exhausted. Learning experience for me. Thankfully the bad ram wasn’t that expensive (was a decent deal in November, but a steal today). I’ll still try the RMA route in case I get lucky - then I’ll have a spare set around.
Yes DDR4, yes USA. I’ll DM. Thanks for checking what you have! My motherboard and Ryzen support ECC UDIMM which would be ideal in this situation.
I’m planning to keep running with the older stable 16gb and try and find some ECC UDIMM or if that takes too long, bite the bullet and get into an Epyc platform (at least the 170W TDP is better than Threadripper’s 250W).
Nonsense. You’re confusing “TDP” for “actual power use”.
An Intel C2xx or Wx80 motherboard with ECC would actually idle lower than your current Ryzen system without ECC.
Nope it can’t: The I/O die prevents that.
Same for your Ryzen CPU by the way. In AMD’s lineup only the monolithic APUs have low idle power to compete with Intel Core or Xeon E. (“ECC” doesn’t have to be Xeon Scalable or SP3/SP5/SP6 EPYC.)