Something has been happening here occasionally, ZFS Metadata corruption.
The tl;dr is that ZFS Metadata corruption should not be occurring unless the user has a non-redundant pool AND has changed the default of redundant_metadata=all. Or has bad hardware. Certainly not at the rate we seem to be seeing here in the TrueNAS forums.
Some history and design notes.
Sun seemed to consider standard Metadata, like directory entries, more important than regular data. This makes sense in that a single bad disk block in the directory tree could take out an entire file. Having 2 or more copies of this standard Metadata allows that bad block to be worked around. Even repaired! On a non-redundant pool!!!
Note that this extra copy of standard Metadata is addition to any vDev redundancy in the pool. For example, a simple 2 way Mirror, would end up with 4 copies of standard Metadata, 2 per Mirror device.
Their is a ZFS Dataset property that can reduce the overhead of this Metadata. In general I can’t see many people changing this from the default. See this manual page and the redundant_metadata entry for details:
https://openzfs.github.io/openzfs-docs/man/master/7/zfsprops.7.html#redundant_metadata
In regards to critical Metadata. By default their are 3 copies, again because loss of critical Metadata could impact much more than a single file.
One last design note on ZFS Metadata. If the pool consists of more than 1 vDev, the extra copies are spread out. So, a 2 vDev 2 way Mirror, would have a standard Metadata copy on each 2 way Mirror. Besides the redundancy effect of spreading the Metadata data around, this causes a more balanced usage of the vDevs.
So how do we get ZFS Metadata corruption?
It may seem simple, power outage caused it. Lots of people in recent years seem to have pool import problems immediately after a power loss. But it is wrong to blame the power loss directly:
On other file systems, potentially yes, an OS crash or power loss could corrupt a file system. But, ZFS was specifically designed to avoid this problem. Data is either fully written, or not. No in between.
So, back to how?
The known causes are these:
- Non-ECC RAM caused a bit flip after the Metadata block was created & check-summed in RAM, but before it was written to storage, (which affects both copies).
- Some LSI HBAs can seem to write corrupt data when they overheat. (Note that wording, “can seem to”…).
- Hardware RAID controllers that do elevator seeking & writing, AND a power loss occurs during such an event. (Thus, “it worked for years without problems!!!” But, of course it did, this is a rare event!)
- Power supplies that are on the edge of reliability.
- Multi-disk USB enclosures may have hardware RAID controller chips, though generally reduced functionality.
- Use of USB attached storage which might have firmware or logic bugs that cause problems.
- It appears, (again note the word “appears”), that SATA Port Multipliers are not well supported software wise. So that perhaps they can cause problems too.
- And the rare, but possible ZFS bug
I can’t think of any more at present. Add comments if you think of a real potential cause.
A Guess on one cause
My personal belief, without real evidence, is that SOME, (and I truly mean a few, definitely NOT all), ZFS Metadata corruptions could be caused by transient memory errors. Here in the forums we have quite a few people running with Non-ECC RAM, so it is possible. Remember, I said that I don’t have real evidence, just a hunch.
Part of the reason I say this, is that the Enterprise users don’t appear to have the same Metadata corruption problem. Otherwise there would be a lot of complaining on that side.
With TrueNAS being one of the bigger free small business & home NASes, that uses ZFS, this makes a some sense. Some people are building their TrueNASes with consumer hardware that does not have ECC RAM support. Even when system boards do support ECC RAM, this also requires the user to select a CPU with such support AND buy ECC RAM. Then hope the BIOS implements ECC RAM correctly.
Server grade hardware would, I assume, have fully tested ECC RAM.
Personal experience with Metadata redundancy
Something odd happened to me a few years ago. My low power, miniature media server has 2 storage slots, one a mSATA SSD and the other a 2.5" disk bay. I installed a 1TB mSATA SSD and a 2TB HDD in this computer. Because the OS was going to be small compared to the media, I took a 50GB piece from each and made a Mirrored root pool for the OS.
However, with good backups I did not see the need to have redundancy on my media. So I striped the remaining space of the 2 storage devices. Occasionally over the years I lost a video file, which was statistically more likely due to their size. ZFS Scrubs told me which file and I would restore it from backups.
One day I noticed a read error, with short resilver, (if I remember correctly). But this was not accompanied by file name. In fact, the pool stated errors: No known data errors. I puzzled over this for a while. Eventually I figured it was likely that the block was in redundant Metadata.
In someways I wish I had better reporting from ZFS. Maybe ZFS does log these types of errors. But, that was a one time event that I did not investigate more thoroughly,
Afterword
Your thoughts?
Any useful info to add?
Suggestions on things I should fix?