What ZFS SCRUB Errors specifically define if a file(s) are corrupt and the data to the user is bad, meaning if this was the users only copy of the data, that person no longer has that data.
And I’d like to preface this conversation with the fact that we are not willing to pay the costs of a data recovery company as the average person would not be able to afford it.
This topic sort of comes up daily when someone reports some error message on a corrupt piece of data, and in a forum thread I read this morning had some contradictions and interesting points which led me to read into it a bit more.
Before this morning I would have said that any SCRUB data corruption meant that the data is no longer available to the user, but now I’m thinking it is a bit more complicated than that and may depend on the pool layout, but maybe not.
Specifically if we have a MIRROR and a SCRUB says one of the drives has CKSUM errors and then lists a corrupt file name, does that mean the file is corrupt on just the one drive, both drives, to the pool as a whole?
It would be nice to make a simple chart that states pool layouts and the various scenarios and list if the file is corrupt or not, if not, where that data resides.
For example: RAIDZ2 using 5 drives, SCRUB reports permanent errors for file “ABCD” and CKSUM has a value of 34 for one of the drives, all other drives have no errors. What does this mean?
In the past I’d say it meant that the file “ABCD” was corrupt and not recoverable. But what about a MIRROR? Same thing? I don’t know.
Based on how I read the statement below, if ZFS reports a corrupted file, it is corrupt for the entire pool, even if only one drive is showing CKSUM errors. And let’s keep in mind that it does not matter how the corruption occurred but the corruption does exists.
From an Oracle Document:
Data corruption errors are always fatal. Their presence indicates that at least one application experienced an I/O error due to corrupt data within the pool. Device errors within a redundant pool do not result in data corruption and are not recorded as part of this log. By default, only the number of errors found is displayed.
Now if a SCRUB does not report a file error and does report CKSUM errors for a drives, that means the data is still intact but a drive may be having a problem. Again, my interpretation.
Feel free to post any factual data you can locate and your interpretation of that data but the most important thing is to make it simple and easy to read. I’d like to add the results of this discussion to @Arwen ZFS pools & power loss Resource once this is clear, since unstable systems and power loss seem to be blamed for a lot of these kinds of errors.