Per various discussions in other threads, lets discuss ECC verses Non-ECC memory for HOME style servers.
I am of 2 minds about this.
First, it appears I have not experienced any memory errors on my home computers, in the last 25 years. These computers have never over-clocked anything, not CPU, GPU or RAM. These vary from smaller desktops, miniature PCs and laptops. I have owned real servers, for example a Sun X2200 M2 AMD dual Opteron, which did have ECC. But they too did not report any RAM errors, (ECC correctable errors).
Of course my sample size is so trivially small, it is worth discounting 100%.
It is also worth noting that vendors, like Intel, give us limited choices for hardware that support ECC for the low end. Like desktops, laptops or home servers. So it is not surprising that Non-ECC RAM is used in the home most often. Even some small business as well.
Now on the other side. I have experienced unexplained crashes and data corruption on disk, (aka NOT a bad disk block). This is when I used something other than OpenZFS.
Was that due to Non-ECC RAM bit flips?
Don’t know.
One thing that I’ve seen in the last 10 years here in the FreeNAS and now TrueNAS forums, are unexplained ZFS pool corruption. We have several known reasons why a pool can get corrupt. Most appear to be either related to virtualizing their NAS, or using hardware RAID controllers.
Yet for those who are not virtualizing TrueNAS, not using hardware RAID disk controllers nor any of the other items, we never got a good explanation for the pool corruption. Could it have been memory errors working their way into the pool? Perhaps. No hard evidence that is the case.
One thing to note. People have assumed that an uexpected power loss could cause ZFS pool corruption. Nothing is farther from the truth! ZFS was SPECIFICALLY designed and TESTED, for unexpected power losses and NOT loose any data. (Except data in flight.) Further, ZERO pool corruption is EXPECTED after unexpected power losses.
Of course hardware can fail during a graceless power loss. Like a disk or disk controller… Or maybe a regression was introduced to OpenZFS breaking it’s programmatically sound recovery from a graceless power loss.
So, lots to think about.
Polite discussions please.