I have had use a no-ecc build, and to avoid problems
1 full memtest at months or 2
specific backup rotation for most important files (family photo, documents): in short, only adding new files without touch anything else
This strategy seems working good on short term, but on long term and with amount of file growing, became painfull/boring/error capable.
As you, was thinking about “ecc support cost too much”… but in reality i was just not hitting what market offers. Just search older components, i have 2 ecc build based on 7 gen Intel, the offsite One cost me less than 100€ (with 2 small disks)
…I’ve used non-ECC for years w/o a KNOWN problem but maybe is because all my NASs use RaidZ2 ?
Let us be clear, unless you verified the files AFTER you put them in the RAID-Zx / Mirroring pool, by comparing to the original source, the error could easily have been introduced BEFORE ZFS check-summed and added parity to one of the files blocks.
Sequence of events:
Source file from another computer
Write file to NAS
File blocks are in memory
File blocks are prepared for writing into Pool’s vDev, with checksum and possibly RAID-Zx parity
File is written to Pool’s vDev
The Non-ECC memory corruption can occur during step 3.
Worse, step 3 is actually multiple steps:
a. Network stack brings in IP packets
b. Network stack organizes the IP packets in to a data stream for the application, (NFS, SMB, iSCSI, etc…)
c. The application, (NFS, SMB, iSCSI, etc…), compiles the file blocks into a block to be written to storage, (aka Pool).
d. ZFS may copy the file blocks into separate writes per disk in a vDev
e. ZFS may copy the file blocks into the ARC
So, as you can see, while ZFS protects data whence it is in a pool with redundant vDevs, (aka RAID-Zx or Mirrors), it can not protect against memory bit flips.
This does not even touch upon the source computer potentially corrupting the file while reading it into it’s own Non-ECC memory.
Some people think I am paranoid. But when computers really are out to get you, that’s not paranoia but taking justifiable precautions.
When ECC options cost as little extra as they do, it’s a bit like the folk who drive around without their seatbelt on. Statistically speaking, they might get away with it for a whole lifetime without serious injury.
But if they have a serious accident, a worn / used seatbelt can frequently be the difference between walking away from the accident and being carried away in a body bag.
It’s no different with our CPUs. Many OEMs like Apple have simply de-prioritized ECC because non-ECC RAM is cheaper and the incidence rate re: bit flips is low enough where you can blame issues on other factors as well (such as the lack of comprehensive checksums in the current edition of the Apple file system)
But just like the students who discovered that their ‘free’ Google accounts can just vanish (and with them, everything the kids had worked on for years) serious data issues can crop up in unexpected places, just car crashes. I have no realistic choice re: ECC on my laptop given my OS preferences but once the data is on the server, it should stay good.
Identify a couple of boards of interest, then save your search on ebay. Or even the major server / motherboard manufacturers like supermicro, Asrock Rack, Gigabyte, etc. Stuff comes up for sale more rarely in Europe, but it comes up for sale over there too.
I’m French, based in the Netherlands, so my hardware costs are pretty much the same as yours, @darkbouny.
How many drives? What network speed?
Like @Constantin has his favourite X10SDV-2C-7TP4F (and if he can source these easily on his side of the Atlantic, he’s very right to push these for pure storage uses), I have two cheap ECC proposals for you that I’ve been repeating for some time:
Gigabyte MC12-LE0 µATX, 6 SATA + boot M.2; takes an ECC-capable AM4 Ryzen of your choice (PRO APU, like a 4650G, 5350G or 5650G, which you can find on eBay, for lowest idle power; just about any desktop CPU to use the x16 in x4x4x4x4 mode for some cool little NVMe pool); 10G NIC possible in the x4 (CPU!) slot
Gigabyte MJ11-EC1 mITX, 8 SATA (including 4 from an extra SFF-8654 4i breakout cable, easily available on eBay), 1 M.2, 1 GbE NIC… and you can’t do much about it; takes cheap RDIMM (64 GB DDR-2400 is 60€ here)
More tinkering (3D-printing your I/O shield), but you can’t beat the price!
Please don’t tell me these genuine server boards and their ECC RAM are too expensive.
Why do people ask this question so often? Are they hoping someone will tell them that Non-ECC is just as good as ECC? I even use ECC RAM in my main computer for the same reason. ECC vs. Non-ECC comes down to, in my opinion, the value of your data.
If you can live with corrupt data at of course the worst possible moment, then Non-ECC is fine. If you would kick yourself in the ass over and over again when you lost that data, then use ECC. The same thing goes for drive redundancy.
As for a best practice, no one who uses Non-ECC will like my answer: Turn the computer off, unplug it, stick it into a corner until you can afford ECC RAM. Anything else comes with a risk, even though it may be a small risk.
If you owned 1000 bit coins, would the block chain servers your bitcoins are validated in be okay if they used Non-ECC RAM? This may be a poor example as bitcoin block chains are better protected than that, but it is an example.
I actually wrote up a system to checksum memory blocks, to try and detect corruption by malware. But, it in theory could also detect “some” types of memory errors. Like multi-bit errors on ECC RAM or any type of error on Non-ECC RAM.
Now this would not be “live” error detection on reads like ZFS. It would require a “scrub” program to be run regularly. Nor would it apply to all used memory. Only R/O blocks of code or data. And it would have to be integrated into the OS. Potentially even into the ELF files.
All in all, a major effort. However, one which might both improve security and reliability.
This old theread discusses debug flag ZFS_DEBUG_MODIFY which triggers checksumming in memory. I suppose one could enable it on non-ECC systems, but that only highlights that the best solution is to use ECC RAM in the first place.
I’ve recently build a new system with ECC memory on a very tight budget. This is what I got:
Motherboard Supermicro X9DRL-iF (~$75)
– 10 SATA ports (only 2 are SATA III, but SATA II is enough for hard drives)
– LGA 2011 socket, Intel C602 chipset (2012)
CPU Intel Xeon E5-2667v2 (8 cores, 16 threads | 3.3GHz w/ boost up to 4GHz) (~$24)
And no, I don’t live in USA or something. I don’t even have ebay in my country
I understand you can’t use ATX and therefore this exact motherboard is not suitable for you, but my point is you don’t have to break a bank to build an ECC system