Alright gamers, I’m very new and just trying to see what I need to do after getting an alert and running a few SMART scrubs.
It’s unusual to have checksum failures like that across your entire pool.
It indicates the data got written incorrectly.
Which could happen if you’re not using ECC memory, or your pool is connected weirdly etc.
What is your hardware and how is it configured?
Meanwhile, I’d suggest restarting and then doing a scrub again to see if the issue goes away.
It’s interesting as I feel like nowadays ECC is treated as optional in hardware recommendations, and rejecting a build list for lack of ECC support gets you branded an elitest.
I’m a fan of following Worst Practices so don’t have a UPS on my setup at my current temporary location.
Only time I’ve ended a scrub with corrupted data is when a faulty kitchen appliance was plugged in that tripped the circuit breaker and knocked out power. I assume all data that hadn’t been written from RAM to disk was lost.
I’ve also had a HBA glitch and drop all disks on that controller, which of course knocked the pool offline. I didn’t experience any corruption, but it was probably because nothing was being written to the pool.
Along with the previous suggestions, were the corrupted files being written to the disk? Was there possibly a power outage or crash that you missed? dmesg log could help.
Power outages shouldn’t cause corruption with ZFS if the hardware is performing correctly.
Lost data that is not sync written, yes. But not corruption.
How did you transfer your data into that pool ?
And check your cabling.
I had a situation like this when I transferred
data to my old non ECC system over the network from a client with an external USB drive.
Perhaps bad verbiage choice on my part, but I received the same list of files that were ‘incorrect’ due to a power loss preventing async data from being written. Any data on-disk would not be corrupted by power loss.
Personal feeling?
ECC is arguably “less essential” than a proper SATA/SAS controller, but if you’re serious about your data it is not really optional. I’d say a first build without ECC (e.g. recycling an older desktop as a first foray into NAS) is possibly acceptable, depending on other components, but that a further TrueNAS build should really feature ECC.
Obviously ECC is better than non-ECC, but just how essential is ECC memory?
There are differing views on the importance of ECC. For example, @lawrencesystems Tom Lawrence (of Lawrence Systems, a widely-regarded TrueNAS YouTuber) thinks not: https://www.youtube.com/watch?v=J4TXNnJYhQY.
More like experience. Not here, but other places online when some asks “how 'bout this motherboard”. If they were just taking something they had lying around it would be one thing, but these are new builds.
Just because it is a new build does not automatically make ECC affordable.
If you are already buying a MB and processing which support ECC, it would be stupid IMO not to buy ECC memory for it.
If you are about to buy a high-end MB and processor which does not support ECC, think again and buy a similar setup with ECC because the marginal cost % is going to be quite small.
If you are about to buy a low-end MB / processor, then the marginal cost % to upgrade to ECC might be pretty high.
In the end, it will depend on your budget and risk-acceptance profile.
Anecdote: My own TrueNAS system is c. 1 year old, does not have ECC and I have not yet had a single checksum error or data corruption despite not having ECC. It’s a 2nd user NAS appliance - and a new low-end self-build would have cost 3-4x as much - and ECC probably 5-6x as much.
ZFS checks for CRC, which increases the relative importance of the ECC.
There were posts explaining in simple terms.
Actually what that article says is the opposite. Here is a direct quote:
- Myth #2 - ECC RAM must be used.
- All filesystems benefit from ECC RAM and ZFS is no different here.
- ZFS without ECC RAM is safer than other file systems with ECC RAM (checksums).
I understand that you do not understand my intentions.
Note: I use a translation website.
Therefore, rather than explain it in my own words
I have chosen to quote.