Data protection when not using ECC

I’m planning on setting up a NAS with TrueNAS. I am very, very new to this; so far I have only built one PC myself and that was a “regular” one, so please excuse any naivety.

Since I have my old gaming rig sitting around, I’d like to reuse its motherboard, CPU, and memory, as that will be a lot cheaper than buying those new, considering they’re free. This means I won’t be able to use ECC memory. Are there any good ways to still protect my data? I’m thinking of maybe good habits to have, like not unnecessarily changing important data and writing it to disk, or ways to check whether memory sticks are still all right or more error-prone.

I’m planning on backing up the most important things, mainly family photos and similar, to rsync (the cloud service, not the command like utility). This would help in total failures, but not minor corruptions that add up here or there.

Also, I have heard that it is heavily recommended to use RAID-Z2. I am planning on buying 8 TB IronWolf Pro drives, and that’s 400€ (2 drives) in parity alone. I was thinking of getting 3 drives and adding 2 more depending on how fast I fill my storage and using those with RAID-Z1. Is RAID-Z2 really that important? If I buy my drives separately, at multiple stores, they will probably be from different batches and won’t be likely to fail in the same few days, unless something happens to the whole NAS, right?

Well, if your data is already written intact and then never changes, you can consider your data protected. This is for one typical with personal photo archives.

Before deploying TrueNAS, you should run memtest86(+) for several hours.
Others could also suggest to:

  1. run memtest every month in non-ECC case
  2. turn off memory and cpu overclocking

RAID(z) is not a backup. Strictly speaking, raid(z) is a tool for high availability (aka high uptime). You should always have backups of your important data. Ideally you should implement a 3-2-1 backup strategy.

I personally think that raidz1 is ok. Although I’ve only used mirrors with ZFS.

Aiui, yes.

2 Likes

Well, if your data is already written intact and then never changes, you can consider your data protected. This is for one typical with personal photo archives.

True, I should probably avoid touching the metadata then though, but otherwise that’s a good point.

Before deploying TrueNAS, you should run memtest86(+) for several hours.

Alright, will do, thanks! By “before deploying” do you mean that I should set up TrueNAS and then run it before doing other things, or should I live boot a Linux distro and do it that way?

turn off memory and cpu overclocking

I am not planning on overclocking the CPU. Memory overclocking would be XMPP, correct? I’d usually enable that, but I suppose I’ll avoid it here then.

RAID(z) is not a backup. Strictly speaking, raid(z) is a tool for high availability (aka high uptime).

Ah, I may not have been clear enough, sorry. Yes, I am planning on using backups, as I mentioned I will back the most important things up to rsync, however it’d be rather expensive to back everything up to it, so I’m concerned about RaidZ for the “less important” data (but that I’d still prefer not to lose).

I personally think that raidz1 is ok.

Alright, cool!

While my wording was imprecise, I meant “before storing any valuable data on the truenas”.

Memtest86(+) doesn’t need any distro – it is bootable directly from UEFI/BIOS. Just flash some usb stick with it.

Perhaps. I don’t have an opinion on overclocking myself. Just repeating what others say about non-ECC setups.

3 Likes

Run memtest86 before installing anything.

Relying on RAM of unknown status for more than you absolutely must is unnecessary since you can so easily run memtest from a USB stick you prepared on a known good system, so running through the entire install of TrueNAS is something that can wait until after you’ve verified stability.

2 Likes

That depends how paranoid you are, and how much you want to avoid ever having to resort to the backups. So it is highly subjective. But, yes, two degrees of redundancy is a common comfort zone.

1 Like

I have one more question, actually! I heard people mention non-ECC memory can screw up RAID configs. I’d assume those are something that gets written more often. Are you aware of this, and if so, could you explain the risks, please? Would it “just” mess with the parity, or would that actually destroy data that wasn’t even written in the corrupt operation? And would such a thing be noticed compared to a regular bad write?

You’re looking in the wrong direction.
Misbehaving RAM rarely destroys data. Rather, it appears as random, hard to diagnose disk errors. To corrupt data, either the RAM bit flip occurs when data is received and before data is checksumed, or the flip occurs on especially critical metadata and a “Scrub of Death” occurs (which takes some exceptionally misbehaving RAM).
Notice that I didn’t mention ECC… All RAM can misbehave, but then ECC detects the failure and either corrects itself or reports the issue—and saves the pain of troubleshooting random errors where non-ECC RAM just leaves you in the dark.

Go ahead with your plan to have a first go at ZFS by recycling your older non-ECC motherboard.
If you’re worried, there’s a ZFS debug flag to force checksuming RAM as well.
Or consider investing in second-hand server-grade components from the start… :wink:

3 Likes

Rather, it appears as random, hard to diagnose disk errors.

How do those errors manifest then, if not by corrupting data? Or do they just show up as error messages with no actual effect?

Or consider investing in second-hand server-grade components from the start… :wink:

I am considering it; however, I am utterly unfamiliar with everything there. I got all my knowledge of consumer-grade stuff by watching channels like Linus Tech Tips, GamersNexus, Level1Techs, and the like over the years, but server-grade stuff makes me feel like the type of person who buys a prebuilt gaming rig trying to put something custom together. Not that there’s anything wrong with that, but it’s tough.

Plus, for now, it’s cheaper to use my old stuff. I can upgrade to secondhand server parts when I can tell this is no longer enough, whether that’s due to too little CPU power or RAM messing things up.

In the meantime, I can hopefully learn. Do you know any good resources for learning about server-grade hardware on the side?

Mostly, since ZFS checksuming should catch them and redundancy allow to correct.

Nothing complicated. It’s either:

  • Xeon Scalable/EPYC 5000+ stuff, which is likely of no relevance (overpowered);
  • Xeon E/EPYC 4000, which is basically the consumer stuff you know about with (official) ECC support (but without gaming style bell and whistles);
  • embedded boards (Xeon D, Atom C3000, EPYC 3000), very relevant (though mostly older generations for NAS) and probably very exotic to you but not that hard to understand (limited flexibility, it does well what it is designed for, but don’t try to teach a pig to sing).
1 Like

Alright, cool!

And thanks for the website link, I’ll skim around through that!

Some remarks on top of the others:

  • I would run MEMTest for at least a week, repeatedly.
  • Replace your paste on your CPU to avoid any thermal issues.
  • You did not write too much about your actual HW configuration, but another, relatively cost effective way of building an ECC system is to use the Aliexpress, X79 (DDR3 ECC RAM ) or X99 (DDR4 ECC RAM) setups. ( ihave already like at least 6 full systems deployed at home that work without any problems
  • YOU can buy those MoBos starting from 30+EUR per piece, a reasonable Xeon E5 16xx/26xx CPU from like 10 EUR, a decent LGA 2011 CPU cooler for 20+EUR and 2x8 GB ECC DDR3 RAM from like 25+EUR.
  • This is in total is less then 100 EUR (about 70).
  • YOU should go after a MoBo, that has at least 6 sATA ports onboard, otherwise, you also have to spend on a HBA in IT mode, if you want to use more HDD-s
  • This will be real server grade CPU with full ECC support, the MoBo is brand new, only the chipset is recycled from previously discarded PCs, The CPU is second hand, but they usually work fine (I ordered already like 20+ and never had any faulty ones before), the RAM is either New, New old stock, or similarly recycled, or used, depending on the price, so most of the parts will be reliable.
  • However there are some things you must consider about this configurations:
  • It is not new components
  • The X79 platform is easily 12+ years old, but if you only use it as a simple NAS only
  • Yes, the power consumption will be significantly higher than lets say an Intel Atom system, but this is a hobby and you save a LOT by using this older architecture… Furthermore, the cheap ebterprise grade gear also most likely will be exactly this family of old xeons.
  • You dont need the high end CPUs with20+ cores/threads, a decent 4core/8thread CPU is more than enough for the NAS only task.
  • I recommend the E5-xxxx V4 Xeons (but that needs X99 platform and DDR4 ECC RAM) to buy. they are the last generation in this socket.
  • If you only use the system as NAS, you can even go after the “L”, (low power) CPUs, that has a TDP around 50-70W (However, this low TDP is because they will clock much less higher than the non L variants, but if you only use this as NAS? it will not be a problem.
  • Speed will not be top tier, but will be sufficient for your NAS purposes.
  • These systems do not have on board GPU, neither have the CPUs, therefore you have to buy a discrete GPU (if you don’t have any lying around), during the installation. After, you set up the system, you can remove the GPU, since you will use the webUI.
  • This system do not have any IPMI implemented, so remote off-band managenet is off the table. (However there area number of such solutions available for less than a 100EUR on the market.
  • If you go for the enterprise gear, avoid 1U systems! They look cool and cute, but there are a lot of issues with them that make those a big pain in the lower region (heat management, noise, lack of alternative support for replacement parts, lack of sufficient HDD bays, etc). A 2U is mostly OK (you can only install half heights cards into it), but 3U and up is the most compatible with consumer grade parts.

So, in summary, these old Xeons according to my opinion are a good alternative to second hand enterprise gear.
I use them for like 5 years now, and I only have very few issues with them.

1 Like