Best practices on non-ECC system

My point is the silent data corruption is an even worse consequence of lack of ECC, as it will contaminate the backup too.

3 Likes

Your frequent reminder that this article is still true:

https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/

Plus my personal corollary:

  • ECC is better than no ECC in any setting
  • ZFS is better than any other filesystem, ECC or not, regarding safety of your data

It really boils down to this. If you are a hobbyist or SMB with some tech knowhow and a time budget to build your own - why would you settle for anything but ECC memory? Take your time, go shopping … if it’s business critical and you need it now there is still the absolutely awesome A2SDi line of mainboards by Supermicro.

Or if $business is of sufficient size, just buy a real server, for crying out loud. ECC is standard with these. But that is of course not the main audience on this forum.

Kind regards,
Patrick

6 Likes

Whiy is ECC that much important for TrueNAS?

I never heard that my MacBook uses ECC, why there not needed?

I know that you made your research before posting so, I know that I will not be able to brain wash you :person_shrugging:
hmmm, just keep the files in your mac.

ECC is beneficial for ANY computer and ANY operating system.
But again: One presumably uses ZFS because one cares about data integrity. ZFS code assumes that RAM is reliable, i.e. it has a built-in assumption for ECC.

Mac Pro and iMacPro have ECC. And I certainly wish that my MacBook Pro had it as well.

3 Likes

Yes. Me too. And I wish that APFS checksummed data and not just metadata.

(I don’t have hard proof for that statement, but it’s what I’ve heard)

2 Likes

One of the issues with the Apple File System is just how undocumented the whole thing is. Folk who make recovery / RAID software have complained bitterly re: the lack of documentation. But I heard similar things there re: checksums, ie that it currently only checksums basic directory data, not the files themselves.

I wish my Apple hardware had ECC but Tim Cook likes to make things as cheaply as possible, so it won’t be made that way unless it’s a pro desktop. So clearly the chips and software can handle ECC, the pro laptops don’t get it though.

1 Like

Hi There,

I finally built it, following your advices I’ve finally opted for ECC capable system.

Motherboard Supermicro M11SDV-8CT-LN4F (310€)

  • AMD EPYC 3201 8-cores (11 Watts idle)
  • PCI bus 3.0
  • 4x SATA ports
  • 1x PCI x16 (support 8x8 or 4x4x4)
  • 4x ECC RDIMM slots

Parts :

  • 4x16GB@2666Mhz ECC RDIMM Samsung M393A2G40EB2-CTD (70€)
  • 2x240GB Crucial BX500 SATA SSD (boot disk) (50€)
  • 1x M2 to 5x SATA ports controller JMB585 (46€)
  • Case Jonsbo N2 (187€)
  • PCIe network adapter 10Gb SFP 82599 (X520-DA1) (45€)

Total without data disk : 708€

Installed true nas scale, all devices are fully recognized. :+1:

Note : I spent 2H trying to figure out why that IPMI Gb port was not coming up… => By default there is a jumper setting that disable that port

6 Likes

Nice. Looks like a good choice. But I think you need to add a fan to the cpu heatsink.

1 Like

Well CPU is less than 50° (the system has no disks for now)
I’ll monitor that once I’ll receive my hdds

Run Prime95 in stress test mode and see what happens.

1 Like

…or, polish the metal to remove the paint. Then use it to cook breakfast :laughing:

1 Like

That CPU heat sink was made for a case that has some reasonable airflow with the case to keep it cool.

50C at idle is pretty hot. As @Stux has said, run Prime95, see what happens then. You are going to hit the 95C thermal cap and throttling of the CPU will start.

This is a photo of the motherboard that is sold with a CPU fan installed.

I’m not saying that you must mount a fan to the heatsink directly (it would work much better) however before you complete your build, consider adding a fan to blow across the CPU heatsink. Or make a plan to move forward if you start to notice throttling. Cooler components tend to last longer.

One thing I liked about this board is the fact that you can bifurcate the x16 PCIe slot into 2x2x2x2 (according to the video I watched, however if I were to need it, I would research it much more). That is very nice to have on such a small board.

I finally attached a 120x120x10 fan under the ventilation grille of the case.

Before, during memtest, CPU temp was 95°, now it’s toping à 60°C

5 Likes

Please tell me where you bought that stuff at that price!
In Italy that prices are more than doubled

Hey, thanks for listing these things out, like this. I’m duplicating your build.

Any lessons-learned? Does it run especially hot? slow? anything? I can see that the old VGA port is going to be slightly annoying, but I have a bucket of adapters going back 30 years… should work out fine.

ANSWER the question people!

Been running non-ECC systems for many years.
Not one data issue.

Secret?

UPS with good enough battery and connected to TN so it can detect power outages, plus the status of the battery and other things!

I did answer the general question:

That you know of.

ZFS is not magic, just good engineering that many Open Source projects can’t afford. See BTRFS, HAMMER1/2 and BCacheFS as examples. All want to be either feature set close to ZFS or better than ZFS. And none come close, yet. (Hopefully that will change…)

I am not saying Non-ECC is certain to damage files or corrupt pools. I personally run 4 computers at home without ECC RAM, each with 2 pools, (root pool and data pool). Other than the non-redundant media pool, I have had no known disk errors. Even with lots of unexpected power offs.

UPS were never required for ZFS. Never. Sun Microsystems specifically designed ZFS to survive any unexpected power loss or OS crash without existing data corruption. That is COW, Copy On Write, in action. Plus, no boot time file system check required.

Of course, unexpected power loss can damage hardware. Or an unexpected power loss could expose a hardware RAID controller’s out of order writes which CAN corrupted a pool.

That all said, YES, UPSes help with up time, and keep brownouts from damaging hardware. (That damage could cause data loss.) I even have 2 UPSes at home.

3 Likes

Got to be honest here, the extra cost of ECC RAM is small if you want to do the most to protect your data. You typically spend more on your consumabe hard drives. But people take risks all the time. Most of us here think more like a corporate mindset. Is the data invaluable? Protect as best as possible as the business depends on it. My personal data may not be that important but if i were audited by the tax people, my data is available now. I hope to never have that audit again because they suck and put a lot of stress on you.

Best practice for tax records is also to retain print copies in a pest controlled environment. Yes, file cabinets still have a use case in this data age. I keep such records and supporting documents dating from the 1970s for myself and parents in a safe location.