Upgrading my NAS setup - hardware choices

blankedy · January 12, 2025, 5:16pm

Hi all, my plans have been somewhat thrown out of the window by several things I’ve read about here, so I’m seeking advice from the professionals

Current setup

My current setup is a TerraMaster F4-210 with 4x 4TB WD Red in RAID5, along with a ‘media PC’ which is running Plex/docker/etc duty.

Use cases

My use cases are:

Plex media library
Immich photo backup for myself and others (including AI tagging/search)
Various small services running in jailed docker

Hardware

Currently available

Everything in my current setup, plus:

a spare 250GB 2.5" SSD (Samsung Evo ###)
an AMD Ryzen 5 5600X

I have purchased 2x 16TB Seagate IronWolf Pro drives and aim to get a third soon, with the goal of having 32TB usable space in RAIDZ1. According to this post that should leave me with a failure chance of 0.016%.

Original plan

I was trying to spec out an i3 13100 in a Mini ITX board so that it would fit in the Node 304, but a) the i3 13100 does not support ECC memory (except a mysterious TE variant I can’t find for sale anywhere?), and b) with Mini ITX if I want a dedicated GPU for CUDA, I’m stuck without any PCIe SATA expansion cards.

If I want QuickSync for Plex transcodes, I’m limited to Intel, but if I want ECC I’m limited to Ryzen (or Intel’s server chips). I’m a little concerned about NVENC performance if GPU utilisation is high whilst doing ML tasks for Immich (I don’t know whether this is a reasonable concern or not).

I was thinking about leaving the 4x 4TB drives in the TerraMaster enclosure and turning them into a 4-wide RAID1 (or possibly JBOD mode and the equivalent managed within TrueNAS), or at least RAID 10. This would be used for photos, with the larger drives with lower redundancy for the media library (which it would be a pain to lose, but not The WorstTM).

New plan (help!)

I would mostly like the hardware to be both cheap (used) and low power. I have a spare 5600X knocking around, but the idle power draw on it is high (much higher than the 13100 I was eyeing up).

What are my options for cheap, low-power processors that support ECC memory? Then I just need a board and a case…

Storage

I’m having a bit of trouble thinking about how many/what capacity SSDs to buy, and indeed how to use them.

Another 120GB drive, to mirror the boot? I don’t need L2ARC, but would a (mirrored) sVDEV be useful?

Would it be better to retire the TerraMaster entirely and get a big enough case/PCIe SATA expansion card to hold all drives?

If you’ve stuck with me this far, thanks—and congrats for reaching the end!

Theo · January 12, 2025, 5:23pm

You seemed to be very constrained by your ECC memory requirement. I too was ECC or die until DDR5 came out. Random bit flips are much less likely and even the experts say you dont really need it. Watch Truenas Tech Talk EP7. They do a great job of covering this topic. I moved away from ECC and I am very happy with my systems. There is nothing inherent in ZFS that requires ECC memory IMHO (and the opinion of many others WAY smarter than me)

blankedy · January 12, 2025, 5:33pm

Thanks, I’d forgotten that DDR5 comes with some amount of error correction. As a newer platform it is pricier though, overall I would prefer to buy older (and used) DDR4 kit.

It would perhaps be simplifying to get the 13100 and forgo ECC—but is that to say there are no low idle-power Ryzen or Intel server chips that support it?

Farout · January 12, 2025, 7:11pm

atom C3xxx (ddr4 rdimm ecc) very low idle power
xeon D 15xx (ddr4 rdimm ecc) low idle power
xeon E- 2200/2300/2400 ( ddr4/5 udimm ecc) lowish idle power

Protopia · January 12, 2025, 9:32pm

Mine is a Terramaster F5-221.

I know you are talking about a new MB, but just in case you didn’t realise it, the F4-210 is an ARM processor and cannot run TrueNAS.

If these are the EFAX models then they are SMR drives which are totally unsuitable for ZFS redundant vDevs. Knowing how SMR works, I am unclear how they can be suitable for hardware RAID1/5 either, but it sounds like you are planning to replace them anyway.

Can you please explain this calculation, because having looked at that post it doesn’t sound right?

You can mirror the boot, but since getting going again after losing a boot drive is as simple as reinstalling on a new SSD and uploading your configuration file, most home users don’t bother.

The normal route would be to add a mirror pair of SSDs as a separate pool to hold your apps and their data (so e.g. Plex and the Plex metadata), keeping the Plex media files on HDD. But several people have suggested elsewhere that a special allocation vDev for small files can be a good way of combining SSDs and HDDs into a single pool.

blankedy · January 12, 2025, 10:42pm

Thanks for the concern, I’m installing it on the ‘media PC’ setup though (or some upgraded version thereof) - so entirely separate build.

3 are EFRX, and 1 is unfortunately the SMR EFAX model. I understand that SMR drives will take forever to rebuild from, but if in a 4-way RAID1 array with 2 CMR drives (assuming one of the CMR fails) that should be OK I think. If the SMR drive dies I would not replace it with an SMR one, so rebuilding to is irrelevant. I’ve seen reports of (but no reasons why) ZFS not liking SMR drives, in that case I may leave them in RAID1 and mount them over the network.

Perhaps I’m using ambiguous language, I mean the probability of unrecoverable data loss in the vdev due to 2 or more drives dying at the same time. I used p = (24 * 365.25) / <manufacturer MTBF> to obtain my value for p, then 1 - P(0) - P(1) as in my follow-up post.

I was trusting Davvo’s original formula for calculating the probability of X drives failing P(X) = C(n,X) * (p)^X * (1-p)^(n-X); I have not personally validated it.

If I create a jail (or later something with Incus) to run Docker will that non-UI configuration I’ve done also be backed up?

Protopia · January 13, 2025, 12:09am

My gut reaction was “not ok” but when I think about it in a 2-way mirror you may be right:

If the CMR fails, then you will be reading from the SMR and the write performance shouldn’t be a problem.
If the SMR fails, then assuming you replace it with a CMR, there won’t be a problem.

But this does assume that the drive fails and you don’t need to resilver for any other reason. And it is only true for mirrors and NOT for RAIDZ.

Simple version: Very simple - SMR bulk write performance sucks, so performance with any bulk write performance but especially with resilvering is very very bad. (This is NOT IME a ZFS thing - I have seen it on non-redundant configurations on my laptop for example.)

Technical version: SMR disk have a CMR cache so short bursts of writes are written to the cache at CMR speeds, and then later when the drive is idle they are destaged back to the SMR area which is very slow. When you are doing bulk writes, the CMR cache gets full, and then all writes are made at SMR speeds.

Unfortunately I don’t think this is the right way of looking at this. The question is actually:

What is the probability of a second drive failing in the time period between the first drive failing and the replacement finishing resilvering?

This is thus the probability of one drive failing and the probability will depend on:

How long it takes for you to get a replacement drive?
How long the resilver takes?
How old the other drives are (which has to be deducted from the MTBF)?
Whether the stress of resilvering increases the chance of a 2nd failure (because MTBF assumes a certain average level of workload).