VDEV Topology for existing hardware

tbiss242 · July 19, 2024, 4:25pm

I’m finally pulling the plug on my Synology dependence and setting up a proper NAS. Each time I needed a larger pool I ended up just replacing each drive, and because I can’t throw out perfectly good hardware I’ve ended up with a pile of drives that should be put to use.

I’ll temporarily host TrueNas Scale on an old computer while shopping around for server/workstation grade components - fingers crossed I will get a free decommissioned server from work. It’s been installed and seems to be running well with 3 drives in a RAIDZ1 test pool.

Temporary components
• MB: ASUS Prime B350-Plus (AM4, DDR4)
• CPU: AMD Ryzen 7 5800X (8-Core, 16-Thread)
• RAM: Corsair Vengeance DDR4 64GB (4x16GB) (will be upgrading quickly to ECC)
• 10GB NIC
• PCIe SATA Expansion/Bifurcation Cards (these I will need to buy as the MB only supports 4 SATA connections)

Available Drives (excludes boot drive):
• 1 x 1TB WD SN580 M.2 NVMe
• 1 x 275GB Crucial MX300 M.2 NVMe
• 2 x 240GB Kingston A1000 M.2 NVMe
• 2 x 128GB Unknown/Generic M.2 NVMe from i7 Intel NUCs
• 2 x 8TB Seagate IronWolf 8TB (CMR, ST8000VN004)
• 4 x 4TB WD Red NAS 4TB (CMR, WD40EFRX)
• 1 x 3TB Seagate Skyhawk Surveillance (CMR, ST3000VX010)
• 4 x 4TB Seagate Barracuda (SMR Drives not suitable for ZFS)
• 2 x 2TB Seagate Barracuda (SMR drives not suitable for ZFS)

My use case is mostly SMB shares, media storage, system backups but I prefer systems that are jack of all trades and master of none. I’m approaching 12TB, so would like 16TB minimum to last a couple more years.

How do you think I should configure the VDEVs to best utilize these drives? If any drive fails (looking at you barracudas) I’ll buy a larger replacement (WD Red Plus or IronWolf).

Update to include CMR/SMR drive information.

etorix · July 19, 2024, 4:47pm

Make that a SAS HBA (LSI 2008/2308/3008) and not SATA. Better: use less drives and be happy with motherboard ports.

I see no obvious use for all these SSDs, except a small one for boot. (275/240 GB may be a bit small for a special vdev, depending on the threshold for small files, and I’m not sure I’d trust low end consumer NVMe for such a critical role.)

HDD layout for SMB/media would be raidz2 (bulk storage), but mirrors do better at accomodating a mix of diffrerent sizes. The Barracudas and 4 TB WD Red (not Plus?) could be SMR, and not suitable for ZFS.
Can you get some more 8+ TB drives instead?

If you’re in Europe, a Gigabyte MC12-LE0 is the obvious candidate for a cheap server board.

Protopia · July 19, 2024, 5:01pm

First thing you need to do is to work out which HDDs are SMC and put those aside as they are not of any real use in ZFS system. The WD Red’s are likely to be SMC (unless they are Red Pro or Red Plus) - no idea about the others.

Then tell us what you still have that is useful.

tbiss242 · July 19, 2024, 5:49pm

They are WD Red NAS drives, which seem to be lumped in the same category as the Plus drives. WD40EFRX. The barracudas are definitly SMR. I’ll update the post to reflect that. Thank you.

tbiss242 · July 19, 2024, 5:59pm

I included them because I didn’t want to assume they would be useless as I’m new to ZFS pools and TrueNas. If they are useless (for this use case) then they won’t be used.

etorix · July 19, 2024, 6:12pm

Old WD Red are tricky: The older ones are CMR, further models are SMR.
You’re lucky here. With 2x8 + 4x4 you have potentially 16 TB of raw space in a stripe of 3 mirrors. The lone Skyhauwk has no obvious good use. 2-way mirrors of large drives are not the safest setting, and you may use only about 75-80% of these 16 TB before you need more space, so 16 TB raw is barely more than 12 TB usable.

6x8 TB in raidz2 would be 32 TB of raw space. 6x6, 24 TB raw. (Assuming a future B450/B550 motherboard with 6 ports, to avoid a HBA.)
2x8 + 4x4 TB in a 6-wide raidz2 is 16 TB raw (6x4 TB), so barely enough for now.

There is no conversion from mirror to raidz2, so you need to decide upfront on the layout. In any case, you’ll need to bring in bigger drives soon.

Protopia · July 19, 2024, 9:03pm

I was going to make some suggestions on how best to use these, but once I started thinking about it I realised that there is a whole bunch of requirements we don’t know that would drive such a design.

I am unclear what your intention is once you get your final server hardware? Will your final hardware come with disks? Or do you intend to move your disks over and reboot? Or to build the new server and then transfer all your data over the network from one NAS to the other? If you want to build afresh using some of your existing drives, then you should reserve the best of them for the new build and see what you can do with the rest of them.
The reason that SMR drives are not recommended is because their write performance makes them unsuitable to silver to if you get a degraded pool. Following text is blurred because it is wrong…However so long as you have matching sized CMR drives to use as replacements there is no reason IMO why you shouldn’t use them to start off with (but I may be very very wrong here). Or if you decide that this current build is temporary and you will live with (say) a degraded pool RAIDZ3 pool and not resilver, and so long as you can live with slow bulk writes when you migrate your data over to the new NAS, then again I think you could use SMR disks. Only you can decide whether the risks are worth it.
I am also unclear whether you plan to run VMs and / or use iSCSI? Or if you are going to use it for write intensive workloads from Macs or over NFS?
Because it makes space management easier, it is normal to try to have the fewest number of pools subject to not mixing different performance vDevs. So whilst it is not a hard and fast rule, normally you probably shouldn’t mix SSDs and HDDs in a single pool, nor mix Mirror vDevs with RAIDZx vDevs in a single pool either. Only you can decide whether you want to mix mirror and RaidZx vDevs in a single pool.

Here are some comments on hardware:

Buy LSI HBA PCIe cards and not any other sort of SATA expansion card.
The ASUS Prime B350-Plus has (according to this specification web page) 6 SATA-3 ports not 4 as you stated. Not sure who is right.
The MB has only 1 PCIe 3.0 16-channel slot. The other 2 PCIe slots are 2.0 (slower) and either 4-channel or 2-channel. If you want a GPU for e.g. transcoding video and a high-performance 8x, 12x or 16x SATA expansion HBA, then you might have to choose carefully.
This MB apparently doesn’t support ECC - so no upgrade possible until you get a new MB. That said, non-ECC memory is still reasonably reliable. So it depends what the consequences would be if a memory bit-swap caused a crash or caused wrong data to be written. Only you can decide whether non-ECC memory is OK in this temporary build.
A Ryzen 5800x is way over-powered for plain NAS - but you already have it, so why not use it in the temporary box. But unless you are going to have reasonably heavy VM or CPU transcoding workload you don’t need to wait for something that high powered for your permanent box.
10Gb NIC doesn’t mean much unless your network is 10Gb end-to-end. If it is 100Mb then definitely think about an upgrade. If it is 1Gb then you may not need an upgrade.
There are ZERO NVMe slots on this MB. I am not an expert but I suspect that NVMe PCIe extenders will at best fall short of native NVMe performance, and may have other issues with ZFS. Others may be able to advise further - but my gut reaction is to save your NVMe boards for MBs with native slots.
Remember that when you look at useable space, there are quite a lot of overheads (aside from redundancy). For example a 6x 4TB RAIDZ2 has 4x 4TB dives of raw space. You are probably measuring your data as TB=2^40 whilst the drives are TB=10^12 which is a fair bit smaller. Then you need to factor in the metadata overhead. And ZFS says you need to limit data to no more than 80% (or 75% to give a little headroom) of the space. So for 4x 4TB drives of raw space you can probably only store 10-12TB of data anyway. I would suggest that you plan for at least 24TB-30TB of raw space in order to store your 12TB of data and have 50% growth space.

Edit: Following text is blurred because it is wrong

Now assuming that your permanent hardware will come with disks or you will buy new ones for it, and so you can use as many of the existing disks as you might usefully use in this interim build, then I would probably do the following:

Pool 1 - use all the 4TB Seagate drives and 2 of the 4TB WD Red drives to create a RAIDZ2 pool and use the remaining 2x 4TB WD Red drives as spare. Or similar but use 3x 4TB WD Reds with RAIDZ3 and keep 1x 4TB WD Red as spare. This would give you c. 16TB raw useable space. Despite using SMR drives, I think this might be suitably performant and resilient - but if an SMR drive drops out of the pool for any reason other than it actually failing (i.e. a cable issue) then you won’t be able to add it back in again.

Pool 2 - use 2x 8TB IronWolf drives as a mirrored vDev. Useable space another 8TB. This is resilient and IronWolf drives are pretty reliable, and if one fails you will need to buy a replacement, but not as resilient as a RAIDZ2/3.

Finally, whilst you have a trial RAIDZ1 pool, try pulling a drive out of the pool and clearing the GPT Partition table on another system, and putting it back in. This will give you a chance to experience a pool recovery when there is no real-data on it.

(That’s my brain dump - but I am not an expert so get some input from others.)

Edited to blur incorrect stuff.

etorix · July 20, 2024, 8:24am

Any remaining SMR drive in a raidz# pool will make resilver slow. Very slow. Excessively slow. (ZFS writes to all disks in raidz resilver.)
So let’s assume there’s a drive failure—drives all eventually die of old age, and these are already used drives, aren’t they? Because of the remaining SMR, resilver takes days to complete, under sustained high strain for all drives. If further (old, used) drives fail during the process, this will not end well.
Do not use SMR drives with ZFS. And do not allow a degraded pool to keep running in degraded state; always replace drives ASAP.

I agree that a 5800X is overkill for a simple storage NAS, and that 10G is overkill for a handful of spinners. But the CPU is already owned, and the unspecified NIC is likely an improvement over the Realtek NICs which come with most Ryzen boards.

Protopia · July 20, 2024, 9:24am

I was right!!! I did say that I might be very very wrong.

tbiss242 · July 20, 2024, 2:59pm

The plan is to move the disks over and reboot.

I don’t plan to run VMs within TrueNas. I’ve been using Proxmox for VMs, and would consider running TrueNas within proxmox to take advantage of the overkill CPU. I’ve never played with iSCSI before - something I should look into?

Does that apply to cache/log vDevs as well?

For the hardware components:

Definitely will buy LSI HBA cards.
It does have 6 ports, but 2 share bandwidth with the M.2 SATA (for the boot drive) and can’t be used.
My thought is to use the 3.0x16 port for the LSI HBA card, and the 2.0x4 slot for the 10gb NIC.
It does support ECC (when used with a Ryzen processor), unless I’m missing something.
Reason why using TrueNas within proxmox might be the way to go (due to my other VMs).
I do have 10gb networking. UI Dream Wall with a flex 10 gbe. I needed a weird setup at my previous house.
Good point. I wasn’t expecting a good use case for this MB and those drives without native NVMe slots.
Didn’t realize the ZFS headroom - thanks for the explanation!

tbiss242 · July 20, 2024, 3:28pm

I didn’t know the drives could be used like that in a mirror. Interesting thought. Wouldn’t I need a third “drive” to make it a 3 way mirror at 16TB? 2x8 (16TB) x 4x4 (16TB) x ?. If I needed to increase size though, wouldn’t I need to replace every drive? Performance benefit to 3way mirror is obvious.

etorix · July 20, 2024, 3:55pm

To reach 16 TB, that would actually be 2x8 TB + 2x4 TB + 2x4 TB (a stripe of three 2-way mirrors). Sorry for the lack of clarity.
Vdevs in a pool need not be identical There’s no point in having different geometries because the pool would end up being limited by the least performing vdev and the least resilent vdev, but vdevs of different sizes are perfectly possible.
Replacing all drives in a vdev suffices to increase size. For example, evolving to 2x8 + 2x10 + 2x4 would be 22 TB raw.

3-way or 4-way mirrors are perfectly possible, but still have the capacity of single drive. The main benefit is resiliency rather than performance (loosing a large drive in a 2-way mirror puts quite a lot of data at risk, without further resiliency, and raises the same considerations as in “RAID5 is dead”).
3-way mirrors are about as resilent as raidz2, and much more flexible, but not as space-efficient—and a lot more expensive.

etorix · July 20, 2024, 4:15pm

OK, then you defintely need a SAS HBA to pass through to TrueNAS.

Not unless you absolutely need it for an application which wants really hard to have its storage exposed as iSCSI. Because then you’re dealing with block storage rather than bulk storage: mirrors everywhere (no raidz#) and under 50% occupancy.

There’s no point in mixing mirrors and raidz# in the same pool for data storage, but L2ARC/SLOG would come as single drives/mirrors SSD even with a raidz# HDD pool. Raidz# HDD + special vdev as SSD mirror is possible as well, but then you’d need at least a 3-way mirror to match the resiliency of a raidz2 HDD pool, which would be lot of NVMe lanes for a Ryzen plateform.

etorix · July 20, 2024, 4:36pm

Thanks for the link to the manual.

Usually this means SATA lanes (from the chipset) are shared, so using a M.2 SATA disables a SATA port but a M.2 NVMe can be used while keeping the SATA port. Here however the lanes come from CPU I/O die and the wording in the manual suggest that everyting is shared, so that even using a M.2 NVMe would disable the SATA port.
You do have two PCIe 2.0 x1 slots tough, and cheap adapters would allow to use NVMe drives in there—enough for a boot drive.

It’s not clear whether it actually supports ECC RAM as ECC, or whether the board merely works with ECC RAM in non-ECC mode.

PCI bifurcation and PCIe switches are perfectly fine with ZFS. If you were not needing the x16 slot for a HBA, you could have an Asus Hyper-M2 in there, or its generic Shenzen equivalent, and host 4 NVMe drives in there (possibly for Proxmox rather than TrueNAS as you still have no clear use case for NVMe storage with ZFS).
Still, you could use a x8x4x4 riser to have a pair of M.2 and a half-height HBA.

Protopia · July 20, 2024, 8:11pm

I can’t help you here as I don’t run Proxmox or VMs myself - but IMO you should ask around to see if running VMs under TrueNAS is better or worse than running TrueNAS alongside VMs under Proxmox. (My uninformed guess is that disk management may be easier and more flexible if all done under TrueNAS - but I can’t say from personal experience or research.)

If you don’t know you need it, then you almost certainly don’t need it.

No - because there is no point in having HDDs for either of these, nor is there much point in using SSDs for these if the main pool drives are already SSDs - so only worth having when you have an HDD pool.

You will only need:

Full L2ARC if you have a LOT of memory.
Metadata-only L2ARC may be useful for all HDD vDevs - I am still researching this for my own system.
SLOG if you are doing write intensive workload to HDDs from Macs or using NFS (or VMs or iSCSI to HDD mirrors).

I think your spec is better than the one I found.

It looks like it does support M.2 NVMe (as well as M.2 SATA - they are different and you should use NVMe because it is MUCH faster).

You might be better off performance-wise buying a small SATA SSD for the boot drive and keeping the M.2 NVMe for e.g. a metadata-only NVMe drive.

I wrote a whole Wiki page on this at Uncle Fester’s Basic TrueNAS Configuration Guide - Planning Data Volumes.

Stux · July 21, 2024, 12:30am

I would’ve used ~~strikethrough~~

Wrap in ~~

Protopia · July 21, 2024, 8:30am

Would need to do that individually for every paragraph rather than as a single block.