Suggestions / config setup for 4x24TB build

colinstu · September 30, 2024, 11:13pm

Looking to be building a new NAS to be replacing dying Synology Rackstation.
Been wanting to do TrueNAS for awhile but Syno has always been ‘good enough’ but my hand is being forced to build anew and the warranty has expired right before I could even make use of it.

Currently have 4x14TB in RAID10, BTRFS volume, 2x512GB nvme SSDs for caching + metadata cache on that synology rig.

Looking to build a 4x24TB TrueNAS system, just trying to figure out what is optimal. I’m expecting to have 128GB of ECC in here, or potentially more if it makes sense. Is there a guide or calculator for how much RAM should be available depending on the volume size? Is the 1GB ram per 1TB of space still the rule of thumb for calculating ARC?

This will mostly be storing media files, pictures, documents, but will also be used for VM backups, potentially some secondary VM storage, and some other both large and small file transfers between servers / temporary storage.

SLOG, L2ARC, and Metadata vdev all sound of interest but I see so much conflicting discussion, but it’s also tempered by the fact that lots of topics that come up on searches are anywhere between 2012-2020 in age, and surely there have been changes/improvements to ZFS and hardware in that period of time. If there’s any sort of guide or calculator on how big SLOG/L2ARC/metadata should be, that would be helpful.
I see routinely they say that SLOG doesn’t need to be very big? <32GB in size? Or that it should be speed of writing to volume (MB/s) * 5-10s?

I haven’t decided where I could make use of my two existing nvme SSDs (they’re 970 Pros) yet. Haven’t decided what to do for bootdrive yet either. I’m also considering purchasing some Intel Optane drives too, if they’d make more sense than the 970s for either Boot drive / SLOG / L2ARC / or Metadata vdev purposes.

The other big question would be layout/config of ZFS on 4x24TB themselves. I’m currently looking to recreate a RAID10-like setup in ZFS, web suggests that “create two 2-disk mirror vdevs, which ZFS will then stripe across” would be equiv / good? How would this compare to RAID-Z2? (Both in usable space, as well as read speed, write speed, as well as potential for how many drives could be lost. Is it still up to 2 drive loss for both? And only 1 drive potentially depending which fails for the faux Raid10 setup? Thanks all

sfatula · October 1, 2024, 12:10am

That sounds like awfully light usage, 128GB likely way over what is needed. Seriously doubt with that workload you would have any need of l2arc like most users. Is the majority of space media or pictures? I personally would do a stripe of mirrors for ease of adding more space which given your config, likely, lol. Raidz2 would have to add in groups of 4, be slower, and, have about the same space given you have 4 drives.

Why SLOG? What specific workload do you have that you think you might need it? NFS?

You just need a small cheap boot drive, you can mirror that if you wish. Mine were like $12.

I am running 24 apps, 4 vms, media, pictures, etc. and was running just fine with 64GB ram, 99.something arc hit rate. No need for any special devices for me. Machine is still basically idle. It screams.

My advice is to run it and then determine if you need any of those special devices, my guess, no.

If you ever plan on any VMs, those 970’s would be good for that.

etorix · October 1, 2024, 6:59am

Yes, but it relaxes as size grows so 128 GB for 48 TB of raw storage looks like massive overkill—nothing wrong with it, but without any app you’ll likely be fine with 32 GB.

Nothing has changed since 2020. Unless you’re doing iSCSI for external VMs you don’t need a SLOG, at all. And you’re not likely to benefit from a L2ARC if you have massive RAM.

Correct: Two transaction groups (default: 5 s) at network speed.

Any cheap SSD on the kind of interface you need LESS for your storage. (So typically a small M.2 NVMe to keep all SATA ports for HDDs.)

“Real” Optane for SLOG, which you do NOT need. I use 16 GB M10 drives as boot drives for CORE because they perfectly fit “small and cheap”, but these might be a bit small for SCALE.

Stripe of mirrors:
space 2 (drive units)
read 4
write 2
IOPS 2
resiliency 1… and then loosing one drive puts the entire pool at risk if the other member fails

Raidz2:
space 2
read 2
write 2
IOPS 1
resiliency 2 (can loose one drive with some safety but do NOT wait for a second failure before acting)

I would go straight for raidz2, or else 3-way mirrors—no less.

Correct… but with Electric Eel, raidz2 could also be extended in width.

I haven’t checked prices, but I suspect that there’s a premium on 24 TB drives because this is the highest available capacity and that the lowest price per TB should be on 18-20 TB drives. If so, the best plan might be to get six drives at the sweet spot—and the case for raidz2 becomes clearer.

sfatula · October 1, 2024, 5:53pm

I would say that’s the purpose of backups. 2 way mirrors fine in my book. If one always has a spare drive on hand shouldn’t be an issue. The resilvers are so much faster than raidz.

You are right about Eel and expansion. I’d add some caution to it being the first version with that functionality though, I personally would not use it. I’m more conservative on new things.

colinstu · October 11, 2024, 2:51am

Thank you all for the help so far. One other concern on the raid"10" vs z2 setup are rebuild times. These are pretty large drives, even at max theoretical speed the whole time it would take over 24hrs just to copy from one drive to another (more than likely closer to two days).
What would rebuild times look like for raidz2 w/ 4x24TB? I’m reading about drives less than half the size needing multiple days to rebuild, all while putting all drives under stress vs just the corresponding mirrored drive. One this size would need a week to rebuild almost maybe?

Also how is ZFS scrub speed affected by these two different raid implementations? Would there be a noticeable difference in time it takes to perform as well as any difference in performance impact on the system while it’s running between the two?

etorix · October 11, 2024, 6:44am

No further significant impact on time, but raidz2/raidz3 can take further incidents during resilver while raidz1 cannot.

You should have some backups. The question is: How eager are you to restore from backup in case of failure? Unwilling, go raidz2 or higher. Comfortable with it, follow @sfatula and go for mirrors.

sfatula · October 11, 2024, 8:56am

Mirror resilvers are vastly faster.

There is always a tradeoff. As @etorix hints at, you have to pick one priority. Do you want speed/resilver times, or, don’t want to restore (though you still need backups). You can’t have them both. In a business, I likely would not choose multiple mirror vdevs as typically uptime is paramount.

I would add that raidz2 or 3 also allows better recovery from errors compared to mirrors and raidz1 as they have parity remaining if just one drive fails. There’s nothing left to fix from if there is a checksum issue for mirrors and raidz1. A mirror resilver has less impact on the drives and system.

Constantin · October 11, 2024, 2:43pm

Also consider how rare disk failures are. A quick resilver is great if your uptime / performance needs are high but in a SOHO setting, you can usually allow the machine to keep up serving content while it’s resilvering in background.

I’d sleep better with a 6 disk z2 but that’s my use case.

sfatula · October 11, 2024, 5:22pm

You can also tune the resilver with various parms so it isn’t as much of a slowdown if you wish.

I am a home user so uptime not a concern to me. And, as you noted, drive failures are not exactly an every year occurrence. It’s different if I was scraping together old stuff. Actually in all the years actual hardware raid has existed (used to use them from the beginning) which is likely much longer than I think, I’ve only experienced 2 or so drive failures ever. I realize others have experienced many but I don’t use ancient drives and tend to replace and monitor them.

Constantin · October 11, 2024, 5:27pm

Exactly. The biggest issue I ran into was not disk failure as often as silent corruption slowly eating up my datasets. That in turn then requires considerable effort to rebuild from old backups, ie re-consolidating that was already consolidated.

While it may be a fun exercise to re-connect and show that you can still mount MO-disks from back in the 1990s, it is a waste of time. Detecting bit rot is where it’s at to prevent this issue (with proper backups).