vDev recommendations, 27*20TB drives

R_C · May 22, 2024, 9:55am

Hi all, new to the forum here!

We are upgrading our file server here and arriving in about 12 weeks time a 90 bay supermicro that, out of the gate, will have 27*20TB SAS 3 drives. I have been trialling TrueNAS scale here in a test environment and am pleased with what it offers. I have also tested failure scenarios and it recovers well and seems all very logical.

The unit will server a studio of vfx artists (maxing out around 40 at the most I would say) and of course be expanded as time goes by and projects grow in size and number. Not all the artists will be constantly pulling in files all the time, the nuke artists tend to cache the exr sequences locally per shot; but there is a render farm and so on. Our current 24 bay synology has done good service for around 30 artists but we need more space and better performance.

The supermicro has 512GB EEC ram, two 16 core Xeons and initially a 10GP SFP+ connection to the switch and 1GB to most artists over windows SMB, (some have 10GB, but only a few). Unit wall also have a UPS attached for safe shutdown if the power goes out for a while.

My initial vDev thoughts (to server single pool with around 6 data sets) are to be 3 devs of 9 drives in raidz2. Then on each upgrade we would add another multiple of 9 drives depending on the upgrade needs at that moment.

My main question would be, would we be better served (performance wise) buying one more drive and having 4 devs of 7 drives out of the gate?

The first option has 328TB usable which is enough to get us started but the 39vDev option makes better use of the space at 360TB (all options raidz2). Would there be any meaningful performance gains with the 74 vs the 9*3 vDev layout and would that have any subsequent benefits/drawbacks down the line when expanding in 9 disk chunks?

Many thanks indeed, most appreciated,
Richard.

Constantin · May 22, 2024, 10:45am

The pool layout sounds fine and it’s great you have that much room to expand with. Also bravo allowing folk to do their work with DAS locally vs. expecting a server to hold all the data all the time. Your approach significantly reduces the work load on the server, snapshots, etc.

I’d also have a look at sVDEVs or at least a metadata-only, persistent L2ARC to speed up directory traversals. In your z2 pool I’d suggest a three-way mirror of quality enterprise SSDs if you were to consider a sVDEV.

Between dataset adjustments re record size (from default 128k to 1M) and sVDEV, my upload speed to a 8-wide he10 z3 pool and a sVDEV consisting of four s3610s jumped from 250MB/s to 400. My use case is SOHO quasi-WORM.

You may need a faster switch and connection to same in the future. Once you’re expanding beyond the 4th 9-wide VDEV, network saturation may become an issue.

etorix · May 22, 2024, 10:45am

More vdevs would provide more IOPS, but is that a meaningful limit for the workload?

Consider whether you may want hot spares in the mix. If not, 9-wide vdevs look like a reasonable way to divide 90 bays.

Constantin · May 22, 2024, 10:51am

Depends too on whether admin is on site (ie use qualified but cold vs. hot spares)?

Also, how will boot drives, etc be hosted? SATADOM? Are there spare SATA or PCIe lanes left to hold nvme SSDs, a L2ARC, and perhaps a SLOG (if and only if sync writes are needed).

Ditto a 25GbE network card if the 10GbE gets hopelessly saturated. Two xeons suggest plenty of PCIe slots.

R_C · May 22, 2024, 12:05pm

Hi all, thanks for the replies; appreciated!

The local DAS working mostly applies to the reading of the EXR files which are only read and never change. Having them on the artist’s local scratch SSD is nicely managed by nuke and allows them to playback 4k exrs a lot quicker and only hammers the network on the first localisation. The work files (.nk files) are tiny really, <5 mB and artists likely only hit save every 5 mins or so. New renders go strait to the NAS from the farm.

Some shots require huge Houdini simulation caches, but that is a far more limited set of artists, myself included that deal with those.

Hot spare(s) sound like a good idea. Easy to implement based on my testing. I am often on site and can get there pretty quickly too, would be nice to be able to start a rebuild remotely though if a drive fails or not wait for a drive to arrive/me to be there in person.

The boot OS will be installed mirrored on two separate internal enterprise nvme for redundancy. Obv’s config and encryption keys backed up securely off the NAS unit etc and all that.

25GbE card comes with the unit and we will upgrade the main switches to that as and when down the line.

So I guess my last check is, will the initial setup of 3*9(20TB) vDevs offer decent performance out of the gate? I know decent is a a relative term, but as long as it is good enough to serve up files at a decent rate it will only improve as we expand down the line.

Cheers!

Constantin · May 22, 2024, 12:13pm

8 wide would mean a greater parity data loss but two spare drive bays for hot spares, I suppose (ie 11x8 +2). With that many disks, having two spares may not be a bad idea, whether they are qualified and then kept in cold standby storage or in use as hot spares.

Until recently, I had two qualified cold spares handy. Now I have ten because I figured after 5 years the first old age units would start dropping out so having a complete replacement set makes sense as long as drive prices for he10s are reasonable (I buy used He10s).

I still would look into sVDEV for such a large system. Between speeding up directory traversals, it also helps getting metadata writes off the backs of HDDs that don’t like small files as much as SSDs do.

etorix · May 22, 2024, 12:21pm

Define “decent”…
Basic expectations for the pool would be the IOPS of 3 drives and the throughput of 3*7 drives on sequential operations.

Not necessarily…
ZFS fits data proportionally to available space on each vdev. At first, data will be nicely spread over 3 vdevs. If the pool is expanded before it fills up, data will remain more or less evenly distributed. But if you wait until the pool is well filled before adding an extra vdev, this vdev will take most of the later writes and you will end up with uneven performance: Old data on the oldest drives (IOPS and throughput as above); new data mostly on the new vdev… and corresponding performance of one vdev rather than a stripe.

R_C · May 22, 2024, 12:36pm

Super, thanks for the info all. Most helpful.

Stux · May 22, 2024, 12:56pm

11x 8-way Raidz2 = 88 drives. Leaving two bays spare.

1 hot spare and a drive being burnt in for a second spare.

Also. 8-way goes nicely into 24/48/72 bay expansion chassis.

Constantin · May 22, 2024, 5:40pm

Could this be addressed over a weekend by running a rebalancing script like I did to get everything into the sVDEV?

Your description suggests not on the first pass, as content would be preferentially written into the new VDEV until it’s as full as the rest and only then would content start getting spread equally?