Advise - New build

Solias · November 12, 2024, 2:34pm

I’m just about to purchase one of the following two systems:

Spinning Rust:
Dell R760xs

1x Intel Gold 5415+ 8Core 2.9 GHz CPU
256GB DDR5-5600MHz RAM
2x10Gb/s RJ45 NICs (aggregated)
2x1Gb/s RJ45 NICs LOMB (for console access)
BOSS w2x480GB NVMe in RAID1 (boot)
12x 4TB 3.5 SATA (6-way striped mirror pool)
2x 480GB 2.5 SSD (rear mount, for L2Arc/Slog, if needed)
HBA 355i

Solid-State:
Dell R660

1x Intel Gold 5415+ 8Core 2.9 GHz CPU
256GB DDR5-5600MHz RAM
2x10Gb/s RJ45 NICs (aggregated)
2x1Gb/s RJ45 NICs LOMB (for console access)
BOSS w2x480GB NVMe in RAID1 (boot)
10x 1.92TB 2.5" SATA (5-way striped mirror pool)
PERC H755 in HBA mode (can’t get the HBA 355i in this chassis)

Will probably use TrueNAS Core for the build. The major difference is the storage itself - mechanical vs solid state. The server will play host for VMware VMs via iSCSI. In the grand scheme, it’ll be light-duty work (Windows AD, DNS, file and print services, a web server or two, and a Veeam system to back it all up) to serve a small engineering company - AutoCAD, Office 365 documents, PDFs, pictures, TwinMotion renderings, random video creations, etc.

My goal is to be able to saturate 10Gb/s links to the desktop PCs should the need arise in the future.

My question is: Given the hardware, is my goal achievable? If not, please offer suggestions.

Thanks.

Protopia · November 12, 2024, 5:25pm

There are many factors that affect your ability to saturate a 10Gb/s link, and a quick braindump includes:

HDD vs SSD vs NVMe
Pool block size
Dataset record size
Mirror vs. RAID
I/Os per second vs. GB/s throughput
For writes, synchronous plus ZIL location (SLOG) vs. asynchronous writes, plus a couple of ZFS tunable parameters relating to how much memory can be used to hold pending writes before network slows to match disk write speed
Memory size, for ARC and for storing pending writes
Size and number of files (i.e. need to read metadata, sequential pre-fetch being triggered)
Number of PCIe lanes your CPU has and how they are allocated to the disk controllers
…

For a general "I want to be able to max it out in all circumstances i.e. for short bursts and for sustained transfer, for reads and for writes, for extremely large files and for thousands of small ones, immediately after a server boot and when it has been running without a reboot for months etc. then there probably isn’t a single solution that will do all of this. However, if you have a very specialised and specific workload that you want to max out the 10Gb link to your desktop, then it might be possible to give specific tuning advice.

That said, you will be surprised just how fast TrueNAS / ZFS can be even with limited resources, and I can saturate the network (end to end c. 600Mb) with a pretty slow 2-core CPU too). Thanks to a smallish core dataset and sequential pre-fetch for media streaming I get an ARC hit rate of over 99% with only 2.5GB of ARC.

CodeGnome · November 12, 2024, 7:11pm

I don’t know why you’d want to saturate your network. As a general rule, that will just trigger congestion control and slow you down anyway. I’d think of it differently: how can you optimize your network usage?

If you go that route, you could apply VLANs, QoS settings, DSCP values, IPv6 aliases, and jumbo frames to push high volumes of traffic through your network without creating needless congestion for other applications or services. That seems like a better way to go.

On the other hand, if your question is just “how much data can a given array read/write to the network at a time” then your backplanes and striping schemes will have an impact and you’ll need to do some capacity planning. SATA-III maxes out at 6 Gbps per drive, and SAS at 12 Gbps, so assuming the backplane(s) on your “spinning rust” are SATA-III and configured with a 6-wide stripe, back-of-the-envelope math would suggest your max throughput for sequential writes to the network would be <= 36 Gbps, minus all sorts of things like disk seek times, mirrored reads, cache misses, parity and checksum verification times, metadata lookups, and packet encoding and fragmentation. Read caches and metadata vdevs may improve the apparent times on some things, but those activities still have to be done, and your cache drives are presumably SATA-III as well and may limit your data transfer to the max speed of your caches; I don’t know enough about the caching internals to say for sure what happens in that particular use case. If your cache drives have M.2 or PCIe connections they may be faster than your aggregated stripe speeds, but the cache misses for reads and lookups that aren’t already local would still be limited to the processing time required to retrieve that information from the HDDs.

The tl;dr is that with the right backplanes it seems like you could saturate a 10 or 20 Gbps link with highly sequential, cache-optimized reads/writes even with SATA-III HDDs, but in practice I’d be surprised if you did with non-sequential reads/writes and varied usage spread across server-side files and services–and assuming non-abusive clients, of course.

Your best bet would be to ask your Dell contact what the max throughput is of your system’s backplane(s), and talk to your network admin about how congestion control and QoS will impact the TrueNAS systems and vice versa. Assuming you can do it, pushing 20 Gbps (including packet overhead, acks, IP control packets, etc.) through typical switched Ethernet fabric can result in dropped packets, queue overflows, large data retransmissions, latency-sensitive application timeouts, potential denial of service for additional inbound connections to the TrueNAS servers, and other problems. So, even if you can do it, I’d treat maxing out your full TrueNAS Ethernet bandwidth as something to avoid rather than something to strive for. As always, your mileage may vary.

Protopia · November 12, 2024, 9:00pm

Spinning rust drives are not normally able to sustain 6Gb SATA III speeds over any significant amount of time. Some have small write caches to help with bursts of writes, but this is easily filled by sustained writes and then things slow down.

As @CodeGnome says, it isn’t usually possible to completely max out a network connection, but switched ethernet is very different to old-style coaxial ethernet as regards how much you can load it up - a typical maximum on coax ethernet before you had collision packets was c. 60%, but on a switched network you can reasonably expect to achieve 90%+ of the rated wired bandwidth given the right circumstances (i.e. a direct connection between two boxes), it is just that the circumstances can very easy not be right. Cheap unmanaged switches don’t have a lot of buffering and can more easily drop packets if 2 or more machines are competing for the single 10Gb bandwidth between the switch and TrueNAS. A single network stream may need to wait for I/Os to complete (though sequential pre-fetch can often eliminate that), or need to wait for an acknowledgement before continuing to send data.

But in the end, performance comes down either to responsiveness (time to get a small-ish file) or throughput (bulk speed), and for most situations with a human at the end, it is responsiveness that is normally important.

In the end - as I said before - it comes down to the specifics of the workload, and tuning for it.