Sustained Write Benchmarking on Core

Constantin · April 16, 2024, 4:22pm

Good afternoon,
I wonder if the general speeds of CORE have changed significantly due to ongoing development at iXsystems.

Specifically, my recollection of ZFS write speeds was that one would expect a 1-VDEV system to write approximately as fast as a single drive, i.e. about 250MB/s for my He10’s per the OEM spec sheet. Achieving 250MB/s on a sustained write was about right for my transfers over 10GbE in the past.

I recently decided to take better advantage of my sVDEV by adjusting recordsizes to 1M for image/video/archive datasets, rebalancing the pool, etc. The result is somewhat flummoxing as sustained writes now go into the pool at 400MB/s on a pretty sustained basis (20GB file). All over 10GbE fiber using a QNAP to Thunderbolt 3 adapter.

I have yet to replicate the same transfers using my older Sonnet 10GbE adapter but I suspect this has less to do with the adapter and more with the record sizes, sVDEV, and changes iXsystems has made under the hood. The pool is quite empty (20% full) and I’m using SMB. Snapshots are currently off.

Is it the combination of recordsizes and the sVDEV that caused this speed increase or can pools now write significantly faster than I used to remember them?

Davvo · April 16, 2024, 8:50pm

Caching in general got a quite substantial rework in ZFS itself recently; about iX’s side, iirc there is some serious optimization in the upcoming releases of both CORE and SCALE.

NickF1227 · April 16, 2024, 11:38pm

If the point of this thread is to celebrate iXsystem’s contributions to ZFS, you can start by checking out the long list of contributions from the saint of a man:
amotin (Alexander Motin) · GitHub

winnielinnie · April 17, 2024, 12:17am

The 1M recordsize probably has the most significant effect for large, sequential writes. (And reads too.)

Think about it. Compared to the default 128K recordize, there are 8 times fewer ZFS metadata operations for every file.

Every block needs a pointer, a generated hash, and attempted (or aborted) compression. (And, optionally, encrypted.)

A file that requires 1,000 of these operations at 1M recordsize would require 8,000 of these operations at 128K recordsize. (Same file, same size.)

Constantin · April 17, 2024, 12:25am

Those are very good insights. Thank you!

Along similar lines, if all the metadata with lots of little bitty blocks goes straight to a SSD pool then the latency on all that overhead is significantly reduced vs. writing to a HDD pool. SSDs don’t take a long time to traverse heads to new sectors and all that. But I really did not expect a 50% boost over my standard large-file write performance.

Thanks again, I appreciate it.