I’m about to recreate my pool on empty disks, and send/receive the original data across.
But before doing that, I’d like to test CPU time for various compression and checksum options on my specific hardware using a temp pool on the empty disks, rather than picking based on hearsay/guesswork.
That means testing the usual - read vs write, small vs large vs mixed blocks, 0 - 50 - 100% compressible (but not just zeros), and preventing effects of warm ARC/existing structures polluting the results. (CPU has AVX2 but not SHA)
I can’t find a resource to do this, but I’m sure its a question many have asked and some serious enthusiasts and users devote time to.
Are there any scripts or test methods available to an ordinary SCALE user which do this in one neat package, or an already-existing script? What do others do?
Alternatively, if I need to do it from scratch, how should I approach it so I time what I actually want to time, and don’t get misled by ZFS shortcutting things, efficiency pathways, non-algorithm time etc? Is it a huge job?
Instrumentation options are valid
While I’m not looking for extreme solutions, I’m not averse to narrowly instrumenting the R/W pipeline with perf or BPFTrace if that’s honestly in others’ view, the easiest way to measure exactly what Im after (or close to it), either. I had to do that on CORE to cleanly optimise it. too. But I don’t know this system and I’d have to ask outline help what/how to instrument, key lines of code to base it on, and what to look for in the output, if that’s whats needed.
System/pool info
If relevant, the pool was 40 TB full size, 14 TB deduped on CORE, and the system was designed for dedup workload and I/O: 2015 era Xeon/Supermicro, huge RAM (256 GB) with no other workloads/no apps or jails, many cores, fast mirrors, fast SSD special vdevs.
CPU is 8C/16T (or maybe 16C/16T??). Family 6/63 (Haswell/Broadwell-EP), AVX2 but no SHA instructions, 35MB large L3 cache, 3.2 GHz max
Ran OK on CORE 13 U5.3, can only run better with modern parameters/careful tuning/fast dedup/clean install SCALE 25.10.
But it does mean carefully selecting the algorithms for compression, checksum and dedup before first write is critical - more than usually? - to reduce the risk of CPU starvation, and allow efficient use.