While I do plan on making things more configurable at some point, the default threading behavior was intentional.
As an example, I have an all NVME system with a large amount of threads and alot of RAM bandwidth (Top example, 2 CPUs Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
) and another all-NVME system (Bottom Example, 1 CPU, AMD Ryzen 5 5600G with Radeon Graphics) with alot less RAM and RAM bandwidth, but much faster cores.
###################################
# DD Benchmark Results for Pool: fire #
###################################
# Threads: 1 #
# 1M Seq Write Run 1: 237.04 MB/s #
# 1M Seq Write Run 2: 223.83 MB/s #
# 1M Seq Write Avg: 230.43 MB/s #
# 1M Seq Read Run 1: 2756.02 MB/s #
# 1M Seq Read Run 2: 2732.92 MB/s #
# 1M Seq Read Avg: 2744.47 MB/s #
###################################
# Threads: 10 #
# 1M Seq Write Run 1: 2076.10 MB/s #
# 1M Seq Write Run 2: 2092.43 MB/s #
# 1M Seq Write Avg: 2084.26 MB/s #
# 1M Seq Read Run 1: 6059.59 MB/s #
# 1M Seq Read Run 2: 6060.71 MB/s #
# 1M Seq Read Avg: 6060.15 MB/s #
###################################
# Threads: 20 #
# 1M Seq Write Run 1: 2925.10 MB/s #
# 1M Seq Write Run 2: 2871.85 MB/s #
# 1M Seq Write Avg: 2898.48 MB/s #
# 1M Seq Read Run 1: 6406.70 MB/s #
# 1M Seq Read Run 2: 6442.41 MB/s #
# 1M Seq Read Avg: 6424.56 MB/s #
###################################
# Threads: 40 #
# 1M Seq Write Run 1: 2923.48 MB/s #
# 1M Seq Write Run 2: 2969.82 MB/s #
# 1M Seq Write Avg: 2946.65 MB/s #
# 1M Seq Read Run 1: 6514.30 MB/s #
# 1M Seq Read Run 2: 6571.73 MB/s #
# 1M Seq Read Avg: 6543.02 MB/s #
###################################
###################################
# DD Benchmark Results for Pool: inferno #
###################################
# Threads: 1 #
# 1M Seq Write Run 1: 411.17 MB/s #
# 1M Seq Write Run 2: 412.88 MB/s #
# 1M Seq Write Avg: 412.03 MB/s #
# 1M Seq Read Run 1: 6762.11 MB/s #
# 1M Seq Read Run 2: 5073.43 MB/s #
# 1M Seq Read Avg: 5917.77 MB/s #
###################################
# Threads: 3 #
# 1M Seq Write Run 1: 1195.91 MB/s #
# 1M Seq Write Run 2: 1193.22 MB/s #
# 1M Seq Write Avg: 1194.56 MB/s #
# 1M Seq Read Run 1: 4146.25 MB/s #
# 1M Seq Read Run 2: 4161.19 MB/s #
# 1M Seq Read Avg: 4153.72 MB/s #
###################################
# Threads: 6 #
# 1M Seq Write Run 1: 2060.54 MB/s #
# 1M Seq Write Run 2: 2058.62 MB/s #
# 1M Seq Write Avg: 2059.58 MB/s #
# 1M Seq Read Run 1: 4209.25 MB/s #
# 1M Seq Read Run 2: 4212.84 MB/s #
# 1M Seq Read Avg: 4211.05 MB/s #
###################################
# Threads: 12 #
# 1M Seq Write Run 1: 2353.74 MB/s #
# 1M Seq Write Run 2: 2184.07 MB/s #
# 1M Seq Write Avg: 2268.91 MB/s #
# 1M Seq Read Run 1: 4191.27 MB/s #
# 1M Seq Read Run 2: 4199.91 MB/s #
# 1M Seq Read Avg: 4195.59 MB/s #
###################################
The pools are not apples-to-apples, different drives, different vdev topology. But what I found may be interesting is that having the various thread counts (1 thread, 1/4 of the threads in your system, 1/2 of the threads in your system, all of the threads in your system) will help find where your bottleneck is.
Looking at Threads = 1 between those two all NVME systems, we can see that there is likely a pretty substantial impact in having higher CPU Frequencies and higher IPC.
With only 120% (Both RaidZ1, 4vs5 disks) as many disks, I am seeing a 140% increase in write performance and a 170% increase in read performance on the bottom system vs the top system.
Then, when the thread count kicks up, the results are flipped on their head because of the additional RAM capacity and additional RAM bandwidth. Comparing 40t vs 32t in those results shows the system with 120% (Both RaidZ1, 4vs5 disks) as many disks fall way behind. It is 61% as fast at writes and only 36% as fast for reads.
You can also see different performance characteristics for each run, particularly interesting is 1/4 threads on one of these systems doesnt scale in higher thread counts, where as the other system does. Which probably speaks pretty loudly to the lack of memory capacity and bandwidth…