Well, I did try changing to more jobs, but that didn’t change the throughput, sadly:
root@storage01[~]# fio --name=write-test --directory=/mnt/Pool1/Dataset1 --rw=write --bs=1M --size=20G --numjobs=10 --iodepth=64 --direct=1 --ioengine=libaio --group_reporting
write-test: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=64
...
fio-3.33
Starting 10 processes
write-test: Laying out IO file (1 file / 20480MiB)
write-test: Laying out IO file (1 file / 20480MiB)
write-test: Laying out IO file (1 file / 20480MiB)
write-test: Laying out IO file (1 file / 20480MiB)
write-test: Laying out IO file (1 file / 20480MiB)
write-test: Laying out IO file (1 file / 20480MiB)
Jobs: 10 (f=10): [W(10)][99.6%][w=830MiB/s][w=830 IOPS][eta 00m:01s]
write-test: (groupid=0, jobs=10): err= 0: pid=58976: Sun Jun 29 03:51:56 2025
write: IOPS=838, BW=839MiB/s (879MB/s)(200GiB/244172msec); 0 zone resets
slat (usec): min=428, max=61031, avg=11904.08, stdev=2744.52
clat (usec): min=13, max=997987, avg=750696.62, stdev=98354.82
lat (msec): min=2, max=1010, avg=762.60, stdev=99.75
clat percentiles (msec):
| 1.00th=[ 86], 5.00th=[ 701], 10.00th=[ 718], 20.00th=[ 735],
| 30.00th=[ 743], 40.00th=[ 751], 50.00th=[ 760], 60.00th=[ 768],
| 70.00th=[ 776], 80.00th=[ 785], 90.00th=[ 810], 95.00th=[ 827],
| 99.00th=[ 919], 99.50th=[ 944], 99.90th=[ 969], 99.95th=[ 986],
| 99.99th=[ 995]
bw ( KiB/s): min=573440, max=5916905, per=99.82%, avg=857308.86, stdev=23800.06, samples=4870
iops : min= 560, max= 5778, avg=837.04, stdev=23.24, samples=4870
lat (usec) : 20=0.01%, 50=0.01%
lat (msec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.04%, 100=1.06%
lat (msec) : 250=0.56%, 500=0.37%, 750=38.49%, 1000=59.45%
cpu : usr=0.87%, sys=6.87%, ctx=207319, majf=0, minf=24171
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.7%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=0,204800,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=839MiB/s (879MB/s), 839MiB/s-839MiB/s (879MB/s-879MB/s), io=200GiB (215GB), run=244172-244172msec
@SmallBarky: Hm, interesting. But that’s about what I’d expect:
1x 4TB, single drive, 3.7 TB, w=108MB/s , rw=50MB/s , r=204MB/s
24x 4TB, 12 striped mirrors, 45.2 TB, w=696MB/s , rw=144MB/s , r=898MB/s
Not exactly times 12, but still times 6. I, on the other hand, seem to have a mere speedup of 1.5x. Their backplane is connected by 8 lanes, sure, but only at 6G, so I would expect the performance increase of my setup to be on par with theirs (4 lanes at 12G)?
Any idea where to look for the bottleneck? I’m somewhat stumped having verified that the PCIe matches and that all the components at least seem to be fully functional.