Does a pool of mirrored VDEVs benefit from an SLOG?

Lobanz · May 3, 2024, 2:11pm

Hmmmm. Not following the relationship of PLP with SLOG write speed. We’re talking about PLP = Power Loss Protection, right?

HoneyBadger · May 3, 2024, 2:36pm

Correct. When a sync write comes into ZFS, ZFS won’t reply until the underlying device has committed the write to non-volatile storage - enforced by sending the SCSI SYNCHRONIZE_CACHE command to the SLOG device.

A consumer SSD without PLP - and a volatile write cache - will get this command, and have to actually program the NAND cells with the requested data before responding OK.

An enterprise SSD with PLP knows that its onboard supercapacitors have enough power to flush its volatile cache to NAND, so it will simply reply OK immediately, and begin programming the cells - even if power is cut immediately after the OK is sent, there’s enough power stored in the drive itself to complete the programming process.

The difference between the two can be immense - we’re talking about the difference between “hundreds of megabytes per second” and “single digits, maybe a dozen MB/s” at the smaller record sizes.

Lobanz · May 3, 2024, 2:49pm

OK. In my testing, I’m seeing hundreds of MB/sec (~350 MB/s?) from my Optane 900p without “real” PLP.

etorix · May 3, 2024, 4:08pm

Optane drives hold so little in-flight data that they all have PLP in practice although only the more expensive “DC” variants are offcially specified with the feature. For home use, consumer Optane 900p/905p drives do qualify as a valid, PLP-enabled, SLOG. For business use, play by the professional book and get a DC.

Lobanz · May 3, 2024, 5:44pm

That is my understanding too. Sounds like the PLP rating on the commercial drives are mostly about compliance, but I’m sure the “real” PLP does do something to increase the PLP some. Whether it’s a practical amount of additional protection is questionable. I just don’t know but there does seem to be a substantial difference between a 900p and a standard SSD, with the 900p substantially closer to the enterprise PLP end of the spectrum.

Lobanz · May 3, 2024, 7:29pm

OK. Getting some test results from these two servers. Wanting to sanity check these results. Particularly the NFS performance which seems abysmal.

Truenas Server Specs:

Dell X720xd
256G ECC RAM
2x E5-2680 V2 2.8GHz 20/40 cores/threads total
LSI SAS2308 (Dell H710p Mini Mono in IT mode) running at PCIe 3.0 speeds (x8 I think)
12x HGST 4TB 7.2k SAS, 512 native sectors
Intel Optane 900p PCIe card
Intel X520 dual port 10G NIC
Intel I350 dual port 1G NIC
TrueNAS DragonFish 24.04
ZFS pool is 2x 6-drive raidz2 vdevs with Optain SLOG.

Test Client Specs:

Dell X720
256G ECC RAM
2x E5-2640 v2 2Ghz 16/32 cores/threads total
LSI SAS2308 (Dell H710 Mini Mono in IT mode) running at PCIe 3.0 speeds (x8 I think)
8x HGST 4TB 7.2k SAS, 512 native sectors (not used), boots of a couple of 256G software raided SATA m.2 SSDs
Intel X520 dual port 10G NIC
Intel I350 dual port 1G NIC
Ubuntu Server 22.04.4

The servers are connected via a 10G LAG/bond that reliably iperfs at about 19.5 Gbps throughput. No switch. Just DAC cables. MTU 9000 on everything and that seems stable and did increased the iperf scores a bit. Tried 1500 and 9000 and no discernable difference in NFS performance. Pretty happy with the networking. Although throughput does fluctuate sometimes.

So, one would think that NFS performance would be somewhat close to what I see on the server itself.

On the pool, I made 3 datasets with the 3 sync levels:

standard: sync=standard,
sync: sync=always, and
async: sync=disabled.

Then I ran fio for random read/write tests in each of those datasets. Here’s the fio command I used:

fio --name=random-read --direct=1 --rw=randrw --bs=256k --ioengine=libaio --iodepth=64 --runtime=45 --numjobs=8 --time_based --group_reporting --eta-newline=1 --end_fsync=1

Here’s a summary of the results:

standard:
- read: IOPS=2551, BW=638MiB/s (669MB/s)
- write: IOPS=2549, BW=637MiB/s (668MB/s)
sync:
- read: IOPS=796, BW=199MiB/s (209MB/s)
- write: IOPS=807, BW=202MiB/s (212MB/s)
async:
- read: IOPS=4596, BW=1149MiB/s (1205MB/s)
- write: IOPS=4589, BW=1147MiB/s (1203MB/s)

Here’s the summary of same fio command run from the NFS client against the sync=standard dataset acroess that 2x10G bonded link:

nfs:
- read: IOPS=409, BW=102MiB/s (107MB/s)
- write: IOPS=416, BW=104MiB/s (109MB/s)
- nfs mount command:
  mount -t nfs4 -o proto=tcp,hard,intr,rw,noatime 10.8.8.10:/mnt/pool1 /mnt/pool1

Doesn’t the NFS result seem pretty lame? Less than 20% of what I saw on the server! Definately doesnt seem like network or ZFS performance. Feels like something on the client side. Tried various NFS rsize/wsize on the mount command.

I also did a local rsync on the client of a 10 GB file from local file system to the NFS mounted standard dataset and get about 180 MB/s. During the transfer it’s reporting ~560 MB/s, but I guess it has to sync at the end. Of course this is a synchronous write. But 180 MB/s still seems lame. Feels like some kind of client NFS bottleneck because iperf is great and server ZFS performancs is much better.

UPDATE: I scp’d the 10 GB from client to server and only got about 115 MB/s. Slow!

UPDATE 2: I transfered the 10g file with iperf and it starts out fast, but degrades. I’ve seen that in my fio nfs testing too.

root@vm-1:~# iperf3  -F ./10gfile -c 10.8.8.10
Connecting to host 10.8.8.10, port 5201
[  5] local 10.8.8.11 port 38384 connected to 10.8.8.10 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.08 GBytes  9.28 Gbits/sec    0   1.71 MBytes
[  5]   1.00-2.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.71 MBytes
[  5]   2.00-3.00   sec   929 MBytes  7.79 Gbits/sec    0   1.79 MBytes
[  5]   3.00-4.00   sec   421 MBytes  3.53 Gbits/sec    0   1.79 MBytes
[  5]   4.00-5.00   sec   371 MBytes  3.12 Gbits/sec    0   1.79 MBytes
[  5]   5.00-6.00   sec   368 MBytes  3.08 Gbits/sec    0   1.79 MBytes
[  5]   6.00-7.00   sec   361 MBytes  3.03 Gbits/sec    0   1.79 MBytes
[  5]   7.00-8.00   sec   355 MBytes  2.98 Gbits/sec    0   1.79 MBytes
[  5]   8.00-9.00   sec   348 MBytes  2.91 Gbits/sec    0   1.79 MBytes
[  5]   9.00-10.00  sec   348 MBytes  2.92 Gbits/sec    0   1.79 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  5.65 GBytes  4.85 Gbits/sec    0             sender
        Sent 5.65 GByte / 10.0 GByte (56%) of ./10gfile
[  5]   0.00-10.00  sec  5.64 GBytes  4.85 Gbits/sec                  receiver

But something’s rotten in Denmark. Here’s the other direction:

root@truenas[~]# iperf3  -F ./10gfile.iperf -c 10.8.8.11
Connecting to host 10.8.8.11, port 5201
[  5] local 10.8.8.10 port 59954 connected to 10.8.8.11 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  57.9 MBytes   485 Mbits/sec    1   1.09 MBytes
[  5]   1.00-2.00   sec  53.8 MBytes   451 Mbits/sec    2   1.09 MBytes
[  5]   2.00-3.00   sec  52.5 MBytes   440 Mbits/sec    1   1.09 MBytes
[  5]   3.00-4.00   sec  53.8 MBytes   451 Mbits/sec    0   1.09 MBytes
[  5]   4.00-5.00   sec  52.5 MBytes   440 Mbits/sec    1   1.09 MBytes
[  5]   5.00-6.00   sec  51.2 MBytes   430 Mbits/sec    1   1.09 MBytes
[  5]   6.00-7.00   sec  51.2 MBytes   430 Mbits/sec    0   1.09 MBytes
[  5]   7.00-8.00   sec  47.5 MBytes   398 Mbits/sec    2   1.09 MBytes
[  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    4    559 KBytes
[  5]   9.00-10.00  sec  1.25 MBytes  10.5 Mbits/sec    2    559 KBytes
[  5]  10.00-11.00  sec  0.00 Bytes  0.00 bits/sec    3    271 KBytes
[  5]  11.00-12.00  sec  0.00 Bytes  0.00 bits/sec    2    419 KBytes
[  5]  12.00-13.00  sec  1.25 MBytes  10.5 Mbits/sec    2    419 KBytes
[  5]  13.00-14.00  sec  0.00 Bytes  0.00 bits/sec    1    419 KBytes
[  5]  14.00-15.00  sec  0.00 Bytes  0.00 bits/sec    0    210 KBytes
[  5]  15.00-16.00  sec  0.00 Bytes  0.00 bits/sec    0    184 KBytes

UPDATE 3: got rid of LAG/bond. Just one 10G port now. It iperfs rock solid at 9.9 Gbps but exact same thing happens when I use iperf to transfer file data. I have a pair of different X520 10G cards i could try… Maybe one is bad?

Don’t have a clue. Any idears? Something overheating?

Long fio output follows.

On TrueNAS server, standard dataset sync=standard:

random-read: (groupid=0, jobs=8): err= 0: pid=2425253: Fri May  3 13:19:56 2024
  read: IOPS=2551, BW=638MiB/s (669MB/s)(28.1GiB/45061msec)
    slat (usec): min=60, max=520533, avg=2879.46, stdev=17868.62
    clat (usec): min=5, max=3694.3k, avg=99236.27, stdev=329789.34
     lat (usec): min=158, max=3774.5k, avg=102115.74, stdev=339367.08
    clat percentiles (msec):
     |  1.00th=[    8],  5.00th=[    9], 10.00th=[    9], 20.00th=[   10],
     | 30.00th=[   11], 40.00th=[   13], 50.00th=[   15], 60.00th=[   17],
     | 70.00th=[   21], 80.00th=[   27], 90.00th=[   53], 95.00th=[  718],
     | 99.00th=[ 1871], 99.50th=[ 2198], 99.90th=[ 2802], 99.95th=[ 3071],
     | 99.99th=[ 3272]
   bw (  KiB/s): min=72841, max=5504143, per=98.53%, avg=643505.08, stdev=105843.06, samples=711
   iops        : min=  284, max=21497, avg=2513.28, stdev=413.42, samples=711
  write: IOPS=2549, BW=637MiB/s (668MB/s)(28.0GiB/45061msec); 0 zone resets
    slat (usec): min=81, max=12977, avg=243.28, stdev=362.90
    clat (usec): min=5, max=3502.7k, avg=98209.99, stdev=325182.38
     lat (usec): min=128, max=3503.1k, avg=98453.27, stdev=325203.24
    clat percentiles (msec):
     |  1.00th=[    8],  5.00th=[    9], 10.00th=[    9], 20.00th=[   10],
     | 30.00th=[   11], 40.00th=[   13], 50.00th=[   15], 60.00th=[   17],
     | 70.00th=[   21], 80.00th=[   27], 90.00th=[   53], 95.00th=[  718],
     | 99.00th=[ 1871], 99.50th=[ 2165], 99.90th=[ 2735], 99.95th=[ 2970],
     | 99.99th=[ 3339]
   bw (  KiB/s): min=78485, max=5527477, per=98.64%, avg=643852.04, stdev=105950.41, samples=706
   iops        : min=  306, max=21588, avg=2514.67, stdev=413.83, samples=706
  lat (usec)   : 10=0.01%, 20=0.01%, 50=0.01%, 250=0.01%, 500=0.01%
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=20.95%, 20=48.40%, 50=20.26%
  lat (msec)   : 100=2.39%, 250=0.47%, 500=0.73%, 750=2.03%, 1000=1.54%
  lat (msec)   : 2000=2.45%, >=2000=0.78%
  cpu          : usr=0.95%, sys=14.22%, ctx=37616, majf=0, minf=49203
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.8%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=114954,114893,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=638MiB/s (669MB/s), 638MiB/s-638MiB/s (669MB/s-669MB/s), io=28.1GiB (30.1GB), run=45061-45061msec
  WRITE: bw=637MiB/s (668MB/s), 637MiB/s-637MiB/s (668MB/s-668MB/s), io=28.0GiB (30.1GB), run=45061-45061msec

On TrueNAS server, sync dataset sync=always:

random-read: (groupid=0, jobs=8): err= 0: pid=2427906: Fri May  3 13:21:20 2024
  read: IOPS=796, BW=199MiB/s (209MB/s)(8967MiB/45014msec)
    slat (usec): min=80, max=331218, avg=5349.10, stdev=10130.80
    clat (usec): min=8, max=972879, avg=312242.91, stdev=92485.24
     lat (msec): min=8, max=985, avg=317.59, stdev=94.83
    clat percentiles (msec):
     |  1.00th=[  146],  5.00th=[  203], 10.00th=[  236], 20.00th=[  262],
     | 30.00th=[  279], 40.00th=[  288], 50.00th=[  300], 60.00th=[  309],
     | 70.00th=[  321], 80.00th=[  338], 90.00th=[  397], 95.00th=[  472],
     | 99.00th=[  726], 99.50th=[  776], 99.90th=[  885], 99.95th=[  911],
     | 99.99th=[  961]
   bw (  KiB/s): min=62783, max=338432, per=99.45%, avg=202845.84, stdev=6429.46, samples=712
   iops        : min=  245, max= 1322, avg=791.83, stdev=25.09, samples=712
  write: IOPS=807, BW=202MiB/s (212MB/s)(9086MiB/45014msec); 0 zone resets
    slat (usec): min=702, max=20222, avg=4596.19, stdev=1917.04
    clat (usec): min=8, max=978503, avg=313615.08, stdev=93091.16
     lat (msec): min=3, max=983, avg=318.21, stdev=92.85
    clat percentiles (msec):
     |  1.00th=[  144],  5.00th=[  207], 10.00th=[  239], 20.00th=[  264],
     | 30.00th=[  279], 40.00th=[  292], 50.00th=[  300], 60.00th=[  313],
     | 70.00th=[  321], 80.00th=[  338], 90.00th=[  401], 95.00th=[  481],
     | 99.00th=[  726], 99.50th=[  785], 99.90th=[  877], 99.95th=[  911],
     | 99.99th=[  961]
   bw (  KiB/s): min=56832, max=322125, per=99.63%, avg=205913.07, stdev=6192.60, samples=712
   iops        : min=  222, max= 1258, avg=803.79, stdev=24.18, samples=712
  lat (usec)   : 10=0.01%, 20=0.01%
  lat (msec)   : 4=0.01%, 10=0.01%, 20=0.02%, 50=0.07%, 100=0.10%
  lat (msec)   : 250=14.07%, 500=81.41%, 750=3.58%, 1000=0.73%
  cpu          : usr=0.54%, sys=7.04%, ctx=110618, majf=0, minf=7705
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=35866,36342,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=199MiB/s (209MB/s), 199MiB/s-199MiB/s (209MB/s-209MB/s), io=8967MiB (9402MB), run=45014-45014msec
  WRITE: bw=202MiB/s (212MB/s), 202MiB/s-202MiB/s (212MB/s-212MB/s), io=9086MiB (9527MB), run=45014-45014msec

On TrueNAS server, async dataset sync=disabled:

random-read: (groupid=0, jobs=8): err= 0: pid=2420660: Fri May  3 13:17:34 2024
  read: IOPS=4596, BW=1149MiB/s (1205MB/s)(50.5GiB/45013msec)
    slat (usec): min=49, max=317261, avg=1588.74, stdev=10755.39
    clat (usec): min=4, max=968570, avg=54624.69, stdev=82414.62
     lat (usec): min=94, max=982937, avg=56213.43, stdev=84275.58
    clat percentiles (msec):
     |  1.00th=[    6],  5.00th=[    7], 10.00th=[    7], 20.00th=[    8],
     | 30.00th=[   10], 40.00th=[   18], 50.00th=[   30], 60.00th=[   42],
     | 70.00th=[   55], 80.00th=[   72], 90.00th=[  111], 95.00th=[  243],
     | 99.00th=[  359], 99.50th=[  514], 99.90th=[  760], 99.95th=[  818],
     | 99.99th=[  919]
   bw (  MiB/s): min=  260, max= 3937, per=99.12%, avg=1138.90, stdev=177.27, samples=712
   iops        : min= 1040, max=15747, avg=4554.35, stdev=709.08, samples=712
  write: IOPS=4589, BW=1147MiB/s (1203MB/s)(50.4GiB/45013msec); 0 zone resets
    slat (usec): min=48, max=14383, avg=137.98, stdev=95.76
    clat (usec): min=4, max=966159, avg=54952.42, stdev=82926.17
     lat (usec): min=111, max=966369, avg=55090.40, stdev=82932.57
    clat percentiles (msec):
     |  1.00th=[    6],  5.00th=[    7], 10.00th=[    7], 20.00th=[    9],
     | 30.00th=[   10], 40.00th=[   18], 50.00th=[   31], 60.00th=[   42],
     | 70.00th=[   55], 80.00th=[   72], 90.00th=[  111], 95.00th=[  243],
     | 99.00th=[  363], 99.50th=[  542], 99.90th=[  768], 99.95th=[  818],
     | 99.99th=[  877]
   bw (  MiB/s): min=  273, max= 3887, per=99.10%, avg=1137.00, stdev=175.85, samples=712
   iops        : min= 1092, max=15547, avg=4546.76, stdev=703.37, samples=712
  lat (usec)   : 10=0.01%, 20=0.01%, 100=0.01%, 250=0.01%, 500=0.01%
  lat (usec)   : 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=31.92%, 20=9.29%, 50=25.30%
  lat (msec)   : 100=21.64%, 250=7.15%, 500=4.16%, 750=0.40%, 1000=0.12%
  cpu          : usr=1.83%, sys=15.90%, ctx=15978, majf=0, minf=25909
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=206886,206570,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=1149MiB/s (1205MB/s), 1149MiB/s-1149MiB/s (1205MB/s-1205MB/s), io=50.5GiB (54.2GB), run=45013-45013msec
  WRITE: bw=1147MiB/s (1203MB/s), 1147MiB/s-1147MiB/s (1203MB/s-1203MB/s), io=50.4GiB (54.2GB), run=45013-45013msec

On NFS client, standard dataset sync=standard:

random-read: (groupid=0, jobs=8): err= 0: pid=2823: Fri May  3 18:24:34 2024
  read: IOPS=409, BW=102MiB/s (107MB/s)(4684MiB/45722msec)
    slat (usec): min=22, max=901, avg=71.64, stdev=14.62
    clat (msec): min=2, max=1661, avg=648.52, stdev=188.88
     lat (msec): min=2, max=1662, avg=648.60, stdev=188.88
    clat percentiles (msec):
     |  1.00th=[  161],  5.00th=[  451], 10.00th=[  493], 20.00th=[  531],
     | 30.00th=[  558], 40.00th=[  584], 50.00th=[  609], 60.00th=[  642],
     | 70.00th=[  676], 80.00th=[  735], 90.00th=[  944], 95.00th=[ 1045],
     | 99.00th=[ 1217], 99.50th=[ 1301], 99.90th=[ 1418], 99.95th=[ 1452],
     | 99.99th=[ 1620]
   bw (  KiB/s): min=18944, max=246162, per=100.00%, avg=105220.03, stdev=3986.11, samples=717
   iops        : min=   74, max=  959, avg=410.83, stdev=15.55, samples=717
  write: IOPS=416, BW=104MiB/s (109MB/s)(4759MiB/45722msec); 0 zone resets
    slat (usec): min=27, max=473, avg=88.65, stdev=15.87
    clat (msec): min=4, max=1228, avg=581.48, stdev=164.22
     lat (msec): min=4, max=1228, avg=581.57, stdev=164.23
    clat percentiles (msec):
     |  1.00th=[  124],  5.00th=[  430], 10.00th=[  464], 20.00th=[  493],
     | 30.00th=[  514], 40.00th=[  535], 50.00th=[  550], 60.00th=[  575],
     | 70.00th=[  600], 80.00th=[  634], 90.00th=[  860], 95.00th=[  978],
     | 99.00th=[ 1070], 99.50th=[ 1133], 99.90th=[ 1200], 99.95th=[ 1200],
     | 99.99th=[ 1217]
   bw (  KiB/s): min=16253, max=260076, per=100.00%, avg=106890.97, stdev=4256.59, samples=718
   iops        : min=   63, max= 1013, avg=417.34, stdev=16.60, samples=718
  lat (msec)   : 4=0.01%, 10=0.06%, 20=0.07%, 50=0.19%, 100=0.30%
  lat (msec)   : 250=1.78%, 500=14.90%, 750=68.13%, 1000=8.89%, 2000=5.67%
  cpu          : usr=0.29%, sys=1.55%, ctx=37526, majf=0, minf=236
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.2%, 16=0.3%, 32=0.7%, >=64=98.7%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=18736,19035,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=102MiB/s (107MB/s), 102MiB/s-102MiB/s (107MB/s-107MB/s), io=4684MiB (4912MB), run=45722-45722msec
  WRITE: bw=104MiB/s (109MB/s), 104MiB/s-104MiB/s (109MB/s-109MB/s), io=4759MiB (4990MB), run=45722-45722msec

Whattteva · May 3, 2024, 9:33pm

I’ve never used Optane drives personally, but the sync performance looks normal to me if I compare it to consumer SSD stuff… well better really, consumer SSD’s can go as low as 200 IOPS or worse. Furthermore, less than 20% sounds about right for fsync performance. That’s also what I’ve observed with my enterprise SSD’s though the base numbers are much higher.

For comparison, here are my numbers for Intel DC S-3500:
fsync/s (always) = 4024.87
fsync/s (disabled) = 28248.21

Notice that even for enterprise level SSD’s, sync writes, in general, incurs a really heavy performance cost. BTW, this is “just a lowly” SATA SSD, which are also at 60%+ wearout. I’d imagine NVme numbers would be more impressive, but I’m too cheap for that. But this is why I always prefer enterprise SATA SSD’s for server workloads over even NVme consumer ones.

Stux · May 3, 2024, 9:59pm

Which is also why one should not follow the last piece of “advice” from that proxmox thread and “re-enable drive cache” to magically make sync writes faster.

Stux · May 3, 2024, 10:04pm

Is the client ESXi?

Whattteva · May 3, 2024, 10:09pm

To be fair, it does indeed make it faster and to some people, they don’t mind the risk, which is a workable solution depending on your risk tolerance. Personally, I’m not a risk taker, but everyone’s different.

Stux · May 3, 2024, 10:12pm

Just disable sync if you don’t want sync.

Of course. That just makes it obvious what you’ve done, as opposed to an invisible per drive setting

Lobanz · May 4, 2024, 12:26am

Nope. Just Ubuntu 22.04.4. Will be xcp-ng, eventually. Just trying to get the hardware wrung out.

Stux · May 4, 2024, 1:40am

And you’ve tested the network perf with iperf3 between the client and server?

Run iperf3 -s in the shell to start server on TrueNAS

NickF1227 · May 4, 2024, 5:14am

I think the test just shreds the system more than the disks. This is sync=never on a pool of 2 mirrors of 960gb 905 optane drives with a Micron NVDIMM SLOG

This system is in production doing other things tho.

Sync Never

root@prod[/mnt/optane_vm/fio]# fio --name=random-read --direct=1 --rw=randrw --bs=256k --ioengine=libaio --iodepth=64 --runtime=45 --numjobs=8 --time_based --group_reporting --eta-newline=1 --end_fsync=1 --size=10G

random-read: (g=0): rw=randrw, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=64
...
fio-3.33
Starting 8 processes
Jobs: 8 (f=8): [m(8)][6.5%][r=1499MiB/s,w=1498MiB/s][r=5994,w=5990 IOPS][eta 00m:43s]
Jobs: 8 (f=8): [m(8)][8.7%][r=1508MiB/s,w=1529MiB/s][r=6032,w=6115 IOPS][eta 00mJobs: 8 (f=8): [m(8)][10.9%][r=1106MiB/s,w=1115MiB/s][r=4425,w=4461 IOPS][eta 00m:41s]
Jobs: 8 (f=8): [m(8)][13.0%][r=1055MiB/s,w=1043MiB/s][r=4221,w=4170 IOPS][eta 00m:40s]
Jobs: 8 (f=8): [m(8)][15.2%][r=1028MiB/s,w=1049MiB/s][r=4112,w=4196 IOPS][eta 00Jobs: 8 (f=8): [m(8)][17.4%][r=1124MiB/s,w=1092MiB/s][r=4496,w=4366 IOPS][eta 00m:38s]
Jobs: 8 (f=8): [m(8)][19.6%][r=699MiB/s,w=682MiB/s][r=2796,w=2728 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][21.7%][r=467MiB/s,w=460MiB/s][r=1867,w=1838 IOPS][eta 00m:36s]
Jobs: 8 (f=8): [m(8)][23.9%][r=368MiB/s,w=381MiB/s][r=1471,w=1525 IOPS][eta 00m:35s]
Jobs: 8 (f=8): [m(8)][26.1%][r=352MiB/s,w=353MiB/s][r=1409,w=1412 IOPS][eta 00m:34s]
Jobs: 8 (f=8): [m(8)][28.3%][r=344MiB/s,w=332MiB/s][r=1377,w=1329 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][30.4%][r=342MiB/s,w=326MiB/s][r=1369,w=1302 IOPS][eta 00m:32s]
Jobs: 8 (f=8): [m(8)][32.6%][r=339MiB/s,w=341MiB/s][r=1355,w=1362 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][34.8%][r=335MiB/s,w=334MiB/s][r=1341,w=1335 IOPS][eta 00m:30s]
Jobs: 8 (f=8): [m(8)][37.0%][r=335MiB/s,w=334MiB/s][r=1339,w=1335 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][39.1%][r=313MiB/s,w=330MiB/s][r=1250,w=1320 IOPS][eta 00m:28s]
Jobs: 8 (f=8): [m(8)][41.3%][r=327MiB/s,w=306MiB/s][r=1309,w=1225 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][43.5%][r=337MiB/s,w=333MiB/s][r=1347,w=1331 IOPS][eta 00m:26s]
Jobs: 8 (f=8): [m(8)][45.7%][r=328MiB/s,w=339MiB/s][r=1310,w=1356 IOPS][eta 00m:25s]
Jobs: 8 (f=8): [m(8)][47.8%][r=359MiB/s,w=328MiB/s][r=1435,w=1310 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][50.0%][r=327MiB/s,w=327MiB/s][r=1308,w=1309 IOPS][eta 00m:23s]
Jobs: 8 (f=8): [m(8)][52.2%][r=320MiB/s,w=327MiB/s][r=1278,w=1307 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][54.3%][r=330MiB/s,w=346MiB/s][r=1321,w=1385 IOPS][eta 00m:21s]
Jobs: 8 (f=8): [m(8)][56.5%][r=352MiB/s,w=322MiB/s][r=1406,w=1289 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][58.7%][r=344MiB/s,w=339MiB/s][r=1374,w=1356 IOPS][eta 00m:19s]
Jobs: 8 (f=8): [m(8)][60.9%][r=345MiB/s,w=335MiB/s][r=1380,w=1339 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][63.0%][r=328MiB/s,w=323MiB/s][r=1313,w=1291 IOPS][eta 00m:17s]
Jobs: 8 (f=8): [m(8)][65.2%][r=326MiB/s,w=352MiB/s][r=1303,w=1409 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][67.4%][r=336MiB/s,w=334MiB/s][r=1344,w=1335 IOPS][eta 00m:15s]
Jobs: 8 (f=8): [m(8)][69.6%][r=346MiB/s,w=337MiB/s][r=1385,w=1346 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][71.7%][r=326MiB/s,w=336MiB/s][r=1303,w=1342 IOPS][eta 00m:13s]
Jobs: 8 (f=8): [m(8)][73.9%][r=337MiB/s,w=336MiB/s][r=1348,w=1344 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][76.1%][r=343MiB/s,w=327MiB/s][r=1370,w=1306 IOPS][eta 00m:11s]
Jobs: 8 (f=8): [m(8)][78.3%][r=348MiB/s,w=342MiB/s][r=1393,w=1369 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][80.4%][r=367MiB/s,w=378MiB/s][r=1469,w=1510 IOPS][eta 00m:09s]
Jobs: 8 (f=8): [m(8)][82.6%][r=335MiB/s,w=327MiB/s][r=1338,w=1307 IOPS][eta 00m:08s]
Jobs: 8 (f=8): [m(8)][84.8%][r=305MiB/s,w=292MiB/s][r=1221,w=1169 IOPS][eta 00m:07s]
Jobs: 8 (f=8): [m(8)][87.0%][r=297MiB/s,w=287MiB/s][r=1186,w=1147 IOPS][eta 00m:06s]
Jobs: 8 (f=8): [m(8)][89.1%][r=312MiB/s,w=322MiB/s][r=1247,w=1289 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][91.3%][r=296MiB/s,w=307MiB/s][r=1182,w=1229 IOPS][eta 00m:04s]
Jobs: 8 (f=8): [m(8)][93.5%][r=325MiB/s,w=313MiB/s][r=1300,w=1252 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][97.8%][r=299MiB/s,w=317MiB/s][r=1195,w=1266 IOPS][eta 00m:01s]
Jobs: 8 (f=8): [m(8)][100.0%][r=302MiB/s,w=326MiB/s][r=1209,w=1303 IOPS][eta 00mJobs: 8 (f=0): [f(8)][100.0%][r=378MiB/s,w=377MiB/s][r=1511,w=1509 IOPS][eta 00m:00s]
random-read: (groupid=0, jobs=8): err= 0: pid=1352671: Sat May  4 01:13:13 2024
  read: IOPS=2085, BW=521MiB/s (547MB/s)(22.9GiB/45003msec)
    slat (usec): min=155, max=412127, avg=1666.49, stdev=3125.01
    clat (usec): min=7, max=1065.9k, avg=120696.94, stdev=84191.89
     lat (msec): min=3, max=1070, avg=122.36, stdev=85.21
    clat percentiles (msec):
     |  1.00th=[   12],  5.00th=[   13], 10.00th=[   14], 20.00th=[   19],
     | 30.00th=[   51], 40.00th=[   81], 50.00th=[  150], 60.00th=[  174],
     | 70.00th=[  186], 80.00th=[  194], 90.00th=[  207], 95.00th=[  218],
     | 99.00th=[  241], 99.50th=[  284], 99.90th=[  978], 99.95th=[ 1011],
     | 99.99th=[ 1053]
   bw (  KiB/s): min=215790, max=4558510, per=98.98%, avg=528328.35, stdev=74239.19, samples=710
   iops        : min=  839, max=17802, avg=2062.44, stdev=289.97, samples=710
  write: IOPS=2085, BW=521MiB/s (547MB/s)(22.9GiB/45003msec); 0 zone resets
    slat (usec): min=115, max=288020, avg=2150.51, stdev=2780.52
    clat (usec): min=7, max=1069.5k, avg=120935.42, stdev=84379.07
     lat (msec): min=2, max=1075, avg=123.09, stdev=85.77
    clat percentiles (msec):
     |  1.00th=[   12],  5.00th=[   13], 10.00th=[   14], 20.00th=[   19],
     | 30.00th=[   50], 40.00th=[   80], 50.00th=[  153], 60.00th=[  176],
     | 70.00th=[  186], 80.00th=[  197], 90.00th=[  209], 95.00th=[  218],
     | 99.00th=[  243], 99.50th=[  271], 99.90th=[  634], 99.95th=[ 1011],
     | 99.99th=[ 1053]
   bw (  KiB/s): min=155833, max=4676081, per=98.67%, avg=526936.13, stdev=75387.78, samples=712
   iops        : min=  606, max=18262, avg=2056.96, stdev=294.47, samples=712
  lat (usec)   : 10=0.01%, 20=0.01%, 250=0.01%
  lat (msec)   : 4=0.01%, 10=0.01%, 20=20.80%, 50=9.13%, 100=13.64%
  lat (msec)   : 250=55.65%, 500=0.63%, 750=0.04%, 1000=0.03%, 2000=0.07%
  cpu          : usr=0.86%, sys=10.46%, ctx=186809, majf=0, minf=79934
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.7%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=93835,93876,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=521MiB/s (547MB/s), 521MiB/s-521MiB/s (547MB/s-547MB/s), io=22.9GiB (24.6GB), run=45003-45003msec
  WRITE: bw=521MiB/s (547MB/s), 521MiB/s-521MiB/s (547MB/s-547MB/s), io=22.9GiB (24.6GB), run=45003-45003msec

Sync Always:

root@prod[/mnt/optane_vm/fio]# fio --name=random-read --direct=1 --rw=randrw --bs=256k --ioengine=libaio --iodepth=64 --runtime=45 --numjobs=8 --time_based --group_reporting --eta-newline=1 --end_fsync=1 --size=10G

random-read: (g=0): rw=randrw, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=64
...
fio-3.33
Starting 8 processes
Jobs: 8 (f=8): [m(8)][8.7%][r=620MiB/s,w=642MiB/s][r=2480,w=2566 IOPS][eta 00m:42s]
Jobs: 8 (f=8): [m(8)][10.9%][r=618MiB/s,w=611MiB/s][r=2473,w=2444 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][13.0%][r=658MiB/s,w=659MiB/s][r=2632,w=2635 IOPS][eta 00m:40s]
Jobs: 8 (f=8): [m(8)][15.2%][r=618MiB/s,w=628MiB/s][r=2473,w=2510 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][17.4%][r=478MiB/s,w=497MiB/s][r=1911,w=1986 IOPS][eta 00m:38s]
Jobs: 8 (f=8): [m(8)][19.6%][r=437MiB/s,w=419MiB/s][r=1747,w=1674 IOPS][eta 00m:37s]
Jobs: 8 (f=8): [m(8)][21.7%][r=1813MiB/s,w=1836MiB/s][r=7251,w=7343 IOPS][eta 00Jobs: 8 (f=8): [m(8)][23.9%][r=603MiB/s,w=634MiB/s][r=2413,w=2537 IOPS][eta 00m:35s]
Jobs: 8 (f=8): [m(8)][26.1%][r=453MiB/s,w=451MiB/s][r=1812,w=1804 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][28.3%][r=535MiB/s,w=501MiB/s][r=2141,w=2002 IOPS][eta 00m:33s]
Jobs: 8 (f=8): [m(8)][30.4%][r=520MiB/s,w=515MiB/s][r=2078,w=2058 IOPS][eta 00m:32s]
Jobs: 8 (f=8): [m(8)][32.6%][r=500MiB/s,w=505MiB/s][r=2000,w=2021 IOPS][eta 00m:31s]
Jobs: 8 (f=8): [m(8)][34.8%][r=464MiB/s,w=440MiB/s][r=1854,w=1759 IOPS][eta 00m:30s]
Jobs: 8 (f=8): [m(8)][37.0%][r=513MiB/s,w=517MiB/s][r=2051,w=2066 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][39.1%][r=531MiB/s,w=549MiB/s][r=2124,w=2197 IOPS][eta 00m:28s]
Jobs: 8 (f=8): [m(8)][41.3%][r=459MiB/s,w=439MiB/s][r=1834,w=1754 IOPS][eta 00m:27s]
Jobs: 8 (f=8): [m(8)][43.5%][r=427MiB/s,w=409MiB/s][r=1706,w=1637 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][45.7%][r=453MiB/s,w=454MiB/s][r=1813,w=1817 IOPS][eta 00m:25s]
Jobs: 8 (f=8): [m(8)][47.8%][r=513MiB/s,w=481MiB/s][r=2050,w=1925 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][50.0%][r=469MiB/s,w=478MiB/s][r=1874,w=1910 IOPS][eta 00m:23s]
Jobs: 8 (f=8): [m(8)][52.2%][r=431MiB/s,w=437MiB/s][r=1722,w=1748 IOPS][eta 00m:22s]
Jobs: 8 (f=8): [m(8)][54.3%][r=423MiB/s,w=406MiB/s][r=1693,w=1623 IOPS][eta 00m:21s]
Jobs: 8 (f=8): [m(8)][56.5%][r=445MiB/s,w=456MiB/s][r=1780,w=1825 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][58.7%][r=360MiB/s,w=346MiB/s][r=1440,w=1383 IOPS][eta 00m:19s]
Jobs: 8 (f=8): [m(8)][62.2%][r=328MiB/s,w=331MiB/s][r=1311,w=1324 IOPS][eta 00m:17s]
Jobs: 8 (f=8): [m(8)][64.4%][r=689MiB/s,w=667MiB/s][r=2756,w=2666 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][66.7%][r=382MiB/s,w=364MiB/s][r=1529,w=1456 IOPS][eta 00m:15s]
Jobs: 8 (f=8): [m(8)][68.9%][r=433MiB/s,w=428MiB/s][r=1731,w=1711 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][71.1%][r=387MiB/s,w=398MiB/s][r=1549,w=1593 IOPS][eta 00m:13s]
Jobs: 8 (f=8): [m(8)][73.3%][r=354MiB/s,w=355MiB/s][r=1417,w=1421 IOPS][eta 00m:12s]
Jobs: 8 (f=8): [m(8)][75.6%][r=328MiB/s,w=321MiB/s][r=1313,w=1283 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][77.8%][r=336MiB/s,w=339MiB/s][r=1342,w=1357 IOPS][eta 00m:10s]
Jobs: 8 (f=8): [m(8)][80.0%][r=318MiB/s,w=313MiB/s][r=1271,w=1252 IOPS][eta 00m:09s]
Jobs: 8 (f=8): [m(8)][82.2%][r=364MiB/s,w=355MiB/s][r=1454,w=1421 IOPS][eta 00m:08s]
Jobs: 8 (f=8): [m(8)][84.4%][r=304MiB/s,w=333MiB/s][r=1217,w=1333 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][86.7%][r=309MiB/s,w=313MiB/s][r=1235,w=1251 IOPS][eta 00m:06s]
Jobs: 8 (f=8): [m(8)][88.9%][r=344MiB/s,w=337MiB/s][r=1377,w=1349 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][91.1%][r=302MiB/s,w=316MiB/s][r=1208,w=1265 IOPS][eta 00m:04s]
Jobs: 8 (f=8): [m(8)][93.3%][r=332MiB/s,w=340MiB/s][r=1327,w=1359 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][95.6%][r=314MiB/s,w=313MiB/s][r=1256,w=1252 IOPS][eta 00m:02s]
Jobs: 8 (f=8): [m(8)][97.8%][r=334MiB/s,w=332MiB/s][r=1336,w=1326 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][100.0%][r=319MiB/s,w=326MiB/s][r=1274,w=1303 IOPS][eta 00m:00s]
Jobs: 2 (f=0): [f(2),E(2),_(1),E(3)][100.0%][r=348MiB/s,w=347MiB/s][r=1390,w=1388 IOPS][eta 00m:00s]
random-read: (groupid=0, jobs=8): err= 0: pid=1362291: Sat May  4 01:16:39 2024
  read: IOPS=2033, BW=508MiB/s (533MB/s)(22.3GiB/45006msec)
    slat (usec): min=162, max=192464, avg=1554.48, stdev=1912.72
    clat (usec): min=6, max=449191, avg=123960.02, stdev=62939.75
     lat (msec): min=4, max=452, avg=125.51, stdev=63.58
    clat percentiles (msec):
     |  1.00th=[   26],  5.00th=[   27], 10.00th=[   29], 20.00th=[   69],
     | 30.00th=[   97], 40.00th=[  111], 50.00th=[  126], 60.00th=[  142],
     | 70.00th=[  161], 80.00th=[  178], 90.00th=[  197], 95.00th=[  215],
     | 99.00th=[  305], 99.50th=[  334], 99.90th=[  368], 99.95th=[  384],
     | 99.99th=[  418]
   bw (  KiB/s): min=197743, max=2254298, per=100.00%, avg=521300.33, stdev=45736.58, samples=712
   iops        : min=  766, max= 8805, avg=2035.13, stdev=178.67, samples=712
  write: IOPS=2035, BW=509MiB/s (534MB/s)(22.4GiB/45006msec); 0 zone resets
    slat (usec): min=279, max=203222, avg=2350.38, stdev=3088.49
    clat (usec): min=6, max=440506, avg=123735.29, stdev=63756.01
     lat (msec): min=2, max=448, avg=126.09, stdev=64.89
    clat percentiles (msec):
     |  1.00th=[   26],  5.00th=[   27], 10.00th=[   29], 20.00th=[   65],
     | 30.00th=[   96], 40.00th=[  111], 50.00th=[  125], 60.00th=[  142],
     | 70.00th=[  161], 80.00th=[  180], 90.00th=[  199], 95.00th=[  218],
     | 99.00th=[  305], 99.50th=[  338], 99.90th=[  372], 99.95th=[  384],
     | 99.99th=[  409]
   bw (  KiB/s): min=218234, max=2323916, per=100.00%, avg=521989.43, stdev=47153.31, samples=712
   iops        : min=  847, max= 9077, avg=2037.83, stdev=184.21, samples=712
  lat (usec)   : 10=0.01%
  lat (msec)   : 4=0.01%, 10=0.01%, 20=0.01%, 50=18.96%, 100=13.38%
  lat (msec)   : 250=65.64%, 500=1.99%
  cpu          : usr=1.02%, sys=15.01%, ctx=587686, majf=0, minf=111559
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.7%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=91500,91597,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=508MiB/s (533MB/s), 508MiB/s-508MiB/s (533MB/s-533MB/s), io=22.3GiB (24.0GB), run=45006-45006msec
  WRITE: bw=509MiB/s (534MB/s), 509MiB/s-509MiB/s (534MB/s-534MB/s), io=22.4GiB (24.0GB), run=45006-45006msec
root@prod[/mnt/optane_vm/fio]#

Rand · May 4, 2024, 7:33am

That seems fairly low, especially on writes - are those -P or N?.

Edit - Ah thats what you meant when you said it taxes the system more than the drives

In general the test seems to be a bit weird as a NFS storage test

you sure XCP uses 256K blocksize?
QD of 64 seems a bit excessive unless each vm is really going to thresh the pool. 16 should be more than enough. Better increase workers/numjobs to the rough numbers of vms you’re planning to run instead

In general it is expected that remote tests are significantly worse then local tests, many gray hairs have been caused by this.
My personal hope was always that RDMA would help, but thats not coming soon, and might not be as beneficial as I hope.

In the meantime don’t fret about the differences, just try to get it fast enough, I’ll run some tests for more comparison points too

Edit2
All tests on Core, with posixaio

TNC 13U6, Xeon 1245v6, 64G, 5 x pm863a in Z1, P1600X slog, sync always, local test

READ: bw=712MiB/s (746MB/s), WRITE: bw=712MiB/s (746MB/s)

fio --name=random-read --direct=1 --rw=randrw --bs=256k --ioengine=posixaio --iodepth=64 --runtime=45 --numjobs=8 --time_based --group_reporting --eta-newline=1 --end_fsync=1 --size=10G
random-read: (g=0): rw=randrw, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=posixaio, iodepth=64
…
fio-3.28
Starting 8 processes
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
Jobs: 8 (f=8): [m(8)][8.7%][r=587MiB/s,w=630MiB/s][r=2346,w=2519 IOPS][eta 00m:42s]
Jobs: 8 (f=8): [m(8)][13.0%][r=791MiB/s,w=787MiB/s][r=3165,w=3148 IOPS][eta 00m:40s]
Jobs: 8 (f=8): [m(8)][17.4%][r=545MiB/s,w=552MiB/s][r=2181,w=2209 IOPS][eta 00m:38s]
Jobs: 8 (f=8): [m(8)][19.6%][r=730MiB/s,w=748MiB/s][r=2919,w=2990 IOPS][eta 00m:37s]
Jobs: 8 (f=8): [m(8)][23.9%][r=673MiB/s,w=691MiB/s][r=2691,w=2763 IOPS][eta 00m:35s]
Jobs: 8 (f=8): [m(8)][28.3%][r=881MiB/s,w=906MiB/s][r=3525,w=3623 IOPS][eta 00m:33s]
Jobs: 8 (f=8): [m(8)][32.6%][r=551MiB/s,w=500MiB/s][r=2204,w=1999 IOPS][eta 00m:31s]
Jobs: 8 (f=8): [m(8)][37.0%][r=701MiB/s,w=692MiB/s][r=2804,w=2768 IOPS][eta 00m:29s]
Jobs: 8 (f=8): [m(8)][41.3%][r=744MiB/s,w=739MiB/s][r=2977,w=2956 IOPS][eta 00m:27s]
Jobs: 8 (f=8): [m(8)][45.7%][r=674MiB/s,w=652MiB/s][r=2696,w=2608 IOPS][eta 00m:25s]
Jobs: 8 (f=8): [m(8)][47.8%][r=722MiB/s,w=722MiB/s][r=2886,w=2888 IOPS][eta 00m:24s]
Jobs: 8 (f=8): [m(8)][52.2%][r=852MiB/s,w=842MiB/s][r=3407,w=3368 IOPS][eta 00m:22s]
Jobs: 8 (f=8): [m(8)][56.5%][r=731MiB/s,w=684MiB/s][r=2923,w=2734 IOPS][eta 00m:20s]
Jobs: 8 (f=8): [m(8)][60.9%][r=728MiB/s,w=730MiB/s][r=2912,w=2918 IOPS][eta 00m:18s]
Jobs: 8 (f=8): [m(8)][65.2%][r=794MiB/s,w=816MiB/s][r=3175,w=3264 IOPS][eta 00m:16s]
Jobs: 8 (f=8): [m(8)][69.6%][r=671MiB/s,w=678MiB/s][r=2682,w=2711 IOPS][eta 00m:14s]
Jobs: 8 (f=8): [m(8)][73.9%][r=720MiB/s,w=708MiB/s][r=2880,w=2833 IOPS][eta 00m:12s]
Jobs: 8 (f=8): [m(8)][78.3%][r=645MiB/s,w=705MiB/s][r=2579,w=2819 IOPS][eta 00m:10s]
Jobs: 8 (f=8): [m(8)][82.6%][r=674MiB/s,w=642MiB/s][r=2695,w=2568 IOPS][eta 00m:08s]
Jobs: 8 (f=8): [m(8)][87.0%][r=793MiB/s,w=806MiB/s][r=3172,w=3223 IOPS][eta 00m:06s]
Jobs: 8 (f=8): [m(8)][91.3%][r=641MiB/s,w=615MiB/s][r=2565,w=2458 IOPS][eta 00m:04s]
Jobs: 8 (f=8): [m(8)][93.5%][r=859MiB/s,w=849MiB/s][r=3435,w=3397 IOPS][eta 00m:03s]
Jobs: 8 (f=8): [m(8)][97.8%][r=621MiB/s,w=602MiB/s][r=2483,w=2406 IOPS][eta 00m:01s]
Jobs: 8 (f=8): [m(8)][79.3%][r=670MiB/s,w=675MiB/s][r=2680,w=2698 IOPS][eta 00m:12s]
random-read: (groupid=0, jobs=8): err= 0: pid=4485: Sat May 4 09:58:30 2024
read: IOPS=2846, BW=712MiB/s (746MB/s)(31.3GiB/45060msec)
slat (nsec): min=434, max=5142.7k, avg=1386.89, stdev=17402.87
clat (msec): min=8, max=548, avg=97.45, stdev=85.20
lat (msec): min=8, max=548, avg=97.45, stdev=85.20
clat percentiles (msec):
| 1.00th=[ 51], 5.00th=[ 57], 10.00th=[ 61], 20.00th=[ 65],
| 30.00th=[ 67], 40.00th=[ 70], 50.00th=[ 72], 60.00th=[ 75],
| 70.00th=[ 81], 80.00th=[ 87], 90.00th=[ 109], 95.00th=[ 376],
| 99.00th=[ 447], 99.50th=[ 489], 99.90th=[ 514], 99.95th=[ 518],
| 99.99th=[ 531]
bw ( KiB/s): min=149713, max=1110304, per=100.00%, avg=731992.00, stdev=31485.74, samples=712
iops : min= 578, max= 4330, avg=2856.36, stdev=123.04, samples=712
write: IOPS=2847, BW=712MiB/s (746MB/s)(31.3GiB/45060msec); 0 zone resets
slat (usec): min=2, max=5304, avg=14.95, stdev=39.01
clat (msec): min=3, max=533, avg=82.13, stdev=86.23
lat (msec): min=3, max=533, avg=82.15, stdev=86.23
clat percentiles (msec):
| 1.00th=[ 44], 5.00th=[ 47], 10.00th=[ 49], 20.00th=[ 52],
| 30.00th=[ 54], 40.00th=[ 56], 50.00th=[ 58], 60.00th=[ 61],
| 70.00th=[ 64], 80.00th=[ 68], 90.00th=[ 79], 95.00th=[ 368],
| 99.00th=[ 443], 99.50th=[ 485], 99.90th=[ 510], 99.95th=[ 514],
| 99.99th=[ 518]
bw ( KiB/s): min=152780, max=1148383, per=100.00%, avg=732064.47, stdev=32959.23, samples=712
iops : min= 590, max= 4479, avg=2856.45, stdev=128.79, samples=712
lat (msec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=8.08%, 100=82.15%
lat (msec) : 250=2.88%, 500=6.66%, 750=0.22%
cpu : usr=0.84%, sys=1.06%, ctx=235503, majf=1, minf=15
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=1.2%, 32=84.5%, >=64=14.2%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=93.6%, 8=2.6%, 16=2.7%, 32=1.0%, 64=0.1%, >=64=0.0%
issued rwts: total=128247,128298,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
READ: bw=712MiB/s (746MB/s), 712MiB/s-712MiB/s (746MB/s-746MB/s), io=31.3GiB (33.6GB), run=45060-45060msec
WRITE: bw=712MiB/s (746MB/s), 712MiB/s-712MiB/s (746MB/s-746MB/s), io=31.3GiB (33.6GB), run=45060-45060msec

TNC 13U6, Xeon Gold 5317 Virtualized 4Cores, 256G, 6 x pm863a in 3mirror, P5800X slog, sync always, local test

READ: bw=909MiB/s WRITE: bw=910MiB/s (954MB/s)

fio --name=random-read --direct=1 --rw=randrw --bs=256k --ioengine=posixaio --iodepth=64 --runtime=45 --numjobs=8 --time_based --group_reporting --eta-newline=1 --end_fsync=1 --size=10G
random-read: (g=0): rw=randrw, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=posixaio, iodepth=64
…
fio-3.28
Starting 8 processes
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
Jobs: 8 (f=8): [m(8)][6.7%][r=927MiB/s,w=963MiB/s][r=3709,w=3851 IOPS][eta 00m:4 2s]
Jobs: 8 (f=8): [m(8)][8.9%][r=839MiB/s,w=826MiB/s][r=3355,w=3304 IOPS][eta 00m:4 1s]
Jobs: 8 (f=8): [m(8)][11.1%][r=964MiB/s,w=958MiB/s][r=3857,w=3831 IOPS][eta 00m: 40s]
Jobs: 8 (f=8): [m(8)][13.3%][r=907MiB/s,w=913MiB/s][r=3627,w=3653 IOPS][eta 00m: 39s]
Jobs: 8 (f=8): [m(8)][15.6%][r=917MiB/s,w=948MiB/s][r=3669,w=3791 IOPS][eta 00m: 38s]
Jobs: 8 (f=8): [m(8)][17.8%][r=1002MiB/s,w=1001MiB/s][r=4009,w=4004 IOPS][eta 00 m:37s]
Jobs: 8 (f=8): [m(8)][20.0%][r=834MiB/s,w=861MiB/s][r=3335,w=3444 IOPS][eta 00m: 36s]
Jobs: 8 (f=8): [m(8)][22.2%][r=991MiB/s,w=1007MiB/s][r=3964,w=4029 IOPS][eta 00m :35s]
Jobs: 8 (f=8): [m(8)][24.4%][r=931MiB/s,w=894MiB/s][r=3724,w=3577 IOPS][eta 00m: 34s]
Jobs: 8 (f=8): [m(8)][26.7%][r=1002MiB/s,w=982MiB/s][r=4009,w=3926 IOPS][eta 00m :33s]
Jobs: 8 (f=8): [m(8)][28.9%][r=919MiB/s,w=934MiB/s][r=3676,w=3736 IOPS][eta 00m: 32s]
Jobs: 8 (f=8): [m(8)][31.1%][r=983MiB/s,w=960MiB/s][r=3932,w=3841 IOPS][eta 00m: 31s]
Jobs: 8 (f=8): [m(8)][33.3%][r=1049MiB/s,w=1016MiB/s][r=4196,w=4063 IOPS][eta 00 m:30s]
Jobs: 8 (f=8): [m(8)][35.6%][r=824MiB/s,w=834MiB/s][r=3297,w=3335 IOPS][eta 00m: 29s]
Jobs: 8 (f=8): [m(8)][37.8%][r=950MiB/s,w=930MiB/s][r=3800,w=3721 IOPS][eta 00m: 28s]
Jobs: 8 (f=8): [m(8)][40.0%][r=855MiB/s,w=859MiB/s][r=3418,w=3437 IOPS][eta 00m: 27s]
Jobs: 8 (f=8): [m(8)][42.2%][r=931MiB/s,w=892MiB/s][r=3723,w=3567 IOPS][eta 00m: 26s]
Jobs: 8 (f=8): [m(8)][44.4%][r=959MiB/s,w=963MiB/s][r=3836,w=3850 IOPS][eta 00m: 25s]
Jobs: 8 (f=8): [m(8)][46.7%][r=829MiB/s,w=821MiB/s][r=3315,w=3285 IOPS][eta 00m: 24s]
Jobs: 8 (f=8): [m(8)][48.9%][r=853MiB/s,w=837MiB/s][r=3412,w=3346 IOPS][eta 00m: 23s]
Jobs: 8 (f=8): [m(8)][51.1%][r=553MiB/s,w=576MiB/s][r=2210,w=2305 IOPS][eta 00m: 22s]
Jobs: 8 (f=8): [m(8)][53.3%][r=625MiB/s,w=621MiB/s][r=2498,w=2485 IOPS][eta 00m: 21s]
Jobs: 8 (f=8): [m(8)][55.6%][r=905MiB/s,w=935MiB/s][r=3620,w=3739 IOPS][eta 00m: 20s]
Jobs: 8 (f=8): [m(8)][57.8%][r=978MiB/s,w=970MiB/s][r=3911,w=3881 IOPS][eta 00m: 19s]
Jobs: 8 (f=8): [m(8)][60.0%][r=880MiB/s,w=863MiB/s][r=3520,w=3450 IOPS][eta 00m: 18s]
Jobs: 8 (f=8): [m(8)][62.2%][r=954MiB/s,w=976MiB/s][r=3816,w=3903 IOPS][eta 00m: 17s]
Jobs: 8 (f=8): [m(8)][64.4%][r=890MiB/s,w=900MiB/s][r=3561,w=3601 IOPS][eta 00m: 16s]
Jobs: 8 (f=8): [m(8)][66.7%][r=971MiB/s,w=950MiB/s][r=3885,w=3800 IOPS][eta 00m: 15s]
Jobs: 8 (f=8): [m(8)][68.9%][r=967MiB/s,w=957MiB/s][r=3868,w=3827 IOPS][eta 00m: 14s]
Jobs: 8 (f=8): [m(8)][71.1%][r=850MiB/s,w=853MiB/s][r=3401,w=3411 IOPS][eta 00m: 13s]
Jobs: 8 (f=8): [m(8)][73.3%][r=921MiB/s,w=922MiB/s][r=3682,w=3688 IOPS][eta 00m: 12s]
Jobs: 8 (f=8): [m(8)][75.6%][r=889MiB/s,w=883MiB/s][r=3555,w=3532 IOPS][eta 00m: 11s]
Jobs: 8 (f=8): [m(8)][77.8%][r=941MiB/s,w=949MiB/s][r=3764,w=3797 IOPS][eta 00m: 10s]
Jobs: 8 (f=8): [m(8)][80.0%][r=964MiB/s,w=989MiB/s][r=3854,w=3954 IOPS][eta 00m: 09s]
Jobs: 8 (f=8): [m(8)][82.2%][r=877MiB/s,w=880MiB/s][r=3508,w=3520 IOPS][eta 00m: 08s]
Jobs: 8 (f=8): [m(8)][84.4%][r=956MiB/s,w=972MiB/s][r=3824,w=3886 IOPS][eta 00m: 07s]
Jobs: 8 (f=8): [m(8)][86.7%][r=879MiB/s,w=877MiB/s][r=3517,w=3506 IOPS][eta 00m: 06s]
Jobs: 8 (f=8): [m(8)][88.9%][r=950MiB/s,w=975MiB/s][r=3798,w=3901 IOPS][eta 00m: 05s]
Jobs: 8 (f=8): [m(8)][91.1%][r=1015MiB/s,w=984MiB/s][r=4058,w=3935 IOPS][eta 00m :04s]
Jobs: 8 (f=8): [m(8)][93.3%][r=853MiB/s,w=838MiB/s][r=3410,w=3353 IOPS][eta 00m: 03s]
Jobs: 8 (f=8): [m(8)][95.6%][r=923MiB/s,w=918MiB/s][r=3690,w=3670 IOPS][eta 00m: 02s]
Jobs: 8 (f=8): [m(8)][97.8%][r=853MiB/s,w=866MiB/s][r=3410,w=3463 IOPS][eta 00m: 01s]
Jobs: 8 (f=8): [m(8)][100.0%][r=955MiB/s,w=940MiB/s][r=3819,w=3760 IOPS][eta 00m :00s]
random-read: (groupid=0, jobs=8): err= 0: pid=53561: Sat May 4 10:00:09 2024
read: IOPS=3637, BW=909MiB/s (954MB/s)(40.0GiB/45073msec)
slat (nsec): min=455, max=12061k, avg=3598.22, stdev=70556.96
clat (usec): min=107, max=3383.8k, avg=68959.25, stdev=43918.22
lat (usec): min=379, max=3383.8k, avg=68962.85, stdev=43915.91
clat percentiles (msec):
| 1.00th=[ 30], 5.00th=[ 43], 10.00th=[ 47], 20.00th=[ 54],
| 30.00th=[ 57], 40.00th=[ 59], 50.00th=[ 61], 60.00th=[ 65],
| 70.00th=[ 70], 80.00th=[ 74], 90.00th=[ 86], 95.00th=[ 123],
| 99.00th=[ 213], 99.50th=[ 253], 99.90th=[ 498], 99.95th=[ 735],
| 99.99th=[ 1318]
bw ( KiB/s): min=525144, max=1204546, per=100.00%, avg=931932.30, stdev=1607 3.91, samples=704
iops : min= 2048, max= 4703, avg=3636.75, stdev=62.80, samples=704
write: IOPS=3638, BW=910MiB/s (954MB/s)(40.0GiB/45073msec); 0 zone resets
slat (nsec): min=1533, max=31224k, avg=27789.67, stdev=206370.74
clat (usec): min=105, max=3305.3k, avg=71049.60, stdev=44811.65
lat (usec): min=537, max=3305.3k, avg=71077.39, stdev=44807.35
clat percentiles (msec):
| 1.00th=[ 31], 5.00th=[ 44], 10.00th=[ 48], 20.00th=[ 55],
| 30.00th=[ 57], 40.00th=[ 60], 50.00th=[ 63], 60.00th=[ 68],
| 70.00th=[ 71], 80.00th=[ 75], 90.00th=[ 90], 95.00th=[ 131],
| 99.00th=[ 218], 99.50th=[ 268], 99.90th=[ 550], 99.95th=[ 802],
| 99.99th=[ 1250]
bw ( KiB/s): min=557144, max=1192748, per=100.00%, avg=932302.81, stdev=1566 1.49, samples=704
iops : min= 2172, max= 4656, avg=3638.10, stdev=61.19, samples=704
lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.02%, 4=0.05%, 10=0.14%, 20=0.22%, 50=13.12%
lat (msec) : 100=79.06%, 250=6.81%, 500=0.46%, 750=0.06%, 1000=0.02%
lat (msec) : 2000=0.02%, >=2000=0.01%
cpu : usr=0.90%, sys=0.41%, ctx=129219, majf=0, minf=8
IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=1.2%, 16=8.7%, 32=81.3%, >=64=8.6%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.6%, 8=1.2%, 16=1.3%, 32=1.1%, 64=0.8%, >=64=0.0%
issued rwts: total=163958,164006,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
READ: bw=909MiB/s (954MB/s), 909MiB/s-909MiB/s (954MB/s-954MB/s), io=40.0GiB (43.0GB), run=45073-45073msec
WRITE: bw=910MiB/s (954MB/s), 910MiB/s-910MiB/s (954MB/s-954MB/s), io=40.0GiB (43.0GB), run=45073-45073msec

A vm on the 5317 system (ESX8 AIO build, TNC as a VM, so basically VM to VM networking only. Note this heavily benefits from the TNC box’s memory due to limited test size (80G total only)

READ: bw=981MiB/s (1028MB/s) WRITE: bw=981MiB/s (1029MB/s), 981MiB/s-981MiB/s

fiobuntu:/mnt/sdb# fio --name=random-read --direct=1 --rw=randrw --bs=256k --ioengine=libaio --iodepth=64 --runtime=45 --numjobs=8 --time_based --group_reporting --eta-newline=1 --end_fsync=1 --size=10G
random-read: (g=0): rw=randrw, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=64
…
fio-3.28
Starting 8 processes
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
Jobs: 8 (f=8): [m(8)][8.7%][r=1194MiB/s,w=1188MiB/s][r=4776,w=4752 IOPS][eta 00m:42s]
Jobs: 8 (f=8): [m(8)][13.0%][r=1194MiB/s,w=1195MiB/s][r=4776,w=4780 IOPS][eta 00m:40s]
Jobs: 8 (f=8): [m(8)][17.4%][r=989MiB/s,w=1012MiB/s][r=3957,w=4046 IOPS][eta 00m:38s]
Jobs: 8 (f=8): [m(8)][21.7%][r=1059MiB/s,w=1038MiB/s][r=4237,w=4151 IOPS][eta 00m:36s]
Jobs: 8 (f=8): [m(8)][26.1%][r=929MiB/s,w=913MiB/s][r=3715,w=3650 IOPS][eta 00m:34s]
Jobs: 8 (f=8): [m(8)][30.4%][r=966MiB/s,w=960MiB/s][r=3863,w=3841 IOPS][eta 00m:32s]
Jobs: 8 (f=8): [m(8)][34.8%][r=1020MiB/s,w=1008MiB/s][r=4078,w=4031 IOPS][eta 00m:30s]
Jobs: 8 (f=8): [m(8)][39.1%][r=1055MiB/s,w=1016MiB/s][r=4221,w=4064 IOPS][eta 00m:28s]
Jobs: 8 (f=8): [m(8)][43.5%][r=1013MiB/s,w=1031MiB/s][r=4052,w=4122 IOPS][eta 00m:26s]
Jobs: 8 (f=8): [m(8)][47.8%][r=1029MiB/s,w=1058MiB/s][r=4114,w=4230 IOPS][eta 00m:24s]
Jobs: 8 (f=8): [m(8)][52.2%][r=948MiB/s,w=907MiB/s][r=3793,w=3628 IOPS][eta 00m:22s]
Jobs: 8 (f=8): [m(8)][57.8%][r=874MiB/s,w=899MiB/s][r=3497,w=3595 IOPS][eta 00m:19s]
Jobs: 8 (f=8): [m(8)][60.9%][r=993MiB/s,w=989MiB/s][r=3973,w=3955 IOPS][eta 00m:18s]
Jobs: 8 (f=8): [m(8)][65.2%][r=1038MiB/s,w=1026MiB/s][r=4151,w=4105 IOPS][eta 00m:16s]
Jobs: 8 (f=8): [m(8)][69.6%][r=971MiB/s,w=967MiB/s][r=3883,w=3869 IOPS][eta 00m:14s]
Jobs: 8 (f=8): [m(8)][73.9%][r=829MiB/s,w=825MiB/s][r=3315,w=3301 IOPS][eta 00m:12s]
Jobs: 8 (f=8): [m(8)][80.0%][r=911MiB/s,w=905MiB/s][r=3642,w=3621 IOPS][eta 00m:09s]
Jobs: 8 (f=8): [m(8)][82.6%][r=901MiB/s,w=895MiB/s][r=3602,w=3581 IOPS][eta 00m:08s]
Jobs: 8 (f=8): [m(8)][88.9%][r=1058MiB/s,w=1032MiB/s][r=4231,w=4126 IOPS][eta 00m:05s]
Jobs: 8 (f=8): [m(8)][91.3%][r=1067MiB/s,w=1039MiB/s][r=4267,w=4157 IOPS][eta 00m:04s]
Jobs: 8 (f=8): [m(8)][97.8%][r=969MiB/s,w=991MiB/s][r=3875,w=3963 IOPS][eta 00m:01s]
Jobs: 8 (f=8): [m(8)][100.0%][r=938MiB/s,w=920MiB/s][r=3753,w=3679 IOPS][eta 00m:00s]
random-read: (groupid=0, jobs=8): err= 0: pid=1312: Sat May 4 08:19:56 2024
read: IOPS=3922, BW=981MiB/s (1028MB/s)(43.2GiB/45092msec)
slat (usec): min=5, max=184094, avg=905.55, stdev=3506.97
clat (usec): min=848, max=230061, avg=42303.05, stdev=23675.01
lat (usec): min=1085, max=230126, avg=43208.98, stdev=23960.42
clat percentiles (msec):
| 1.00th=[ 9], 5.00th=[ 15], 10.00th=[ 20], 20.00th=[ 26],
| 30.00th=[ 31], 40.00th=[ 36], 50.00th=[ 40], 60.00th=[ 44],
| 70.00th=[ 49], 80.00th=[ 55], 90.00th=[ 64], 95.00th=[ 73],
| 99.00th=[ 163], 99.50th=[ 178], 99.90th=[ 207], 99.95th=[ 213],
| 99.99th=[ 222]
bw ( KiB/s): min=577536, max=1397183, per=100.00%, avg=1006226.64, stdev=18481.59, samples=712
iops : min= 2256, max= 5456, avg=3930.31, stdev=72.17, samples=712
write: IOPS=3923, BW=981MiB/s (1029MB/s)(43.2GiB/45092msec); 0 zone resets
slat (usec): min=7, max=168675, avg=1093.04, stdev=3921.09
clat (usec): min=1440, max=696119, avg=85996.22, stdev=41493.09
lat (msec): min=2, max=696, avg=87.09, stdev=41.66
clat percentiles (msec):
| 1.00th=[ 21], 5.00th=[ 34], 10.00th=[ 43], 20.00th=[ 54],
| 30.00th=[ 64], 40.00th=[ 72], 50.00th=[ 82], 60.00th=[ 90],
| 70.00th=[ 100], 80.00th=[ 111], 90.00th=[ 129], 95.00th=[ 155],
| 99.00th=[ 236], 99.50th=[ 255], 99.90th=[ 347], 99.95th=[ 388],
| 99.99th=[ 485]
bw ( KiB/s): min=632320, max=1429113, per=100.00%, avg=1005733.66, stdev=18055.90, samples=712
iops : min= 2470, max= 5582, avg=3928.31, stdev=70.51, samples=712
lat (usec) : 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.99%, 20=5.16%, 50=38.29%
lat (msec) : 100=39.86%, 250=15.39%, 500=0.29%, 750=0.01%
cpu : usr=0.86%, sys=1.57%, ctx=88331, majf=0, minf=122
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=176894,176928,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
READ: bw=981MiB/s (1028MB/s), 981MiB/s-981MiB/s (1028MB/s-1028MB/s), io=43.2GiB (46.4GB), run=45092-45092msec
WRITE: bw=981MiB/s (1029MB/s), 981MiB/s-981MiB/s (1029MB/s-1029MB/s), io=43.2GiB (46.4GB), run=45092-45092msec

Lobanz · May 4, 2024, 1:22pm

So, I think I figured out the iperf issue. Thankfully, it makes total sense.

BLUF: When iperf is just sending data and not transferring files, NO DISKS ARE INVOLVED!!! DUH!

So, the two hosts are now both running truenas. I figured that plain vanilla ubuntu server might not be as finely tuned to this stuff as truenas would be. Dunno. Using truenas on both sides DID seems to make things a bit better.

One server is named nas-1 and the other vm-1. Both are Dell R720s with 256G RAM. No ZFS pools involved except their mirrored boot drives (SSD on nas-1, HDD on vm-1).

So here is a plain iperf from vm-1 to nas-1:

root@vm-1[~]#  iperf3 -t 20 -c nas-1
Connecting to host nas-1, port 5201
[  5] local 10.8.8.11 port 58576 connected to 10.8.8.10 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.16 GBytes  9.92 Gbits/sec   44   1.55 MBytes
[  5]   1.00-2.00   sec  1.15 GBytes  9.90 Gbits/sec    1   1.55 MBytes
[  5]   2.00-3.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.57 MBytes
[  5]   3.00-4.00   sec  1.15 GBytes  9.90 Gbits/sec    1   1.57 MBytes
[  5]   4.00-5.00   sec  1.15 GBytes  9.91 Gbits/sec    1   1.57 MBytes
[  5]   5.00-6.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.59 MBytes
[  5]   6.00-7.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.60 MBytes
[  5]   7.00-8.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.66 MBytes
[  5]   8.00-9.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.66 MBytes
[  5]   9.00-10.00  sec  1.15 GBytes  9.90 Gbits/sec    0   1.91 MBytes
[  5]  10.00-11.00  sec  1.15 GBytes  9.90 Gbits/sec    0   1.91 MBytes
[  5]  11.00-12.00  sec  1.15 GBytes  9.91 Gbits/sec    0   1.91 MBytes
[  5]  12.00-13.00  sec  1.15 GBytes  9.90 Gbits/sec    1   1.91 MBytes
[  5]  13.00-14.00  sec  1.15 GBytes  9.90 Gbits/sec    0   1.91 MBytes
[  5]  14.00-15.00  sec  1.15 GBytes  9.90 Gbits/sec    0   1.91 MBytes
[  5]  15.00-16.00  sec  1.15 GBytes  9.90 Gbits/sec    0   1.91 MBytes
[  5]  16.00-17.00  sec  1.15 GBytes  9.90 Gbits/sec    1   1.91 MBytes
[  5]  17.00-18.00  sec  1.15 GBytes  9.91 Gbits/sec    0   1.91 MBytes
[  5]  18.00-19.00  sec  1.15 GBytes  9.90 Gbits/sec    0   1.91 MBytes
[  5]  19.00-20.00  sec  1.15 GBytes  9.90 Gbits/sec    0   1.91 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-20.00  sec  23.1 GBytes  9.90 Gbits/sec   49             sender
[  5]   0.00-20.00  sec  23.1 GBytes  9.90 Gbits/sec                  receiver

Basically wirespeed.

So here it is when I transfer a 10 GB file AND write it to disk on the other side.

On nas-1:

root@nas-1[~]# iperf3  -F /root/10gfile.iperf2 -s

On vm-1:

root@vm-1[~]#  iperf3  -F /root/10gfile.iperf -t 20 -c nas-1
Connecting to host nas-1, port 5201
[  5] local 10.8.8.11 port 47482 connected to 10.8.8.10 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.08 GBytes  9.27 Gbits/sec   22   1.57 MBytes
[  5]   1.00-2.00   sec   962 MBytes  8.07 Gbits/sec    1   1.57 MBytes
[  5]   2.00-3.00   sec   995 MBytes  8.35 Gbits/sec    0   1.57 MBytes
[  5]   3.00-4.00   sec   680 MBytes  5.70 Gbits/sec    2   1.57 MBytes
[  5]   4.00-5.00   sec   405 MBytes  3.39 Gbits/sec    0   1.57 MBytes
[  5]   5.00-6.00   sec   389 MBytes  3.27 Gbits/sec    0   1.57 MBytes
[  5]   6.00-7.00   sec   388 MBytes  3.25 Gbits/sec    0   1.57 MBytes
[  5]   7.00-8.00   sec   380 MBytes  3.19 Gbits/sec    0   1.57 MBytes
[  5]   8.00-9.00   sec   372 MBytes  3.12 Gbits/sec    0   1.57 MBytes
[  5]   9.00-10.00  sec   415 MBytes  3.48 Gbits/sec    0   1.57 MBytes
[  5]  10.00-11.00  sec   401 MBytes  3.37 Gbits/sec    0   1.57 MBytes
[  5]  11.00-12.00  sec   349 MBytes  2.93 Gbits/sec    0   1.57 MBytes
[  5]  12.00-13.00  sec   354 MBytes  2.97 Gbits/sec    0   1.57 MBytes
[  5]  13.00-14.00  sec   365 MBytes  3.06 Gbits/sec    0   1.57 MBytes
[  5]  14.00-15.00  sec   365 MBytes  3.06 Gbits/sec    0   1.57 MBytes
[  5]  15.00-16.00  sec   365 MBytes  3.06 Gbits/sec    0   1.57 MBytes
[  5]  16.00-17.00  sec   382 MBytes  3.21 Gbits/sec    0   1.57 MBytes
[  5]  17.00-18.00  sec   379 MBytes  3.18 Gbits/sec    0   1.57 MBytes
[  5]  18.00-19.00  sec   319 MBytes  2.68 Gbits/sec    0   1.57 MBytes
[  5]  19.00-20.00  sec   299 MBytes  2.50 Gbits/sec    0   1.57 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-20.00  sec  9.44 GBytes  4.06 Gbits/sec   25             sender
        Sent 9.44 GByte / 10.0 GByte (94%) of /root/10gfile.iperf
[  5]   0.00-20.00  sec  9.44 GBytes  4.05 Gbits/sec                  receiver

THAT’s the slowdown I was seeing, although not as bad as with ubuntu in the mix.

But it changes if I don’t write it to the disk on the iperf server side:

On nas-1:

root@nas-1[~]# iperf3 -s

On vm-1:

root@vm-1[~]#  iperf3  -F /root/10gfile.iperf -t 20 -c nas-1
Connecting to host nas-1, port 5201
[  5] local 10.8.8.11 port 46922 connected to 10.8.8.10 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   975 MBytes  8.17 Gbits/sec   45   1.56 MBytes
[  5]   1.00-2.00   sec  1.07 GBytes  9.22 Gbits/sec    2   1.56 MBytes
[  5]   2.00-3.00   sec   975 MBytes  8.18 Gbits/sec    0   1.56 MBytes
[  5]   3.00-4.00   sec  1.03 GBytes  8.85 Gbits/sec    0   1.61 MBytes
[  5]   4.00-5.00   sec  1.07 GBytes  9.21 Gbits/sec    0   1.61 MBytes
[  5]   5.00-6.00   sec   974 MBytes  8.16 Gbits/sec    1   1.61 MBytes
[  5]   6.00-7.00   sec   972 MBytes  8.16 Gbits/sec    2   1.61 MBytes
[  5]   7.00-8.00   sec   976 MBytes  8.18 Gbits/sec    0   1.61 MBytes
[  5]   8.00-9.00   sec  1.03 GBytes  8.85 Gbits/sec    0   1.61 MBytes
[  5]   9.00-9.95   sec  1.04 GBytes  9.34 Gbits/sec    1   1.61 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-9.95   sec  10.0 GBytes  8.63 Gbits/sec   51             sender
        Sent 10.0 GByte / 10.0 GByte (100%) of /root/10gfile.iperf
[  5]   0.00-9.95   sec  10.0 GBytes  8.63 Gbits/sec                  receiver

Almost wirespeed, but not quite I assume because it has to read from the disk on vm-1 (vm-1 has a 4TB SAS HDD mirror boot drive).

So, then I wanted to make sure that it wasnt the reading or writing, so I created ram disks on both machines and used that to read and write from.

mkdir /tmp/ramdisk
chmod 777 /tmp/ramdisk
mount -t tmpfs -o size=20G myramdisk /tmp/ramdisk

On nas-1:

 iperf3  -F /tmp/ramdisk/10gfile.iperf2 -s

On vm-1:

root@vm-1[~]#  iperf3  -F /tmp/ramdisk/10gfile.iperf -t 20 -c nas-1
Connecting to host nas-1, port 5201
[  5] local 10.8.8.11 port 41210 connected to 10.8.8.10 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.09 GBytes  9.33 Gbits/sec   20   1.54 MBytes
[  5]   1.00-2.00   sec  1.12 GBytes  9.62 Gbits/sec    1   1.54 MBytes
[  5]   2.00-3.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.56 MBytes
[  5]   3.00-4.00   sec  1.15 GBytes  9.90 Gbits/sec    5   1.56 MBytes
[  5]   4.00-5.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.56 MBytes
[  5]   5.00-6.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.60 MBytes
[  5]   6.00-7.00   sec  1.15 GBytes  9.90 Gbits/sec    5   1.60 MBytes
[  5]   7.00-8.00   sec  1.15 GBytes  9.90 Gbits/sec    1   1.60 MBytes
[  5]   8.00-8.95   sec   899 MBytes  7.97 Gbits/sec    0   1.60 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-8.95   sec  10.0 GBytes  9.60 Gbits/sec   32             sender
        Sent 10.0 GByte / 10.0 GByte (100%) of /tmp/ramdisk/10gfile.iperf
[  5]   0.00-8.95   sec  10.0 GBytes  9.60 Gbits/sec                  receiver

So, basically a wirespeed transfer.

I think I’m happy with the networking now. At least it’s making sense.

Now I can move back to NFS performance testing.

Lobanz · May 4, 2024, 1:28pm

Yeah. I’m a newb when it comes to fio and this storage stuff in genderal. I just grabbed an example fio command line from and Oracle site. I think they were testing database storage, which is probly totally different set of requirements than vms.

Definately eager for suggestions of a test to simulate xcp-ng VM storage over NFS.

Rand · May 4, 2024, 6:10pm

No idea how xcp-ng does things, but VMWare uses 64K blocks…

I’d use that with a qd of maybe 4 and 16 jobs to simulate 16 vms.

Check with a read write ratio of maybe 70:30 (or vice versa depending on how much activity there really is on the VMs).

You can also try to see the difference between random and streaming activity to get a feel for it.

But the best test is to simply move your VMs on it and just see how it works. At least a few test VMs. Just make sure you can go back;)

Lobanz · May 4, 2024, 7:06pm

Thanks. Definately doing more tests. Waiting for some large files to copy on the servers being tested, but but I’ll post more results. Looks like I get the most IIOPS and throughput (bandwidth) from bs=3k. I started at 128k and kept halving the block size until IIOPS and bandwidth suffered. The number of jobs (tried 8 and 16) changed the bandwidth and IIOPS but didnt seem to move the 3k sweet spot. Need to be more scientific about it tho.

I don’t understand iodepth enough to specify a value so I just took it off. Maybe that makes my results useless, I dunno. Something about having a bunch of io transations out there so the OS can pick the most convenient one. Never seen an iodepth setting outside of benchmarking software so I dont know if its something that I can control.

Rand · May 4, 2024, 9:03pm

iodepth is the number of operations that are waiting to be processed by a disk.

You basically get a stack of papers to read instead of one after reducing overhead and preventing a disk from idleing since the scheduler has not appointed new work

For maxing out disks its a good thing, but not necessarily for having an accurate representation of your workload.

3k or 4k? 3k=3172 bytes would be a very odd peak.

No idea if ashift is still a thing nowadays, having the proper value was all the rage a couple of years ago, but have not followed things. Maybe some of the other guys has more recent info on that. Just saying due to this weird peak performance blocksize…