Hmmmm. Not following the relationship of PLP with SLOG write speed. We’re talking about PLP = Power Loss Protection, right?
Correct. When a sync write comes into ZFS, ZFS won’t reply until the underlying device has committed the write to non-volatile storage - enforced by sending the SCSI SYNCHRONIZE_CACHE
command to the SLOG device.
A consumer SSD without PLP - and a volatile write cache - will get this command, and have to actually program the NAND cells with the requested data before responding OK
.
An enterprise SSD with PLP knows that its onboard supercapacitors have enough power to flush its volatile cache to NAND, so it will simply reply OK
immediately, and begin programming the cells - even if power is cut immediately after the OK
is sent, there’s enough power stored in the drive itself to complete the programming process.
The difference between the two can be immense - we’re talking about the difference between “hundreds of megabytes per second” and “single digits, maybe a dozen MB/s” at the smaller record sizes.
OK. In my testing, I’m seeing hundreds of MB/sec (~350 MB/s?) from my Optane 900p without “real” PLP.
Optane drives hold so little in-flight data that they all have PLP in practice although only the more expensive “DC” variants are offcially specified with the feature. For home use, consumer Optane 900p/905p drives do qualify as a valid, PLP-enabled, SLOG. For business use, play by the professional book and get a DC.
That is my understanding too. Sounds like the PLP rating on the commercial drives are mostly about compliance, but I’m sure the “real” PLP does do something to increase the PLP some. Whether it’s a practical amount of additional protection is questionable. I just don’t know but there does seem to be a substantial difference between a 900p and a standard SSD, with the 900p substantially closer to the enterprise PLP end of the spectrum.
OK. Getting some test results from these two servers. Wanting to sanity check these results. Particularly the NFS performance which seems abysmal.
Truenas Server Specs:
- Dell X720xd
- 256G ECC RAM
- 2x E5-2680 V2 2.8GHz 20/40 cores/threads total
- LSI SAS2308 (Dell H710p Mini Mono in IT mode) running at PCIe 3.0 speeds (x8 I think)
- 12x HGST 4TB 7.2k SAS, 512 native sectors
- Intel Optane 900p PCIe card
- Intel X520 dual port 10G NIC
- Intel I350 dual port 1G NIC
- TrueNAS DragonFish 24.04
- ZFS pool is 2x 6-drive raidz2 vdevs with Optain SLOG.
Test Client Specs:
- Dell X720
- 256G ECC RAM
- 2x E5-2640 v2 2Ghz 16/32 cores/threads total
- LSI SAS2308 (Dell H710 Mini Mono in IT mode) running at PCIe 3.0 speeds (x8 I think)
- 8x HGST 4TB 7.2k SAS, 512 native sectors (not used), boots of a couple of 256G software raided SATA m.2 SSDs
- Intel X520 dual port 10G NIC
- Intel I350 dual port 1G NIC
- Ubuntu Server 22.04.4
The servers are connected via a 10G LAG/bond that reliably iperfs at about 19.5 Gbps throughput. No switch. Just DAC cables. MTU 9000 on everything and that seems stable and did increased the iperf scores a bit. Tried 1500 and 9000 and no discernable difference in NFS performance. Pretty happy with the networking. Although throughput does fluctuate sometimes.
So, one would think that NFS performance would be somewhat close to what I see on the server itself.
On the pool, I made 3 datasets with the 3 sync levels:
- standard: sync=standard,
- sync: sync=always, and
- async: sync=disabled.
Then I ran fio for random read/write tests in each of those datasets. Here’s the fio command I used:
fio --name=random-read --direct=1 --rw=randrw --bs=256k --ioengine=libaio --iodepth=64 --runtime=45 --numjobs=8 --time_based --group_reporting --eta-newline=1 --end_fsync=1
Here’s a summary of the results:
- standard:
- read: IOPS=2551, BW=638MiB/s (669MB/s)
- write: IOPS=2549, BW=637MiB/s (668MB/s)
- sync:
- read: IOPS=796, BW=199MiB/s (209MB/s)
- write: IOPS=807, BW=202MiB/s (212MB/s)
- async:
- read: IOPS=4596, BW=1149MiB/s (1205MB/s)
- write: IOPS=4589, BW=1147MiB/s (1203MB/s)
Here’s the summary of same fio command run from the NFS client against the sync=standard dataset acroess that 2x10G bonded link:
- nfs:
- read: IOPS=409, BW=102MiB/s (107MB/s)
- write: IOPS=416, BW=104MiB/s (109MB/s)
- nfs mount command:
mount -t nfs4 -o proto=tcp,hard,intr,rw,noatime 10.8.8.10:/mnt/pool1 /mnt/pool1
Doesn’t the NFS result seem pretty lame? Less than 20% of what I saw on the server! Definately doesnt seem like network or ZFS performance. Feels like something on the client side. Tried various NFS rsize/wsize on the mount command.
I also did a local rsync on the client of a 10 GB file from local file system to the NFS mounted standard dataset and get about 180 MB/s. During the transfer it’s reporting ~560 MB/s, but I guess it has to sync at the end. Of course this is a synchronous write. But 180 MB/s still seems lame. Feels like some kind of client NFS bottleneck because iperf is great and server ZFS performancs is much better.
UPDATE: I scp’d the 10 GB from client to server and only got about 115 MB/s. Slow!
UPDATE 2: I transfered the 10g file with iperf and it starts out fast, but degrades. I’ve seen that in my fio nfs testing too.
root@vm-1:~# iperf3 -F ./10gfile -c 10.8.8.10
Connecting to host 10.8.8.10, port 5201
[ 5] local 10.8.8.11 port 38384 connected to 10.8.8.10 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.08 GBytes 9.28 Gbits/sec 0 1.71 MBytes
[ 5] 1.00-2.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.71 MBytes
[ 5] 2.00-3.00 sec 929 MBytes 7.79 Gbits/sec 0 1.79 MBytes
[ 5] 3.00-4.00 sec 421 MBytes 3.53 Gbits/sec 0 1.79 MBytes
[ 5] 4.00-5.00 sec 371 MBytes 3.12 Gbits/sec 0 1.79 MBytes
[ 5] 5.00-6.00 sec 368 MBytes 3.08 Gbits/sec 0 1.79 MBytes
[ 5] 6.00-7.00 sec 361 MBytes 3.03 Gbits/sec 0 1.79 MBytes
[ 5] 7.00-8.00 sec 355 MBytes 2.98 Gbits/sec 0 1.79 MBytes
[ 5] 8.00-9.00 sec 348 MBytes 2.91 Gbits/sec 0 1.79 MBytes
[ 5] 9.00-10.00 sec 348 MBytes 2.92 Gbits/sec 0 1.79 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 5.65 GBytes 4.85 Gbits/sec 0 sender
Sent 5.65 GByte / 10.0 GByte (56%) of ./10gfile
[ 5] 0.00-10.00 sec 5.64 GBytes 4.85 Gbits/sec receiver
But something’s rotten in Denmark. Here’s the other direction:
root@truenas[~]# iperf3 -F ./10gfile.iperf -c 10.8.8.11
Connecting to host 10.8.8.11, port 5201
[ 5] local 10.8.8.10 port 59954 connected to 10.8.8.11 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 57.9 MBytes 485 Mbits/sec 1 1.09 MBytes
[ 5] 1.00-2.00 sec 53.8 MBytes 451 Mbits/sec 2 1.09 MBytes
[ 5] 2.00-3.00 sec 52.5 MBytes 440 Mbits/sec 1 1.09 MBytes
[ 5] 3.00-4.00 sec 53.8 MBytes 451 Mbits/sec 0 1.09 MBytes
[ 5] 4.00-5.00 sec 52.5 MBytes 440 Mbits/sec 1 1.09 MBytes
[ 5] 5.00-6.00 sec 51.2 MBytes 430 Mbits/sec 1 1.09 MBytes
[ 5] 6.00-7.00 sec 51.2 MBytes 430 Mbits/sec 0 1.09 MBytes
[ 5] 7.00-8.00 sec 47.5 MBytes 398 Mbits/sec 2 1.09 MBytes
[ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 4 559 KBytes
[ 5] 9.00-10.00 sec 1.25 MBytes 10.5 Mbits/sec 2 559 KBytes
[ 5] 10.00-11.00 sec 0.00 Bytes 0.00 bits/sec 3 271 KBytes
[ 5] 11.00-12.00 sec 0.00 Bytes 0.00 bits/sec 2 419 KBytes
[ 5] 12.00-13.00 sec 1.25 MBytes 10.5 Mbits/sec 2 419 KBytes
[ 5] 13.00-14.00 sec 0.00 Bytes 0.00 bits/sec 1 419 KBytes
[ 5] 14.00-15.00 sec 0.00 Bytes 0.00 bits/sec 0 210 KBytes
[ 5] 15.00-16.00 sec 0.00 Bytes 0.00 bits/sec 0 184 KBytes
UPDATE 3: got rid of LAG/bond. Just one 10G port now. It iperfs rock solid at 9.9 Gbps but exact same thing happens when I use iperf to transfer file data. I have a pair of different X520 10G cards i could try… Maybe one is bad?
Don’t have a clue. Any idears? Something overheating?
Long fio output follows.
On TrueNAS server, standard dataset sync=standard:
random-read: (groupid=0, jobs=8): err= 0: pid=2425253: Fri May 3 13:19:56 2024
read: IOPS=2551, BW=638MiB/s (669MB/s)(28.1GiB/45061msec)
slat (usec): min=60, max=520533, avg=2879.46, stdev=17868.62
clat (usec): min=5, max=3694.3k, avg=99236.27, stdev=329789.34
lat (usec): min=158, max=3774.5k, avg=102115.74, stdev=339367.08
clat percentiles (msec):
| 1.00th=[ 8], 5.00th=[ 9], 10.00th=[ 9], 20.00th=[ 10],
| 30.00th=[ 11], 40.00th=[ 13], 50.00th=[ 15], 60.00th=[ 17],
| 70.00th=[ 21], 80.00th=[ 27], 90.00th=[ 53], 95.00th=[ 718],
| 99.00th=[ 1871], 99.50th=[ 2198], 99.90th=[ 2802], 99.95th=[ 3071],
| 99.99th=[ 3272]
bw ( KiB/s): min=72841, max=5504143, per=98.53%, avg=643505.08, stdev=105843.06, samples=711
iops : min= 284, max=21497, avg=2513.28, stdev=413.42, samples=711
write: IOPS=2549, BW=637MiB/s (668MB/s)(28.0GiB/45061msec); 0 zone resets
slat (usec): min=81, max=12977, avg=243.28, stdev=362.90
clat (usec): min=5, max=3502.7k, avg=98209.99, stdev=325182.38
lat (usec): min=128, max=3503.1k, avg=98453.27, stdev=325203.24
clat percentiles (msec):
| 1.00th=[ 8], 5.00th=[ 9], 10.00th=[ 9], 20.00th=[ 10],
| 30.00th=[ 11], 40.00th=[ 13], 50.00th=[ 15], 60.00th=[ 17],
| 70.00th=[ 21], 80.00th=[ 27], 90.00th=[ 53], 95.00th=[ 718],
| 99.00th=[ 1871], 99.50th=[ 2165], 99.90th=[ 2735], 99.95th=[ 2970],
| 99.99th=[ 3339]
bw ( KiB/s): min=78485, max=5527477, per=98.64%, avg=643852.04, stdev=105950.41, samples=706
iops : min= 306, max=21588, avg=2514.67, stdev=413.83, samples=706
lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 250=0.01%, 500=0.01%
lat (usec) : 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=20.95%, 20=48.40%, 50=20.26%
lat (msec) : 100=2.39%, 250=0.47%, 500=0.73%, 750=2.03%, 1000=1.54%
lat (msec) : 2000=2.45%, >=2000=0.78%
cpu : usr=0.95%, sys=14.22%, ctx=37616, majf=0, minf=49203
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.8%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=114954,114893,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=638MiB/s (669MB/s), 638MiB/s-638MiB/s (669MB/s-669MB/s), io=28.1GiB (30.1GB), run=45061-45061msec
WRITE: bw=637MiB/s (668MB/s), 637MiB/s-637MiB/s (668MB/s-668MB/s), io=28.0GiB (30.1GB), run=45061-45061msec
On TrueNAS server, sync dataset sync=always:
random-read: (groupid=0, jobs=8): err= 0: pid=2427906: Fri May 3 13:21:20 2024
read: IOPS=796, BW=199MiB/s (209MB/s)(8967MiB/45014msec)
slat (usec): min=80, max=331218, avg=5349.10, stdev=10130.80
clat (usec): min=8, max=972879, avg=312242.91, stdev=92485.24
lat (msec): min=8, max=985, avg=317.59, stdev=94.83
clat percentiles (msec):
| 1.00th=[ 146], 5.00th=[ 203], 10.00th=[ 236], 20.00th=[ 262],
| 30.00th=[ 279], 40.00th=[ 288], 50.00th=[ 300], 60.00th=[ 309],
| 70.00th=[ 321], 80.00th=[ 338], 90.00th=[ 397], 95.00th=[ 472],
| 99.00th=[ 726], 99.50th=[ 776], 99.90th=[ 885], 99.95th=[ 911],
| 99.99th=[ 961]
bw ( KiB/s): min=62783, max=338432, per=99.45%, avg=202845.84, stdev=6429.46, samples=712
iops : min= 245, max= 1322, avg=791.83, stdev=25.09, samples=712
write: IOPS=807, BW=202MiB/s (212MB/s)(9086MiB/45014msec); 0 zone resets
slat (usec): min=702, max=20222, avg=4596.19, stdev=1917.04
clat (usec): min=8, max=978503, avg=313615.08, stdev=93091.16
lat (msec): min=3, max=983, avg=318.21, stdev=92.85
clat percentiles (msec):
| 1.00th=[ 144], 5.00th=[ 207], 10.00th=[ 239], 20.00th=[ 264],
| 30.00th=[ 279], 40.00th=[ 292], 50.00th=[ 300], 60.00th=[ 313],
| 70.00th=[ 321], 80.00th=[ 338], 90.00th=[ 401], 95.00th=[ 481],
| 99.00th=[ 726], 99.50th=[ 785], 99.90th=[ 877], 99.95th=[ 911],
| 99.99th=[ 961]
bw ( KiB/s): min=56832, max=322125, per=99.63%, avg=205913.07, stdev=6192.60, samples=712
iops : min= 222, max= 1258, avg=803.79, stdev=24.18, samples=712
lat (usec) : 10=0.01%, 20=0.01%
lat (msec) : 4=0.01%, 10=0.01%, 20=0.02%, 50=0.07%, 100=0.10%
lat (msec) : 250=14.07%, 500=81.41%, 750=3.58%, 1000=0.73%
cpu : usr=0.54%, sys=7.04%, ctx=110618, majf=0, minf=7705
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=35866,36342,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=199MiB/s (209MB/s), 199MiB/s-199MiB/s (209MB/s-209MB/s), io=8967MiB (9402MB), run=45014-45014msec
WRITE: bw=202MiB/s (212MB/s), 202MiB/s-202MiB/s (212MB/s-212MB/s), io=9086MiB (9527MB), run=45014-45014msec
On TrueNAS server, async dataset sync=disabled:
random-read: (groupid=0, jobs=8): err= 0: pid=2420660: Fri May 3 13:17:34 2024
read: IOPS=4596, BW=1149MiB/s (1205MB/s)(50.5GiB/45013msec)
slat (usec): min=49, max=317261, avg=1588.74, stdev=10755.39
clat (usec): min=4, max=968570, avg=54624.69, stdev=82414.62
lat (usec): min=94, max=982937, avg=56213.43, stdev=84275.58
clat percentiles (msec):
| 1.00th=[ 6], 5.00th=[ 7], 10.00th=[ 7], 20.00th=[ 8],
| 30.00th=[ 10], 40.00th=[ 18], 50.00th=[ 30], 60.00th=[ 42],
| 70.00th=[ 55], 80.00th=[ 72], 90.00th=[ 111], 95.00th=[ 243],
| 99.00th=[ 359], 99.50th=[ 514], 99.90th=[ 760], 99.95th=[ 818],
| 99.99th=[ 919]
bw ( MiB/s): min= 260, max= 3937, per=99.12%, avg=1138.90, stdev=177.27, samples=712
iops : min= 1040, max=15747, avg=4554.35, stdev=709.08, samples=712
write: IOPS=4589, BW=1147MiB/s (1203MB/s)(50.4GiB/45013msec); 0 zone resets
slat (usec): min=48, max=14383, avg=137.98, stdev=95.76
clat (usec): min=4, max=966159, avg=54952.42, stdev=82926.17
lat (usec): min=111, max=966369, avg=55090.40, stdev=82932.57
clat percentiles (msec):
| 1.00th=[ 6], 5.00th=[ 7], 10.00th=[ 7], 20.00th=[ 9],
| 30.00th=[ 10], 40.00th=[ 18], 50.00th=[ 31], 60.00th=[ 42],
| 70.00th=[ 55], 80.00th=[ 72], 90.00th=[ 111], 95.00th=[ 243],
| 99.00th=[ 363], 99.50th=[ 542], 99.90th=[ 768], 99.95th=[ 818],
| 99.99th=[ 877]
bw ( MiB/s): min= 273, max= 3887, per=99.10%, avg=1137.00, stdev=175.85, samples=712
iops : min= 1092, max=15547, avg=4546.76, stdev=703.37, samples=712
lat (usec) : 10=0.01%, 20=0.01%, 100=0.01%, 250=0.01%, 500=0.01%
lat (usec) : 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=31.92%, 20=9.29%, 50=25.30%
lat (msec) : 100=21.64%, 250=7.15%, 500=4.16%, 750=0.40%, 1000=0.12%
cpu : usr=1.83%, sys=15.90%, ctx=15978, majf=0, minf=25909
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=206886,206570,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=1149MiB/s (1205MB/s), 1149MiB/s-1149MiB/s (1205MB/s-1205MB/s), io=50.5GiB (54.2GB), run=45013-45013msec
WRITE: bw=1147MiB/s (1203MB/s), 1147MiB/s-1147MiB/s (1203MB/s-1203MB/s), io=50.4GiB (54.2GB), run=45013-45013msec
On NFS client, standard dataset sync=standard:
random-read: (groupid=0, jobs=8): err= 0: pid=2823: Fri May 3 18:24:34 2024
read: IOPS=409, BW=102MiB/s (107MB/s)(4684MiB/45722msec)
slat (usec): min=22, max=901, avg=71.64, stdev=14.62
clat (msec): min=2, max=1661, avg=648.52, stdev=188.88
lat (msec): min=2, max=1662, avg=648.60, stdev=188.88
clat percentiles (msec):
| 1.00th=[ 161], 5.00th=[ 451], 10.00th=[ 493], 20.00th=[ 531],
| 30.00th=[ 558], 40.00th=[ 584], 50.00th=[ 609], 60.00th=[ 642],
| 70.00th=[ 676], 80.00th=[ 735], 90.00th=[ 944], 95.00th=[ 1045],
| 99.00th=[ 1217], 99.50th=[ 1301], 99.90th=[ 1418], 99.95th=[ 1452],
| 99.99th=[ 1620]
bw ( KiB/s): min=18944, max=246162, per=100.00%, avg=105220.03, stdev=3986.11, samples=717
iops : min= 74, max= 959, avg=410.83, stdev=15.55, samples=717
write: IOPS=416, BW=104MiB/s (109MB/s)(4759MiB/45722msec); 0 zone resets
slat (usec): min=27, max=473, avg=88.65, stdev=15.87
clat (msec): min=4, max=1228, avg=581.48, stdev=164.22
lat (msec): min=4, max=1228, avg=581.57, stdev=164.23
clat percentiles (msec):
| 1.00th=[ 124], 5.00th=[ 430], 10.00th=[ 464], 20.00th=[ 493],
| 30.00th=[ 514], 40.00th=[ 535], 50.00th=[ 550], 60.00th=[ 575],
| 70.00th=[ 600], 80.00th=[ 634], 90.00th=[ 860], 95.00th=[ 978],
| 99.00th=[ 1070], 99.50th=[ 1133], 99.90th=[ 1200], 99.95th=[ 1200],
| 99.99th=[ 1217]
bw ( KiB/s): min=16253, max=260076, per=100.00%, avg=106890.97, stdev=4256.59, samples=718
iops : min= 63, max= 1013, avg=417.34, stdev=16.60, samples=718
lat (msec) : 4=0.01%, 10=0.06%, 20=0.07%, 50=0.19%, 100=0.30%
lat (msec) : 250=1.78%, 500=14.90%, 750=68.13%, 1000=8.89%, 2000=5.67%
cpu : usr=0.29%, sys=1.55%, ctx=37526, majf=0, minf=236
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.2%, 16=0.3%, 32=0.7%, >=64=98.7%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=18736,19035,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=102MiB/s (107MB/s), 102MiB/s-102MiB/s (107MB/s-107MB/s), io=4684MiB (4912MB), run=45722-45722msec
WRITE: bw=104MiB/s (109MB/s), 104MiB/s-104MiB/s (109MB/s-109MB/s), io=4759MiB (4990MB), run=45722-45722msec
I’ve never used Optane drives personally, but the sync performance looks normal to me if I compare it to consumer SSD stuff… well better really, consumer SSD’s can go as low as 200 IOPS or worse. Furthermore, less than 20% sounds about right for fsync performance. That’s also what I’ve observed with my enterprise SSD’s though the base numbers are much higher.
For comparison, here are my numbers for Intel DC S-3500:
fsync/s (always) = 4024.87
fsync/s (disabled) = 28248.21
Notice that even for enterprise level SSD’s, sync writes, in general, incurs a really heavy performance cost. BTW, this is “just a lowly” SATA SSD, which are also at 60%+ wearout. I’d imagine NVme numbers would be more impressive, but I’m too cheap for that. But this is why I always prefer enterprise SATA SSD’s for server workloads over even NVme consumer ones.
Which is also why one should not follow the last piece of “advice” from that proxmox thread and “re-enable drive cache” to magically make sync writes faster.
Is the client ESXi?
To be fair, it does indeed make it faster and to some people, they don’t mind the risk, which is a workable solution depending on your risk tolerance. Personally, I’m not a risk taker, but everyone’s different.
Just disable sync if you don’t want sync.
Of course. That just makes it obvious what you’ve done, as opposed to an invisible per drive setting
Nope. Just Ubuntu 22.04.4. Will be xcp-ng, eventually. Just trying to get the hardware wrung out.
And you’ve tested the network perf with iperf3 between the client and server?
Run iperf3 -s
in the shell to start server on TrueNAS
I think the test just shreds the system more than the disks. This is sync=never
on a pool of 2 mirrors of 960gb 905 optane drives with a Micron NVDIMM SLOG
This system is in production doing other things tho.
Sync Never
root@prod[/mnt/optane_vm/fio]# fio --name=random-read --direct=1 --rw=randrw --bs=256k --ioengine=libaio --iodepth=64 --runtime=45 --numjobs=8 --time_based --group_reporting --eta-newline=1 --end_fsync=1 --size=10G
random-read: (g=0): rw=randrw, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=64
...
fio-3.33
Starting 8 processes
Jobs: 8 (f=8): [m(8)][6.5%][r=1499MiB/s,w=1498MiB/s][r=5994,w=5990 IOPS][eta 00m:43s]
Jobs: 8 (f=8): [m(8)][8.7%][r=1508MiB/s,w=1529MiB/s][r=6032,w=6115 IOPS][eta 00mJobs: 8 (f=8): [m(8)][10.9%][r=1106MiB/s,w=1115MiB/s][r=4425,w=4461 IOPS][eta 00m:41s]
Jobs: 8 (f=8): [m(8)][13.0%][r=1055MiB/s,w=1043MiB/s][r=4221,w=4170 IOPS][eta 00m:40s]
Jobs: 8 (f=8): [m(8)][15.2%][r=1028MiB/s,w=1049MiB/s][r=4112,w=4196 IOPS][eta 00Jobs: 8 (f=8): [m(8)][17.4%][r=1124MiB/s,w=1092MiB/s][r=4496,w=4366 IOPS][eta 00m:38s]
Jobs: 8 (f=8): [m(8)][19.6%][r=699MiB/s,w=682MiB/s][r=2796,w=2728 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][21.7%][r=467MiB/s,w=460MiB/s][r=1867,w=1838 IOPS][eta 00m:36s]
Jobs: 8 (f=8): [m(8)][23.9%][r=368MiB/s,w=381MiB/s][r=1471,w=1525 IOPS][eta 00m:35s]
Jobs: 8 (f=8): [m(8)][26.1%][r=352MiB/s,w=353MiB/s][r=1409,w=1412 IOPS][eta 00m:34s]
Jobs: 8 (f=8): [m(8)][28.3%][r=344MiB/s,w=332MiB/s][r=1377,w=1329 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][30.4%][r=342MiB/s,w=326MiB/s][r=1369,w=1302 IOPS][eta 00m:32s]
Jobs: 8 (f=8): [m(8)][32.6%][r=339MiB/s,w=341MiB/s][r=1355,w=1362 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][34.8%][r=335MiB/s,w=334MiB/s][r=1341,w=1335 IOPS][eta 00m:30s]
Jobs: 8 (f=8): [m(8)][37.0%][r=335MiB/s,w=334MiB/s][r=1339,w=1335 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][39.1%][r=313MiB/s,w=330MiB/s][r=1250,w=1320 IOPS][eta 00m:28s]
Jobs: 8 (f=8): [m(8)][41.3%][r=327MiB/s,w=306MiB/s][r=1309,w=1225 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][43.5%][r=337MiB/s,w=333MiB/s][r=1347,w=1331 IOPS][eta 00m:26s]
Jobs: 8 (f=8): [m(8)][45.7%][r=328MiB/s,w=339MiB/s][r=1310,w=1356 IOPS][eta 00m:25s]
Jobs: 8 (f=8): [m(8)][47.8%][r=359MiB/s,w=328MiB/s][r=1435,w=1310 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][50.0%][r=327MiB/s,w=327MiB/s][r=1308,w=1309 IOPS][eta 00m:23s]
Jobs: 8 (f=8): [m(8)][52.2%][r=320MiB/s,w=327MiB/s][r=1278,w=1307 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][54.3%][r=330MiB/s,w=346MiB/s][r=1321,w=1385 IOPS][eta 00m:21s]
Jobs: 8 (f=8): [m(8)][56.5%][r=352MiB/s,w=322MiB/s][r=1406,w=1289 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][58.7%][r=344MiB/s,w=339MiB/s][r=1374,w=1356 IOPS][eta 00m:19s]
Jobs: 8 (f=8): [m(8)][60.9%][r=345MiB/s,w=335MiB/s][r=1380,w=1339 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][63.0%][r=328MiB/s,w=323MiB/s][r=1313,w=1291 IOPS][eta 00m:17s]
Jobs: 8 (f=8): [m(8)][65.2%][r=326MiB/s,w=352MiB/s][r=1303,w=1409 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][67.4%][r=336MiB/s,w=334MiB/s][r=1344,w=1335 IOPS][eta 00m:15s]
Jobs: 8 (f=8): [m(8)][69.6%][r=346MiB/s,w=337MiB/s][r=1385,w=1346 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][71.7%][r=326MiB/s,w=336MiB/s][r=1303,w=1342 IOPS][eta 00m:13s]
Jobs: 8 (f=8): [m(8)][73.9%][r=337MiB/s,w=336MiB/s][r=1348,w=1344 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][76.1%][r=343MiB/s,w=327MiB/s][r=1370,w=1306 IOPS][eta 00m:11s]
Jobs: 8 (f=8): [m(8)][78.3%][r=348MiB/s,w=342MiB/s][r=1393,w=1369 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][80.4%][r=367MiB/s,w=378MiB/s][r=1469,w=1510 IOPS][eta 00m:09s]
Jobs: 8 (f=8): [m(8)][82.6%][r=335MiB/s,w=327MiB/s][r=1338,w=1307 IOPS][eta 00m:08s]
Jobs: 8 (f=8): [m(8)][84.8%][r=305MiB/s,w=292MiB/s][r=1221,w=1169 IOPS][eta 00m:07s]
Jobs: 8 (f=8): [m(8)][87.0%][r=297MiB/s,w=287MiB/s][r=1186,w=1147 IOPS][eta 00m:06s]
Jobs: 8 (f=8): [m(8)][89.1%][r=312MiB/s,w=322MiB/s][r=1247,w=1289 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][91.3%][r=296MiB/s,w=307MiB/s][r=1182,w=1229 IOPS][eta 00m:04s]
Jobs: 8 (f=8): [m(8)][93.5%][r=325MiB/s,w=313MiB/s][r=1300,w=1252 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][97.8%][r=299MiB/s,w=317MiB/s][r=1195,w=1266 IOPS][eta 00m:01s]
Jobs: 8 (f=8): [m(8)][100.0%][r=302MiB/s,w=326MiB/s][r=1209,w=1303 IOPS][eta 00mJobs: 8 (f=0): [f(8)][100.0%][r=378MiB/s,w=377MiB/s][r=1511,w=1509 IOPS][eta 00m:00s]
random-read: (groupid=0, jobs=8): err= 0: pid=1352671: Sat May 4 01:13:13 2024
read: IOPS=2085, BW=521MiB/s (547MB/s)(22.9GiB/45003msec)
slat (usec): min=155, max=412127, avg=1666.49, stdev=3125.01
clat (usec): min=7, max=1065.9k, avg=120696.94, stdev=84191.89
lat (msec): min=3, max=1070, avg=122.36, stdev=85.21
clat percentiles (msec):
| 1.00th=[ 12], 5.00th=[ 13], 10.00th=[ 14], 20.00th=[ 19],
| 30.00th=[ 51], 40.00th=[ 81], 50.00th=[ 150], 60.00th=[ 174],
| 70.00th=[ 186], 80.00th=[ 194], 90.00th=[ 207], 95.00th=[ 218],
| 99.00th=[ 241], 99.50th=[ 284], 99.90th=[ 978], 99.95th=[ 1011],
| 99.99th=[ 1053]
bw ( KiB/s): min=215790, max=4558510, per=98.98%, avg=528328.35, stdev=74239.19, samples=710
iops : min= 839, max=17802, avg=2062.44, stdev=289.97, samples=710
write: IOPS=2085, BW=521MiB/s (547MB/s)(22.9GiB/45003msec); 0 zone resets
slat (usec): min=115, max=288020, avg=2150.51, stdev=2780.52
clat (usec): min=7, max=1069.5k, avg=120935.42, stdev=84379.07
lat (msec): min=2, max=1075, avg=123.09, stdev=85.77
clat percentiles (msec):
| 1.00th=[ 12], 5.00th=[ 13], 10.00th=[ 14], 20.00th=[ 19],
| 30.00th=[ 50], 40.00th=[ 80], 50.00th=[ 153], 60.00th=[ 176],
| 70.00th=[ 186], 80.00th=[ 197], 90.00th=[ 209], 95.00th=[ 218],
| 99.00th=[ 243], 99.50th=[ 271], 99.90th=[ 634], 99.95th=[ 1011],
| 99.99th=[ 1053]
bw ( KiB/s): min=155833, max=4676081, per=98.67%, avg=526936.13, stdev=75387.78, samples=712
iops : min= 606, max=18262, avg=2056.96, stdev=294.47, samples=712
lat (usec) : 10=0.01%, 20=0.01%, 250=0.01%
lat (msec) : 4=0.01%, 10=0.01%, 20=20.80%, 50=9.13%, 100=13.64%
lat (msec) : 250=55.65%, 500=0.63%, 750=0.04%, 1000=0.03%, 2000=0.07%
cpu : usr=0.86%, sys=10.46%, ctx=186809, majf=0, minf=79934
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.7%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=93835,93876,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=521MiB/s (547MB/s), 521MiB/s-521MiB/s (547MB/s-547MB/s), io=22.9GiB (24.6GB), run=45003-45003msec
WRITE: bw=521MiB/s (547MB/s), 521MiB/s-521MiB/s (547MB/s-547MB/s), io=22.9GiB (24.6GB), run=45003-45003msec
Sync Always:
root@prod[/mnt/optane_vm/fio]# fio --name=random-read --direct=1 --rw=randrw --bs=256k --ioengine=libaio --iodepth=64 --runtime=45 --numjobs=8 --time_based --group_reporting --eta-newline=1 --end_fsync=1 --size=10G
random-read: (g=0): rw=randrw, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=64
...
fio-3.33
Starting 8 processes
Jobs: 8 (f=8): [m(8)][8.7%][r=620MiB/s,w=642MiB/s][r=2480,w=2566 IOPS][eta 00m:42s]
Jobs: 8 (f=8): [m(8)][10.9%][r=618MiB/s,w=611MiB/s][r=2473,w=2444 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][13.0%][r=658MiB/s,w=659MiB/s][r=2632,w=2635 IOPS][eta 00m:40s]
Jobs: 8 (f=8): [m(8)][15.2%][r=618MiB/s,w=628MiB/s][r=2473,w=2510 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][17.4%][r=478MiB/s,w=497MiB/s][r=1911,w=1986 IOPS][eta 00m:38s]
Jobs: 8 (f=8): [m(8)][19.6%][r=437MiB/s,w=419MiB/s][r=1747,w=1674 IOPS][eta 00m:37s]
Jobs: 8 (f=8): [m(8)][21.7%][r=1813MiB/s,w=1836MiB/s][r=7251,w=7343 IOPS][eta 00Jobs: 8 (f=8): [m(8)][23.9%][r=603MiB/s,w=634MiB/s][r=2413,w=2537 IOPS][eta 00m:35s]
Jobs: 8 (f=8): [m(8)][26.1%][r=453MiB/s,w=451MiB/s][r=1812,w=1804 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][28.3%][r=535MiB/s,w=501MiB/s][r=2141,w=2002 IOPS][eta 00m:33s]
Jobs: 8 (f=8): [m(8)][30.4%][r=520MiB/s,w=515MiB/s][r=2078,w=2058 IOPS][eta 00m:32s]
Jobs: 8 (f=8): [m(8)][32.6%][r=500MiB/s,w=505MiB/s][r=2000,w=2021 IOPS][eta 00m:31s]
Jobs: 8 (f=8): [m(8)][34.8%][r=464MiB/s,w=440MiB/s][r=1854,w=1759 IOPS][eta 00m:30s]
Jobs: 8 (f=8): [m(8)][37.0%][r=513MiB/s,w=517MiB/s][r=2051,w=2066 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][39.1%][r=531MiB/s,w=549MiB/s][r=2124,w=2197 IOPS][eta 00m:28s]
Jobs: 8 (f=8): [m(8)][41.3%][r=459MiB/s,w=439MiB/s][r=1834,w=1754 IOPS][eta 00m:27s]
Jobs: 8 (f=8): [m(8)][43.5%][r=427MiB/s,w=409MiB/s][r=1706,w=1637 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][45.7%][r=453MiB/s,w=454MiB/s][r=1813,w=1817 IOPS][eta 00m:25s]
Jobs: 8 (f=8): [m(8)][47.8%][r=513MiB/s,w=481MiB/s][r=2050,w=1925 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][50.0%][r=469MiB/s,w=478MiB/s][r=1874,w=1910 IOPS][eta 00m:23s]
Jobs: 8 (f=8): [m(8)][52.2%][r=431MiB/s,w=437MiB/s][r=1722,w=1748 IOPS][eta 00m:22s]
Jobs: 8 (f=8): [m(8)][54.3%][r=423MiB/s,w=406MiB/s][r=1693,w=1623 IOPS][eta 00m:21s]
Jobs: 8 (f=8): [m(8)][56.5%][r=445MiB/s,w=456MiB/s][r=1780,w=1825 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][58.7%][r=360MiB/s,w=346MiB/s][r=1440,w=1383 IOPS][eta 00m:19s]
Jobs: 8 (f=8): [m(8)][62.2%][r=328MiB/s,w=331MiB/s][r=1311,w=1324 IOPS][eta 00m:17s]
Jobs: 8 (f=8): [m(8)][64.4%][r=689MiB/s,w=667MiB/s][r=2756,w=2666 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][66.7%][r=382MiB/s,w=364MiB/s][r=1529,w=1456 IOPS][eta 00m:15s]
Jobs: 8 (f=8): [m(8)][68.9%][r=433MiB/s,w=428MiB/s][r=1731,w=1711 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][71.1%][r=387MiB/s,w=398MiB/s][r=1549,w=1593 IOPS][eta 00m:13s]
Jobs: 8 (f=8): [m(8)][73.3%][r=354MiB/s,w=355MiB/s][r=1417,w=1421 IOPS][eta 00m:12s]
Jobs: 8 (f=8): [m(8)][75.6%][r=328MiB/s,w=321MiB/s][r=1313,w=1283 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][77.8%][r=336MiB/s,w=339MiB/s][r=1342,w=1357 IOPS][eta 00m:10s]
Jobs: 8 (f=8): [m(8)][80.0%][r=318MiB/s,w=313MiB/s][r=1271,w=1252 IOPS][eta 00m:09s]
Jobs: 8 (f=8): [m(8)][82.2%][r=364MiB/s,w=355MiB/s][r=1454,w=1421 IOPS][eta 00m:08s]
Jobs: 8 (f=8): [m(8)][84.4%][r=304MiB/s,w=333MiB/s][r=1217,w=1333 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][86.7%][r=309MiB/s,w=313MiB/s][r=1235,w=1251 IOPS][eta 00m:06s]
Jobs: 8 (f=8): [m(8)][88.9%][r=344MiB/s,w=337MiB/s][r=1377,w=1349 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][91.1%][r=302MiB/s,w=316MiB/s][r=1208,w=1265 IOPS][eta 00m:04s]
Jobs: 8 (f=8): [m(8)][93.3%][r=332MiB/s,w=340MiB/s][r=1327,w=1359 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][95.6%][r=314MiB/s,w=313MiB/s][r=1256,w=1252 IOPS][eta 00m:02s]
Jobs: 8 (f=8): [m(8)][97.8%][r=334MiB/s,w=332MiB/s][r=1336,w=1326 IOPS][eta 00m:Jobs: 8 (f=8): [m(8)][100.0%][r=319MiB/s,w=326MiB/s][r=1274,w=1303 IOPS][eta 00m:00s]
Jobs: 2 (f=0): [f(2),E(2),_(1),E(3)][100.0%][r=348MiB/s,w=347MiB/s][r=1390,w=1388 IOPS][eta 00m:00s]
random-read: (groupid=0, jobs=8): err= 0: pid=1362291: Sat May 4 01:16:39 2024
read: IOPS=2033, BW=508MiB/s (533MB/s)(22.3GiB/45006msec)
slat (usec): min=162, max=192464, avg=1554.48, stdev=1912.72
clat (usec): min=6, max=449191, avg=123960.02, stdev=62939.75
lat (msec): min=4, max=452, avg=125.51, stdev=63.58
clat percentiles (msec):
| 1.00th=[ 26], 5.00th=[ 27], 10.00th=[ 29], 20.00th=[ 69],
| 30.00th=[ 97], 40.00th=[ 111], 50.00th=[ 126], 60.00th=[ 142],
| 70.00th=[ 161], 80.00th=[ 178], 90.00th=[ 197], 95.00th=[ 215],
| 99.00th=[ 305], 99.50th=[ 334], 99.90th=[ 368], 99.95th=[ 384],
| 99.99th=[ 418]
bw ( KiB/s): min=197743, max=2254298, per=100.00%, avg=521300.33, stdev=45736.58, samples=712
iops : min= 766, max= 8805, avg=2035.13, stdev=178.67, samples=712
write: IOPS=2035, BW=509MiB/s (534MB/s)(22.4GiB/45006msec); 0 zone resets
slat (usec): min=279, max=203222, avg=2350.38, stdev=3088.49
clat (usec): min=6, max=440506, avg=123735.29, stdev=63756.01
lat (msec): min=2, max=448, avg=126.09, stdev=64.89
clat percentiles (msec):
| 1.00th=[ 26], 5.00th=[ 27], 10.00th=[ 29], 20.00th=[ 65],
| 30.00th=[ 96], 40.00th=[ 111], 50.00th=[ 125], 60.00th=[ 142],
| 70.00th=[ 161], 80.00th=[ 180], 90.00th=[ 199], 95.00th=[ 218],
| 99.00th=[ 305], 99.50th=[ 338], 99.90th=[ 372], 99.95th=[ 384],
| 99.99th=[ 409]
bw ( KiB/s): min=218234, max=2323916, per=100.00%, avg=521989.43, stdev=47153.31, samples=712
iops : min= 847, max= 9077, avg=2037.83, stdev=184.21, samples=712
lat (usec) : 10=0.01%
lat (msec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=18.96%, 100=13.38%
lat (msec) : 250=65.64%, 500=1.99%
cpu : usr=1.02%, sys=15.01%, ctx=587686, majf=0, minf=111559
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.7%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=91500,91597,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=508MiB/s (533MB/s), 508MiB/s-508MiB/s (533MB/s-533MB/s), io=22.3GiB (24.0GB), run=45006-45006msec
WRITE: bw=509MiB/s (534MB/s), 509MiB/s-509MiB/s (534MB/s-534MB/s), io=22.4GiB (24.0GB), run=45006-45006msec
root@prod[/mnt/optane_vm/fio]#
That seems fairly low, especially on writes - are those -P or N?.
Edit - Ah thats what you meant when you said it taxes the system more than the drives
In general the test seems to be a bit weird as a NFS storage test
- you sure XCP uses 256K blocksize?
- QD of 64 seems a bit excessive unless each vm is really going to thresh the pool. 16 should be more than enough. Better increase workers/numjobs to the rough numbers of vms you’re planning to run instead
In general it is expected that remote tests are significantly worse then local tests, many gray hairs have been caused by this.
My personal hope was always that RDMA would help, but thats not coming soon, and might not be as beneficial as I hope.
In the meantime don’t fret about the differences, just try to get it fast enough, I’ll run some tests for more comparison points too
Edit2
All tests on Core, with posixaio
TNC 13U6, Xeon 1245v6, 64G, 5 x pm863a in Z1, P1600X slog, sync always, local test
READ: bw=712MiB/s (746MB/s), WRITE: bw=712MiB/s (746MB/s)
fio --name=random-read --direct=1 --rw=randrw --bs=256k --ioengine=posixaio --iodepth=64 --runtime=45 --numjobs=8 --time_based --group_reporting --eta-newline=1 --end_fsync=1 --size=10G
random-read: (g=0): rw=randrw, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=posixaio, iodepth=64
…
fio-3.28
Starting 8 processes
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
Jobs: 8 (f=8): [m(8)][8.7%][r=587MiB/s,w=630MiB/s][r=2346,w=2519 IOPS][eta 00m:42s]
Jobs: 8 (f=8): [m(8)][13.0%][r=791MiB/s,w=787MiB/s][r=3165,w=3148 IOPS][eta 00m:40s]
Jobs: 8 (f=8): [m(8)][17.4%][r=545MiB/s,w=552MiB/s][r=2181,w=2209 IOPS][eta 00m:38s]
Jobs: 8 (f=8): [m(8)][19.6%][r=730MiB/s,w=748MiB/s][r=2919,w=2990 IOPS][eta 00m:37s]
Jobs: 8 (f=8): [m(8)][23.9%][r=673MiB/s,w=691MiB/s][r=2691,w=2763 IOPS][eta 00m:35s]
Jobs: 8 (f=8): [m(8)][28.3%][r=881MiB/s,w=906MiB/s][r=3525,w=3623 IOPS][eta 00m:33s]
Jobs: 8 (f=8): [m(8)][32.6%][r=551MiB/s,w=500MiB/s][r=2204,w=1999 IOPS][eta 00m:31s]
Jobs: 8 (f=8): [m(8)][37.0%][r=701MiB/s,w=692MiB/s][r=2804,w=2768 IOPS][eta 00m:29s]
Jobs: 8 (f=8): [m(8)][41.3%][r=744MiB/s,w=739MiB/s][r=2977,w=2956 IOPS][eta 00m:27s]
Jobs: 8 (f=8): [m(8)][45.7%][r=674MiB/s,w=652MiB/s][r=2696,w=2608 IOPS][eta 00m:25s]
Jobs: 8 (f=8): [m(8)][47.8%][r=722MiB/s,w=722MiB/s][r=2886,w=2888 IOPS][eta 00m:24s]
Jobs: 8 (f=8): [m(8)][52.2%][r=852MiB/s,w=842MiB/s][r=3407,w=3368 IOPS][eta 00m:22s]
Jobs: 8 (f=8): [m(8)][56.5%][r=731MiB/s,w=684MiB/s][r=2923,w=2734 IOPS][eta 00m:20s]
Jobs: 8 (f=8): [m(8)][60.9%][r=728MiB/s,w=730MiB/s][r=2912,w=2918 IOPS][eta 00m:18s]
Jobs: 8 (f=8): [m(8)][65.2%][r=794MiB/s,w=816MiB/s][r=3175,w=3264 IOPS][eta 00m:16s]
Jobs: 8 (f=8): [m(8)][69.6%][r=671MiB/s,w=678MiB/s][r=2682,w=2711 IOPS][eta 00m:14s]
Jobs: 8 (f=8): [m(8)][73.9%][r=720MiB/s,w=708MiB/s][r=2880,w=2833 IOPS][eta 00m:12s]
Jobs: 8 (f=8): [m(8)][78.3%][r=645MiB/s,w=705MiB/s][r=2579,w=2819 IOPS][eta 00m:10s]
Jobs: 8 (f=8): [m(8)][82.6%][r=674MiB/s,w=642MiB/s][r=2695,w=2568 IOPS][eta 00m:08s]
Jobs: 8 (f=8): [m(8)][87.0%][r=793MiB/s,w=806MiB/s][r=3172,w=3223 IOPS][eta 00m:06s]
Jobs: 8 (f=8): [m(8)][91.3%][r=641MiB/s,w=615MiB/s][r=2565,w=2458 IOPS][eta 00m:04s]
Jobs: 8 (f=8): [m(8)][93.5%][r=859MiB/s,w=849MiB/s][r=3435,w=3397 IOPS][eta 00m:03s]
Jobs: 8 (f=8): [m(8)][97.8%][r=621MiB/s,w=602MiB/s][r=2483,w=2406 IOPS][eta 00m:01s]
Jobs: 8 (f=8): [m(8)][79.3%][r=670MiB/s,w=675MiB/s][r=2680,w=2698 IOPS][eta 00m:12s]
random-read: (groupid=0, jobs=8): err= 0: pid=4485: Sat May 4 09:58:30 2024
read: IOPS=2846, BW=712MiB/s (746MB/s)(31.3GiB/45060msec)
slat (nsec): min=434, max=5142.7k, avg=1386.89, stdev=17402.87
clat (msec): min=8, max=548, avg=97.45, stdev=85.20
lat (msec): min=8, max=548, avg=97.45, stdev=85.20
clat percentiles (msec):
| 1.00th=[ 51], 5.00th=[ 57], 10.00th=[ 61], 20.00th=[ 65],
| 30.00th=[ 67], 40.00th=[ 70], 50.00th=[ 72], 60.00th=[ 75],
| 70.00th=[ 81], 80.00th=[ 87], 90.00th=[ 109], 95.00th=[ 376],
| 99.00th=[ 447], 99.50th=[ 489], 99.90th=[ 514], 99.95th=[ 518],
| 99.99th=[ 531]
bw ( KiB/s): min=149713, max=1110304, per=100.00%, avg=731992.00, stdev=31485.74, samples=712
iops : min= 578, max= 4330, avg=2856.36, stdev=123.04, samples=712
write: IOPS=2847, BW=712MiB/s (746MB/s)(31.3GiB/45060msec); 0 zone resets
slat (usec): min=2, max=5304, avg=14.95, stdev=39.01
clat (msec): min=3, max=533, avg=82.13, stdev=86.23
lat (msec): min=3, max=533, avg=82.15, stdev=86.23
clat percentiles (msec):
| 1.00th=[ 44], 5.00th=[ 47], 10.00th=[ 49], 20.00th=[ 52],
| 30.00th=[ 54], 40.00th=[ 56], 50.00th=[ 58], 60.00th=[ 61],
| 70.00th=[ 64], 80.00th=[ 68], 90.00th=[ 79], 95.00th=[ 368],
| 99.00th=[ 443], 99.50th=[ 485], 99.90th=[ 510], 99.95th=[ 514],
| 99.99th=[ 518]
bw ( KiB/s): min=152780, max=1148383, per=100.00%, avg=732064.47, stdev=32959.23, samples=712
iops : min= 590, max= 4479, avg=2856.45, stdev=128.79, samples=712
lat (msec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=8.08%, 100=82.15%
lat (msec) : 250=2.88%, 500=6.66%, 750=0.22%
cpu : usr=0.84%, sys=1.06%, ctx=235503, majf=1, minf=15
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=1.2%, 32=84.5%, >=64=14.2%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=93.6%, 8=2.6%, 16=2.7%, 32=1.0%, 64=0.1%, >=64=0.0%
issued rwts: total=128247,128298,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=712MiB/s (746MB/s), 712MiB/s-712MiB/s (746MB/s-746MB/s), io=31.3GiB (33.6GB), run=45060-45060msec
WRITE: bw=712MiB/s (746MB/s), 712MiB/s-712MiB/s (746MB/s-746MB/s), io=31.3GiB (33.6GB), run=45060-45060msec
TNC 13U6, Xeon Gold 5317 Virtualized 4Cores, 256G, 6 x pm863a in 3mirror, P5800X slog, sync always, local test
READ: bw=909MiB/s WRITE: bw=910MiB/s (954MB/s)
fio --name=random-read --direct=1 --rw=randrw --bs=256k --ioengine=posixaio --iodepth=64 --runtime=45 --numjobs=8 --time_based --group_reporting --eta-newline=1 --end_fsync=1 --size=10G
random-read: (g=0): rw=randrw, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=posixaio, iodepth=64
…
fio-3.28
Starting 8 processes
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
Jobs: 8 (f=8): [m(8)][6.7%][r=927MiB/s,w=963MiB/s][r=3709,w=3851 IOPS][eta 00m:4 2s]
Jobs: 8 (f=8): [m(8)][8.9%][r=839MiB/s,w=826MiB/s][r=3355,w=3304 IOPS][eta 00m:4 1s]
Jobs: 8 (f=8): [m(8)][11.1%][r=964MiB/s,w=958MiB/s][r=3857,w=3831 IOPS][eta 00m: 40s]
Jobs: 8 (f=8): [m(8)][13.3%][r=907MiB/s,w=913MiB/s][r=3627,w=3653 IOPS][eta 00m: 39s]
Jobs: 8 (f=8): [m(8)][15.6%][r=917MiB/s,w=948MiB/s][r=3669,w=3791 IOPS][eta 00m: 38s]
Jobs: 8 (f=8): [m(8)][17.8%][r=1002MiB/s,w=1001MiB/s][r=4009,w=4004 IOPS][eta 00 m:37s]
Jobs: 8 (f=8): [m(8)][20.0%][r=834MiB/s,w=861MiB/s][r=3335,w=3444 IOPS][eta 00m: 36s]
Jobs: 8 (f=8): [m(8)][22.2%][r=991MiB/s,w=1007MiB/s][r=3964,w=4029 IOPS][eta 00m :35s]
Jobs: 8 (f=8): [m(8)][24.4%][r=931MiB/s,w=894MiB/s][r=3724,w=3577 IOPS][eta 00m: 34s]
Jobs: 8 (f=8): [m(8)][26.7%][r=1002MiB/s,w=982MiB/s][r=4009,w=3926 IOPS][eta 00m :33s]
Jobs: 8 (f=8): [m(8)][28.9%][r=919MiB/s,w=934MiB/s][r=3676,w=3736 IOPS][eta 00m: 32s]
Jobs: 8 (f=8): [m(8)][31.1%][r=983MiB/s,w=960MiB/s][r=3932,w=3841 IOPS][eta 00m: 31s]
Jobs: 8 (f=8): [m(8)][33.3%][r=1049MiB/s,w=1016MiB/s][r=4196,w=4063 IOPS][eta 00 m:30s]
Jobs: 8 (f=8): [m(8)][35.6%][r=824MiB/s,w=834MiB/s][r=3297,w=3335 IOPS][eta 00m: 29s]
Jobs: 8 (f=8): [m(8)][37.8%][r=950MiB/s,w=930MiB/s][r=3800,w=3721 IOPS][eta 00m: 28s]
Jobs: 8 (f=8): [m(8)][40.0%][r=855MiB/s,w=859MiB/s][r=3418,w=3437 IOPS][eta 00m: 27s]
Jobs: 8 (f=8): [m(8)][42.2%][r=931MiB/s,w=892MiB/s][r=3723,w=3567 IOPS][eta 00m: 26s]
Jobs: 8 (f=8): [m(8)][44.4%][r=959MiB/s,w=963MiB/s][r=3836,w=3850 IOPS][eta 00m: 25s]
Jobs: 8 (f=8): [m(8)][46.7%][r=829MiB/s,w=821MiB/s][r=3315,w=3285 IOPS][eta 00m: 24s]
Jobs: 8 (f=8): [m(8)][48.9%][r=853MiB/s,w=837MiB/s][r=3412,w=3346 IOPS][eta 00m: 23s]
Jobs: 8 (f=8): [m(8)][51.1%][r=553MiB/s,w=576MiB/s][r=2210,w=2305 IOPS][eta 00m: 22s]
Jobs: 8 (f=8): [m(8)][53.3%][r=625MiB/s,w=621MiB/s][r=2498,w=2485 IOPS][eta 00m: 21s]
Jobs: 8 (f=8): [m(8)][55.6%][r=905MiB/s,w=935MiB/s][r=3620,w=3739 IOPS][eta 00m: 20s]
Jobs: 8 (f=8): [m(8)][57.8%][r=978MiB/s,w=970MiB/s][r=3911,w=3881 IOPS][eta 00m: 19s]
Jobs: 8 (f=8): [m(8)][60.0%][r=880MiB/s,w=863MiB/s][r=3520,w=3450 IOPS][eta 00m: 18s]
Jobs: 8 (f=8): [m(8)][62.2%][r=954MiB/s,w=976MiB/s][r=3816,w=3903 IOPS][eta 00m: 17s]
Jobs: 8 (f=8): [m(8)][64.4%][r=890MiB/s,w=900MiB/s][r=3561,w=3601 IOPS][eta 00m: 16s]
Jobs: 8 (f=8): [m(8)][66.7%][r=971MiB/s,w=950MiB/s][r=3885,w=3800 IOPS][eta 00m: 15s]
Jobs: 8 (f=8): [m(8)][68.9%][r=967MiB/s,w=957MiB/s][r=3868,w=3827 IOPS][eta 00m: 14s]
Jobs: 8 (f=8): [m(8)][71.1%][r=850MiB/s,w=853MiB/s][r=3401,w=3411 IOPS][eta 00m: 13s]
Jobs: 8 (f=8): [m(8)][73.3%][r=921MiB/s,w=922MiB/s][r=3682,w=3688 IOPS][eta 00m: 12s]
Jobs: 8 (f=8): [m(8)][75.6%][r=889MiB/s,w=883MiB/s][r=3555,w=3532 IOPS][eta 00m: 11s]
Jobs: 8 (f=8): [m(8)][77.8%][r=941MiB/s,w=949MiB/s][r=3764,w=3797 IOPS][eta 00m: 10s]
Jobs: 8 (f=8): [m(8)][80.0%][r=964MiB/s,w=989MiB/s][r=3854,w=3954 IOPS][eta 00m: 09s]
Jobs: 8 (f=8): [m(8)][82.2%][r=877MiB/s,w=880MiB/s][r=3508,w=3520 IOPS][eta 00m: 08s]
Jobs: 8 (f=8): [m(8)][84.4%][r=956MiB/s,w=972MiB/s][r=3824,w=3886 IOPS][eta 00m: 07s]
Jobs: 8 (f=8): [m(8)][86.7%][r=879MiB/s,w=877MiB/s][r=3517,w=3506 IOPS][eta 00m: 06s]
Jobs: 8 (f=8): [m(8)][88.9%][r=950MiB/s,w=975MiB/s][r=3798,w=3901 IOPS][eta 00m: 05s]
Jobs: 8 (f=8): [m(8)][91.1%][r=1015MiB/s,w=984MiB/s][r=4058,w=3935 IOPS][eta 00m :04s]
Jobs: 8 (f=8): [m(8)][93.3%][r=853MiB/s,w=838MiB/s][r=3410,w=3353 IOPS][eta 00m: 03s]
Jobs: 8 (f=8): [m(8)][95.6%][r=923MiB/s,w=918MiB/s][r=3690,w=3670 IOPS][eta 00m: 02s]
Jobs: 8 (f=8): [m(8)][97.8%][r=853MiB/s,w=866MiB/s][r=3410,w=3463 IOPS][eta 00m: 01s]
Jobs: 8 (f=8): [m(8)][100.0%][r=955MiB/s,w=940MiB/s][r=3819,w=3760 IOPS][eta 00m :00s]
random-read: (groupid=0, jobs=8): err= 0: pid=53561: Sat May 4 10:00:09 2024
read: IOPS=3637, BW=909MiB/s (954MB/s)(40.0GiB/45073msec)
slat (nsec): min=455, max=12061k, avg=3598.22, stdev=70556.96
clat (usec): min=107, max=3383.8k, avg=68959.25, stdev=43918.22
lat (usec): min=379, max=3383.8k, avg=68962.85, stdev=43915.91
clat percentiles (msec):
| 1.00th=[ 30], 5.00th=[ 43], 10.00th=[ 47], 20.00th=[ 54],
| 30.00th=[ 57], 40.00th=[ 59], 50.00th=[ 61], 60.00th=[ 65],
| 70.00th=[ 70], 80.00th=[ 74], 90.00th=[ 86], 95.00th=[ 123],
| 99.00th=[ 213], 99.50th=[ 253], 99.90th=[ 498], 99.95th=[ 735],
| 99.99th=[ 1318]
bw ( KiB/s): min=525144, max=1204546, per=100.00%, avg=931932.30, stdev=1607 3.91, samples=704
iops : min= 2048, max= 4703, avg=3636.75, stdev=62.80, samples=704
write: IOPS=3638, BW=910MiB/s (954MB/s)(40.0GiB/45073msec); 0 zone resets
slat (nsec): min=1533, max=31224k, avg=27789.67, stdev=206370.74
clat (usec): min=105, max=3305.3k, avg=71049.60, stdev=44811.65
lat (usec): min=537, max=3305.3k, avg=71077.39, stdev=44807.35
clat percentiles (msec):
| 1.00th=[ 31], 5.00th=[ 44], 10.00th=[ 48], 20.00th=[ 55],
| 30.00th=[ 57], 40.00th=[ 60], 50.00th=[ 63], 60.00th=[ 68],
| 70.00th=[ 71], 80.00th=[ 75], 90.00th=[ 90], 95.00th=[ 131],
| 99.00th=[ 218], 99.50th=[ 268], 99.90th=[ 550], 99.95th=[ 802],
| 99.99th=[ 1250]
bw ( KiB/s): min=557144, max=1192748, per=100.00%, avg=932302.81, stdev=1566 1.49, samples=704
iops : min= 2172, max= 4656, avg=3638.10, stdev=61.19, samples=704
lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.02%, 4=0.05%, 10=0.14%, 20=0.22%, 50=13.12%
lat (msec) : 100=79.06%, 250=6.81%, 500=0.46%, 750=0.06%, 1000=0.02%
lat (msec) : 2000=0.02%, >=2000=0.01%
cpu : usr=0.90%, sys=0.41%, ctx=129219, majf=0, minf=8
IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=1.2%, 16=8.7%, 32=81.3%, >=64=8.6%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.6%, 8=1.2%, 16=1.3%, 32=1.1%, 64=0.8%, >=64=0.0%
issued rwts: total=163958,164006,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=909MiB/s (954MB/s), 909MiB/s-909MiB/s (954MB/s-954MB/s), io=40.0GiB (43.0GB), run=45073-45073msec
WRITE: bw=910MiB/s (954MB/s), 910MiB/s-910MiB/s (954MB/s-954MB/s), io=40.0GiB (43.0GB), run=45073-45073msec
A vm on the 5317 system (ESX8 AIO build, TNC as a VM, so basically VM to VM networking only. Note this heavily benefits from the TNC box’s memory due to limited test size (80G total only)
READ: bw=981MiB/s (1028MB/s) WRITE: bw=981MiB/s (1029MB/s), 981MiB/s-981MiB/s
fiobuntu:/mnt/sdb# fio --name=random-read --direct=1 --rw=randrw --bs=256k --ioengine=libaio --iodepth=64 --runtime=45 --numjobs=8 --time_based --group_reporting --eta-newline=1 --end_fsync=1 --size=10G
random-read: (g=0): rw=randrw, bs=(R) 256KiB-256KiB, (W) 256KiB-256KiB, (T) 256KiB-256KiB, ioengine=libaio, iodepth=64
…
fio-3.28
Starting 8 processes
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
random-read: Laying out IO file (1 file / 10240MiB)
Jobs: 8 (f=8): [m(8)][8.7%][r=1194MiB/s,w=1188MiB/s][r=4776,w=4752 IOPS][eta 00m:42s]
Jobs: 8 (f=8): [m(8)][13.0%][r=1194MiB/s,w=1195MiB/s][r=4776,w=4780 IOPS][eta 00m:40s]
Jobs: 8 (f=8): [m(8)][17.4%][r=989MiB/s,w=1012MiB/s][r=3957,w=4046 IOPS][eta 00m:38s]
Jobs: 8 (f=8): [m(8)][21.7%][r=1059MiB/s,w=1038MiB/s][r=4237,w=4151 IOPS][eta 00m:36s]
Jobs: 8 (f=8): [m(8)][26.1%][r=929MiB/s,w=913MiB/s][r=3715,w=3650 IOPS][eta 00m:34s]
Jobs: 8 (f=8): [m(8)][30.4%][r=966MiB/s,w=960MiB/s][r=3863,w=3841 IOPS][eta 00m:32s]
Jobs: 8 (f=8): [m(8)][34.8%][r=1020MiB/s,w=1008MiB/s][r=4078,w=4031 IOPS][eta 00m:30s]
Jobs: 8 (f=8): [m(8)][39.1%][r=1055MiB/s,w=1016MiB/s][r=4221,w=4064 IOPS][eta 00m:28s]
Jobs: 8 (f=8): [m(8)][43.5%][r=1013MiB/s,w=1031MiB/s][r=4052,w=4122 IOPS][eta 00m:26s]
Jobs: 8 (f=8): [m(8)][47.8%][r=1029MiB/s,w=1058MiB/s][r=4114,w=4230 IOPS][eta 00m:24s]
Jobs: 8 (f=8): [m(8)][52.2%][r=948MiB/s,w=907MiB/s][r=3793,w=3628 IOPS][eta 00m:22s]
Jobs: 8 (f=8): [m(8)][57.8%][r=874MiB/s,w=899MiB/s][r=3497,w=3595 IOPS][eta 00m:19s]
Jobs: 8 (f=8): [m(8)][60.9%][r=993MiB/s,w=989MiB/s][r=3973,w=3955 IOPS][eta 00m:18s]
Jobs: 8 (f=8): [m(8)][65.2%][r=1038MiB/s,w=1026MiB/s][r=4151,w=4105 IOPS][eta 00m:16s]
Jobs: 8 (f=8): [m(8)][69.6%][r=971MiB/s,w=967MiB/s][r=3883,w=3869 IOPS][eta 00m:14s]
Jobs: 8 (f=8): [m(8)][73.9%][r=829MiB/s,w=825MiB/s][r=3315,w=3301 IOPS][eta 00m:12s]
Jobs: 8 (f=8): [m(8)][80.0%][r=911MiB/s,w=905MiB/s][r=3642,w=3621 IOPS][eta 00m:09s]
Jobs: 8 (f=8): [m(8)][82.6%][r=901MiB/s,w=895MiB/s][r=3602,w=3581 IOPS][eta 00m:08s]
Jobs: 8 (f=8): [m(8)][88.9%][r=1058MiB/s,w=1032MiB/s][r=4231,w=4126 IOPS][eta 00m:05s]
Jobs: 8 (f=8): [m(8)][91.3%][r=1067MiB/s,w=1039MiB/s][r=4267,w=4157 IOPS][eta 00m:04s]
Jobs: 8 (f=8): [m(8)][97.8%][r=969MiB/s,w=991MiB/s][r=3875,w=3963 IOPS][eta 00m:01s]
Jobs: 8 (f=8): [m(8)][100.0%][r=938MiB/s,w=920MiB/s][r=3753,w=3679 IOPS][eta 00m:00s]
random-read: (groupid=0, jobs=8): err= 0: pid=1312: Sat May 4 08:19:56 2024
read: IOPS=3922, BW=981MiB/s (1028MB/s)(43.2GiB/45092msec)
slat (usec): min=5, max=184094, avg=905.55, stdev=3506.97
clat (usec): min=848, max=230061, avg=42303.05, stdev=23675.01
lat (usec): min=1085, max=230126, avg=43208.98, stdev=23960.42
clat percentiles (msec):
| 1.00th=[ 9], 5.00th=[ 15], 10.00th=[ 20], 20.00th=[ 26],
| 30.00th=[ 31], 40.00th=[ 36], 50.00th=[ 40], 60.00th=[ 44],
| 70.00th=[ 49], 80.00th=[ 55], 90.00th=[ 64], 95.00th=[ 73],
| 99.00th=[ 163], 99.50th=[ 178], 99.90th=[ 207], 99.95th=[ 213],
| 99.99th=[ 222]
bw ( KiB/s): min=577536, max=1397183, per=100.00%, avg=1006226.64, stdev=18481.59, samples=712
iops : min= 2256, max= 5456, avg=3930.31, stdev=72.17, samples=712
write: IOPS=3923, BW=981MiB/s (1029MB/s)(43.2GiB/45092msec); 0 zone resets
slat (usec): min=7, max=168675, avg=1093.04, stdev=3921.09
clat (usec): min=1440, max=696119, avg=85996.22, stdev=41493.09
lat (msec): min=2, max=696, avg=87.09, stdev=41.66
clat percentiles (msec):
| 1.00th=[ 21], 5.00th=[ 34], 10.00th=[ 43], 20.00th=[ 54],
| 30.00th=[ 64], 40.00th=[ 72], 50.00th=[ 82], 60.00th=[ 90],
| 70.00th=[ 100], 80.00th=[ 111], 90.00th=[ 129], 95.00th=[ 155],
| 99.00th=[ 236], 99.50th=[ 255], 99.90th=[ 347], 99.95th=[ 388],
| 99.99th=[ 485]
bw ( KiB/s): min=632320, max=1429113, per=100.00%, avg=1005733.66, stdev=18055.90, samples=712
iops : min= 2470, max= 5582, avg=3928.31, stdev=70.51, samples=712
lat (usec) : 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.99%, 20=5.16%, 50=38.29%
lat (msec) : 100=39.86%, 250=15.39%, 500=0.29%, 750=0.01%
cpu : usr=0.86%, sys=1.57%, ctx=88331, majf=0, minf=122
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=176894,176928,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=981MiB/s (1028MB/s), 981MiB/s-981MiB/s (1028MB/s-1028MB/s), io=43.2GiB (46.4GB), run=45092-45092msec
WRITE: bw=981MiB/s (1029MB/s), 981MiB/s-981MiB/s (1029MB/s-1029MB/s), io=43.2GiB (46.4GB), run=45092-45092msec
So, I think I figured out the iperf issue. Thankfully, it makes total sense.
BLUF: When iperf is just sending data and not transferring files, NO DISKS ARE INVOLVED!!! DUH!
So, the two hosts are now both running truenas. I figured that plain vanilla ubuntu server might not be as finely tuned to this stuff as truenas would be. Dunno. Using truenas on both sides DID seems to make things a bit better.
One server is named nas-1 and the other vm-1. Both are Dell R720s with 256G RAM. No ZFS pools involved except their mirrored boot drives (SSD on nas-1, HDD on vm-1).
So here is a plain iperf from vm-1 to nas-1:
root@vm-1[~]# iperf3 -t 20 -c nas-1
Connecting to host nas-1, port 5201
[ 5] local 10.8.8.11 port 58576 connected to 10.8.8.10 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.16 GBytes 9.92 Gbits/sec 44 1.55 MBytes
[ 5] 1.00-2.00 sec 1.15 GBytes 9.90 Gbits/sec 1 1.55 MBytes
[ 5] 2.00-3.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.57 MBytes
[ 5] 3.00-4.00 sec 1.15 GBytes 9.90 Gbits/sec 1 1.57 MBytes
[ 5] 4.00-5.00 sec 1.15 GBytes 9.91 Gbits/sec 1 1.57 MBytes
[ 5] 5.00-6.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.59 MBytes
[ 5] 6.00-7.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.60 MBytes
[ 5] 7.00-8.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.66 MBytes
[ 5] 8.00-9.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.66 MBytes
[ 5] 9.00-10.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.91 MBytes
[ 5] 10.00-11.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.91 MBytes
[ 5] 11.00-12.00 sec 1.15 GBytes 9.91 Gbits/sec 0 1.91 MBytes
[ 5] 12.00-13.00 sec 1.15 GBytes 9.90 Gbits/sec 1 1.91 MBytes
[ 5] 13.00-14.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.91 MBytes
[ 5] 14.00-15.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.91 MBytes
[ 5] 15.00-16.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.91 MBytes
[ 5] 16.00-17.00 sec 1.15 GBytes 9.90 Gbits/sec 1 1.91 MBytes
[ 5] 17.00-18.00 sec 1.15 GBytes 9.91 Gbits/sec 0 1.91 MBytes
[ 5] 18.00-19.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.91 MBytes
[ 5] 19.00-20.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.91 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-20.00 sec 23.1 GBytes 9.90 Gbits/sec 49 sender
[ 5] 0.00-20.00 sec 23.1 GBytes 9.90 Gbits/sec receiver
Basically wirespeed.
So here it is when I transfer a 10 GB file AND write it to disk on the other side.
On nas-1:
root@nas-1[~]# iperf3 -F /root/10gfile.iperf2 -s
On vm-1:
root@vm-1[~]# iperf3 -F /root/10gfile.iperf -t 20 -c nas-1
Connecting to host nas-1, port 5201
[ 5] local 10.8.8.11 port 47482 connected to 10.8.8.10 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.08 GBytes 9.27 Gbits/sec 22 1.57 MBytes
[ 5] 1.00-2.00 sec 962 MBytes 8.07 Gbits/sec 1 1.57 MBytes
[ 5] 2.00-3.00 sec 995 MBytes 8.35 Gbits/sec 0 1.57 MBytes
[ 5] 3.00-4.00 sec 680 MBytes 5.70 Gbits/sec 2 1.57 MBytes
[ 5] 4.00-5.00 sec 405 MBytes 3.39 Gbits/sec 0 1.57 MBytes
[ 5] 5.00-6.00 sec 389 MBytes 3.27 Gbits/sec 0 1.57 MBytes
[ 5] 6.00-7.00 sec 388 MBytes 3.25 Gbits/sec 0 1.57 MBytes
[ 5] 7.00-8.00 sec 380 MBytes 3.19 Gbits/sec 0 1.57 MBytes
[ 5] 8.00-9.00 sec 372 MBytes 3.12 Gbits/sec 0 1.57 MBytes
[ 5] 9.00-10.00 sec 415 MBytes 3.48 Gbits/sec 0 1.57 MBytes
[ 5] 10.00-11.00 sec 401 MBytes 3.37 Gbits/sec 0 1.57 MBytes
[ 5] 11.00-12.00 sec 349 MBytes 2.93 Gbits/sec 0 1.57 MBytes
[ 5] 12.00-13.00 sec 354 MBytes 2.97 Gbits/sec 0 1.57 MBytes
[ 5] 13.00-14.00 sec 365 MBytes 3.06 Gbits/sec 0 1.57 MBytes
[ 5] 14.00-15.00 sec 365 MBytes 3.06 Gbits/sec 0 1.57 MBytes
[ 5] 15.00-16.00 sec 365 MBytes 3.06 Gbits/sec 0 1.57 MBytes
[ 5] 16.00-17.00 sec 382 MBytes 3.21 Gbits/sec 0 1.57 MBytes
[ 5] 17.00-18.00 sec 379 MBytes 3.18 Gbits/sec 0 1.57 MBytes
[ 5] 18.00-19.00 sec 319 MBytes 2.68 Gbits/sec 0 1.57 MBytes
[ 5] 19.00-20.00 sec 299 MBytes 2.50 Gbits/sec 0 1.57 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-20.00 sec 9.44 GBytes 4.06 Gbits/sec 25 sender
Sent 9.44 GByte / 10.0 GByte (94%) of /root/10gfile.iperf
[ 5] 0.00-20.00 sec 9.44 GBytes 4.05 Gbits/sec receiver
THAT’s the slowdown I was seeing, although not as bad as with ubuntu in the mix.
But it changes if I don’t write it to the disk on the iperf server side:
On nas-1:
root@nas-1[~]# iperf3 -s
On vm-1:
root@vm-1[~]# iperf3 -F /root/10gfile.iperf -t 20 -c nas-1
Connecting to host nas-1, port 5201
[ 5] local 10.8.8.11 port 46922 connected to 10.8.8.10 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 975 MBytes 8.17 Gbits/sec 45 1.56 MBytes
[ 5] 1.00-2.00 sec 1.07 GBytes 9.22 Gbits/sec 2 1.56 MBytes
[ 5] 2.00-3.00 sec 975 MBytes 8.18 Gbits/sec 0 1.56 MBytes
[ 5] 3.00-4.00 sec 1.03 GBytes 8.85 Gbits/sec 0 1.61 MBytes
[ 5] 4.00-5.00 sec 1.07 GBytes 9.21 Gbits/sec 0 1.61 MBytes
[ 5] 5.00-6.00 sec 974 MBytes 8.16 Gbits/sec 1 1.61 MBytes
[ 5] 6.00-7.00 sec 972 MBytes 8.16 Gbits/sec 2 1.61 MBytes
[ 5] 7.00-8.00 sec 976 MBytes 8.18 Gbits/sec 0 1.61 MBytes
[ 5] 8.00-9.00 sec 1.03 GBytes 8.85 Gbits/sec 0 1.61 MBytes
[ 5] 9.00-9.95 sec 1.04 GBytes 9.34 Gbits/sec 1 1.61 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-9.95 sec 10.0 GBytes 8.63 Gbits/sec 51 sender
Sent 10.0 GByte / 10.0 GByte (100%) of /root/10gfile.iperf
[ 5] 0.00-9.95 sec 10.0 GBytes 8.63 Gbits/sec receiver
Almost wirespeed, but not quite I assume because it has to read from the disk on vm-1 (vm-1 has a 4TB SAS HDD mirror boot drive).
So, then I wanted to make sure that it wasnt the reading or writing, so I created ram disks on both machines and used that to read and write from.
mkdir /tmp/ramdisk
chmod 777 /tmp/ramdisk
mount -t tmpfs -o size=20G myramdisk /tmp/ramdisk
On nas-1:
iperf3 -F /tmp/ramdisk/10gfile.iperf2 -s
On vm-1:
root@vm-1[~]# iperf3 -F /tmp/ramdisk/10gfile.iperf -t 20 -c nas-1
Connecting to host nas-1, port 5201
[ 5] local 10.8.8.11 port 41210 connected to 10.8.8.10 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.09 GBytes 9.33 Gbits/sec 20 1.54 MBytes
[ 5] 1.00-2.00 sec 1.12 GBytes 9.62 Gbits/sec 1 1.54 MBytes
[ 5] 2.00-3.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.56 MBytes
[ 5] 3.00-4.00 sec 1.15 GBytes 9.90 Gbits/sec 5 1.56 MBytes
[ 5] 4.00-5.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.56 MBytes
[ 5] 5.00-6.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.60 MBytes
[ 5] 6.00-7.00 sec 1.15 GBytes 9.90 Gbits/sec 5 1.60 MBytes
[ 5] 7.00-8.00 sec 1.15 GBytes 9.90 Gbits/sec 1 1.60 MBytes
[ 5] 8.00-8.95 sec 899 MBytes 7.97 Gbits/sec 0 1.60 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-8.95 sec 10.0 GBytes 9.60 Gbits/sec 32 sender
Sent 10.0 GByte / 10.0 GByte (100%) of /tmp/ramdisk/10gfile.iperf
[ 5] 0.00-8.95 sec 10.0 GBytes 9.60 Gbits/sec receiver
So, basically a wirespeed transfer.
I think I’m happy with the networking now. At least it’s making sense.
Now I can move back to NFS performance testing.
Yeah. I’m a newb when it comes to fio and this storage stuff in genderal. I just grabbed an example fio command line from and Oracle site. I think they were testing database storage, which is probly totally different set of requirements than vms.
Definately eager for suggestions of a test to simulate xcp-ng VM storage over NFS.
No idea how xcp-ng does things, but VMWare uses 64K blocks…
I’d use that with a qd of maybe 4 and 16 jobs to simulate 16 vms.
Check with a read write ratio of maybe 70:30 (or vice versa depending on how much activity there really is on the VMs).
You can also try to see the difference between random and streaming activity to get a feel for it.
But the best test is to simply move your VMs on it and just see how it works. At least a few test VMs. Just make sure you can go back;)
Thanks. Definately doing more tests. Waiting for some large files to copy on the servers being tested, but but I’ll post more results. Looks like I get the most IIOPS and throughput (bandwidth) from bs=3k. I started at 128k and kept halving the block size until IIOPS and bandwidth suffered. The number of jobs (tried 8 and 16) changed the bandwidth and IIOPS but didnt seem to move the 3k sweet spot. Need to be more scientific about it tho.
I don’t understand iodepth enough to specify a value so I just took it off. Maybe that makes my results useless, I dunno. Something about having a bunch of io transations out there so the OS can pick the most convenient one. Never seen an iodepth setting outside of benchmarking software so I dont know if its something that I can control.
iodepth is the number of operations that are waiting to be processed by a disk.
You basically get a stack of papers to read instead of one after reducing overhead and preventing a disk from idleing since the scheduler has not appointed new work
For maxing out disks its a good thing, but not necessarily for having an accurate representation of your workload.
3k or 4k? 3k=3172 bytes would be a very odd peak.
No idea if ashift is still a thing nowadays, having the proper value was all the rage a couple of years ago, but have not followed things. Maybe some of the other guys has more recent info on that. Just saying due to this weird peak performance blocksize…