Dual 10Gbs iscsi caps at 1.6GB/s

Antonis_Marcou · July 7, 2024, 4:52pm

DELL PowerEdge T440

2 x Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz
192 GB DDR4 2133 MT/s

Dell HBA330 Adp Storage Controller in Slot 4 FW: 16.17.01.00 (CPU1)

4 x 1.8TB 7.2k SAS 12Gbps
3 x 500GB CT500MX SSDs

3 x 500GB CT500P5PSSD8 NVMe on PCIe gen3 x 16 card (Slot 3 - CPU2)

NC552SFP 2-port 10Gb Server Adapter (CPU1)

Test Pool
3 x nvme stipe

simple dd write zeros local test on pool: 3.3GB/s - 3.7GB/s
sync; dd if=/dev/zero of=testfile bs=1M count=8000; sync
8000+0 records in
8000+0 records out
8388608000 bytes transferred in 2.265634 secs (3702544059 bytes/sec)

iperf test from remote host

[root@ol1 ~]# iperf3 -c 10.10.10.4
Connecting to host 10.10.10.4, port 5201
[  5] local 10.10.10.2 port 36652 connected to 10.10.10.4 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.54 MBytes
[  5]   1.00-2.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.54 MBytes
[  5]   2.00-3.00   sec  1.15 GBytes  9.89 Gbits/sec    0   1.54 MBytes
[  5]   3.00-4.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.54 MBytes
[  5]   4.00-5.00   sec  1.15 GBytes  9.89 Gbits/sec    0   1.54 MBytes

iperf3 -c 10.10.20.4
Connecting to host 10.10.20.4, port 5201
[  5] local 10.10.20.2 port 59606 connected to 10.10.20.4 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.16 GBytes  9.92 Gbits/sec    0   1.54 MBytes
[  5]   1.00-2.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.54 MBytes
[  5]   2.00-3.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.54 MBytes
[  5]   3.00-4.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.54 MBytes

With all the above info.
My KVM hosts iscsi Datastores connected via 2-port 10Gb + multipath (both paths active with I/O) caps at 1.6GB/s no matter what.
Same dd command from KVM host.
Test with 2 x KVM hosts (same config, same dd test, concurrent) and again total aggregate throughput from both hosts cap at 1.6GB/s.
With two hosts, i was able to saturate the TNAS network interfaces but for short spikes. The average was around 770MB/s

Not bad but still cannot break the 2GB/s psychological barrier.
Any ideas much appreciated

Thank you

etorix · July 7, 2024, 5:32pm

The ‘dd’ test is useless: Zeroes compress excessively well.

1.6 GB/s could well be the pool limit serving block storage. Having to traverse from CPU2 (storage) to CPU1 (network) does not help.

Antonis_Marcou · July 7, 2024, 6:50pm

Hi and thanks for your comments.

I use dd with zeros to get the absolute best i can. I use block size to help throughput.

I know my disk setup is far from optimal. But even with the above limitations i expected to reach the 20Gbs limit of my network setup.

3xnvme stripe pool is more than enough to saturate 20Gbs. Even traversing from one CPU to another.

I believe this is some sort of iscsi overhead. I am sure i can saturate 20Gbs on NFS but not really interested in NFS

etorix · July 7, 2024, 6:55pm

You do not have a 20 Gbps network; you have two 10 Gbps links. Any single client is limited to 10 Gbps, and you’ll need multiple clients to reach a total throughput of 20 Gbps.

neofusion · July 7, 2024, 9:44pm

Did you disable compression for that test? Zeroes compress really well and will lead to a misleading result.

Antonis_Marcou · July 8, 2024, 1:58pm

Yes i have 2 KVM hosts so i should be able to test it but not really interested in NFS.
I am testing for POC purposes Oracle Virtualization for Oracle RAC and need iscsi shared storage. I love TNAS so i want to try.
I believe dd zeros is my best chance to saturate the 2x10Gbs nics.

Antonis_Marcou · July 8, 2024, 2:00pm

What test will you suggest to try and saturate the links? Get max throughput ?
As said dd zeros locally give me 3.7GB/s

Thanks guys

Antonis_Marcou · July 13, 2024, 9:57am

No input?
Looks like TrueNas is just for simple homelabs with nfs shares and smb. What a pitty.

Better stick with real enterprise storage when it comes to real use cases

Thanks anyway

neofusion · July 13, 2024, 10:19am

Try fio with a testfile size that is larger than your arc can handle. Set it up to mimic the workload you’re doing. But be careful, targeting fio at a /dev/ instead of a file is destructive.

This is a forum for free community support. Enterprise buys enterprise level support.
Your attitude is likely only going to worsen your chances of getting help.

Antonis_Marcou · July 13, 2024, 11:35am

Thanks friend

Not my intention to insult or downgrade the value of TrueNas.

As I said I love the product for my homelab.

But like u said enterprise, that’s a whole different ballgame

joeschmuck · July 13, 2024, 1:49pm

You can use dd to obtain a better “in device only” throughput of your pool however you must do a few very specific things.

First the input file (if) must be random, second the dataset you are using must be uncompressed, lastly you must be creating a file so large it cannot fit into the amount of RAM you have. You want to test how fast the pool is, not your RAM.

The other option has already been suggested, use fio. Do a search for something like “truenas fio” for information on how others have used this tool.

If you want to use dd then I highly recommend you remove most of your RAM as 192GB while using dd will be tough to test easily.

pmh · July 13, 2024, 2:03pm

I disagree. The system cannot produce random numbers at maximum speed of the memory subsystem like it can produce zeros. So you might be measuring your random number generator instead of your storage.

You only have to disable compression for the destination (of=...) dataset and you can continue to use /dev/zero.

joeschmuck · July 13, 2024, 3:04pm

I did not know just disabling compression was enough. Thanks for correcting me.

I did a little research and it does appear random is fairly slow, urandom is faster but would not be faster than zero. I still have some questions, not really for this thread, I’m trying to remain on topic.

neofusion · July 13, 2024, 3:22pm

Another approach would be to do it in two steps, create a file based on random data letting it take however long it needs, and then use that finished file as input for the time sensitive benchmark.

Mind you, that means you will read while you write. Which could be an important caveat to take into account as you benchmark.

Farout · July 13, 2024, 3:32pm

Sure…

Davvo · July 13, 2024, 7:28pm

You know, there are plenty of threads like yours in the old forum… you just need to search

Besides, I see no correlation between gettig a reply on a Community Forum and a product’s capabilities in the enterprise world; would you please share your reasoning?

Antonis_Marcou · July 14, 2024, 7:07am

Dear Friends,

Thank you all for your replies.
I feel that my comments did not help and i apologize!!

Maybe is the language barrier that made my comment sound stupid.

I believe the below comment i found on another forum explains my reasoning in a better way:

I’ve been a TrueNAS user for many years now, but am at the point where I am ready to try something new.
The major catalyst for this has been going to MPIO iSCSI and not seeing the utilization I expected… even with multiple operations.

No matter what I do, XenServer only seems to want to read/write from one channel… even when migrating multiple VMs concurrently.
I have seen XenServer blaze over iSCSI, but not with my implementations (lol). It obviously could be something I’m doing wrong, but I don’t think so.

When I try and ask about it on the forums, ****** there (very smart dude) is usually the first to reply with: “You need more RAM!” or
“CoW doesn’t play well with iSCSI”.

Over the years, the amount of RAM I’ve had available has grown from 4GB all the way up to 64GB.

The results have been the same, regardless… despite ***** insistence that RAM is the culprit.
Maybe for a much larger environment… but we’re talking about two Xen hosts and 1-2 TrueNAS boxes.

Maybe there is a guide out there for TrueNAS to “tune” ZFS/iSCSI to enable MPIO to work the way it’s designed, but I haven’t found one.
In fact, most TrueNAS forum requests for such assistance end up with the same thing.

The end result of my troubleshooting/research is that:

It’s possible, but not worth the hassle. ***** has often indicated that you need to throw more hardware at the problem,
as opposed to tweaking to get things working better.

MPIO w/iSCSI is inherently limited by TrueNAS and/or ZFS, and there really is nothing you CAN do.

So at this point, I am considering other alternatives. I will say, I LOVE ZFS. In my opinion, it is the best file system I’ve worked with.
ZFS snaps have saved my *** many times

Johnny_Fartpants · July 14, 2024, 8:58am

As others have said I don’t find dd very helpful when trying to benchmark in-fact I might go as far to say it can be misleading. FIO is a much better tool however even that can be misleading if you don’t understand the parameters and are unable to match them with the parameters of your environment.

I would suggest as a direct comparison to your dd test you try the below command on the pool and compare the results.

fio --randrepeat=1 --ioengine=posixaio --direct=1 --name=test --bs=1M --size=20G --rw=randwrite --ramp_time=4 --numjobs=2

The more you increase the number of job (numjobs) the faster it will perform till it hits its peak.

Davvo · July 14, 2024, 10:50am

The post you are reffering to is from 2016 (and refers to cyberjock for anyone too lazy to go check the link): much has changed since 8 years ago… I would not use that as baseline.

I have added the link of your quoted post and fixed the formatting to better differentiate it from your writing.

Do note that none is attacking you here, we are just pointing out a few things with the spirit of having you make better posts that can actually make your (and ours) experience better.

Antonis_Marcou · July 14, 2024, 2:15pm

Hi again,

Thank you all for your input much appreciated.
Helped me confirm that the problem lies with iscsi mpio.

So like many suggested i used fio which gives much more info.
Tested locally:

WRITE: bw=3617MiB/s (3793MB/s), 1822MiB/s-1830MiB/s (1910MB/s-1919MB/s), io=21.9GiB (23.5GB), run=6069-6187msec

From single KVM host best i can do:

Run status group 0 (all jobs):
WRITE: bw=998MiB/s (1047MB/s), 499MiB/s-499MiB/s (523MB/s-524MB/s), io=36.2GiB (38.9GB), run=37103-37124msec

Tried fio sequential read:

Run status group 0 (all jobs):
READ: bw=1155MiB/s (1211MB/s), 1155MiB/s-1155MiB/s (1211MB/s-1211MB/s), io=136GiB (145GB), run=120128-120128msec

It is obvious that total bandwidth used is for single 10Gbs nic even though i set mpio on Linux to load balance. I see 50/50 traffic on both NICs but my total bandwidth is at 10Gbs.

Tested with both KVM hosts concurrently and finally got full speed.

KVM1:
Run status group 0 (all jobs):
READ: bw=1155MiB/s (1211MB/s), 1155MiB/s-1155MiB/s (1211MB/s-1211MB/s), io=136GiB (145GB), run=120128-120128msec
 KVM2:
Run status group 0 (all jobs):
READ: bw=1170MiB/s (1227MB/s), 1170MiB/s-1170MiB/s (1227MB/s-1227MB/s), io=138GiB (148GB), run=120568-120568msec

My TrueNAS nic ports:

So @Davvo even though the post is from 2016, i have yet not found a solid document or forum on how to setup mpio (multipath.conf) with TrueNAS in order to get full active/active iscsi performance.

Having said that if anyone has successfully manage to setup Linux MPIO with TrueNAS (not active/standby) pls share multipath.conf file.

I am currently testing with path_grouping_policy set to multibus.

Thank you all for your input and again i apologize if i said anything wrong.