SOLVED - Dragonfish 10GB/s limit on 100GB/s Interfaces

I am running 2 x Dragonfish-24.04.0 installs connected via 100GB utilizing Cisco Nexus 9K switches. I am seeing zero errors on either interface, both systems are connected at 100GB/s, but yet when I am replicating snapshots one to the other I am seeing a cap at just over 8GB/s.

I know standard Linux needs some tuning to really get the funn bandwidth, but I didn’t want to go messing with tunables in Dragonfish until I knew for sure this was the issue.

Are there tunables specific to Dragonfish and 100GB Mellonox cards so I can get this resolved?

Many Thanks

100gbps links (because I doubt you have a tbps link) and you’re seeing 8GB/s?

Which seems like a pretty good result to me.

Or do you mean 8gigabits per second?

Meanwhile, even getting to 10gbps takes some doing. A pair of HDs is not going to do it.

So what is your storage?

Or are you just showing iperf results?

1 Like

I was hoping for much closer to 12GB/s which would be near the max of the 100GB (100000000 Kbit) interfaces. Am I asking too much? 8GB/s is only about 2/3rds line rate unless I miscalculated.

No, this is what Dragonfish is showing during the actual transfer. I’m not sure how to upload an image or I would post the image from the dashboard.

How are you measuring this?

I will admit, I have zero experience with 100gbit networking.

but, I do believe it’s non trivial to saturate it.

This is just what the Scale Dashboards show, which is also what my Cisco N9Ks show.

Actually, Dragonfish is showing 8Gb/s, not 8GB/s, so I am WAY off, but maybe it’s a drive saturation issue?

Probably.

What drives do you have. What is your pool layout?

Both systems:
18 x 18TB WD HC550 7200RPM 12GB/s SAS Hard Drives
2 x 9 Drive RAIDZ2 VDEVs

In order to hit 1GB/s on my 10gbit system I needed to switch my 18 hard drives to 9 mirrors.

Interesting. I think the next time I have a replication running I will run nmon and watch the drive utilization, maybe that is the bottleneck!

Thanks for the insight.

SAS3 is up to 12gbps per port.

But HDs typically peform at about 100-280MB/s best case depending on if they’re reading from the inner or outer edge.

Meanwhile how is your HBA connected to your system?

Mine is PCIe Gen 2 8x. Which is good for a MAX of 4GB/s.

Maybe yours is Gen 3 8x and good for 8GB/s.

8GB/s is not going to saturate 100gbit. You’d need PCIe Gen 4 or 16x Gen 3 to do that.

And more or faster storage than you have.

1 Like

So I am running the LSI3008 in a Gen3 8x slot which should give me the full 12Gb/s SAS speed according to the Broadcom website.

I did a smaller replication this morning and was monitoring my drive utilization with nmon and all of the drives across the array hit over 80% utilization, so I am guessing you were spot on with the limitation being the drives and not the network.
I am going to check with a much larger replication later today.

Thanks for getting the units right: Bytes with a capital ‘B’ vs. bits with a small ‘b’.

I have a flash-only NAS with two 8-wide raidz2 vdevs (mix of enterprise 3.84 TB drives), all attached to a 9305-16i. While scrubbing, TrueCommand reports a reading speed of 8-9 GBytes/s. You are NOT going to even approach this speed with two raidz2 vdevs of spinning drives; 8 Gbits/s looks like a reasonable mark.

I think you can mark the thread as “solved”, with a negative answer: There’s no hard limit (or not this one) in software, but your pool is very far from being capable of saturating a 100 GbE link.

1 Like

Thank you and @Stux both for helping me figure this out as a array issue and not a network issue. As I expand the units I suspect the speed will increase as I add more vdevs.