MPIO and Lagg

abjr · September 11, 2024, 9:05pm

I know this has been discussed, but I couldn’t find the exact correlation topic. I apologize if I missed it.

I’m looking for advice on setting up iSCSI interfaces in TrueNAS. I understand MPIO is recommended, and I would want to use that method. However, I have a network interface quantity discrepancy between the storage and the hosts connecting to it. So, I am unable to create separate isolated subnets/interfaces on each host server.

The storage device has 4 - 10 Gig interfaces (two 2 port cards) that can be used for iSCSI
All hosts only have 2 - 10 Gig interfaces for iSCSI

My current thought is:
Use MPIO and two interfaces (separate subnets) - but that leaves two unused 10 Gig interfaces, which seems like a waste.

From what I read, using a LAGG in iSCSI is not ideal and should use MPIO. But is it worth it to LAGG two of the interfaces (thinking failover would be best) and use that interface for each MPIO path, still having two paths (separate subnets) but would create fault tolerance for the physical network cards?

Would creating the LAGG cause issues with the MPIO? Or does it have no impact since it is only ‘one path’ of the MPIO and actually provides some fault tolerance? The LAGG is not across the MPIO links themselves, right, at least that is my uneducated thinking.

In essence, this would allow me to put two interfaces on the same subnet on the TrueNAS since I can’t add two additional subnets on each of the hosts.

Looking for any feedback, thoughts or suggestions.

Thanks in advance!

NickF1227 · September 11, 2024, 10:04pm

You wouldn’t want to use iSCSI and LACP at all. The problem is that with LACP you’re going to have packets received out of order and others that have to be retransmitted, which will cause an increase in latency and actually hurt performance.

MPIO is providing the fault tolerance at Layer 3 and your hypervisor will intelligently load balance based on criteria configured there.

What are the hosts running as an OS?

Networking Recommendations | TrueNAS Documentation Hub

Arwen · September 12, 2024, 6:30am

Well, in theory you could use 4 x sub-nets, and set up the client hosts with VLANs, 2 per interface. Thus, if MPIO & iSCS work right, they balance the traffic across all 4 sub-nets. A single host won’t get more than 2 x 10Gbits/ps speed. But, perhaps their will be less network port contention on the TrueNAS server side.

If you did use this setup, I’d make sure that each client host used redundant pathing. Something like this:

TrueNAS card 1 - port 1 - sub-net 1
TrueNAS card 1 - port 2 - sub-net 2
TrueNAS card 2 - port 1 - sub-net 3
TrueNAS card 2 - port 2 - sub-net 4

Host port - port 1 - VLAN sub-net 1
Host port - port 1 - VLAN sub-net 3
Host port - port 2 - VLAN sub-net 2
Host port - port 2 - VLAN sub-net 4

Thus, a failure of a TrueNAS card, taking both ports out does not take out a host port’s card. Similar to client host loosing a port, the other will have full access to 2 TrueNAS ports. Both these also help avoid network contention on the TrueNAS side during failures. Of course nothing is perfect.

Now the disclaimer: I have no clue if this is “better” or even data safe, than simply using 2 x sub-nets and 2 x 10Gbits/ps ports on the TrueNAS server side. From a network point of view, this is rational. But, from an iSCSI point of view, I have no clue.

abjr · September 12, 2024, 8:53pm

The hosts are currently VMware, but might change in the future depending on ‘new’ licensing costs.

abjr · September 12, 2024, 8:56pm

Thank Arwens. Yeah, I thought of that as well, but I was not sure of the impact of trunking the storage network ports on the host side.

Arwen · September 13, 2024, 8:05am

What I wrote has nothing to do with “trunking”. VLANing is a simple network abstraction layer that allows a single network port to have 2 IPs in different sub-nets.

“trunking” is generally defined as using 2 or more network ports for the same IP, (or group of IPs in the same sub-net). It is possible to VLAN on top of a “trunk”, (aka LAGG, LACP, etc…). But, iSCSI will generally do it’s own path load sharing and redundancy.

abjr · September 13, 2024, 12:00pm

There may be some terminology mismatch between networking and storage. From a networking standpoint (the switch standpoint), the ports that the storage and server connect to are access ports on a single VLAN. This is by design and recommended to keep the storage network isolated.

Those ports on the switch talk to other devices on the same VLAN and only that VLAN. In order for a switch port to allow communication with multiple VLANs, it needs to be configured as a trunk. So, in order to allow the server to communicate to two different VLANs on a single physical port on the server, I would need to trunk that port on the switch. This would then allow for both VLANs to communicate to that single physical adapter on the server.

HoneyBadger · September 13, 2024, 2:12pm

Hey @abjr

It’s worth noting that VMware specifically recognizes link aggregation + iSCSI as an anti-pattern in their official KBs: Host requirements for link aggregation (etherchannel, port channel, or LACP) in ESXi

Mixing the two technologies can cause unexpected results when the LACP load-balancing doesn’t align with MPIO’s - for example, in your scenario, you might have two hosts that can each use an independent link in “side A” of the bundle, but the “side B” algorithm causes them to be sent down a single link - effectively giving them a 10Gbps/5Gbps asymmetrical MPIO setup under combined load.

With that said, a “failover-only LAGG” on the two interface pairs might be an additional measure of redundancy.

Are you actually being bottlenecked by the 2x10G network setup?