Help with TrueNAS Scale - Link Aggregation Not Working

Hello everyone,

Setting up TrueNAS Scale for the first time. I have a network card with 2 10G SFP+ ports which I have connected to my TP-Link switch which also has 10G SFP+ ports.

Currently TrueNAS Scale shows two links (2 ip addresses) active, one for each 10G port. I am now trying to setup Link Aggregation.

On TP Link, I have added the two TrueNAS ports to LAG1.

But on TrueNAS, I am not able to get the bond created.

Here are the steps I am following.

After saving this, I see bond0. But when I submit “test changes”, the GUI goes in loading for a minute or two and the bond disappears.

I don’t know what I am doing wrong. I also tried with CLI - network interface create... but even after committing the changes, the bond isn’t created.

What am I missing?

I also checked kernel logs and found following logs

Sep  1 16:07:53 truenas kernel: ixgbe 0000:41:00.0: registered PHC device on enp65s0f0
Sep  1 16:07:53 truenas kernel: bond10: (slave enp65s0f0): Enslaving as a backup interface with a down link
Sep  1 16:07:53 truenas kernel: ixgbe 0000:41:00.0 enp65s0f0: detected SFP+: 3
Sep  1 16:07:53 truenas kernel: ixgbe 0000:41:00.1: registered PHC device on enp65s0f1
Sep  1 16:07:53 truenas kernel: ixgbe 0000:41:00.0 enp65s0f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Sep  1 16:07:53 truenas kernel: bond10: (slave enp65s0f1): Enslaving as a backup interface with a down link
Sep  1 16:07:53 truenas kernel: bond10: Warning: No 802.3ad response from the link partner for any adapters in the bond
Sep  1 16:07:53 truenas kernel: bond10: (slave enp65s0f0): link status definitely up, 10000 Mbps full duplex
Sep  1 16:07:53 truenas kernel: bond10: active interface up!
Sep  1 16:07:53 truenas kernel: ixgbe 0000:41:00.1 enp65s0f1: detected SFP+: 4
Sep  1 16:07:53 truenas kernel: ixgbe 0000:41:00.1 enp65s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Sep  1 16:07:53 truenas kernel: bond10: (slave enp65s0f1): link status definitely up, 10000 Mbps full duplex
Sep  1 16:08:54 truenas kernel: ixgbe 0000:41:00.0: removed PHC on enp65s0f0
Sep  1 16:08:55 truenas kernel: bond10: (slave enp65s0f0): link status definitely down, disabling slave
Sep  1 16:08:55 truenas kernel: bond10: active interface up!
Sep  1 16:08:55 truenas kernel: ixgbe 0000:41:00.0: registered PHC device on enp65s0f0
Sep  1 16:08:55 truenas kernel: bond10: (slave enp65s0f0): link status definitely down, disabling slave
Sep  1 16:08:55 truenas kernel: ixgbe 0000:41:00.1: removed PHC on enp65s0f1
Sep  1 16:08:55 truenas kernel: ixgbe 0000:41:00.0 enp65s0f0: detected SFP+: 3
Sep  1 16:08:55 truenas kernel: bond10: (slave enp65s0f1): link status definitely down, disabling slave
Sep  1 16:08:55 truenas kernel: bond10: now running without any active interface!
Sep  1 16:08:55 truenas kernel: ixgbe 0000:41:00.0 enp65s0f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Sep  1 16:08:55 truenas kernel: ixgbe 0000:41:00.1: registered PHC device on enp65s0f1
Sep  1 16:08:55 truenas kernel: bond10: (slave enp65s0f0): link status definitely up, 10000 Mbps full duplex
Sep  1 16:08:55 truenas kernel: bond10: (slave enp65s0f1): link status definitely down, disabling slave
Sep  1 16:08:55 truenas kernel: bond10: active interface up!
Sep  1 16:08:56 truenas kernel: ixgbe 0000:41:00.1 enp65s0f1: detected SFP+: 4
Sep  1 16:08:56 truenas kernel: bond10 (unregistering): (slave enp65s0f0): Removing an active aggregator
Sep  1 16:08:56 truenas kernel: bond10 (unregistering): (slave enp65s0f0): Releasing backup interface
Sep  1 16:08:56 truenas kernel: ixgbe 0000:41:00.0: removed PHC on enp65s0f0
Sep  1 16:08:56 truenas kernel: ixgbe 0000:41:00.1 enp65s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Sep  1 16:08:56 truenas kernel: bond10 (unregistering): (slave enp65s0f1): Releasing backup interface
Sep  1 16:08:56 truenas kernel: ixgbe 0000:41:00.1: removed PHC on enp65s0f1
Sep  1 16:08:56 truenas kernel: bond10 (unregistering): Released all slaves
Sep  1 16:08:58 truenas kernel: tg3 0000:46:00.1 eno2: Link is up at 1000 Mbps, full duplex
Sep  1 16:08:58 truenas kernel: tg3 0000:46:00.1 eno2: Flow control is off for TX and off for RX
Sep  1 16:08:58 truenas kernel: tg3 0000:46:00.1 eno2: EEE is disabled

Its best that no Ip addresses on interfaces initially. follow these instructions.

Its much easier if the webUI is going through a separate port… not the bonded interface.

I tried with and without connecting the interfaces to switch. My GUI is on eno2 and the bonded interfaces are unused for the entire setup. The link aggregation still doesn’t work. I followed the post you shared but I am not sure what I am missing. Can you point me to specific section I am doing wrong?

isn’t eno2 one of the interfaces for the bond?

Nope, bond is on enp65s0f0 and enp65s0f1.

I only enabled eno2 for logging into GUI outside of bond interfaces.

The GUI should not be effected… by the bond. Unless its getting an IP address on the same subnet as the bond.

Suggest not using DHCP and documenting your IP addressing.

Ip addresses are in one of the screenshots and they don’t change. Here they are
eno2: 10.0.0.53
enp65s0f0: 10.0.0.48
enp65s0f1: 10.0.0.49

I tried static ip but didn’t work.

I’m confused about what you said about ip being in same subnet. Can you please explain?

You can’t have Multiple interfaces on the same IP subnet… TrueNAS is confused about which interface to use outbound.

Are these DHCP assigned IPs on your interfaces or static ones?

Also is your switch really capable of layer 2 and 3 hashing? Because my rather cheap HP switches at home only support layer 2. And when I had 2 and 3 enabled in Truenas my bond was behaving weirdly when setting it up until I limited Truenas to Layer 2 only.

You can’t have Multiple interfaces on the same IP subnet… TrueNAS is confused about which interface to use outbound.

I am not sure how to fix this without creating VLANs. But I haven’t read about this in the documentation you posted in the first message either. My question is - creating LACP bond will create one IP for both interfaces combined right? How would that work when interfaces are on separate subnets? Why is creating LAG1 group on my switch not sufficient? Also why does this restriction exist when Synology is able to create LACP bonds in same subnet (just an example)?

Are these DHCP assigned IPs on your interfaces or static ones?

Right now they are assigned via DHCP. But tried using static IP as well. Didn’t work

Also is your switch really capable of layer 2 and 3 hashing?

Pretty sure it is. I am using TL-SX3008F | JetStream 8-Port 10GE SFP+ L2+ Managed Switch | TP-Link It has L2 and L3 features.
I haven’t tried TrueNAS LACP setup restricted to L2. I’ll give it a try.

Your TCP UI session will be diverted as you create the bond.

So you need to separate out your management and data vlans.

1 Like

Now we’re getting closer to the problem. There shouldn’t be any settings for an ethernet port at all that you want to include in in bond group.
That your wanted members get an IP shows that there’s something completely wrong wrong with you config.

So in order to get a working bond:

  1. Go into the settings of enp65s0f0 and clean them completely out. They should look like for one of my bond members.

    When they have been cleaned up hit the save button.
  2. Do the same for enp65s0f1.
  3. Now that you have clean interfaces you can join them for the bond. Hit the add button next to interfaces.

Type: Link Aggregation
Name: Something like bond0
Check DHCP and Autoconfigure IPv6 if you want automatically configured IPs.
Link Aggregation Protocol: LACP
Transmit Hash Policy: Layer2 only since the data sheet of your switch mentions this only under layer 2.
LACPDU Rate: SLOW
Link Aggregation Interfaces: Now select enp65s0f0 and enp65s0f1 as members there.
Aliases: If you haven’t checked an automatic configuration for Ipv4 or IPv6 earlier you can set fixed IP addresses there.

  1. Hit the save button.
  2. Now also empty the settings for eno2 completely and hit save.
  3. If you have DHCP configured it’s now time to open your DHPC server’s client table.
  4. Click the “test settings” in Truenas and you should see another IPv4 getting assigned in your DHCP’s client table.
  5. Try to open this IP (or the fixed one if you have done this) in your browser and after logging in you should see a working bond.

It’s been quite a while since I’ve set up mine so it’s possible that there’s also a final apply settings button. If yes click it.

2 Likes

I’m having similar difficulties to the OP. I only have three interfaces on the motherboard (Supermicro X10SRH-CF). One for the IPMI and two that are assigned to the UI. I don’t really see how I can clear out the Network settings as you’ve described it in the UI because when I clear out the settings for eno1 and eno2 then test, I can no longer access the UI. This leaves KVM via the IPMI interface. I pressed 1 to configure network interfaces, cleared out eno1 and eno2, persist the changes and of course the IP addresses are no longer assigned. As soon as I press n to create a new interface, down arrow to Link_aggregation for the type and press space or enter the screen goes blank followed by a quick bunch of python errors and returns to the CLI menu, so that doesn’t work either.

In my case, I followed the GUI instructions (posted above) and get to test the settings, enable the Link Aggregation Group on the Switch (Zyxel GS1900-48HPv2) but I get no IP address for the GUI.

Well, I’m a little closer (maybe). By configuring the UI with a static IP address for the Link Aggregation I then see in the KVM that it’s assigning an IPv4, but that IP address isn’t responding to pings or loading in the browser.

If I run ifconfig from the linux shell it gives me

Ha

When I open my network settings in the GUI I have a trashcan next to my bond and a reload button next to my interfaces.

Right, I see that, but your instructions were “Go into the settings of enp65s0f0 and clean them completely out. They should look like for one of my bond members.” At this point in the instructions, there is no bond and there is no trash can for eno1 or eno2. I can either reset them to defaults or edit them to make them look like your screenshot. Are you suggesting I clear them out like your screenshot and then create the bond before clicking test?

I have the same issue, and UI isn’t helpful in explaining what’s missing, and how to fix it.

Ok Im not sure how helpful this is going to be but this is what I do and it works every time.

Setting Up Link Aggregation and VLAN in TrueNAS SCALE

Assumptions have been made that all is correctly configured on the network switch side as this is outside the scope of these instructions.

I like to set this up in the console either stood beside the TrueNAS or via IPMI.

Step 1: Create Link Aggregation

  1. Open the ‘configure network interfaces’.
  • Press (n) to add a new network interface.
  1. Choose Link Aggregation as the type.
  2. Set the following:
  • Name: bond1
  • LAG Protocol: LACP (or whatever option you choose)
  • Ports: ens7f4 and ens7f4d1 (yours may be different)
  1. Save the settings.
  • Press (A) to apply changes.
  • Press (P) to make them persistent.

Step 2: Create a VLAN (you may not need to do this, if not create your alias on your bond1 interface)

  1. Open the network configuration menu again.
  • Press (n) to add another network interface.
  1. Choose VLAN as the type.
  2. Set the following:
  • Name: vlan1001 (your vlan name)
  • Aliases: 10.150.1.25/24 (your IP details)
  • Parent Interface: bond1
  • VLAN Tag: 1001 (your VLAN tag)
  1. Save the settings.
  • Press (A) to apply changes.
  • Press (P) to persist them.
  • Press (Q) to exit

Step 3: Configure Network Settings

  1. Open the ‘configure network settigs’
  2. Set the following:
  • Hostname
  • Domain
  • IPv4 Gateway (this is important, forget this and things don’t work)
  • Nameservers
  1. Save the settings.
  2. Press (Q) to exit.
2 Likes

OK Mr. Fartpants. I’ll give that a try.

1 Like

OK, I logged into the KVM using the IPMIView and walked through your instructions. I did not create a VLAN because I haven’t been usingVLANs. Instead I just configured the alias.

From the screenshot below, everything looks accurate to me.

The Network configuration also looks accurate to me.

On my managed switch, the LAGG setup is very simple in the UI. My two server ports are 47 and 48. Next, Finish, Save, Reboot, Confirm. Unless I’m missing something, there doesn’t seem to be anything to mess up. The ports function when not in a LAGG, so I don’t think it’s a hardware issue.

The CLI says the TrueNAS UI is available at 192.168.1.13 but the browser times out trying to access the UI. The SMB shares are also not available. Ideas?