Bridge setup help, please

I’m having problems setting up a bridge on Scale (Fangtooth 24.04.0)…

I’ve followed the steps here: Setting Up a Network Bridge | TrueNAS Documentation Hub. I’ve ensured no apps are running (I’ve even deleted them, just to be sure) and I have NO VMs created. I’ve stopped ALL services, including S.M.A.R.T. just to be safe.

Bare metal specs are: Supermicro 6028R-E1CR24N, 2x Xeon E5-2600, 128Gb ram, 2x 10Gb SFP+ integrated NICs (but only ens5f0 in use, currently).

I follow the steps defined in the doc in this order:

  1. Stop all apps and all services.
  2. Remove the static IP address alias from ens5f0 and save.
  3. Create a new br0 interface with no Alias and save.
  4. Edit the new br0 interface and add the original static IP address from ens5f0 and save.
  5. Click test, which comes back successfully in 2-3 seconds.
  6. I see the “traffic” icon next to br0 go from grayed out to light blue and see traffic statistics if I hover over it.
  7. I open an incognito browser and navigate to the static IP address, which times out.
  8. The original browser window I opened to make the network configuration changes blanks and goes to the “Make sure the server is running” splash screen (Note: This is well under the 60 second timeout).
  9. My entire network becomes unresponsive for a minute while Truenas reverts back to the original settings.
  10. I end up back to original network settings.

I’ve even gone so far as to disconnect the SFP+ port from the hardware and use the CLI via IPMI to create the bridge, and as soon as I plug the SFP+ module back in (to bring the truenas host back online to the network), the entire network goes dark again until I unplug the SFP+ module and manually revert the network settings.

My network is managed by a Netgate SG-4200 physical appliance running the latest version of pfSense and routes through a TPLink Omada 28 port switch (24 1Gbe ports + 4 10Gbe ports). I have NO DHCP reservations for the Truenas host, so I’m relying solely on the static IP truenas provides for the network. So, I’m a bit curious about how truenas causes all (ALL, including some IoT devices on a separate VLAN defined in pfSense) network devices across the entire LAN to stop all traffic momentarily. So, I’m not sure if I have a Truenas. Netgate, or pfSense problem.

Anyone experience anything like this that might be able to direct me as to where to look?

And for reference, I’m trying to figure out how to remove a similar cross-post in the Apps & Virtualization category to get a wider view. So, please no flaming for cross-posting… PLEASE?

Community Member stux made a video on how to setup a bridge, maybe it’s clearer then the written documentation

1 Like

I’m not sure that my issue is setting up the bridge itself, it’s that it seems to completely bring down my entire network (including external vlans NOT using truenas) after the “test” phase starts, and only goes back to normal after truenas resets the configuration back to the original settings.

TLDR: Once I click the “test” button, my entire network goes down.

I know it’s not a duplicate IP address; it’s been used for months. I’ve even explicitly set the same original MTU, just to be sane.

Not sure if it helps any, but here is a dump of the truenas /var/log/messages file during the last test:

May  8 10:19:15 truenas kernel: ixgbe 0000:02:00.0 ens5f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  8 10:19:15 truenas kernel: br0: port 1(ens5f0) entered blocking state
May  8 10:19:15 truenas kernel: br0: port 1(ens5f0) entered listening state
May  8 10:19:30 truenas kernel: br0: port 1(ens5f0) entered learning state
May  8 10:19:45 truenas kernel: br0: port 1(ens5f0) entered forwarding state
May  8 10:19:45 truenas kernel: br0: topology change detected, propagating
May  8 10:19:46 truenas kernel: NFSD: all clients done reclaiming, ending NFSv4 grace period (net f0000000)
May  8 10:21:03 truenas kernel: ixgbe 0000:02:00.0 ens5f0: NIC Link is Down
May  8 10:21:03 truenas kernel: br0: port 1(ens5f0) entered disabled state
May  8 10:22:58 truenas kernel: ixgbe 0000:02:00.1 ens5f1: left allmulticast mode
May  8 10:22:58 truenas kernel: ixgbe 0000:02:00.1 ens5f1: left promiscuous mode
May  8 10:22:58 truenas kernel: br0: port 2(ens5f1) entered disabled state
May  8 10:22:58 truenas kernel: ixgbe 0000:02:00.1: removed PHC on ens5f1
May  8 10:24:16 truenas kernel: ixgbe 0000:02:00.0 ens5f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  8 10:24:16 truenas kernel: br0: port 1(ens5f0) entered blocking state
May  8 10:24:16 truenas kernel: br0: port 1(ens5f0) entered listening state
May  8 10:24:32 truenas kernel: br0: port 1(ens5f0) entered learning state
May  8 10:24:47 truenas kernel: br0: port 1(ens5f0) entered forwarding state
May  8 10:24:47 truenas kernel: br0: topology change detected, propagating
May  8 10:25:10 truenas kernel: ixgbe 0000:02:00.0 ens5f0: NIC Link is Down
May  8 10:25:10 truenas kernel: br0: port 1(ens5f0) entered disabled state
May  8 10:26:40 truenas kernel: ixgbe 0000:02:00.0 ens5f0: left allmulticast mode
May  8 10:26:40 truenas kernel: ixgbe 0000:02:00.0 ens5f0: left promiscuous mode
May  8 10:26:40 truenas kernel: br0: port 1(ens5f0) entered disabled state
May  8 10:27:11 truenas kernel: ixgbe 0000:02:00.0 ens5f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May  8 11:04:14 truenas kernel: evm: overlay not supported
May  8 11:04:15 truenas kernel: Initializing XFRM netlink socket
May  8 11:20:38 truenas systemd-journald[1253]: Data hash table of /var/log/journal/dc18725ef5a14f01ad3484d8a1346d52/system.journal has a fill level at 75.0 (8534 of 11377 items, 6553600 file size, 767 bytes per hash table item), suggesting rotation.
May  8 11:20:38 truenas systemd-journald[1253]: /var/log/journal/dc18725ef5a14f01ad3484d8a1346d52/system.journal: Journal header limits reached or header out-of-date, rotating.
May  8 12:34:16 truenas kernel: br0: port 1(ens5f0) entered blocking state
May  8 12:34:16 truenas kernel: br0: port 1(ens5f0) entered disabled state
May  8 12:34:16 truenas kernel: ixgbe 0000:02:00.0 ens5f0: entered allmulticast mode
May  8 12:34:16 truenas kernel: ixgbe 0000:02:00.0 ens5f0: entered promiscuous mode
May  8 12:34:16 truenas kernel: br0: port 1(ens5f0) entered blocking state
May  8 12:34:16 truenas kernel: br0: port 1(ens5f0) entered forwarding state
May  8 12:35:16 truenas kernel: ixgbe 0000:02:00.0 ens5f0: left allmulticast mode
May  8 12:35:16 truenas kernel: ixgbe 0000:02:00.0 ens5f0: left promiscuous mode
May  8 12:35:16 truenas kernel: br0: port 1(ens5f0) entered disabled state

Mb, you should try to connect the client directly to the truenas while testing. Just to ensure there is no network HW misconfig.

And I once saw the claim that Enable Learning flag on the interface can ruin the connection somehow.

If I understand what you are suggesting, I think i’ve done that. I open a terminal on my desktop and ssh into the truenas host, then su and tail the /var/log/messages file while making all the changes. The desktop terminal loses connection to Truenas just as the

br0: port 1(ens5f0) entered forwarding state

message gets output. After Truenas resets its configuration and I can re-ssh into the box, I have to cat (rather than tail) the /var/log/messages file to get the remaining /var/log/messages content to paste here.

I meant a physical direct connection from the PC/laptop’s NIC to the truenas’s NIC. To exclude all the switches/firewalls.

You should try to uncheck Enable Learning flag before, though.

I can try that, but I don’t have a laptop with a 10Gbe nic in it. I’ll try later this evening when I can set up a laptop with a static IP address in the same range and see if they talk to each other.

I have disconnected the 10G SFP+ from the network and done the same network changes via the IPMI console using the CLI menu which seems to work, but as soon as I plug the SFP+ transceiver back into the host, the network goes down again and won’t come back up until I unplug the SFP+ transceiver and use the CLI to revert back to original settings again.

I’ve never used IPMI myself, but can it be that your IPMI and SFP ports are on the same subnet?

Yes, the IPMI is on the same subnet as the Truenas host, but in essence different computers internally with different IP addresses. I can try moving the IPMI host to a different subnet and see if that makes any difference.

Well, I’ve read that using 2 ports on the same subnet can cause issues. And quick googling showed that IPMI can (sometimes) be presented as a NIC on the host. So I’ve just put 2 and 2 together.

Odd, I’ve never seen that. I have another Supermicro host as well as a Dell R630 with their BMC’s on the same subnet with no issues. I’ll set up a new VLAN and move all 3 of those BMCs to it and see if that makes any difference.

mind posting output of ip link andif config - the only weird thing I notice in logs is that ens5f0 AND ens5f1 seem to be going up/down…

As far as IPMI - my IPMI is on the same subnet, but is detected as a completely different device & has nothing to do with the two main NIC ports (though there are options to configure it otherwise in bios); I don’t think IPMI is the issue, yet.

Perhaps I just didn’t get it correctly.

I think that was at the “testing network changes” periods.

Do you happen to have more than one interface as a member in that bridge and did you connect both (or more) to your network infrastructure?

1 Like

Here’s ip link:

root@truenas[~]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens5f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether ac:1f:6b:0f:3d:a4 brd ff:ff:ff:ff:ff:ff
    altname enp2s0f0
3: ens5f1: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether ac:1f:6b:0f:3d:a5 brd ff:ff:ff:ff:ff:ff
    altname enp2s0f1
5: incusbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:16:3e:bc:57:f3 brd ff:ff:ff:ff:ff:ff
7: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
    link/ether 02:42:69:1e:47:00 brd ff:ff:ff:ff:ff:ff

and if config (assuming that means ifconfig):

root@truenas[~]# ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.16.0.1  netmask 255.255.255.0  broadcast 172.16.0.255
        inet6 fdd0::1  prefixlen 64  scopeid 0x0<global>
        ether 02:42:69:1e:47:00  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 79 overruns 0  carrier 0  collisions 0

ens5f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.X.X.X  netmask 255.255.255.0  broadcast 10.X.X.X
        inet6 fe80::ae1f:6bff:fe0f:3da4  prefixlen 64  scopeid 0x20<link>
        ether ac:1f:6b:0f:3d:a4  txqueuelen 1000  (Ethernet)
        RX packets 11281230  bytes 15774285165 (14.6 GiB)
        RX errors 6659  dropped 6472  overruns 0  frame 6659
        TX packets 2549093  bytes 2666584448 (2.4 GiB)
        TX errors 0  dropped 96 overruns 0  carrier 0  collisions 0

incusbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 10.14.117.1  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fd42:32b:4e23:64cf::1  prefixlen 64  scopeid 0x0<global>
        ether 00:16:3e:bc:57:f3  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 92 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 59035  bytes 42831890 (40.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 59035  bytes 42831890 (40.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

No, just the single ens5f0 was added as a member of the bridge when I created it. I don’t even have an SFP+ transceiver plugged into the 2nd port, so it shouldn’t even be active.

The only other connection is the BMC/IPMI ethernet connection on the same subnet with a different DHCP reserved IP address. But, is on a completely separate switch port as well.

I could be wrong since I’m not on the latest update & don’t use incus - but I’m guessing that issue is due to incusbr0 already being a bridge with the same subnet as ens5f0 - but then br0 wants to be a bridge of ens5f0. Should it not be instead of ens5f0 is setup under br0 and then inbusbr0 is setup as a bridge of br0? Or am I way off the mark?

I swear a put a space in there every time I put it into console as well and have to retype it :frowning: