I’m having problems setting up a bridge on Scale (Fangtooth 24.04.0)…
I’ve followed the steps here: Setting Up a Network Bridge | TrueNAS Documentation Hub. I’ve ensured no apps are running (I’ve even deleted them, just to be sure) and I have NO VMs created. I’ve stopped ALL services, including S.M.A.R.T. just to be safe.
Bare metal specs are: Supermicro 6028R-E1CR24N, 2x Xeon E5-2600, 128Gb ram, 2x 10Gb SFP+ integrated NICs (but only ens5f0 in use, currently).
I follow the steps defined in the doc in this order:
Stop all apps and all services.
Remove the static IP address alias from ens5f0 and save.
Create a new br0 interface with no Alias and save.
Edit the new br0 interface and add the original static IP address from ens5f0 and save.
Click test, which comes back successfully in 2-3 seconds.
I see the “traffic” icon next to br0 go from grayed out to light blue and see traffic statistics if I hover over it.
I open an incognito browser and navigate to the static IP address, which times out.
The original browser window I opened to make the network configuration changes blanks and goes to the “Make sure the server is running” splash screen (Note: This is well under the 60 second timeout).
My entire network becomes unresponsive for a minute while Truenas reverts back to the original settings.
I end up back to original network settings.
I’ve even gone so far as to disconnect the SFP+ port from the hardware and use the CLI via IPMI to create the bridge, and as soon as I plug the SFP+ module back in (to bring the truenas host back online to the network), the entire network goes dark again until I unplug the SFP+ module and manually revert the network settings.
My network is managed by a Netgate SG-4200 physical appliance running the latest version of pfSense and routes through a TPLink Omada 28 port switch (24 1Gbe ports + 4 10Gbe ports). I have NO DHCP reservations for the Truenas host, so I’m relying solely on the static IP truenas provides for the network. So, I’m a bit curious about how truenas causes all (ALL, including some IoT devices on a separate VLAN defined in pfSense) network devices across the entire LAN to stop all traffic momentarily. So, I’m not sure if I have a Truenas. Netgate, or pfSense problem.
Anyone experience anything like this that might be able to direct me as to where to look?
And for reference, I’m trying to figure out how to remove a similar cross-post in the Apps & Virtualization category to get a wider view. So, please no flaming for cross-posting… PLEASE?
I’m not sure that my issue is setting up the bridge itself, it’s that it seems to completely bring down my entire network (including external vlans NOT using truenas) after the “test” phase starts, and only goes back to normal after truenas resets the configuration back to the original settings.
TLDR: Once I click the “test” button, my entire network goes down.
I know it’s not a duplicate IP address; it’s been used for months. I’ve even explicitly set the same original MTU, just to be sane.
Not sure if it helps any, but here is a dump of the truenas /var/log/messages file during the last test:
May 8 10:19:15 truenas kernel: ixgbe 0000:02:00.0 ens5f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May 8 10:19:15 truenas kernel: br0: port 1(ens5f0) entered blocking state
May 8 10:19:15 truenas kernel: br0: port 1(ens5f0) entered listening state
May 8 10:19:30 truenas kernel: br0: port 1(ens5f0) entered learning state
May 8 10:19:45 truenas kernel: br0: port 1(ens5f0) entered forwarding state
May 8 10:19:45 truenas kernel: br0: topology change detected, propagating
May 8 10:19:46 truenas kernel: NFSD: all clients done reclaiming, ending NFSv4 grace period (net f0000000)
May 8 10:21:03 truenas kernel: ixgbe 0000:02:00.0 ens5f0: NIC Link is Down
May 8 10:21:03 truenas kernel: br0: port 1(ens5f0) entered disabled state
May 8 10:22:58 truenas kernel: ixgbe 0000:02:00.1 ens5f1: left allmulticast mode
May 8 10:22:58 truenas kernel: ixgbe 0000:02:00.1 ens5f1: left promiscuous mode
May 8 10:22:58 truenas kernel: br0: port 2(ens5f1) entered disabled state
May 8 10:22:58 truenas kernel: ixgbe 0000:02:00.1: removed PHC on ens5f1
May 8 10:24:16 truenas kernel: ixgbe 0000:02:00.0 ens5f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May 8 10:24:16 truenas kernel: br0: port 1(ens5f0) entered blocking state
May 8 10:24:16 truenas kernel: br0: port 1(ens5f0) entered listening state
May 8 10:24:32 truenas kernel: br0: port 1(ens5f0) entered learning state
May 8 10:24:47 truenas kernel: br0: port 1(ens5f0) entered forwarding state
May 8 10:24:47 truenas kernel: br0: topology change detected, propagating
May 8 10:25:10 truenas kernel: ixgbe 0000:02:00.0 ens5f0: NIC Link is Down
May 8 10:25:10 truenas kernel: br0: port 1(ens5f0) entered disabled state
May 8 10:26:40 truenas kernel: ixgbe 0000:02:00.0 ens5f0: left allmulticast mode
May 8 10:26:40 truenas kernel: ixgbe 0000:02:00.0 ens5f0: left promiscuous mode
May 8 10:26:40 truenas kernel: br0: port 1(ens5f0) entered disabled state
May 8 10:27:11 truenas kernel: ixgbe 0000:02:00.0 ens5f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
May 8 11:04:14 truenas kernel: evm: overlay not supported
May 8 11:04:15 truenas kernel: Initializing XFRM netlink socket
May 8 11:20:38 truenas systemd-journald[1253]: Data hash table of /var/log/journal/dc18725ef5a14f01ad3484d8a1346d52/system.journal has a fill level at 75.0 (8534 of 11377 items, 6553600 file size, 767 bytes per hash table item), suggesting rotation.
May 8 11:20:38 truenas systemd-journald[1253]: /var/log/journal/dc18725ef5a14f01ad3484d8a1346d52/system.journal: Journal header limits reached or header out-of-date, rotating.
May 8 12:34:16 truenas kernel: br0: port 1(ens5f0) entered blocking state
May 8 12:34:16 truenas kernel: br0: port 1(ens5f0) entered disabled state
May 8 12:34:16 truenas kernel: ixgbe 0000:02:00.0 ens5f0: entered allmulticast mode
May 8 12:34:16 truenas kernel: ixgbe 0000:02:00.0 ens5f0: entered promiscuous mode
May 8 12:34:16 truenas kernel: br0: port 1(ens5f0) entered blocking state
May 8 12:34:16 truenas kernel: br0: port 1(ens5f0) entered forwarding state
May 8 12:35:16 truenas kernel: ixgbe 0000:02:00.0 ens5f0: left allmulticast mode
May 8 12:35:16 truenas kernel: ixgbe 0000:02:00.0 ens5f0: left promiscuous mode
May 8 12:35:16 truenas kernel: br0: port 1(ens5f0) entered disabled state
If I understand what you are suggesting, I think i’ve done that. I open a terminal on my desktop and ssh into the truenas host, then su and tail the /var/log/messages file while making all the changes. The desktop terminal loses connection to Truenas just as the
br0: port 1(ens5f0) entered forwarding state
message gets output. After Truenas resets its configuration and I can re-ssh into the box, I have to cat (rather than tail) the /var/log/messages file to get the remaining /var/log/messages content to paste here.
I can try that, but I don’t have a laptop with a 10Gbe nic in it. I’ll try later this evening when I can set up a laptop with a static IP address in the same range and see if they talk to each other.
I have disconnected the 10G SFP+ from the network and done the same network changes via the IPMI console using the CLI menu which seems to work, but as soon as I plug the SFP+ transceiver back into the host, the network goes down again and won’t come back up until I unplug the SFP+ transceiver and use the CLI to revert back to original settings again.
Yes, the IPMI is on the same subnet as the Truenas host, but in essence different computers internally with different IP addresses. I can try moving the IPMI host to a different subnet and see if that makes any difference.
Well, I’ve read that using 2 ports on the same subnet can cause issues. And quick googling showed that IPMI can (sometimes) be presented as a NIC on the host. So I’ve just put 2 and 2 together.
Odd, I’ve never seen that. I have another Supermicro host as well as a Dell R630 with their BMC’s on the same subnet with no issues. I’ll set up a new VLAN and move all 3 of those BMCs to it and see if that makes any difference.
mind posting output of ip link andif config - the only weird thing I notice in logs is that ens5f0 AND ens5f1 seem to be going up/down…
As far as IPMI - my IPMI is on the same subnet, but is detected as a completely different device & has nothing to do with the two main NIC ports (though there are options to configure it otherwise in bios); I don’t think IPMI is the issue, yet.
root@truenas[~]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens5f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether ac:1f:6b:0f:3d:a4 brd ff:ff:ff:ff:ff:ff
altname enp2s0f0
3: ens5f1: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
link/ether ac:1f:6b:0f:3d:a5 brd ff:ff:ff:ff:ff:ff
altname enp2s0f1
5: incusbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
link/ether 00:16:3e:bc:57:f3 brd ff:ff:ff:ff:ff:ff
7: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:69:1e:47:00 brd ff:ff:ff:ff:ff:ff
No, just the single ens5f0 was added as a member of the bridge when I created it. I don’t even have an SFP+ transceiver plugged into the 2nd port, so it shouldn’t even be active.
The only other connection is the BMC/IPMI ethernet connection on the same subnet with a different DHCP reserved IP address. But, is on a completely separate switch port as well.
I could be wrong since I’m not on the latest update & don’t use incus - but I’m guessing that issue is due to incusbr0 already being a bridge with the same subnet as ens5f0 - but then br0 wants to be a bridge of ens5f0. Should it not be instead of ens5f0 is setup under br0 and then inbusbr0 is setup as a bridge of br0? Or am I way off the mark?
I swear a put a space in there every time I put it into console as well and have to retype it