[BUG] 25.10.2.1 — Recurring "Failed to check for alert BondStatus: Netlink socket busy"

Since upgrading to 25.10.2.1 (Goldeye), I’m getting this alert in a recurring loop — appears, clears itself, repeats — on a system with no bond interface whatsoever:

Failed to check for alert BondStatus: Netlink socket busy

Confirmed no bond present: ip link show type bond returns nothing. Single NIC, static IP, no LAG/VLAN on bond.

Root cause: BondStatus.check() in bond.py calls interface.query which opens a Netlink socket. On systems with many virtual interfaces (Docker bridges, veth pairs), the socket is occasionally busy — throwing the exception before any bond check even happens. There’s no early-exit guard for systems without LINK_AGGREGATION interfaces.

Proposed fix — add two lines to BondStatus.check():

ifaces = {i[“id”]: i for i in await self.middleware.call(“interface.query”)} + if not any(i[“type”] == “LINK_AGGREGATION” for i in ifaces.values()): + return []

Skips all Netlink activity on bond-free systems. No impact on systems that actually use bonding.

Note: The BondStatus checker has no entry in Alert Settings UI — it cannot be suppressed via the web interface. The workaround is patching bond.py directly and restarting middlewared.

Known issue, should get fixed with the next minor update, which was supposed to release this week but was postponed because they found more bugs to fix

1 Like

I think this is the correct JIRA ticket. NAS-140041

Yep, noticed that.

Which makes me wonder, as this fix release was supposed to be for only a few bugs, but it looks like there was quite more behind them…

Any update on a release date for this fix? This bug is resulting in 2 to 10 of these per minute and ever more network disconnections. Yes a reboot gets rid of the issue for a few hours but that’s inconvenient (at best) and then it returns anyway.

1 Like

You can try booting to a previous version and see if that can get you by until an update for TrueNAS Scale

1 Like

you could also do service middlewared restart on a cron schedule. There really shouldn’t be any harm as it only handles management functions. (docker, VMs, SMB, etc all will continue to run)