26.0.0-BETA.1: Apps fail to start at boot with "Unable to determine default interface", recoverable via midclt

## Summary

After upgrading to 26.0.0-BETA.1, Apps refuse to start on every boot. `docker.status` returns:

{“description”: “Application(s) have failed to start:\n[EFAULT] Unable to determine default interface”, “status”: “FAILED”}


`docker.service` stays inactive. Any new app install (e.g. Tailscale) fails with "docker not installed". 25.10.3 was unaffected, the issue started immediately after the BETA upgrade.

By the time the system is reachable, the default route is up and the interface is healthy. The error is stale, set during early boot, and middlewared never retries.

## Reproduction

1. 26.0.0-BETA.1, Apps pool selected, at least one app configured (Tailscale, SearXNG, etc.).
2. Reboot.
3. Once SSH is up:

```bash
sudo midclt call docker.status
# -> {"description": "...Unable to determine default interface", "status": "FAILED"}

sudo systemctl is-active docker
# -> inactive

ip route show default
# -> default via 192.168.0.1 dev enp0s31f6 proto dhcp src 192.168.0.48 metric 1002

cat /sys/class/net/enp0s31f6/operstate
# -> up

So the default route and the interface are both up, yet docker.status is still FAILED.

Root cause (middlewared source dive)

In /usr/lib/python3/dist-packages/middlewared/utils/interface.py:

def wait_for_default_interface_link_state_up() -> tuple[str | None, bool]:
    default_interface = get_default_interface()
    if default_interface is None:
        return default_interface, False
    return default_interface, wait_on_interface_link_state_up(default_interface)

get_default_interface() is called exactly once. If /proc/net/route does not yet contain a default route at that instant (race against NetworkManager / DHCP), it returns None and the helper gives up immediately. The 60s IFACE_LINK_STATE_MAX_WAIT budget is only consumed when polling operstate after a default interface has already been resolved, so it does not cover discovery of the default interface itself.

When this happens at boot, docker.state.start_service fails, sets state to FAILED, and never retries. State stays FAILED until the operator intervenes manually.

Verification that the helper itself works post-boot:

$ sudo python3 -c "from middlewared.utils.interface import \
    get_default_interface, wait_for_default_interface_link_state_up; \
    print('default:', get_default_interface()); \
    print('wait:', wait_for_default_interface_link_state_up())"
default: enp0s31f6
wait: ('enp0s31f6', True)

Helper is correct, only the single-shot discovery is racy.

Workaround (every reboot)

sudo midclt call -j docker.fs_manage.mount
sudo midclt call docker.state.start_service
sudo midclt call docker.status
# -> {"description": "Application(s) are currently running", "status": "RUNNING"}

After this, all installed apps come up and new app installs work.

Suggested fix

Poll get_default_interface() within the existing IFACE_LINK_STATE_MAX_WAIT budget, then use the remaining budget for the operstate check. Patch ready, total worst-case wait stays bounded at the same 60s.

Why I’m posting here instead of Jira

I created a jira.ixsystems.com account but the Create Issue page returns “You are not authorized to perform this operation.” Posting here so iX staff can either re-file in Jira or unblock my account; happy to attach a GitHub PR against truenas/middleware:master once a Jira ticket exists.

Environment

  • TrueNAS 26.0.0-BETA.1 (multi-boot with 25.10.3, 25 path was unaffected)
  • Hardware: <fill in: motherboard / NIC>
  • Default route iface: enp0s31f6 (Intel I219-V, 1G), DHCP from upstream router
  • Apps pool: data
  • Apps installed: tailscale, searxng (both come up after manual start_service)

I can attach journalctl -u middlewared -b 0 and cat /proc/net/route snapshots from boot if helpful.

The way to access Jira is to press the “Report a Bug” button at the top of the page. If you are logged into the forum then the login into Jira should succeed again.

I have created a bug report for this a couple of days ago and it was closed with the mention it was fixed in a newer version already.

It’s probably already fixed in the nightlies and it will probably come in BETA2.

You’re probably not blocked, but the Jira interface can be frustratingly finicky for new users. Could you say exactly what you did before seeing this error message? Was it directly after clicking the Create button or when you tried to submit the ticket? If the latter, what issue type did you select from the dropdown?