Middleware service not starting

Has anyone been able to resolve this issue on know what is actually happening? I’ve tried reloading old configs with no success.

I have also tried installing ubuntu on my boot drive and the system works flawlessly, so it doesn’t appear to be a hardware issue.

My system still takes many minutes to boot up as it stays a long time at ‘setting up plugins (audit)’ but it does boot in the end.
I did look at some logs but I couldn’t find anything useful in syslog and middlewared.log.
This is probably rather due to me not knowing how middlewared and it’s plugins work and therefore not knowing that to actually look for than due to there being no useful information.

Is there anyone here (preferrably a TrueNAS dev) who could explain what exactly happens at this stage in the boot process and what to look for?

I’ve taken a look at middleware/src/middlewared/middlewared/plugins/audit at master · truenas/middleware · GitHub which led me to the path /audit on my NAS (quiet obvious, yes).

I think my system might take so long to start because I have a multi-gigabyte SUDO.db there.
Nearly all entries are caused by my monitoring agent which is not part of TrueNAS.
At least for me the issue may be selfmade.

I sadly can’t find an option to exclude a certain user from sudo audit logging, this option only seems to exist for SMB.
I would like to empty or at least largely reduce the databse.
When reducing the retention days the database does not seem to be cleared.
It is also not possible to set the quota to a level which would trigger the warning state with the current size.

Although the issue may be selfmade I still think a large auditing database should not make TrueNAS take that long to boot (especially on mirrored SSDs) or even outright fail (if this is actually the cause for other people’s issues as well).
I would also like to see some more options for audit logging (ignoring users, freeing space…).

I got stuck on middlewared: setting up plugins (audit) after upgrading from 24.10.0.2 → 24.10.1. I was blocked from fully booting the console as a result.

I “fixed it” by removing all of the existing audit databases:

cd /audit
for f in *; do mv "$f" "$f.save"; done

and then rebooting. So, I lose out on the previous log databases, but my system works again.

I spent a tiny amount of time investigating some of the databases in a postmortem, and the .db files were able to be read and dumped by sqlite3, so I doubt there was any corruption there. Maybe it could’ve been related to the size of the databases (the biggest one, IIRC, was around 10GB with the WAL), but without diving into the audit plugin a little, that feels unlikely.

I didn’t spend too much time in postmortem, since I don’t really care about those databases in their previous state, and a reboot causes the audit plugin to make new databases.

2 Likes

Oh, also potentially relevant for someone wanting to do more digging on this. There weren’t any logs in /var/log/middlewared.log from before the upgrade that were particularly concerning. After the upgrade during startup, I had this:

[2025/01/12 08:23:37] (ERROR) middlewared.reader():708 - Error in ShellWorkerThread.reader
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 706, in reader
    ).result()
      ^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3/dist-packages/aiohttp/web_ws.py", line 315, in send_bytes
    await self._writer.send(data, binary=True, compress=compress)
  File "/usr/lib/python3/dist-packages/aiohttp/http_websocket.py", line 688, in send
    await self._send_frame(message, WSMsgType.BINARY, compress)
  File "/usr/lib/python3/dist-packages/aiohttp/http_websocket.py", line 601, in _send_frame
    raise ConnectionResetError("Cannot write to closing transport")
ConnectionResetError: Cannot write to closing transport

The only relevant log from journalctl -u middlewared was about the event loop exiting too soon.

Jan 12 08:27:14 localhost systemd[1]: Starting middlewared.service - TrueNAS Middleware...
Jan 12 08:27:17 localhost middlewared[1680]: /usr/lib/python3/dist-packages/pydantic/json_schema.py:2158: PydanticJsonSchemaWarning: Default value <object object at 0x7ff145f08d40> is not JSON serializable; excluding default>
Jan 12 08:27:17 localhost middlewared[1680]:   warnings.warn(message, PydanticJsonSchemaWarning)
Jan 12 08:31:19 localhost systemd[1]: middlewared.service: start operation timed out. Terminating.
Jan 12 08:31:19 localhost middlewared[1680]: Traceback (most recent call last):
Jan 12 08:31:19 localhost middlewared[1680]:   File "/usr/bin/middlewared", line 33, in <module>
Jan 12 08:31:19 localhost middlewared[1680]:     sys.exit(load_entry_point('middlewared==0.0.0', 'console_scripts', 'middlewared')())
Jan 12 08:31:19 localhost middlewared[1680]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 12 08:31:19 localhost middlewared[1680]:   File "/usr/lib/python3/dist-packages/middlewared/main.py", line 2220, in main
Jan 12 08:31:19 localhost middlewared[1680]:     ).run()
Jan 12 08:31:19 localhost middlewared[1680]:       ^^^^^
Jan 12 08:31:19 localhost middlewared[1680]:   File "/usr/lib/python3/dist-packages/middlewared/main.py", line 2034, in run
Jan 12 08:31:19 localhost middlewared[1680]:     self.loop.run_until_complete(self.__initialize())
Jan 12 08:31:19 localhost middlewared[1680]:   File "/usr/lib/python3.11/asyncio/base_events.py", line 652, in run_until_complete
Jan 12 08:31:19 localhost middlewared[1680]:     raise RuntimeError('Event loop stopped before Future completed.')
Jan 12 08:31:19 localhost middlewared[1680]: RuntimeError: Event loop stopped before Future completed.
Jan 12 08:32:50 localhost systemd[1]: middlewared.service: State 'stop-sigterm' timed out. Skipping SIGKILL.
Jan 12 08:34:20 localhost systemd[1]: middlewared.service: State 'final-sigterm' timed out. Skipping SIGKILL. Entering failed mode.
Jan 12 08:34:20 localhost systemd[1]: middlewared.service: Failed with result 'timeout'.
Jan 12 08:34:20 localhost systemd[1]: middlewared.service: Unit process 1680 (asyncio_loop) remains running after unit stopped.
Jan 12 08:34:20 localhost systemd[1]: middlewared.service: Unit process 1692 (python3) remains running after unit stopped.
Jan 12 08:34:20 localhost systemd[1]: Failed to start middlewared.service - TrueNAS Middleware.
Jan 12 08:34:20 localhost systemd[1]: middlewared.service: Consumed 4min 40.497s CPU time.

This fixed it it for me. Appears to have been caused by large db’s in the audit/ directory.

I have turned off SMB audit in my SMB shares. as that was the largest files

Thanks, this fixed my issue. Look like my SMB audit were huge (15GB) after syncing a lot of files. I just disabled SMB audit ! Booted pretty fast after renaming the log to .save :grin:

1 Like

Thank you for this! Ran into the same issue when upgrading from 24.10.1 to 24.10.2 and your solution worked.

I just ran into this to going from 10.0.2 → 10.2

middleware never started and I had to drop in and delete /audit/* - it was 10gb total and totally preventing the middleware from starting even if I dropped back to the previous version (server had been up for 100+ days)

that said even when deleting audio in the new version the system was broken, I had to get into 10.0.2 and re-apply the upgrade for 10.2 to find my pool and boot successfully