First signs of bitrot in TN CORE? Or only for me?

No idea when this was introduced because I am so used to using iocage on the command line, but yesterday:

So apparently you cannot stop and start jails in the UI, anymore, whatever the cause.

Well, time to plan moving on, I guess. We come to it at last … the great battle of our time.

Take care,
Patrick

4 Likes

I assume there was no software change… 13.0-U6.???

13.3-U1.2 - 13.0 cannot run up to date jails.

I just noticed on Saturday, no idea when that problem might have appeared. Jails start ok at boot time, but stopping and starting through the UI does not work.

Creating a jail in the UI and then using iocage start does.

??? Not understanding.

I just logged into the UI and stopped a Jellyfin jail, a Plex jail, and a nginx jail (my reverse proxy). They all stopped without error messages. VNet network interfaces went down and their RAM was released from use. I then started them back up and they came back online with no issue.

When I click on STOP for any jail I get the error message in my first post instead. I thought that much was obvious.

Are you running 13.0 or 13.3?

13.3u1.2

(Isn’t this why we all have our system specs in our sigs?)

Same reason as you, for apps running in jails. Although I have moved a few to Docker in a VM now. I don’t feel like that setup is as stable as the apps still in BSD jails.

1 Like

Ah … yes. :wink: I forgot about that when moving to the new forum.

Oh, well. So something different than just the version is causing some problem with file descriptors. Time to bring the big guns - truss/ktrace - later tonight.

Here’s the full trace from middlewared.log:

[2025/11/17 13:49:05] (DEBUG) iocage.__start_jail__():253 - Grabbing IPv6 default route
[2025/11/17 13:49:05] (WARNING) iocage.callback():69 - No default gateway found for ipv6.
[2025/11/17 13:49:05] (DEBUG) iocage.__start_jail__():255 - Default IPv6 Gateway: none
[2025/11/17 13:49:05] (INFO) iocage.callback():71 - * Starting cloud
[2025/11/17 13:49:07] (ERROR) middlewared.job.run():367 - Job <bound method accepts.<locals>.wrap.<locals>.nf of <middlewared.plugins.jail_freebsd.JailService object at 0x9a373c670>> failed
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/jail_freebsd.py", line 1297, in start
    iocage.start(used_ports=[6000] + list(range(1025)))
  File "/usr/local/lib/python3.9/site-packages/iocage_lib/iocage.py", line 1811, in start
    ioc_start.IOCStart(
  File "/usr/local/lib/python3.9/site-packages/iocage_lib/ioc_start.py", line 87, in __init__
    raise e
  File "/usr/local/lib/python3.9/site-packages/iocage_lib/ioc_start.py", line 84, in __init__
    self.__start_jail__()
  File "/usr/local/lib/python3.9/site-packages/iocage_lib/ioc_start.py", line 643, in __start_jail__
    prestart_success, prestart_error = iocage_lib.ioc_common.runscript(
  File "/usr/local/lib/python3.9/site-packages/iocage_lib/ioc_common.py", line 849, in runscript
    output = iocage_lib.ioc_exec.SilentExec(
  File "/usr/local/lib/python3.9/site-packages/iocage_lib/ioc_exec.py", line 268, in __init__
    self.output = list(silent)
  File "/usr/local/lib/python3.9/site-packages/iocage_lib/ioc_exec.py", line 220, in exec_jail
    r = select.select([
ValueError: filedescriptor out of range in select()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 355, in run
    await self.future
  File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 393, in __run_body
    rv = await self.middleware.run_in_thread(self.method, *([self] + args))
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1161, in run_in_thread
    return await self.run_in_executor(self.thread_pool_executor, method, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1158, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 985, in nf
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/jail_freebsd.py", line 1299, in start
    raise CallError(str(e))
middlewared.service_exception.CallError: [EFAULT] filedescriptor out of range in select()
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/datastore/read.py", line 172, in query
    return result[0]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/libvirt.py", line 5822, in _dispatchDomainEventCallbacks
    cb(self, virDomain(self, _obj=dom), event, detail, opaque)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/vm_/events.py", line 25, in callback
    vms = {f'{d["id"]}_{d["name"]}': d for d in self.middleware.call_sync('vm.query')}
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1301, in call_sync
    return self.run_coroutine(methodobj(*prepared_call.args))
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1341, in run_coroutine
    return fut.result()
  File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 981, in nf
    return await f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/service.py", line 484, in query
    result = await self.middleware.call(
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1285, in call
    return await self._call(
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1242, in _call
    return await methodobj(*prepared_call.args)
  File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 981, in nf
    return await f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/datastore/read.py", line 164, in query
    result = await self._queryset_serialize(
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/datastore/read.py", line 214, in _queryset_serialize
    result.append(await self._serialize(
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/datastore/read.py", line 232, in _serialize
    data = await self.middleware.call(extend, data)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1285, in call
    return await self._call(
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1242, in _call
    return await methodobj(*prepared_call.args)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/vm.py", line 1037, in extend_vm
    vm['status'] = await self.middleware.call('vm.status', vm['id'])
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1285, in call
    return await self._call(
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1253, in _call
    return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1158, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 985, in nf
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/vm.py", line 1684, in status
    vm = self.middleware.call_sync('datastore.query', 'vm.vm', [['id', '=', id]], {'get': True})
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1301, in call_sync
    return self.run_coroutine(methodobj(*prepared_call.args))
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1341, in run_coroutine
    return fut.result()
  File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 981, in nf
    return await f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/datastore/read.py", line 174, in query
    raise MatchNotFound()
middlewared.service_exception.MatchNotFound
[2025/11/17 13:51:33] (INFO) iocage.callback():71 - * Stopping acme
[2025/11/17 13:51:33] (ERROR) middlewared.job.run():367 - Job <bound method accepts.<locals>.wrap.<locals>.nf of <middlewared.plugins.jail_freebsd.JailService object at 0x9a373c670>> failed
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/jail_freebsd.py", line 1316, in stop
    iocage.stop(force=force)
  File "/usr/local/lib/python3.9/site-packages/iocage_lib/iocage.py", line 1843, in stop
    ioc_stop.IOCStop(
  File "/usr/local/lib/python3.9/site-packages/iocage_lib/ioc_stop.py", line 62, in __init__
    raise e
  File "/usr/local/lib/python3.9/site-packages/iocage_lib/ioc_stop.py", line 59, in __init__
    self.__stop_jail__()
  File "/usr/local/lib/python3.9/site-packages/iocage_lib/ioc_stop.py", line 114, in __stop_jail__
    prestop_success, prestop_error = iocage_lib.ioc_common.runscript(
  File "/usr/local/lib/python3.9/site-packages/iocage_lib/ioc_common.py", line 849, in runscript
    output = iocage_lib.ioc_exec.SilentExec(
  File "/usr/local/lib/python3.9/site-packages/iocage_lib/ioc_exec.py", line 268, in __init__
    self.output = list(silent)
  File "/usr/local/lib/python3.9/site-packages/iocage_lib/ioc_exec.py", line 220, in exec_jail
    r = select.select([
ValueError: filedescriptor out of range in select()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 355, in run
    await self.future
  File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 393, in __run_body
    rv = await self.middleware.run_in_thread(self.method, *([self] + args))
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1161, in run_in_thread
    return await self.run_in_executor(self.thread_pool_executor, method, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1158, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 985, in nf
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/jail_freebsd.py", line 1318, in stop
    raise CallError(str(e))
middlewared.service_exception.CallError: [EFAULT] filedescriptor out of range in select()

That looks more like an iocage bug (using select.select() rather than kqueue or poll()). There are known issues with FD_SET / select when the FD numbers get large. Do you see the issue if you, say, reboot the server?

^^^ check that is failing.

Yes. Even after a reboot. I could try to disable autostart for all jails and reboot. Main services that have the potential to use up file descriptors are all in jails.

Then again iocage start|stop on the command line plain works. Also all other iocage operations or config changes through the UI.

Probably your iocage start / stop are using the libiocage in a slightly different way. If you look at the python traceback, it’s failing due to select.select() call in libiocage (which I think in modern world would be considered a library bug) when trying to run some sort of pre-exec script. This isn’t new code, so you’re probably tickling the library in a new way now and my guess is that it’s related to open fds on the server.

I definitely have more than 1024 open files if that is the problem:

freenas# sysctl kern.openfiles
kern.openfiles: 2890

But the main contributor to that count is middlewared itself:

freenas# fstat | grep python3.9 | wc -l
    1306

Restarting middlewared brings that number way down:

freenas# service middlewared restart
Stopping middlewared.
Waiting for PIDS: 59193.
freenas# fstat | grep python3.9 | wc -l
     225

And guess what - starting and stopping jails from the UI works again.

So I guess I’ll have to monitor open FDs by middlewared. If the “leak” takes some time to materialise, I could just restart middlewared every night or so.

If you use the freebsd port for iocage generally (I think someone took over maintainership), then it may be worth a bug report against the library if they haven’t fixed it yet. As you’re probably aware there are quite a few syscalls / subroutines that are known bad (strcpy vs strlcpy, etc). select / FD_SET are among these because the behavior is so annoying (bugs due to their usage don’t show up until server is in production).

1 Like

I had also looked into the code already. We do not experience these problems on our production machines with “FreeBSD iocage”.

I just noticed it the first time with TrueNAS CORE.

I suspect if there is a file descriptor leak in middlewared that after fork() and exec() iocage inherits all of these and then the limit in select() hits.

I moved all virtual machines off of my TN CORE, because while bhybe is well alive and kicking (we had a great bhyvecon in Zagreb this September) the state in TN CORE leaves a bit to be desired.

But I am not going to give up my jails any time soon.

The number I see with that simple fstat | grep is slowly increasing, by the way. 225 > 250 > 252 > …

Interesting. After having my Core server running for a while, it’s still only at 228. The difference is I only use the command-line with iocage to start, stop, exec, and enter jails.

It’s settled for now:

freenas# fstat | grep python3.9 | wc -l
     251

So I guess I will shrug it off for the moment and should the problem in the UI resurface immediately check the open files.

I will have to further investigate alternatives to CORE, anyway, though. Yes, apps and VMs are all fun on CE and I actually use it as an application server.

But I want e.g. my Nextcloud in a jail where I can ssh into and look at the moving parts. And add or modify things. And update at my pace and not depend on an app repository.

Thanks for helping, @awalkerix

Issue is in upstream master. That’s generally the problem with select. It’s basically a ticking timebomb. It works until it doesn’t, and when it fails it can fail bad.

That said, if I were deploying a vanilla FreeBSD server, iocage would not be my first choice for jail managment. I think there are much better libs / tools available these days.