SCALE ElectricEel-24.10.0.2 - Critical Failed to start docker for Applications: Docker service could not be started

jerr · January 28, 2025, 10:37pm

I tried with jira but they ignored me saying that it’s not their problem and that I should try with the community After many restarts I came to the conclusion that my problem is too many containers (too many? I don’t know what that means and it seems to me that truenas can’t wait enough time for the application manager to start with the containers. At the moment I have a little less than 60 containers

DjP-iX · January 29, 2025, 3:08pm

Can you send your Jira ticket number so I can take a look at it?

For what it’s worth, we haven’t seen this issue come up in any of our internal testing but will be trying to see if we can reproduce it in the next round of manual tests. It does seem like there might be a common thread emerging, pointing toward some kind of time out on systems with 50+ containers running, but we’ll have to investigate further.

jerr · January 29, 2025, 9:15pm

Thank you very much in advance for your support, here is the thread - https://ixsystems.atlassian.net/browse/NAS-132921 , today I had another fun with starting docker after updating Scale to version ElectricEel-24.10.2, unfortunately no attempts to restart docker or ususet/choose pool gave good results, only I turned off all containers, turned off docker and restarted truenas. after all this the manager started itself. Last time it was similar - I uninstalled some containers to less than 40 and everything started up

red · January 30, 2025, 4:15am

NAS-133304

48 containers

sloniupl · January 30, 2025, 5:37pm

Found a solution from this thread:

Just go in Apps > Configuration >settings and change Base ip address from 172.17.0.0 to 172.16.0.0
That’s it now it works

Edit: still does not work after restart

SavanFlou · January 31, 2025, 8:01am

Hello,
I just had this problem, I just unset / set the pool and they are all back.
Thanks for the tip

TKOxff · February 8, 2025, 4:33am

In my case, I followed ChatGPT’s advice and made the following changes, which resolved the issue.

Original Error message from journalctl -xeu docker.service
Feb 08 12:51:31 truenas dockerd[6902]: failed to start daemon: Error initializing network controller: error creating default “bridge” network: all predefined address pools have been fully subnetted

ChatGPT’s advice
Cause:
Your home router and Docker network are using the same subnet (e.g., 192.168.3.0/24), causing network collisions and routing issues.

Solution:
Choose a subnet that does not overlap with your router.

Edit /etc/docker/daemon.json:
{
“default-address-pools”: [{“base”: “192.168.200.0/24”, “size”: 24}]
}

sloniupl · February 8, 2025, 9:27pm

Looks like the issue is related to docker apps running from HDD based pool.
And they will solve it in 25.04

Source: Jira

yorick · February 10, 2025, 10:36am

I was like “huh HDD?” – 120s hard-coded timeout. That makes sense.

And explains why I’ve had zero issues - three apps, SSD mirror pool for them. Overkill? For sure. They don’t need the SSD. But - what is TrueNAS if not a fun hobby where we get to stick hardware into a server ;).

I cannot fathom 40+ apps. More power to you! I have a media app (Plex), a gaming app (Foundry VTT), and a dns updater app. I can see nextcloud plus office solution - but then again no, that’s not worth it for my use case. If I wanted to run more than one Foundry (no thanks), or went for something like nextcloud after all, then an ngnix app would make sense.

I can see the value in a WiFi controller app for those that use Ubiquity.

All that rambling aside: It’s good this case was found and will be fixed in 25.04. TrueNAS CE should work with 40+ apps on HDD.

Also, with 40+ apps, maybe do the nerdy thing and have some fun with a 9200/9300 Broadcom HBA in IT mode with a couple Intel SATA SSDs from eBay as a mirror pool. The data bits can be on HDD (Plex media for me somewhat obviously is), but the apps will be a little happier on SSD.

adriancristian · February 20, 2025, 6:04am

Nobody could debug this, but i found a workaround. All you have to do is after the system boot go to CLI and restart middlewared

sudo service middlewared restart

And all goes to normal.

airmarcus · February 21, 2025, 11:23pm

Thanks for that - this worked for me!

BrandonMFong · February 23, 2025, 1:48am

I’m getting this issue after rebooting for the first time in about a month. None of the above suggestions are working for me.

Version: ElectricEel-24.10.1

yorick · February 23, 2025, 1:38pm

“None of the above suggestions” - how does it behave when your apps are on SSD? Caution that there isn’t a UI way to just move them over, you’ll have to make sure the contents of ix-apps move with all permissions, manually from CLI

If you want to stick to HDD, the suggestion was to try 25.04, which removes the hard coded timer. How does it behave there?

Can you confirm that the timer is the issue? Are there so many apps that they take longer than 120s to start, and then you get the error?

BrandonMFong · February 24, 2025, 3:55am

I am running my apps on HDDs. I saw the jira item regarding the hard coded timer issue, I’ll consider running the 25.04 beta to see if it solves the issue.

Can you confirm that the timer is the issue? Are there so many apps that they take longer than 120s to start, and then you get the error?

How would I confirm this?

yorick · February 24, 2025, 12:19pm

The easy button is to change that. Run apps on an SSD mirror pool (those are truly cheap) and hand in app data via a host mount from HDD.

Now with 40+ apps, granted, that “easy button” may need some manual work, depending on where the data the apps access is now. If it’s already on host mounts (separate datasets from ix-apps) then it’s truly easy.

Sure 25.04 will solve this for you, RC1 out shortly. But solve in the sense of it won’t fail - it’ll still take longer than 120s to start them all because HDD.

Apps on SSD sounds so much nicer all around.

jerr · February 26, 2025, 1:58pm

Works better than docker restart ! Thank You!

Whattteva · February 26, 2025, 2:17pm

I don’t think enterprises use the Apps feature to be honest. I’m actually interested in seeing the metrics for Apps usage in enterprise deployment if anyone knows where to look.

Nemean · February 26, 2025, 5:19pm

Hello. I found this thread while trying to debug my truenas scale apps issues.

Gist is I have a pool that is currently slowly dying. I have created a new pool and replicated data from my backup onto it. I tried to change the selected pool for apps to my new pool, but docker fails to start. I have tried restarting middleware, unseating/rechoosing the pool multiple times. The journal does help much as it said docker exited with error code 1

Any help identifying the issue would be super helpful. I am on the latest release version of Scale EE.

Nemean · February 26, 2025, 5:23pm

Oh also forgot to mention. When I choose my original pool with my existing apps, it starts without error.

Nemean · February 26, 2025, 11:02pm

I found the issue. For some reason the entire pool was set to read-only.