"initializing apps service ..." problem

E_B · August 7, 2024, 2:22pm

Hello all

short version, 8th Aug:rebooted/unset/rebooted/reset …still stuck.
Long version below showing everything I tried yesterday

A couple of hours ago one out of my six running apps needed to be reconfigured (TN app “Frigate”). Having done so, I couldn’t restart it beyond “deploying” so I tried (many times) to stop it from the UI.

Then I followed some forum advice (can’t remember the post but it seemed sensible and I think I’d used this approach before) so I tried unsetting and then resetting the pool which didn’t work, instead giving me this:

[edit - I did not reboot between unsetting and resetting the pool - I have since found other forum posts which suggest rebooting is a requirement].

I waited and got no progress so I decided to reboot TN. After doing so I got this instead:

and an error message in the log telling me

CRITICAL
Failed to start kubernetes cluster for Applications: Client connection error raised from ‘/api/v1/pods’ endpoint

I searched here for a fix and saw this post so I decided to try the steps mentioned:

1. systemctl stop k3s
2. cd /mnt/data/ix-applications/k3s/server/db/
3. cp state.db state.db.save
4. sqlite3 state.db.save ".dump" > recovered.sql
5. sqlite3 state.db.recovered < recovered.sql
6. cp state.db.recovered state.db
7. systemctl start k3s

That didn’t work because after step (7) I saw in the ssh console:

Job for k3s.service failed because the control process exited with error code.
See “systemctl status k3s.service” and “journalctl -xeu k3s.service” for details.

I tried systemctl status k3s:

k3s.service - Lightweight Kubernetes
Loaded: loaded (/lib/systemd/system/k3s.service; disabled; preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Wed 2024-08-07 15:03:17 BST; 3s ago
Docs: https://k3s.io
Process: 46695 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
Process: 46696 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Process: 46697 ExecStart=/usr/local/bin/k3s server --flannel-backend=none --disable=traefik,local-storage --disable-kube-proxy ->
Process: 46721 ExecStopPost=/usr/local/bin/k3s-kill.sh (code=exited, status=0/SUCCESS)
Main PID: 46697 (code=exited, status=1/FAILURE)
CPU: 160ms

Then I started to sweat because I realise I don’t know what to do.

Fifteen minutes after rebooting, I’m still seeing the first picture “initialising apps …” combined with the second “applications are not running” and the big red exclamation mark where the “apps” should be.

Please can anyone help? I hope not to lose the six apps, because one of them is a custom app which i must have set up a couple of years ago and I can’t remember how I did it. The remaining apps are all standard TN catalogue items and not hard to reconfigure, but it is a bit laborious.

edit - I forgot to show you what my “apps” kubernetes settings look like. I don’t know what they were set to before I rebooted but this is what they say now:

(I don’t know what to set the node IP or route v4 interface to, for example).

Thanks for any assistance which can calm me down and stop me weeping.

EB

LarsR · August 7, 2024, 2:30pm

Have you tried a force refresh of the apps page with shift+f5? the Gui does some caching and doesnt automatically refresh to the “real” page…

E_B · August 7, 2024, 2:37pm

No - I went and did something very complicated and beyond my abilities instead of something sensible like your suggestion. When will I ever learn …

I have tried shift-F5 now but it didn’t work, perhaps because I have tried it too late and my experiments have messed things up, or anyway it hasn’t worked.

I will certainly remember to try it next time (I remember it from when you upgrade and you have to make sure the browser doesn’t hache previous UI pages and settings).

LarsR · August 7, 2024, 2:45pm

The only thing you could try is to unset and reset the apps pool, that should also restart the k3s process…

E_B · August 7, 2024, 2:53pm

I tried it again but it gives me the same “initalizing apps service” and “apps not running!”

I also tried systemctl start k3s again and it tells me

Job for k3s.service failed because the control process exited with error code.
See "systemctl status k3s.service" and "journalctl -xeu k3s.service" for details.

So I then this time I did what it says:

systemctl status k3s.service
which results in

k3s.service - Lightweight Kubernetes
     Loaded: loaded (/lib/systemd/system/k3s.service; disabled; preset: disabled)
     Active: activating (auto-restart) (Result: exit-code) since Wed 2024-08-07 15:49:24 BST; 1s ago
       Docs: https://k3s.io
    Process: 64055 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 64056 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
    Process: 64057 ExecStart=/usr/local/bin/k3s server --flannel-backend=none --disable=traefik,local-storage --disable-kube-proxy ->
    Process: 64079 ExecStopPost=/usr/local/bin/k3s-kill.sh (code=exited, status=0/SUCCESS)
   Main PID: 64057 (code=exited, status=1/FAILURE)

and also journalctl -xeu k3s.service
which results in a lot of lines, the most salient of which I estimate to be:

truenas systemd[1]: k3s.service: Scheduled restart job, restart counter is at 370.
truenas systemd[1]: Stopped k3s.service - Lightweight Kubernetes.
truenas systemd[1]: Starting k3s.service - Lightweight Kubernetes.
truenas k3s[64445]: time="2024-08-07T15:50:05+01:00" level=info msg="Starting k3s v1.26.6+k3s-6a894050-dirty (6a8940>
truenas k3s[64445]: time="2024-08-07T15:50:05+01:00" level=info msg="Configuring sqlite3 database connection pooling>
truenas k3s[64445]: time="2024-08-07T15:50:05+01:00" level=info msg="Configuring database table schema and indexes, >
truenas k3s[64445]: time="2024-08-07T15:50:05+01:00" level=info msg="Database tables and indexes are up to date"
truenas k3s[64445]: time="2024-08-07T15:50:05+01:00" level=fatal msg="starting kubernetes: preparing server: creatin>
truenas systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
truenas systemd[1]: k3s.service: Failed with result 'exit-code'.

which then repeats.

I tried unsetting|rebooting|setting the pool to see if it made any difference: it said something about kubernetes in a window

which then vanished leaving me with this again

I’ll leave it to see if it changes.

I also looked at # systemctl status k3s again

k3s.service - Lightweight Kubernetes
     Loaded: loaded (/lib/systemd/system/k3s.service; disabled; preset: disabled)
     Active: activating (start) since Wed 2024-08-07 16:12:54 BST; 3s ago
       Docs: https://k3s.io
    Process: 6883 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 6884 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
   Main PID: 6885 (k3s-server)
      Tasks: 12
     Memory: 20.7M
        CPU: 128ms
     CGroup: /system.slice/k3s.service
             └─6885 "/usr/local/bin/k3s server"

Aug 07 16:12:54 truenas systemd[1]: Starting k3s.service - Lightweight Kubernetes...
Aug 07 16:12:54 truenas k3s[6885]: time="2024-08-07T16:12:54+01:00" level=info msg="Starting k3s v1.26.6+k3s-6a894050-dirty (6a89405>

running that command multiple times shows me that it’s likely stuck in a loop repeating every 4 s or so. I can’t kill it due to the errors it presents as previously mentioned.

[edit: an hour or so later and I still see the same “initializing app” window so I have rebooted again, only to find the same problem persisisting]

Still “initializing” and further searches have led me to conclude this problem isn’t soluble, so instead, can anyone help me with a different approach please?

(1) Perhaps via SSH and cd to /mnt/pool/ix-applications/k3s, or by other approaches, please can you show me which dirs and files I ought to backup somewhere away and then reinstall again in order to re-establish my six apps?

(2) Also it only occurs to me now (of course) that I could have done some snapshots of ix-applications … is that something which users sometimes do?

edit - I have found a dir at

root@truenas[/mnt/mainraid/ix-applications/backups]#

which contains

drwxr-xr-x 4 root  4 Apr 15  2023 HeavyScript_2023_04_15_17_02_35/
drwxr-xr-x 4 root  4 May  6  2023 HeavyScript_2023_05_06_18_11_28/
drwxr-xr-x 4 root  4 May 23  2022 TrueTool_2022_05_23_14_19_28/
drwxr-xr-x 6 root  6 May 17 23:36 system-update--2024-05-17_22:35:58/
drwxr-xr-x 6 root  6 Jun 13 08:27 system-update--2024-06-13_07:27:42/
drwxr-xr-x 7 root  7 Jul  9 18:00 system-update--2024-07-09_17:00:02/

The last one is the most recent - a month ago - could it be useful, and if so, what do I do with it (if anything)?

edit: I did this, just in case …

ssh root@192.168.1.194 "tar czf - -C /mnt/mainraid/ix-applications/backups/ system-update--2024-07-09_17:00:02" > ~/TN_app_backup/system-update-backup.tar.gz

ABain · August 7, 2024, 6:05pm

Which release are you on? I know your my NAS says general latest, can you confirm it is up to date with 24.04.2?

E_B · August 7, 2024, 6:09pm

Hello

Yes - Dragonfish-24.04.2 - the latest (perhaps I ought to edit that signature part to make it explicit - I did wonder about that … edit - I have edited it!).

E_B · August 7, 2024, 10:36pm

contents of the deleted posts above were copied into the main post to improve readability

I tried unsetting and resetting; I tried moving from a static IP to DHCP (it’s th eonly thing I have changed that I can think of) and I tried to install a new version of an app, but I can’t install because the service isn’t running.

The only remaining option I can think of is to reinstall Scale but I am keen to try something else first

Protopia · August 8, 2024, 11:02am

I know I am probably preaching to the converted here, but one lesson from this seems to be to ask here for advice BEFORE trying out what someone else suggests in a prior post as a fix for their problem which may or may not be the same as yours.

I know that people often say that “asking for forgiveness afterwards can be easier than asking for permission before” but when it comes to technical problems where you can easily make a problem far worse by trying the wrong fix this is most often not the case.

E_B · August 8, 2024, 11:06am

I know, I agree - you’re right. I never learn. (I thought I’d be able to fix it because the symptoms seemed similar …).

Luckily for me this “apps” stuff is only a hobby aspect (and the NAS data storage is robust and safe) and I’m not looking after a hospital or company!

E_B · August 8, 2024, 6:36pm

I tried everything I could think of, and then reinstalled Scale 24.04.2 (and uploaded a recent config).

Even that hasn’t fixed it:

Doomed …doomed

E_B · August 9, 2024, 10:33am

This time my experiments worked, along the lines of

unset pool
delete the ix-applications dataset
reboot
set pool

I then did # watch -d k3s kubectl get pod -A

until I saw

and also

at which point I was satisfied that the problem was solved, inasmuch as I can now reinstall the apps (four out of the five will take a few tens of minutes, the last one I can’t remember how I first did it a couple of years ago and I’ve not needed to touch it since !).

It’s true that I may perhaps have made things worse along the way, but I did want to make progress now, whilst I had the opportunity, so I did my best by following various posts here and elsewhere.