When upgrading my machine from 24.10’s first release candidate to it’s second, all my apps seemingly disappear from my system.
The messages “Applications are not running” shows up in the app list app list and “Apps Service Pending” in the top-right corner of the app screen indefinitely.
I’ve figured out that un-setting the app pool and re-setting it to the same as it was brings my apps back until I restart the machine again in an odd “purgatory-state”, where their current status is not reported in the UI.
Starting them doesn’t work, unless I first explicitly stop them beforehand (listing the containers with docker show that they’re all in an “Exited” state).
Worse yet, upon restarting the server, the apps once again disappear, and I need to once more re-assign the pool for it to work.
Any idea what might’ve gone wrong here? Where I might even find logs on why the app system doesn’t seem to start?
I believe I might’ve found the culprit; Having checked “Install NDIVIA Drivers” [SIC] on RC1 and forwarding my Nvidia GPU on my Jellyfin container, it appears to have ve potentially stopped the app-system from initiating properly.
It would seem Nvidia drivers may be installed per boot-environment, and when upgrading, therefore may not be present on the new boot environment when a user upgrades their system, despite it’s checkbox still being checked.
In theory there should have been error handling in place to prevent this already, so potentially a bug. As Stux suggested, please submit a bug report and make sure you attach a debug file from the system on RC.2 so our devs can investigate.
I’ll chime in as I ran into a similar issue upgrading from RC2 to 24.10.
I had the same issue with apps disappearing that seemed to be fixed after unsetting and resetting the pool. However, my Jellyfin app now cannot have the GPU passed through to it and I believe the only way to fix it is creating a new container (which would be quite annoying as I now have everything set up again…)
I haven’t restarted my system since ‘fixing’ it by re-choosing the pool, so I’m unsure if the Nvidia drivers will persist after a reboot.
What might be happening here is that TrueNAS is automatically downloading and reinstalling Nvidia drivers after you upgrade, as intended, which prevents the apps service from starting until it is completed (and which could take a while). Since there isn’t any UI feedback while this install is happening it could look like the service has failed to start but you just need to wait a little longer.
Edit: Or it looks like there may have been a network error on boot which prevented the install/apps service from starting and it does not retry when that happens
Edit 2: A known issue for this has been added to the release notes and @wyrmling’s issue is being investigated
That would indicate that TrueNAS is not currently detecting an installed Nvidia GPU. I suppose that could mean that it’s in the process of installing drivers but I’m not sure without more information
I just tested with a newly created Jellyfin container and the passthrough worked just fine (nvidia-smi output correctly in the container shell). I’m unsure why the checkbox disappeared but if it is going to work with the new container I will try not to mess with it any more
Just for more info in case you’re curious, I have an NVIDIA GT 745 and an NVIDIA GTX 1650 installed in the system.
admin@truenas[~]$ nvidia-smi
Tue Oct 29 12:42:16 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05 Driver Version: 550.127.05 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 745 Off | 00000000:65:00.0 Off | N/A |
| 20% 48C P0 N/A / N/A | 0MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce GTX 1650 Off | 00000000:B3:00.0 Off | N/A |
| 39% 30C P0 N/A / 75W | 1MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+