Cannot assign GPU to App - nvidia-device-plugin CrashLoopBackOff

After moving my TrueNas SCALE 24.04 to new hardware(had to reinstall boot-drives and import config) I canot assign my Nvidia T400 GPU to Plex anymore. I moved from a Supermicro x11SPL-F to X11SDV-4C-TP8F.

I have tried opening a suppot ticket([NAS-128964] - iXsystems TrueNAS Jira) with ixststems, but it got closed as they found someone one the nvidia-device-plugins github with the sam issue, stating everything ok on their side. (The github problem does not have a solution to the problem, but poijts to some DNS issues, DNS works fine with all other apps in TrueNAS)

The nvidia-device-plugin pod on k3s keeps crashing with the error CrashLoopBackOff.
The only logs I can see from the pod is

2024/05/24 05:39:56 Starting FS watcher.
2024/05/24 05:39:56 Starting OS watcher.
2024/05/24 05:39:56 Starting Plugins.
2024/05/24 05:39:56 Loading configuration.
2024/05/24 05:39:56 Updating config with default resource matching patterns.
2024/05/24 05:39:56
Running with config:
{
“version”: “v1”,
“flags”: {
“migStrategy”: “none”,
“failOnInitError”: true,
“nvidiaDriverRoot”: “/”,
“gdsEnabled”: false,
“mofedEnabled”: false,
“plugin”: {
“passDeviceSpecs”: false,
“deviceListStrategy”: “envvar”,
“deviceIDStrategy”: “uuid”
}
},
“resources”: {
“gpus”: [
{
“pattern”: “*”,
“name”: “nvidia.com/gpu
}
]
},
“sharing”: {
“timeSlicing”: {
“resources”: [
{
“name”: “nvidia.com/gpu”,
“devices”: “all”,
“replicas”: 5
}
]
}
}
}
2024/05/24 05:39:56 Retreiving plugins.
2024/05/24 05:39:56 Detected NVML platform: found NVML library
2024/05/24 05:39:56 Detected non-Tegra platform: /sys/devices/soc0/family file not found
2024/05/24 05:39:57 Starting GRPC server for ‘nvidia.com/gpu
2024/05/24 05:39:57 Starting to serve ‘nvidia.com/gpu’ on /var/lib/kubelet/device-plugins/nvidia-gpu.sock
2024/05/24 05:39:57 Registered device plugin for ‘nvidia.com/gpu’ with Kubelet
2024/05/24 05:41:22 Received signal “terminated”, shutting down.
2024/05/24 05:41:22 Stopping plugins.
2024/05/24 05:41:22 Stopping to serve ‘nvidia.com/gpu’ on /var/lib/kubelet/device-plugins/nvidia-gpu.sock

So Im still left with out hw-transcoding and a lot of wonders of what might have caused this.

Have anyone else experienced something simmilar, or have any clues on how to fix this?

24.04.1 did not provide any solution

hi i am having the same problem, did you manage to solve it?

Have not fixed this, awaiting 24.10 and docker to see if that will solve it. As the problem now is with Kubernetes, I have assumed it will not be fixed due to the move to docker quite soon.

1 Like