CUDA initialization failure with error: 100. Please check your CUDA installation:

Hello everybody, my nvidia drivers intallation has some sort of problem. after trying to install some custom apps (wich I failed) the system can’t no more use the graphic card.
Error seems to be in CUDA, ad per my frigate deploying log states:

Generating yolov7x-640.trt. This may take a few minutes.
2025-02-24 07:39:19.359145+00:002025-02-24T07:39:19.359145118Z
2025-02-24 07:39:19.672104+00:00Traceback (most recent call last):
2025-02-24 07:39:19.672144+00:00File “/usr/local/src/tensorrt_demos/yolo/onnx_to_tensorrt.py”, line 244, in
2025-02-24 07:39:19.672263+00:00main()
2025-02-24 07:39:19.672309+00:00File “/usr/local/src/tensorrt_demos/yolo/onnx_to_tensorrt.py”, line 229, in main
2025-02-24 07:39:19.672335+00:00engine = build_engine(
2025-02-24 07:39:19.672353+00:00File “/usr/local/src/tensorrt_demos/yolo/onnx_to_tensorrt.py”, line 114, in build_engine
2025-02-24 07:39:19.672395+00:00with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
2025-02-24 07:39:19.672421+00:00TypeError: pybind11::init(): factory function returned nullptr
2025-02-24 07:39:19.701560+00:00[02/24/2025-08:39:19] [TRT] [W] Unable to determine GPU memory usage
2025-02-24 07:39:19.701611+00:00[02/24/2025-08:39:19] [TRT] [W] Unable to determine GPU memory usage
2025-02-24 07:39:19.701621+00:00[02/24/2025-08:39:19] [TRT] [W] CUDA initialization failure with error: 100. Please check your CUDA installation: 1. Introduction — Installation Guide for Linux 12.8 documentation
2025-02-24 07:39:19.701633+00:00Loading the ONNX file…
2025-02-24 07:39:19.701701+00:00Available tensorrt models:
2025-02-24 07:39:19.703197+00:00ls: cannot access '
.trt’: No such file or directory
2025-02-24 07:39:19.703576+00:00s6-rc: warning: unable to start service trt-model-prepare: command exited 2

Do you know a way to get rid of actual CUDA toolkit so that I can install the proper one back?

Already tried removing NVIDIA drivers from apps/Configuration/settings/
Install NVIDIA Drivers and restart the system, but no luck

thx in advance for your preciuos help

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05             Driver Version: 550.127.05     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1060 6GB    Off |   00000000:08:00.0 Off |                  N/A |
|  0%   44C    P0             22W /  120W |       0MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Are you using the Frigate App from the TrueNAS catalog, or your own YAML?

Truenas Catalogue, but same error appears also on Steam-Headless app that uses the nvidia gpu

CUDA related specifically even on Steam-Headless?

actually not, it just kicks me out as I try to log in, I installed Steam headless 'cause it gave me the nvidia-setting gui that allowed me control my 1060 fan (not spinning without it), here’s the log form steam, btw

2025-02-25 15:34:00.023261+00:00PULSEAUDIO: Starting pulseaudio service
2025-02-25 15:34:00.037421+00:002025-02-25 16:34:00,037 INFO reaped unknown pid 269 (exit status 0)
2025-02-25 15:34:01.028353+00:002025-02-25 16:34:01,028 INFO success: udev entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-02-25 15:34:01.028418+00:002025-02-25 16:34:01,028 INFO success: xorg entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-02-25 15:34:01.028469+00:002025-02-25 16:34:01,028 INFO success: audiostream entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-02-25 15:34:01.028506+00:002025-02-25 16:34:01,028 INFO success: frontend entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-02-25 15:34:01.028576+00:002025-02-25 16:34:01,028 INFO success: pulseaudio entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-02-25 15:34:01.028676+00:002025-02-25 16:34:01,028 INFO success: x11vnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-02-25 15:34:01.028727+00:002025-02-25 16:34:01,028 INFO success: desktop entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-02-25 15:34:01.028804+00:002025-02-25 16:34:01,028 INFO success: sunshine entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-02-25 15:34:01.228502+00:002025-02-25 16:34:01,228 INFO reaped unknown pid 321 (exit status 1)
2025-02-25 15:34:08.054260+00:002025-02-25 16:34:08,054 INFO reaped unknown pid 367 (exit status 0)
2025-02-25 15:34:08.054353+00:002025-02-25 16:34:08,054 INFO reaped unknown pid 369 (exit status 0)
2025-02-25 15:34:08.054423+00:002025-02-25 16:34:08,054 INFO reaped unknown pid 371 (exit status 0)
2025-02-25 15:34:08.054448+00:002025-02-25 16:34:08,054 INFO reaped unknown pid 373 (exit status 0)
2025-02-25 15:34:08.054492+00:002025-02-25 16:34:08,054 INFO reaped unknown pid 375 (exit status 0)
2025-02-25 15:34:08.054552+00:002025-02-25 16:34:08,054 INFO reaped unknown pid 377 (exit status 0)
2025-02-25 15:34:08.054575+00:002025-02-25 16:34:08,054 INFO reaped unknown pid 379 (exit status 0)
2025-02-25 15:34:21.852068+00:002025-02-25 16:34:21,851 INFO reaped unknown pid 416 (exit status 0)
2025-02-25 15:34:21.852131+00:002025-02-25 16:34:21,851 INFO reaped unknown pid 418 (exit status 0)
2025-02-25 15:34:22.147981+00:002025-02-25 16:34:22,147 WARN exited: desktop (exit status 84; not expected)
2025-02-25 15:34:22.148160+00:002025-02-25 16:34:22,148 INFO reaped unknown pid 311 (terminated by SIGHUP)
2025-02-25 15:34:22.149891+00:002025-02-25 16:34:22,149 INFO spawned: 'desktop' with pid 428
2025-02-25 15:34:22.150311+00:002025-02-25 16:34:22,150 WARN exited: xorg (exit status 134; not expected)
2025-02-25 15:34:22.150403+00:002025-02-25 16:34:22,150 WARN exited: x11vnc (exit status 3; not expected)
2025-02-25 15:34:22.152092+00:002025-02-25 16:34:22,151 INFO spawned: 'xorg' with pid 430
2025-02-25 15:34:22.153597+00:002025-02-25 16:34:22,153 INFO spawned: 'x11vnc' with pid 432
2025-02-25 15:34:22.153835+00:002025-02-25 16:34:22,153 INFO reaped unknown pid 353 (exit status 1)
2025-02-25 15:34:22.164748+00:002025-02-25 16:34:22,164 INFO reaped unknown pid 443 (exit status 0)
2025-02-25 15:34:22.339716+00:002025-02-25 16:34:22,339 INFO reaped unknown pid 466 (exit status 1)
2025-02-25 15:34:23.427370+00:002025-02-25 16:34:23,427 INFO success: xorg entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-02-25 15:34:23.427436+00:002025-02-25 16:34:23,427 INFO success: x11vnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-02-25 15:34:23.427454+00:002025-02-25 16:34:23,427 INFO success: desktop entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-02-25 15:34:31.157844+00:002025-02-25 16:34:31,157 WARN exited: sunshine (exit status 11; not expected)
2025-02-25 15:34:32.160741+00:002025-02-25 16:34:32,160 INFO spawned: 'sunshine' with pid 517
2025-02-25 15:34:33.161565+00:002025-02-25 16:34:33,161 INFO success: sunshine entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-02-25 15:34:41.439167+00:002025-02-25 16:34:41,439 WARN exited: desktop (exit status 84; not expected)
2025-02-25 15:34:41.439244+00:002025-02-25 16:34:41,439 INFO reaped unknown pid 455 (terminated by SIGHUP)
2025-02-25 15:34:41.441418+00:002025-02-25 16:34:41,441 INFO spawned: 'desktop' with pid 546
2025-02-25 15:34:41.441847+00:002025-02-25 16:34:41,441 WARN exited: xorg (exit status 134; not expected)
2025-02-25 15:34:41.441979+00:002025-02-25 16:34:41,441 WARN exited: x11vnc (exit status 3; not expected)
2025-02-25 15:34:41.444164+00:002025-02-25 16:34:41,443 INFO spawned: 'xorg' with pid 547
2025-02-25 15:34:41.445787+00:002025-02-25 16:34:41,445 INFO spawned: 'x11vnc' with pid 548
2025-02-25 15:34:41.454503+00:002025-02-25 16:34:41,454 INFO reaped unknown pid 495 (exit status 1)
2025-02-25 15:34:41.456807+00:002025-02-25 16:34:41,456 INFO reaped unknown pid 558 (exit status 0)
2025-02-25 15:34:41.637660+00:002025-02-25 16:34:41,637 INFO reaped unknown pid 581 (exit status 1)
2025-02-25 15:34:42.716283+00:002025-02-25 16:34:42,716 INFO success: xorg entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-02-25 15:34:42.716334+00:002025-02-25 16:34:42,716 INFO success: x11vnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

Which custom apps were you trying to install? It sounds like someone in the nvidia docker container might be having an issue - and it might need to go as far as having the pool get unset (which would remove any non-persistent data like iXvolumes) in order to flush that out.

It was completerly my fault HB, was trying to install a straight gui for nvida-settings in order to avoid steam-headless and all it’s processes… but epically failed!

now, I’ll unset again the apps pool, and then?
also delete le nvidia drivers before that?

If the custom container or YAML potentially tried to put its own plumbing into Docker it may have messed with the ability to assign GPUs that way.

I would say unset the NVIDIA driver checkbox, ensure that you’ve backed up all of your data in the Apps area because unsetting the pool will likely remove your iX-Volumes, and then unset the pool from the Apps menu.

After you’ve done those steps, reboot, set the Apps pool up again, and then install the NVIDIA drivers one more time.

1 Like

no luck, keeps telling me CUDA installation faulire…

2025-02-25 22:22:14.535237+00:00Loading the ONNX file...

2025-02-25 22:22:24.849819+00:002025-02-25T22:22:24.849819523Z

2025-02-25 22:22:24.849915+00:00Generating yolov7x-640.trt. This may take a few minutes.

2025-02-25 22:22:24.849926+00:002025-02-25T22:22:24.849926553Z

2025-02-25 22:22:25.121546+00:00Traceback (most recent call last):

2025-02-25 22:22:25.121581+00:00File "/usr/local/src/tensorrt_demos/yolo/onnx_to_tensorrt.py", line 244, in <module>

2025-02-25 22:22:25.121666+00:00main()

2025-02-25 22:22:25.121688+00:00File "/usr/local/src/tensorrt_demos/yolo/onnx_to_tensorrt.py", line 229, in main

2025-02-25 22:22:25.121738+00:00engine = build_engine(

2025-02-25 22:22:25.121756+00:00File "/usr/local/src/tensorrt_demos/yolo/onnx_to_tensorrt.py", line 114, in build_engine

2025-02-25 22:22:25.121801+00:00with trt.Builder(TRT_LOGGER) as builder, builder.create_network(*EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:

2025-02-25 22:22:25.121826+00:00TypeError: pybind11::init(): factory function returned nullptr

2025-02-25 22:22:25.154183+00:00[02/25/2025-23:22:25] [TRT] [W] Unable to determine GPU memory usage

2025-02-25 22:22:25.154263+00:00[02/25/2025-23:22:25] [TRT] [W] Unable to determine GPU memory usage

2025-02-25 22:22:25.154284+00:00[02/25/2025-23:22:25] [TRT] [W] CUDA initialization failure with error: 100. Please check your CUDA installation: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

2025-02-25 22:22:25.154305+00:00Loading the ONNX file...

2025-02-25 22:22:25.154339+00:00Available tensorrt models:

2025-02-25 22:22:25.156942+00:00ls: cannot access '*.trt': No such file or directory

and now also Immich stopped working:

[EFAULT] Failed ‘up’ action for ‘immich’ app. Please check /var/log/app_lifecycle.log for more details

Traceback (most recent call last):
File “/usr/lib/python3/dist-packages/middlewared/job.py”, line 509, in run
await self.future
File “/usr/lib/python3/dist-packages/middlewared/job.py”, line 556, in __run_body
rv = await self.middleware.run_in_thread(self.method, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1367, in run_in_thread
return await self.run_in_executor(io_thread_pool_executor, method, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1364, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.11/concurrent/futures/thread.py”, line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/schema/processor.py”, line 183, in nf
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/schema/processor.py”, line 55, in nf
res = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/plugins/apps/app_scale.py”, line 51, in start
compose_action(app_name, app_config[‘version’], ‘up’, force_recreate=True, remove_orphans=True)
File “/usr/lib/python3/dist-packages/middlewared/plugins/apps/compose_utils.py”, line 61, in compose_action
raise CallError(err_msg)
middlewared.service_exception.CallError: [EFAULT] Failed ‘up’ action for ‘immich’ app. Please check /var/log/app_lifecycle.log for more details

a brief part of /var/log/app_lifecycle.log:

Container ix-immich-machine-learning-1 Creating
Container ix-immich-pgvecto-1 Created
Container ix-immich-machine-learning-1 Created
Container ix-immich-redis-1 Created
Container ix-immich-server-1 Creating
Container ix-immich-server-1 Created
Container ix-immich-permissions-1 Starting
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/moby/322b0d7df6596036fa6175e1dbd6eef3>

[2025/02/26 07:09:42] (ERROR) app_lifecycle.compose_action():56 - Failed ‘up’ action for ‘immich’ app: Network ix-immich_default Creating
Network ix-immich_default Created
Container ix-immich-permissions-1 Creating
Container ix-immich-permissions-1 Created
Container ix-immich-redis-1 Creating
Container ix-immich-pgvecto-1 Creating
Container ix-immich-machine-learning-1 Creating
Container ix-immich-redis-1 Created
Container ix-immich-pgvecto-1 Created
Container ix-immich-machine-learning-1 Created
Container ix-immich-server-1 Creating
Container ix-immich-server-1 Created
Container ix-immich-permissions-1 Starting
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/moby/2cb14db486dbb2b27a1a02f293425acd>

[2025/02/26 07:12:13] (ERROR) app_lifecycle.compose_action():56 - Failed ‘up’ action for ‘frigate’ app: Network ix-frigate_default Creating
Network ix-frigate_default Created
Container ix-frigate-frigate-1 Creating
Container ix-frigate-frigate-1 Created
Container ix-frigate-frigate-1 Starting
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/moby/66dbb5b7ca8f56a9aec4c9404387855b>

[2025/02/26 07:12:22] (ERROR) app_lifecycle.compose_action():56 - Failed ‘up’ action for ‘frigate’ app: Network ix-frigate_default Creating
Network ix-frigate_default Created
Container ix-frigate-frigate-1 Creating
Container ix-frigate-frigate-1 Created
Container ix-frigate-frigate-1 Starting
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/moby/7597973d1db142cb1d2f55a1f17a930f>

[2025/02/26 07:12:44] (ERROR) app_lifecycle.compose_action():56 - Failed ‘up’ action for ‘frigate’ app: Network ix-frigate_default Creating
Network ix-frigate_default Created
Container ix-frigate-frigate-1 Creating
Container ix-frigate-frigate-1 Created
Container ix-frigate-frigate-1 Starting
Error response from daemon: could not select device driver “nvidia” with capabilities: [[gpu]]