/dev/nvidia-modeset missing

I’m trying to use docker compose and for awhile everything was working. I was able to load the module for nvidia_drm and setup nvidia-modeset, but all of a sudden, no reboot or anything, it’s gone. nvidia_drm is not found to load and /dev/nvidia-modeset is missing. I tried using nvidia-smi and it sees the video card. When i attempt to use it in plex for transcoding, i get a transcode error and no processes show up. it’s like it attempts to load a process but fails. Any help would be apprecaited!

additional details:
this is what i see when i try to load nvidia_drm
root@pyra:/# modprobe -r nvidia_drm ; modprobe nvidia_drm modeset=1
modprobe: FATAL: Module nvidia_drm not found.
modprobe: FATAL: Module nvidia_drm not found in directory /lib/modules/6.6.29-production+truenas

when i check that library, there is no module nvidia_drm.

Missing hardware description and software version…

Was there a reboot involved or just nothing?

whoops. sorry. at the time, no there was no reboot. I did reboot to see if the driver would load, but it did not.

I’m running Dragonfish-

the GPU is a 3050
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
| 0 NVIDIA GeForce RTX 3050 Off | 00000000:07:00.0 Off | N/A |
| 0% 56C P0 34W / 130W | 1MiB / 8192MiB | 2% Default |
| | | N/A |

| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
| No running processes found |

I can do this, but no process can use the card. It tried, the “no running processes found” disappears for a few seconds, then comes back and an error is presented in the application.
root@pyra:~# ls -la /dev | grep nvidia
drwxr-xr-x 2 root root 80 Jun 23 11:49 nvidia-caps
crw-rw-rw- 1 root root 239, 0 Jun 23 11:49 nvidia-uvm
crw-rw-rw- 1 root root 239, 1 Jun 23 11:49 nvidia-uvm-tools
crw-rw-rw- 1 root root 195, 0 Jun 23 11:49 nvidia0
crw-rw-rw- 1 root root 195, 255 Jun 23 11:49 nvidiactl

root@pyra:~# lsmod | grep nvidia
nvidia_uvm 1757184 0
nvidia_drm 118784 0
nvidia_modeset 1581056 1 nvidia_drm
nvidia 62382080 2 nvidia_uvm,nvidia_modeset
drm_kms_helper 270336 1 nvidia_drm
video 77824 2 asus_wmi,nvidia_modeset
drm 802816 4 drm_kms_helper,nvidia,nvidia_drm

additionally, here’s the error i get in dockage trying to load fileflows node with GPU support:

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: mount error: stat failed: /dev/nvidia-modeset: no such file or directory: unknown

ok so i got /dev/nvidia-modeset to show up by starting nvidia-persistenced. Looks like this daemon isn’t started at boot. is that a bug?

It may be… can you “report a bug” and copy the NAS case number here.

sure can. thanks!

NAS-129707 is the jira case.

i wonder if this is the cause of the “sometimes i can allocate a gpu and sometimes i cannot”?

as a workaround, just to state, i added "nvidia-persistenced; " to my post init command to start jailmaker.

Can you follow the advice on the NAS ticket and see if the issue goes away.

Thanks for documenting the workaround.

“Too many open files” is an error people seem to get when using QBitTorrent and other applicaitons…

See this post for a fix/discussion:

Thanks for the additional info. I wasn’t sure how to deal with the files issue. I don’t run any torrents though. We’ll see how it goes!

The same issue impacts any app that exhausts the default limit on inotify watchers, Syncthing for example.

1 Like

oh ok. I should learn more about that setting. I changed it to the posts recommendation and restarted my nas.