NVIDIA GPU Not Being Used By Apps - ElectricEel

Hi All,

I’m wracking by brains trying to work out how I can get my NVIDIA GTX 1660 working with Plex (transcoding) and Immich (machine learning). I recently upgraded from TrueNAS-Core to Scale (Dragonfish).

I can confrim that Plex is not using my GPU for encoding, as my CPU usage spikes considerably when it’s transcoding, and there are no processes present in when I run nvidia-smi while transcoding.

When I run the maching learning pods in Immich i get continual ERROR Worker was sent code 139 which is a SIGSEGV memory violation error.

I think the issue is that my GPU is being used by something and is not available to the system, as [VGA Controller] is listed after the GPU when I run lspci – if I understand the meaning of that correctly.

TrueNAS Scale Version: ElectricEel-24.10.0
Plex Version: 1.0.24
Immich Version: 1.6.24

I do not have any displays connected.

I have followed this post which details adding the following code…

resources:
  gpus:
    nvidia_gpu_selection:
      '0000:07:00.0':
        use_gpu: true
        uuid: ''  <<-- the problem
        use_all_gpus: false

… to the user_config.yaml file, located in the ixVolume volume, found at /mnt/.ix-apps/user_config.yaml, and setting the IOMMU and UUID values correcty – which I have done.

I also came across this post. However, I’m able to run the nvidia-smi command without errors.

Interestingly, I don’t have any of the following files on my system:

/etc/modprobe.d/kvm.conf
/etc/modprobe.d/nvidia.conf
/etc/modprobe.d/vfio.conf

My system also doesn’t present me with any GPUs avaible for isolation, as shown in the screenshot further below.

Is anyone able to point me in the right direction as to what I should do?


— — — Additional Info — — —

nvidia-smi Output

root@truenas[~]# nvidia-smi
Thu Oct 31 12:24:16 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05             Driver Version: 550.127.05     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1660 ...    Off |   00000000:01:00.0 Off |                  N/A |
| 28%   43C    P0             N/A /  125W |       1MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+



modprobe Output

root@truenas[~]# modprobe nvidia_current_drm
modprobe: FATAL: Module nvidia_current_drm not found in directory /lib/modules/6.6.44-production+truenas
root@truenas[~]# modprobe nvidia-current
modprobe: FATAL: Module nvidia-current not found in directory /lib/modules/6.6.44-production+truenas



lsmod Output

root@truenas[~]# lsmod | grep nvidia
nvidia_uvm           4911104  0
nvidia_drm            118784  0
nvidia_modeset       1605632  1 nvidia_drm
nvidia              60620800  2 nvidia_uvm,nvidia_modeset
drm_kms_helper        249856  4 ast,nvidia_drm
drm                   757760  6 drm_kms_helper,ast,drm_shmem_helper,nvidia,nvidia_drm
video                  73728  1 nvidia_modeset



lspci Output

root@truenas[~]# lspci -v
...
01:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660 SUPER] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: NVIDIA Corporation TU116 [GeForce GTX 1660 SUPER]
        Flags: bus master, fast devsel, latency 0, IRQ 16, IOMMU group 1
        Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at f0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at e000 [size=128]
        Expansion ROM at f7000000 [virtual] [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] Secondary PCI Express
        Capabilities: [bb0] Physical Resizable BAR
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia


No GPUs to Isolate


Application Settings


Plex Resources


Plex Transcoding Settings - No specific GPU available

You do not want to Isolate GPUs, that for use with VMs.

From where I am looking seems like everything is in place and should be working, might be worthwhile creating a ticket for investigation.

Hi @william, thanks for the reply.

Yeah, I know that I don’t want to Isolate GPUs for Apps, I just thought it was weird that my GPU wasn’t showing there at all.

How/where can I create a ticket for investigation?

Its probably not showing there because you have other devices in the same IOMMU group.

https://ixsystems.atlassian.net/

These are all the iommu groups I currently have.

root@truenas[~]# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/7/devices/0000:00:1c.4
/sys/kernel/iommu_groups/5/devices/0000:00:1c.0
/sys/kernel/iommu_groups/3/devices/0000:00:19.0
/sys/kernel/iommu_groups/11/devices/0000:04:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.2
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.3
/sys/kernel/iommu_groups/1/devices/0000:01:00.1
/sys/kernel/iommu_groups/8/devices/0000:00:1d.0
/sys/kernel/iommu_groups/6/devices/0000:00:1c.1
/sys/kernel/iommu_groups/4/devices/0000:00:1a.0
/sys/kernel/iommu_groups/12/devices/0000:05:00.0
/sys/kernel/iommu_groups/2/devices/0000:00:14.0
/sys/kernel/iommu_groups/10/devices/0000:03:00.0
/sys/kernel/iommu_groups/10/devices/0000:02:00.0
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/9/devices/0000:00:1f.2
/sys/kernel/iommu_groups/9/devices/0000:00:1f.0
/sys/kernel/iommu_groups/9/devices/0000:00:1f.3
/sys/kernel/iommu_groups/9/devices/0000:00:1f.6

It seems that my GPU 0000:01:00.0 is the only device in it’s group, and each of it’s other components also has it’s own group.

root@truenas[~]# lspci -Dnn | grep -i NVIDIA
0000:01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU116 [GeForce GTX 1660 SUPER] [10de:21c4] (rev a1)
0000:01:00.1 Audio device [0403]: NVIDIA Corporation TU116 High Definition Audio Controller [10de:1aeb] (rev a1)
0000:01:00.2 USB controller [0c03]: NVIDIA Corporation TU116 USB 3.1 Host Controller [10de:1aec] (rev a1)
0000:01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU116 USB Type-C UCSI Controller [10de:1aed] (rev a1)


I also ran the following command:

root@truenas[~]# dmesg | grep -i 'vga\|display\|nvidia'

[    0.211805] pci 0000:01:00.0: vgaarb: setting as boot VGA device
[    0.211805] pci 0000:01:00.0: vgaarb: bridge control possible
[    0.211805] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.211805] pci 0000:03:00.0: vgaarb: setting as boot VGA device (overriding previous)
[    0.211805] pci 0000:03:00.0: vgaarb: bridge control possible
[    0.211805] pci 0000:03:00.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    0.211805] vgaarb: loaded
[    0.694443] fb0: EFI VGA frame buffer device
[   12.876613] ast 0000:03:00.0: vgaarb: deactivate vga console
[   12.876769] ast 0000:03:00.0: [drm] Using analog VGA
[   12.907490] snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio client
[   12.962552] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input6
[   12.963921] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input7
[   12.963960] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input8
[   12.963992] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card0/input9
[   13.529272] nvidia-nvlink: Nvlink Core is being initialized, major device number 241
[   13.530190] nvidia 0000:01:00.0: enabling device (0000 -> 0003)
[   13.530340] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[   13.576894] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  550.127.05  Tue Oct  8 03:22:07 UTC 2024
[   13.618688] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  550.127.05  Tue Oct  8 02:56:05 UTC 2024
[   13.626835] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[   13.626838] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
[  112.662664] audit: type=1400 audit(1730371701.484:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=3350 comm="apparmor_parser"
[  112.663725] audit: type=1400 audit(1730371701.484:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=3350 comm="apparmor_parser"
[  164.737708] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[  164.794357] nvidia-uvm: Loaded the UVM driver, major device number 237.

This shows both my GPUs.

My other VGA device, which is my motherboard’s ASpeed AST, identified as ast 0000:03:00.0.
This device seems to be designated as the boot VGA, while it seems my NVIDIA GPU (located at 0000:01:00.0) has also been configured with VGA arbitration – assuming that’s what the references to VGA and framebuffer mean.

I ran into the same issue during the beta (still not fixed) - the TrueNAS middleware is apparently (sometimes) unable to determin the UUID of the GPU, and so the container cant use the GPU.

Now you might be thinking that you can just run nvidia-smi -L in the shell to grab the UUID and then pass that on to the APP to skip the middleware auto detection.

This would work if the Plex APP configuration allowed you to set the UUID manually, which it does not. You get an error that this variable has been defined by the developer already.

So bottomline is that you cannot get this to work until iX fixes the middleware bug.

How you can get it to work is by using Portainer, where you can manually set the UUID and so transcoding will work just fine. :slight_smile:

For now I would avoid using the TrueNAS APPS when it comes to GPU accelerated tasks.

Where exactly do you get this error?
Once I followed the instructions to add the uuid in the user_config.yaml file, both containers for Plex and Immich delpoy successfully. Other in the linked post have also said it solved their problem.

I believe my problem here is that my NVIDIA graphics card is being assigned by either my BIOS or TrueNAS and is then not available for use by any Apps.

I have a similar issue. I updated user_config.yaml all starts up fine but hw transcoding does not work. Created a ticket which was immediately closed :(.

Do you have a link to the ticket?
If it was immediately closed, it was probably becasue it was a duplicate of an already reported issue.

https://ixsystems.atlassian.net/issues/NAS-132130?filter=-4

I did mention that it’s not the same issue, but was still closed as a duplicate.

That issue doesn’t seem to exist. Or at least I don’t have permission to view it.

same…
but I also created a ticket as well. I do not have th option to even select a GPU…only non-nvidia device

I have logged a bug in the ix-system Jira.

https://ixsystems.atlassian.net/browse/NAS-132145

1 Like

Apps → Plex → edit configuration → add Enviroment Variables for NVIDIA_VISIBLE _DEVICES and NVIDIA_DRIVER_CAPABILITIES → try to save → Warning that the developer allready deficed these variables.

Solution I choose was to deploy the Plex container myself using Portainer :slight_smile:

I added these, it allowed me to save it…just getting Transcode…no HW so able to save. but no go on that one

Isn’t this down to the known issue of the middleware being unable to detect the GPU’s UUID and so feeds the container with an empty UUID?

https://ixsystems.atlassian.net/browse/NAS-131590?focusedCommentId=278994

I think that we are talking about different things as you cannot add these as env to the Plex APP from TrueNAS. :slight_smile:

worked for me…im deployed with these settings and using Plex as We speak
was trying to add a screenshot but it won’t let me of some reason…

Well that is extemely interesting…

I decided to say screw it, and just created a brand new (ElectricEel) Plex container, and now this shows up:

And according to nvidia-smi Plex is now using my GPU:

root@truenas[~]# nvidia-smi
Thu Oct 31 18:38:35 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05             Driver Version: 550.127.05     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1660 ...    Off |   00000000:01:00.0 Off |                  N/A |
| 40%   47C    P2             38W /  125W |     374MiB /   6144MiB |      5%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A    126273      C   ...lib/plexmediaserver/Plex Transcoder        370MiB |
+-----------------------------------------------------------------------------------------+

I can’t believe it was that simple. Perhaps ElectricEel did fix my issue after all…

I’ll try with Immich as well, although that will take a lot longer as it will need to search through all my photos again.

I guess I just stick with this new Plex container and re-import all my media.

1 Like

So im not sure that my GPU is being seen by Electric eel…

when I try to do the watch Nvidia command…nothing is showing up? odd…I can see it under the settings advanced settings isolate GPU