Docker Apps and UUID issue with NVIDIA GPU after upgrade to 24.10

Morning all,

We’re tracking an issue with Apps that impacts NVIDIA users, which we’ve now added to the Known Issues page of our release notes.

Some users who have upgraded to 24.10.0 from a previous version, and who have applications with have NVIDIA GPU allocations, report the error Expected [uuid] to be set for GPU inslot [<some pci slot>] in [nvidia_gpu_selection]) (see NAS-132086).

Users experiencing this error should follow the steps below for a one time fix that should not need to be repeated.

Connect to a shell session and retrieve the UUID for each GPU with the command midclt call app.gpu_choices | jq.

For each application that experiences the error, run midclt call -job app.update APP_NAME '{"values": {"resources": {"gpus": {"use_all_gpus": false, "nvidia_gpu_selection": {"PCI_SLOT": {"use_gpu": true, "uuid": "GPU_UUID"}}}}}}'

Where:

  • APP_NAME is the name you entered in the application, for example “plex”.
  • PCI_SLOT is the pci slot identified in the error, for example "0000:2d:00.0”.
  • GPU_UUID is the UUID matching the pci slot that you retrieved with the above command.

Engineering is digging into the root cause of this - it may be related to the NVIDIA drivers being installed at first-boot, and the apps system isn’t refreshing the UUIDs correctly after the installation.

If you’re having an issue with NVIDIA that isn’t related to the missing UUIDs, please start a separate thread, and ideally include the exact text of the error message. If you have an issue with driver installation, please also include the /var/log/nvidia-installer.log file as an attachment.

12 Likes

Hi there,

Thanks for the quick update.

When I run the command I get the following output:

root@truenas[~]# midclt call -j app.update plex-new-new '{"values": {"resources": {"gpus": {"use_all_gpus": false, "nvidia_gpu_selection": {"0000:2b:00.0": {"use_gpu": true, "uuid": "GPU-f6edf301-01ce-f9db-a641-81b03b714f62"}}}}}}'  
[EBADMSG] Invalid method name
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 365, in on_message
    serviceobj, methodobj = self.middleware._method_lookup(message['method'])
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/utils/service/call.py", line 20, in _method_lookup
    raise CallError('Invalid method name', errno.EBADMSG)
middlewared.service_exception.CallError: [EBADMSG] Invalid method name

using the output from the command in this post. did I do something wrong?

No, that would be a typo on our side - it should be midclt call -job instead of just -j

Can you give it another shot?

That did the trick. Thanks!

1 Like

Thanks! Worked for mee too!

@HoneyBadger Thanks for this! Fixed my problems…

Fixed my problem after an update from Dragonfish to Eel.

Thanks!

Your post fixed my issue. Thank you so much!

It’s a lot easier than when I had to edit the file manually through TrueNAS’s shell.

Hello,

I migrated form “TrueNAS Scale Bluefin” to “Scale ElectricEel 24.10.1”, and updated “Frigate”, as a result I lost the GPU usage and had to rework the Frigate configuration using CPU instead of GPU hardware acceleration.

I then tackled the problem :

I first enabled the CPU Intel video in the BIOS, then installed the nVidia “RTX 3050 LP” card as a replacement for an old nVidia “Gigabyte GTX1060” card that used to work for Frigate under TrueNAS Scale 'Bluefin" before I ran successive TrueNAS updates.

( I had kept Bluefin so long because of the RSync change in support. )

I then applied the advised shell commands:

root@TrueNAS-Asus-i5[~]# midclt call app.gpu_choices

{“0000:00:02.0”: {“vendor”: “INTEL”, “description”: “Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller”, “vendor_specific_config”: {}, “pci_slot”: “0000:00:02.0”},

“0000:01:00.0”: {“vendor”: “NVIDIA”, “description”: “NVIDIA GeForce RTX 3050”, “vendor_specific_config”: {“uuid”: “GPU-5579b8ac-5f36-0a45-e8e8-2a3e995dcd5d”}, “pci_slot”: “0000:01:00.0”}}

root@TrueNAS-Asus-i5[~]# midclt call -job app.update frigate ‘{“values”: {“resources”: {“gpus”: {“use_all_gpus”: false, “nvidia_gpu_selection”: {“0000:01:00.0”: {“use_gpu”: true, “uuid”: “GPU-5579b8ac-5f36-0a45-e8e8-2a3e995dcd5d”}}}}}}’

It worked for me. Thanks for that piece of information !

Got Frigate to see and make use of nVidia “RTX 3050 LP”.

However, TrueNAS Scale ElectricEel 24.10.1 / “Advanced Settings” / “Isolated GPU device” / Configure / Isolated GPU PCI Ids / GPUs remains with “No options”.

I hope your teams will find the bug, thanks for the efforts.