Cannot pass through Nvidia Tesla P4 GPU to newly created Ubuntu 22.04 VM

rodneysing · December 14, 2024, 4:45am

After confirming the installed GPU resources were detectable with the lspci | grep -i nvidia command and isolating GPU resources for use with a VM or container, I saw and added said resources in the VM config and went to spin up the VM. It failed and here is the error message I got:

Traceback (most recent call last):
File “/usr/lib/python3/dist-packages/middlewared/plugins/vm/supervisor/supervisor.py”, line 189, in start
if self.domain.create() < 0:
^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/libvirt.py”, line 1373, in create
raise libvirtError(‘virDomainCreate() failed’)
libvirt.libvirtError: internal error: process exited while connecting to monitor: 2024-12-14T04:40:30.869788Z qemu-system-x86_64: -device {“driver”:“vfio-pci”,“host”:“0000:03:00.0”,“id”:“hostdev0”,“bus”:“pci.0”,“addr”:“0x7”}: vfio 0000:03:00.0: failed to setup container for group 13: Failed to set iommu for container: Operation not permitted

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 208, in call_method
result = await self.middleware.call_with_audit(message[‘method’], serviceobj, methodobj, params, self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1526, in call_with_audit
result = await self._call(method, serviceobj, methodobj, params, app=app,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1457, in _call
return await methodobj(*prepared_call.args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/schema/processor.py”, line 179, in nf
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/schema/processor.py”, line 49, in nf
res = await f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/plugins/vm/vm_lifecycle.py”, line 58, in start
await self.middleware.run_in_thread(self._start, vm[‘name’])
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1364, in run_in_thread
return await self.run_in_executor(io_thread_pool_executor, method, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1361, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.11/concurrent/futures/thread.py”, line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/plugins/vm/vm_supervisor.py”, line 68, in _start
self.vms[vm_name].start(vm_data=self._vm_from_name(vm_name))
File “/usr/lib/python3/dist-packages/middlewared/plugins/vm/supervisor/supervisor.py”, line 198, in start
raise CallError(‘\n’.join(errors))
middlewared.service_exception.CallError: [EFAULT] internal error: process exited while connecting to monitor: 2024-12-14T04:40:30.869788Z qemu-system-x86_64: -device {“driver”:“vfio-pci”,“host”:“0000:03:00.0”,“id”:“hostdev0”,“bus”:“pci.0”,“addr”:“0x7”}: vfio 0000:03:00.0: failed to setup container for group 13: Failed to set iommu for container: Operation not permitted

This happened in any GPU mode I selected in the UI.

Is this a Jira case I need to log, or am I doing something wrong?

Thanks!
-Rodney

rodneysing · December 14, 2024, 4:53am

…sorry, forgot to add common thread seems to be “failed to setup container for group 13”. Does this perhaps indict a permissions issues?

Thanks again!
-Rodney

rodneysing · December 14, 2024, 7:15pm

So, I tried re-installing the Nvidia drivers and got the following error:

Traceback (most recent call last):
File “/usr/lib/python3/dist-packages/middlewared/job.py”, line 488, in run
await self.future
File “/usr/lib/python3/dist-packages/middlewared/job.py”, line 535, in __run_body
rv = await self.middleware.run_in_thread(self.method, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1364, in run_in_thread
return await self.run_in_executor(io_thread_pool_executor, method, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1361, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.11/concurrent/futures/thread.py”, line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/plugins/nvidia.py”, line 65, in install
self._install_driver(job, td, path)
File “/usr/lib/python3/dist-packages/middlewared/plugins/nvidia.py”, line 133, in _install_driver
subprocess.run([path, “–tmpdir”, td, “-s”], capture_output=True, check=True, text=True)
File “/usr/lib/python3.11/subprocess.py”, line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command ‘[’/root/tmpca9l3275/NVIDIA-Linux-x86_64-550.135-no-compat32.run’, ‘–tmpdir’, ‘/root/tmpca9l3275’, ‘-s’]’ returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/lib/python3/dist-packages/middlewared/job.py”, line 488, in run
await self.future
File “/usr/lib/python3/dist-packages/middlewared/job.py”, line 533, in __run_body
rv = await self.method(*args)
^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/schema/processor.py”, line 49, in nf
res = await f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/schema/processor.py”, line 179, in nf
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/plugins/docker/update.py”, line 106, in do_update
await (
File “/usr/lib/python3/dist-packages/middlewared/job.py”, line 436, in wait
raise self.exc_info[1]
File “/usr/lib/python3/dist-packages/middlewared/job.py”, line 492, in run
raise handled
middlewared.service_exception.CallError: [EFAULT] Command /root/tmpca9l3275/NVIDIA-Linux-x86_64-550.135-no-compat32.run --tmpdir /root/tmpca9l3275 -s failed (code 1):
Verifying archive integrity… OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 550.135…

ERROR: Unable to load the kernel module ‘nvidia.ko’. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries ‘Kernel module load error’ and ‘Kernel messages’ at the end of the file ‘/var/log/nvidia-installer.log’ for more information.

ERROR: Installation has failed. Please see the file ‘/var/log/nvidia-installer.log’ for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

Anyone had this issue?

Going to check out the readme file.

Thanks!
-Rodney

HoneyBadger · December 14, 2024, 7:19pm

This usually happens if the vfio driver is refusing to let go of the NVIDIA card. You’ve never isolated the card at all, I assume?

I know we’ve also changed the method of driver install in the soon-to-arrive 24.10.1 - that may correct the problem.

rodneysing · December 14, 2024, 9:05pm

Hi Chris,

Thanks for the response. Saw the ECC piece. Great discussion. I’m in the ECC camp too. The Dell PowerEdge 410 I’m running TrueNAS on was rescued from the reclamation pile. The 256G of ECC RAM cost me less than $200 bucks. Short money as far as I was concerned, and you get added benefits.

Is the new kit still coming out on the 17th? For sure I’ll give it a try. Looks like some form of driver did get installed, after I rebooted, as I was able to run the Nvidia watch command and see the install GPU resources.

I wanted to use my installed Nvidia Tesla P4 GPU card to speed up iX Ollama, but when I go to select a GPU, it says unknown. And after selecting and saving, I get the error shown below….

Traceback (most recent call last):
File “/usr/lib/python3/dist-packages/middlewared/job.py”, line 488, in run
await self.future
File “/usr/lib/python3/dist-packages/middlewared/job.py”, line 535, in __run_body
rv = await self.middleware.run_in_thread(self.method, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1364, in run_in_thread
return await self.run_in_executor(io_thread_pool_executor, method, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1361, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.11/concurrent/futures/thread.py”, line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/service/crud_service.py”, line 268, in nf
rv = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/schema/processor.py”, line 55, in nf
res = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/schema/processor.py”, line 183, in nf
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/plugins/apps/crud.py”, line 287, in do_update
app = self.update_internal(job, app, data, trigger_compose=app[‘state’] != ‘STOPPED’)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/plugins/apps/crud.py”, line 317, in update_internal
update_app_config(app_name, app[‘version’], new_values, custom_app=app[‘custom_app’])
File “/usr/lib/python3/dist-packages/middlewared/plugins/apps/ix_apps/lifecycle.py”, line 59, in update_app_config
render_compose_templates(
File “/usr/lib/python3/dist-packages/middlewared/plugins/apps/ix_apps/lifecycle.py”, line 50, in render_compose_templates
raise CallError(f’Failed to render compose templates: {cp.stderr}')
middlewared.service_exception.CallError: [EFAULT] Failed to render compose templates: Traceback (most recent call last):
File “/usr/bin/apps_render_app”, line 33, in
sys.exit(load_entry_point(‘apps-validation==0.1’, ‘console_scripts’, ‘apps_render_app’)())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/catalog_templating/scripts/render_compose.py”, line 47, in main
render_templates_from_path(args.path, args.values)
File “/usr/lib/python3/dist-packages/catalog_templating/scripts/render_compose.py”, line 19, in render_templates_from_path
rendered_data = render_templates(
^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/catalog_templating/render.py”, line 36, in render_templates
).render({‘ix_lib’: template_libs, ‘values’: test_values})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/jinja2/environment.py”, line 1301, in render
self.environment.handle_exception()
File “/usr/lib/python3/dist-packages/jinja2/environment.py”, line 936, in handle_exception
raise rewrite_traceback_stack(source=source)
File “/mnt/.ix-apps/app_configs/ollama-local/versions/1.0.18/templates/docker-compose.yaml”, line 3, in top-level template code
{% set c1 = tpl.add_container(values.consts.ollama_container_name, values.ollama.image_selector) %}
^^^^^^^^^^^^^^^^^^^^^^^^^
File “/mnt/.ix-apps/app_configs/ollama-local/versions/1.0.18/templates/library/base_v2_1_0/render.py”, line 53, in add_container
container = Container(self, name, image)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/mnt/.ix-apps/app_configs/ollama-local/versions/1.0.18/templates/library/base_v2_1_0/container.py”, line 68, in init
self.deploy: Deploy = Deploy(self._render_instance)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/mnt/.ix-apps/app_configs/ollama-local/versions/1.0.18/templates/library/base_v2_1_0/deploy.py”, line 15, in init
self.resources: Resources = Resources(self._render_instance)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/mnt/.ix-apps/app_configs/ollama-local/versions/1.0.18/templates/library/base_v2_1_0/resources.py”, line 24, in init
self._auto_add_gpus_from_values()
File “/mnt/.ix-apps/app_configs/ollama-local/versions/1.0.18/templates/library/base_v2_1_0/resources.py”, line 55, in _auto_add_gpus_from_values
raise RenderError(f"Expected [uuid] to be set for GPU in slot [{pci}] in [nvidia_gpu_selection]")
base_v2_1_0.error.RenderError: Expected [uuid] to be set for GPU in slot [0000:03:00.0] in [nvidia_gpu_selection]

Sometime back you noted that the IDs matttered. Wondering if that’s what’s wrong here.

Anyway, thanks!
-Rodney

rodneysing · December 14, 2024, 9:11pm

….Also, the documentation says in order to isolate a GPU for VM use, you need to have 2 such resources. Sadly, I only have one free PCI slot for 1 GPU card. Any way I can get around this restriction?

Thanks!
-Rodney

HoneyBadger · December 18, 2024, 5:22am

Hey Rodney,

Check out the fix here and see if it applies to your situation - it looks like it should.

HoneyBadger · December 18, 2024, 5:23am

No way around this at the moment; TrueNAS expects a console video output device, and if you’re using GPU-accelerated Apps you’ll need a compatible GPU un-isolated while the second is vfio mapped for the VM.

rodneysing · December 18, 2024, 3:42pm

Thanks Chris!

Does this requirement for 2 GPUs stand for container deployment as well? Meaning, I could set up Ollama on a VM and have to split the two resources up, but if I deploy Ollama in a container, do I still need 2 GPUs?

At some point, will users be able to use 1 GPU, regardless of their deployments, VMs or container apps?

Thanks again!

-Rodney

HoneyBadger · December 18, 2024, 7:12pm

Apps can all share one (or more) GPUs - it’s only VMs that cannot share with Apps (or with each other) because they require exclusive access to the hardware.

So if you set up Ollama in a container, you’d be able to share the same GPU with other workloads like a Plex/Jellyfin transcode. Since transcoding is pretty light on the VRAM and uses dedicated hardware blocks on modern GPUs, it hopefully wouldn’t impact your Ollama token speed.

rodneysing · December 18, 2024, 8:26pm

OK, that’s good to hear Chris. Thanks!

So I should be able to get Ollama to leverage the installed GPU resources. Unfortunately, have not been able to get this working. And lately, I have not been able to even get iX Ollama working. Troubleshooting to the container level, I get this weird error when trying to spin it up in the shell:

ollama run llama2

Error: llama runner process has terminated: signal: killed

Thoughts?

I and perhaps others would greatly benefit a Tech-Talk podcast on getting GPUs to work with containers. I’ve cheked Youtube on this subject, but not much there.

Thanks again!
-Rodney

rodneysing · December 18, 2024, 9:38pm

Also forgot to ask, Chris, are you using the out of the box iX community version of Ollama, or did you spin up using YAML code? I’m using the iX community version of Ollama. If you spun up the instance of Ollama using YAML, perhaps that’s the big difference. And if you used YAML, possible for you to share the code you used to be successful?

Thanks!
-Rodney

rodneysing · December 19, 2024, 7:47am

Hi!

Recently tried to spin up an Ollama container, being intentional about installed GPUs I wanted to use, but was still not able to see the resources being used. Here is the YAML script I used:

services:
ollama:
container_name: ollama-local
deploy:
resources:
reservations:
devices:
- capabilities:
- gpu
count: 1
driver: nvidia
environment:
NVIDIA_DRIVER_CAPABILITIES: compute,utility
NVIDIA_VISIBLE_DEVICES: all
image: ollama/ollama:latest
ports:
- ‘11434:11434’
restart: unless-stopped
runtime: nvidia
volumes:
- /mnt/window_share/Apps/Ollama/data:/data
- /mnt/window_share/Apps/Ollama/config:/config
version: ‘3.9’

Thanks!

-Rodney

DjP-iX · December 19, 2024, 2:58pm

This section doesn’t look right to me. More typically I’ve seen something like:

    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: 1
            capabilities: [gpu]

Ps. If you surround your Compose file with three backticks, it will preserve the whitespace and make it easier to troubleshoot:
```
YAML here
```
displays as

YAML here

rodneysing · December 19, 2024, 9:26pm

Thanks DjP,

For the response and tip!

Looks like the compose file installs Ollama OK. Then when I run: ollama run llama2, from the shell inside the container, I’m able to see things work, albeit, very slowly.

Additional debug yields status of my GPU card in the off state while interacting with the Ollama container…

Didn’t know one could do this in the container, but it clearly shows all the information concerning the installed GPU, but that it is off when it’s needed. Not surprisingly the system CPU resources get maxed out…

Wondering if there is a diagnostic from the shell, a light load computation, that could be run to confirm the base system and installed drivers are working. If that activity were to check out, then the debug could be focused elsewhere.

Thoughts?

And thanks again!
-Rodney

rodneysing · December 19, 2024, 9:49pm

Also, looks like the required Nvidia driver for Debian 10-12 is 550.127.08. TrueNAS installs 550.127.05. Not sure how much that matters, but it’s close to being correct…

-Rodney

rodneysing · December 19, 2024, 10:25pm

Found this, which would allow one to run a simple matrix multiplication, with the help of the installed GPU card:

cd /usr/local/cuda/samples/0_Simple/matrixMul
make
./matrixMul

However, the cuda toolkit has to be installed order to run these computational tests. And unfortunately these tools cannot be installed, because apt is not available:

cd /usr/local/cuda/samples/0_Simple/matrixMul
make
./matrixMul

Need another way to test/debug the installed GPU works, outside of calls from VMs and containers.

-Rodney