PCI passthrough in VM no longer working in Dragonfish 24.04.0

I have a VM with GPU passthrough as well a NVME passthrough and neither is working after the upgrade to 24.04.0. I read about some changes in this regard and thus have already removed and re-added the GPU to the isolated GPU list. I also removed all PCI passthrough devices of the VM and added them back.

When I try to start the VM, I get this error for any PCI device I try to passthrough (if I remove the device in the error, same error shows up for next device etc).

Error: Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/supervisor/supervisor.py", line 182, in start
    if self.domain.create() < 0:
       ^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/libvirt.py", line 1373, in create
    raise libvirtError('virDomainCreate() failed')
libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2024-04-25T09:58:42.781242Z qemu-system-x86_64: -device {"driver":"vfio-pci","host":"0000:03:00.0","id":"hostdev0","bus":"pci.0","addr":"0x6"}: VFIO_MAP_DMA failed: Bad address
2024-04-25T09:58:42.816518Z qemu-system-x86_64: -device {"driver":"vfio-pci","host":"0000:03:00.0","id":"hostdev0","bus":"pci.0","addr":"0x6"}: vfio 0000:03:00.0: failed to setup container for group 11: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x55d3cfe74f30, 0x100000000, 0x1c0000000, 0x7fb107e00000) = -2 (No such file or directory)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 198, in call_method
    result = await self.middleware.call_with_audit(message['method'], serviceobj, methodobj, params, self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1466, in call_with_audit
    result = await self._call(method, serviceobj, methodobj, params, app=app,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1417, in _call
    return await methodobj(*prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 187, in nf
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 47, in nf
    res = await f(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/vm_lifecycle.py", line 58, in start
    await self.middleware.run_in_thread(self._start, vm['name'])
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1324, in run_in_thread
    return await self.run_in_executor(self.thread_pool_executor, method, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1321, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/vm_supervisor.py", line 68, in _start
    self.vms[vm_name].start(vm_data=self._vm_from_name(vm_name))
  File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/supervisor/supervisor.py", line 191, in start
    raise CallError('\n'.join(errors))
middlewared.service_exception.CallError: [EFAULT] internal error: qemu unexpectedly closed the monitor: 2024-04-25T09:58:42.781242Z qemu-system-x86_64: -device {"driver":"vfio-pci","host":"0000:03:00.0","id":"hostdev0","bus":"pci.0","addr":"0x6"}: VFIO_MAP_DMA failed: Bad address
2024-04-25T09:58:42.816518Z qemu-system-x86_64: -device {"driver":"vfio-pci","host":"0000:03:00.0","id":"hostdev0","bus":"pci.0","addr":"0x6"}: vfio 0000:03:00.0: failed to setup container for group 11: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x55d3cfe74f30, 0x100000000, 0x1c0000000, 0x7fb107e00000) = -2 (No such file or directory)

When I try to add my isolated GPU to the VM in the VM config, I get this error:

 Error: Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 198, in call_method
    result = await self.middleware.call_with_audit(message['method'], serviceobj, methodobj, params, self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1466, in call_with_audit
    result = await self._call(method, serviceobj, methodobj, params, app=app,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1417, in _call
    return await methodobj(*prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 187, in nf
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 47, in nf
    res = await f(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/system_advanced/gpu.py", line 44, in update_gpu_pci_ids
    verrors.check()
  File "/usr/lib/python3/dist-packages/middlewared/service_exception.py", line 70, in check
    raise self
middlewared.service_exception.ValidationErrors: [EINVAL] gpu_settings.isolated_gpu_pci_ids: pci_0000_01_00_1, pci_0000_00_01_0, pci_0000_01_00_0 GPU pci slot(s) are not available or a GPU is not configured.

Anybody any idea?

2 Likes

odd, seems to have fixed itself after another reboot of the server.

and it’s broken again after a reboot :frowning:

I am having a similar issue since 24.04.0, with an Nvidia T1000 and a GTX 1650.

Trying to start the vps hangs and any attempt to check its status or restart again gives the error that the vps is currently paused and must be stopped first.

I have been having a similar issue. When I pass my Arc 380 to my VM in the device list and boot the vm the whole server crashes. I haven’t been able to figure out why the isolated gpu device is being used by truenas if it’s clearly been isolated already. The VM attempts to start but then my fans go crazy and server is inoperable. Can’t connect through shell or otherwise.