PCI passthrough in VM no longer working in Dragonfish 24.04.0

I have a VM with GPU passthrough as well a NVME passthrough and neither is working after the upgrade to 24.04.0. I read about some changes in this regard and thus have already removed and re-added the GPU to the isolated GPU list. I also removed all PCI passthrough devices of the VM and added them back.

When I try to start the VM, I get this error for any PCI device I try to passthrough (if I remove the device in the error, same error shows up for next device etc).

Error: Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/supervisor/supervisor.py", line 182, in start
    if self.domain.create() < 0:
       ^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/libvirt.py", line 1373, in create
    raise libvirtError('virDomainCreate() failed')
libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2024-04-25T09:58:42.781242Z qemu-system-x86_64: -device {"driver":"vfio-pci","host":"0000:03:00.0","id":"hostdev0","bus":"pci.0","addr":"0x6"}: VFIO_MAP_DMA failed: Bad address
2024-04-25T09:58:42.816518Z qemu-system-x86_64: -device {"driver":"vfio-pci","host":"0000:03:00.0","id":"hostdev0","bus":"pci.0","addr":"0x6"}: vfio 0000:03:00.0: failed to setup container for group 11: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x55d3cfe74f30, 0x100000000, 0x1c0000000, 0x7fb107e00000) = -2 (No such file or directory)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 198, in call_method
    result = await self.middleware.call_with_audit(message['method'], serviceobj, methodobj, params, self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1466, in call_with_audit
    result = await self._call(method, serviceobj, methodobj, params, app=app,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1417, in _call
    return await methodobj(*prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 187, in nf
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 47, in nf
    res = await f(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/vm_lifecycle.py", line 58, in start
    await self.middleware.run_in_thread(self._start, vm['name'])
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1324, in run_in_thread
    return await self.run_in_executor(self.thread_pool_executor, method, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1321, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/vm_supervisor.py", line 68, in _start
    self.vms[vm_name].start(vm_data=self._vm_from_name(vm_name))
  File "/usr/lib/python3/dist-packages/middlewared/plugins/vm/supervisor/supervisor.py", line 191, in start
    raise CallError('\n'.join(errors))
middlewared.service_exception.CallError: [EFAULT] internal error: qemu unexpectedly closed the monitor: 2024-04-25T09:58:42.781242Z qemu-system-x86_64: -device {"driver":"vfio-pci","host":"0000:03:00.0","id":"hostdev0","bus":"pci.0","addr":"0x6"}: VFIO_MAP_DMA failed: Bad address
2024-04-25T09:58:42.816518Z qemu-system-x86_64: -device {"driver":"vfio-pci","host":"0000:03:00.0","id":"hostdev0","bus":"pci.0","addr":"0x6"}: vfio 0000:03:00.0: failed to setup container for group 11: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x55d3cfe74f30, 0x100000000, 0x1c0000000, 0x7fb107e00000) = -2 (No such file or directory)

When I try to add my isolated GPU to the VM in the VM config, I get this error:

 Error: Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 198, in call_method
    result = await self.middleware.call_with_audit(message['method'], serviceobj, methodobj, params, self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1466, in call_with_audit
    result = await self._call(method, serviceobj, methodobj, params, app=app,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1417, in _call
    return await methodobj(*prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 187, in nf
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 47, in nf
    res = await f(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/system_advanced/gpu.py", line 44, in update_gpu_pci_ids
    verrors.check()
  File "/usr/lib/python3/dist-packages/middlewared/service_exception.py", line 70, in check
    raise self
middlewared.service_exception.ValidationErrors: [EINVAL] gpu_settings.isolated_gpu_pci_ids: pci_0000_01_00_1, pci_0000_00_01_0, pci_0000_01_00_0 GPU pci slot(s) are not available or a GPU is not configured.

Anybody any idea?

odd, seems to have fixed itself after another reboot of the server.

and it’s broken again after a reboot :frowning: