Hi all,
I’ve moved my TrueNAS (now Scale 24.10) from being a VM under VMware to being installed on bare metal a while back, and have been very happy since then, especially since using jailmaker for some containers. One of the things I am trying to do now is to pass through my RTX4000 to a Windows 11 VM, but am having no luck. I’m getting the following error:
[EFAULT] internal error: qemu unexpectedly closed the monitor: 2024-12-07T22:51:44.961121Z qemu-system-x86_64: -device {"driver":"vfio-pci","host":"0000:04:00.0","id":"hostdev0","bus":"pci.0","addr":"0x7"}: vfio 0000:04:00.0: failed to setup container for group 55: Failed to set iommu for container: Operation not permitted
/var/log/messages gives me:
Dec 8 00:06:50 truenas kernel: vfio-pci 0000:04:00.1: Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.
What I did:
- The GPU is isolated in System->Isolated GPU Devices
- The GPU is then inserted into the VM->Edit->GPUs
- All four sub-devices are listed in the devices page of the VM (gfx, audio, USB and serial sub-devices)
IOMMU groups seem ok; the gpu is alone in it’s own group;
...
Group 50: [14e4:168e] [R] 03:00.0 Ethernet controller NetXtreme II BCM57810 10 Gigabit Ethernet
[14e4:168e] [R] 03:00.1 Ethernet controller NetXtreme II BCM57810 10 Gigabit Ethernet
Group 51: [10de:1eb1] [R] 04:00.0 VGA compatible controller TU104GL [Quadro RTX 4000]
[10de:10f8] 04:00.1 Audio device TU104 HD Audio Controller
[10de:1ad8] 04:00.2 USB controller TU104 USB 3.1 Host Controller
[10de:1ad9] 04:00.3 Serial bus controller TU104 USB Type-C UCSI Controller
Group 52: [1cc7:0200] [R] 0a:00.0 Non-Volatile memory controller RMS-200
...
VT-d is anabled in BIOS, as is SR-IOV (not sure that’s even relevant here, disabling makes no difference).
Anyone know what to do to get this GPU to pass through to a VM?
Best of thanks in advance,
Kai.
… so I did some more research and installed Proxmox 3.0 on my other server (dl560, also gen8, so very very similar to the 380 apart form two more cpu sockets) to see what the difference is that makes GPU passthrough work on this hardware under Proxmox vs not under Scale. It turns out the only thing these things need is the rmrr patch that’s included in Proxmox 8. This enables the “intel_iommu=on,relax_rmrr” kernel cmdline option.
Is there any chance Truenas Scale might include this patch at some point? It would enable me and anybody else with G8/G9 Proliant servers to use GPU passthrough in Scale (and I could switch off another box). It would be greatly appreciated.
Best of thanks for any info,
Kai.
… and after a little more googling I found a permanent solution here that shows how to disable RMRR on selected PCI slots.
No more kernel cmdline workaround needed for proxmox, now trying Scale again…
Kai.
Sorry to keep this thing alive, but … Even though Proxmox would now pass through the GPU without anhy problems and without the relax_rmrr cmdline option, Scale 24.10 won’t start the Windows 11 VM with the GPU passed through;
[EFAULT] internal error: qemu unexpectedly closed the monitor: 2024-12-10T00:25:47.460763Z qemu-system-x86_64: -device {"driver":"vfio-pci","host":"0000:04:00.0","id":"hostdev0","bus":"pci.0","addr":"0x7"}: vfio 0000:04:00.0: failed to setup container for group 51: Failed to set iommu for container: Operation not permitted
iommu groups:
...
Group 51: [10de:1eb1] [R] 04:00.0 VGA compatible controller TU104GL [Quadro RTX 4000]
[10de:10f8] 04:00.1 Audio device TU104 HD Audio Controller
[10de:1ad8] 04:00.2 USB controller TU104 USB 3.1 Host Controller
[10de:1ad9] 04:00.3 Serial bus controller TU104 USB Type-C UCSI Controller
Group 52: [1cc7:0200] [R] 0a:00.0 Non-Volatile memory controller RMS-200
root@truenas:~# dmesg | grep -e DMAR -e VFIO -e RMRR
[ 0.007355] ACPI: DMAR 0x00000000BDDAD200 0004A4 (v01 HP ProLiant 00000001 \xd2? 0000162E)
[ 0.007409] ACPI: Reserving DMAR table memory at [mem 0xbddad200-0xbddad6a3]
[ 0.026729] DMAR: IOMMU enabled
[ 0.519536] DMAR: Host address width 46
[ 0.519539] DMAR: DRHD base: 0x000000fbdfe000 flags: 0x0
[ 0.519554] DMAR: dmar0: reg_base_addr fbdfe000 ver 1:0 cap d2078c106f0466 ecap f020de
[ 0.519557] DMAR: DRHD base: 0x000000f2ffe000 flags: 0x1
[ 0.519563] DMAR: dmar1: reg_base_addr f2ffe000 ver 1:0 cap d2078c106f0466 ecap f020de
[ 0.519566] DMAR: RMRR base: 0x000000bdffd000 end: 0x000000bdffffff
[ 0.519568] DMAR: RMRR base: 0x000000bdff6000 end: 0x000000bdffcfff
[ 0.519570] DMAR: RMRR base: 0x000000bdf83000 end: 0x000000bdf84fff
[ 0.519571] DMAR: RMRR base: 0x000000bdf7f000 end: 0x000000bdf82fff
[ 0.519572] DMAR: RMRR base: 0x000000bdf6f000 end: 0x000000bdf7efff
[ 0.519574] DMAR: RMRR base: 0x000000bdf6e000 end: 0x000000bdf6efff
[ 0.519575] DMAR: RMRR base: 0x000000000f4000 end: 0x000000000f4fff
[ 0.519576] DMAR: RMRR base: 0x000000000e8000 end: 0x000000000e8fff
[ 0.519577] DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000000e8000-0x00000000000e8fff], contact BIOS vendor for fixes
[ 0.519671] DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000000e8000-0x00000000000e8fff]
[ 0.519776] DMAR: RMRR base: 0x000000bddde000 end: 0x000000bdddefff
[ 0.519777] DMAR: ATSR flags: 0x0
[ 0.519876] DMAR: No SATC found
[ 0.519884] DMAR: dmar0: Using Queued invalidation
[ 0.519897] DMAR: dmar1: Using Queued invalidation
[ 0.526967] DMAR: Intel(R) Virtualization Technology for Directed I/O
[ 1.668654] VFIO - User Level meta-driver version: 0.3
[ 822.446848] vfio_iommu_type1_attach_group: No interrupt remapping support. Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
[ 1670.627595] vfio_iommu_type1_attach_group: No interrupt remapping support. Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
[ 1734.869939] vfio_iommu_type1_attach_group: No interrupt remapping support. Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
root@truenas:~#
Devices;
Device ID
Device
Order
24 Display 1000
1 Disk 1001
29 PCI Passthrough Device 1002
30 PCI Passthrough Device 1002
31 PCI Passthrough Device 1002
32 PCI Passthrough Device 1002
2 NIC 1003
I tried also with “Ensure display” on and off, as well as deleting the display device (1000), but none of that made a difference.
Is there anything I’m missing?
Many thanks for any help,
Kai.