Hey all! I’ve been struggling for a while now with this but. Couldn’t get very far. Im trying to pass a pair of TPUs to my vm and whats weird is it works “sometimes” but other times i get this error.
[EFAULT] internal error: qemu unexpectedly closed the monitor: 2025-02-27T19:37:31.614811Z qemu-system-x86_64: -device {"driver":"vfio-pci","host":"0000:0c:00.0","id":"hostdev0","bus":"pci.0","addr":"0x8"}: VFIO_MAP_DMA failed: Bad address 2025-02-27T19:37:31.711093Z qemu-system-x86_64: -device {"driver":"vfio-pci","host":"0000:0c:00.0","id":"hostdev0","bus":"pci.0","addr":"0x8"}: vfio 0000:0c:00.0: failed to setup container for group 27: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x55f5188f24c0, 0x100000, 0xbff00000, 0x7f755bf00000) = -2 (No such file or directory)
System Setup:
Host OS: TrueNAS Scale (kernel 6.6.44)
CPU: Intel Xeon (E7 v2/Xeon E5 v2)
Memory: 128GB RAM (with 512 HugePages configured)
TPU Setup:
Two Coral TPUs (PCIe x1 version)
Installed on a PCIe switch card (ASMedia ASM1182e)
Each TPU gets its own PCIe lane from the switch
VM Configuration:
QEMU/KVM-based VM
TPU devices passed using vfio-pci
HugePages allocated
VM has CPU affinity set to NUMA node 0
IOMMU Groups
/sys/kernel/iommu_groups/26/devices/0000:0a:03.0 (PCIe switch - ASMedia ASM1182e)
/sys/kernel/iommu_groups/26/devices/0000:0b:00.0 (Coral TPU 1)
/sys/kernel/iommu_groups/27/devices/0000:0a:07.0 (PCIe switch - ASMedia ASM1182e)
/sys/kernel/iommu_groups/27/devices/0000:0c:00.0 (Coral TPU 2)
Things i’ve tried
- Tried passing the TPUs individually (0b:00.0 and 0c:00.0) → Fails intermittently
- Tried passing the PCIe switch (0a:03.0) instead of individual TPUs → Trunas blocks you from doing this.
- Tried forcing vfio-pci recan like this
echo "0000:0b:00.0" | sudo tee /sys/bus/pci/devices/0000:0b:00.0/driver/unbind
echo "0000:0c:00.0" | sudo tee /sys/bus/pci/devices/0000:0c:00.0/driver/unbind
echo 1 | sudo tee /sys/bus/pci/rescan
- This sometimes lets the VM start correctly.
- Checked HugePages availability: HugePages_Free: 512
Im sure theres important information im leaving out! I have this similar setup running fine on another machine, tottally different OS though. Same tpu chip and adapter though! So i feel like it should absolutely be possible im just missing something! Thank you for reading.