Intermittent VFIO DMA Mapping Failure with Coral TPU Passthrough in TrueNAS Scale VM

Hey all! I’ve been struggling for a while now with this but. Couldn’t get very far. Im trying to pass a pair of TPUs to my vm and whats weird is it works “sometimes” but other times i get this error.

[EFAULT] internal error: qemu unexpectedly closed the monitor: 2025-02-27T19:37:31.614811Z qemu-system-x86_64: -device {"driver":"vfio-pci","host":"0000:0c:00.0","id":"hostdev0","bus":"pci.0","addr":"0x8"}: VFIO_MAP_DMA failed: Bad address 2025-02-27T19:37:31.711093Z qemu-system-x86_64: -device {"driver":"vfio-pci","host":"0000:0c:00.0","id":"hostdev0","bus":"pci.0","addr":"0x8"}: vfio 0000:0c:00.0: failed to setup container for group 27: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x55f5188f24c0, 0x100000, 0xbff00000, 0x7f755bf00000) = -2 (No such file or directory)
System Setup:

    Host OS: TrueNAS Scale (kernel 6.6.44)
    CPU: Intel Xeon (E7 v2/Xeon E5 v2)
    Memory: 128GB RAM (with 512 HugePages configured)
    TPU Setup:
        Two Coral TPUs (PCIe x1 version)
        Installed on a PCIe switch card (ASMedia ASM1182e)
        Each TPU gets its own PCIe lane from the switch
    VM Configuration:
        QEMU/KVM-based VM
        TPU devices passed using vfio-pci
        HugePages allocated
        VM has CPU affinity set to NUMA node 0

IOMMU Groups

/sys/kernel/iommu_groups/26/devices/0000:0a:03.0  (PCIe switch - ASMedia ASM1182e)
/sys/kernel/iommu_groups/26/devices/0000:0b:00.0  (Coral TPU 1)
/sys/kernel/iommu_groups/27/devices/0000:0a:07.0  (PCIe switch - ASMedia ASM1182e)
/sys/kernel/iommu_groups/27/devices/0000:0c:00.0  (Coral TPU 2)

Things i’ve tried

  • Tried passing the TPUs individually (0b:00.0 and 0c:00.0) → Fails intermittently
  • Tried passing the PCIe switch (0a:03.0) instead of individual TPUs → Trunas blocks you from doing this.
  • Tried forcing vfio-pci recan like this
echo "0000:0b:00.0" | sudo tee /sys/bus/pci/devices/0000:0b:00.0/driver/unbind
echo "0000:0c:00.0" | sudo tee /sys/bus/pci/devices/0000:0c:00.0/driver/unbind
echo 1 | sudo tee /sys/bus/pci/rescan
  • This sometimes lets the VM start correctly.
  • Checked HugePages availability: HugePages_Free: 512

Im sure theres important information im leaving out! I have this similar setup running fine on another machine, tottally different OS though. Same tpu chip and adapter though! So i feel like it should absolutely be possible im just missing something! Thank you for reading.