Problem/Justification
Currently, it is possible to build cutom dkms modules on Debian based system to support SR IOV for intel iGPUs thanks to the repository strongtz/i915-sriov-dkms on github.
Being able to share the iGPU between host and multiple VMs are convenient.
But unfortunately TrueNAS has not supported it out of box, and some users have experimented with building themselves without success:
Impact
Whether TrueNAS is installed on baremetal, or as a VM in other hypervisor like PVE, if TrueNAS is able to support SR IOV out of box, it would be beneficial to a lot of users who wants to share a iGPU between multiple VMs for some light workload, without the need to purchase additional GPU.
User Story
I will take myself as an example, I am currently running a PVE host, with a Windows VM, a Debian VM, and a Xpenology VM. iGPU are needed on all 3 VMs, and I am able to do it thanks to SR IOV. I have been trying out TrueNAS lately and I really like what it has to offer, but the deal breaker is that it does not support SR IOV and thus some of the services I have been running like frigate and jellyfin would be handicapped due to this. Had TrueNAS supported SR IOV, I would not hesitate to make the switch.
Iâm actually confused by this, I thought the driver is in the 6.12 kernel and by using 25.04 the GPU VF would automatically work, but itâs apparently not the case.
Tried other distros, same result.
So it turns out we really need the custom dkms. Please consider this!
I have resorted to run certain services on another VM which I can install the dkms, and mount the TrueNAS drive using NFS. It works but the performance takes a hit and I am worried about the data integrity for some of the heavier docker containers.
The process to install the dkms doesnât seem too complicated, maybe the driver bloating OS is somewhat of a concern? But it would definitely be useful for a significant amount of people, as GPU decoding and encoding is useful for a lot of services.
This package is highly experimental, you should only use it when you know what you are doing.
maybe not the best idea to include software like that in production level OS?
The fact the driver is not in the kernel already (or maybe the module is disabled by default? This is confusing) is a concern by itself, especially since Intel has no problem upstreaming stuff usually.
I understand your concern. I have been following the repo and it seems that the most issue people have with this driver is that it doesnât work on all systems, and I havenât seen any report that it actually impacted other parts of the system.
Given that TrueNAS is a relatively locked down system, Iâd say the risks are to a certain extent mitigated.
If we really are going down the rabbit hole of flawed software component, the OpenZFS repo on GitHub is very uninspiringâŚsome of the issues people bring up make you hard to believe it can be used in a production level OS as wellâŚ
I think leaning into that, first, makes sense. Why without success? The source code for the truenas kernel is available, so modules should be able to be built (on a dev machine, not on TrueNAS, but for TrueNAS).
FYI, SR-IOV is not necessary to share an i915 or other GPU among the Docker/containerized workloads like Jellyfin or Frigate - itâs only necessary if you want a fully isolated VM solution.
I think leaning into that, first, makes sense. Why without success? The source code for the truenas kernel is available, so modules should be able to be built (on a dev machine, not on TrueNAS, but for TrueNAS).
Have that working first, and then see.
I took your advice and build it on a different system, then I moved the .ko files to TrueNAS. It works now.
The reason why the build fails is probably because of some restrictions TrueNAS put in place. All the more reasons this should be done officially as opposed to by users.
Enabling developer mode will lift the majority of guardrails but with the obvious caveats of allowing undefined behavior and any bug tickets being asked for reproduction on a clean system.
Furthermore:
This package is highly experimental , you should only use it when you know what you are doing.
IANTET (I Am Not The Engineering Team) but personally speaking this statement in the repo going to disqualify it pretty quickly.
Specifically that last one is an issue because weâre trying to enable the new xe driver to support the newer-generation Intel (i)GPUs, not blacklist them out - and the xe driver is actually getting much more official support from Intel in the upstream repos to enable SR-IOV natively - IE, without a dkms module currently marked âhighly experimental.â
Iâm less inclined to see that as the use case vs. wanting TrueNAS to replace Proxmox and having GPU-accelerated VMs with only an Intel iGPU.
Iâm less inclined to see that as the use case vs. wanting TrueNAS to replace Proxmox and having GPU-accelerated VMs with only an Intel iGPU.
I mean TrueNAS team wrote this: âyes-you-can-virtualize-freenasâ (cannot submit link)
So I think it is very much a thing.
Specifically that last one is an issue because weâre trying to enable the new xe driver to support the newer-generation Intel (i)GPUs, not blacklist them out - and the xe driver is actually getting much more official support from Intel in the upstream repos to enable SR-IOV natively - IE, without a dkms module currently marked âhighly experimental.â
Isnât this the perfect opportunity? If the vision is to push xe driver for intel iGPUs, then shipping a i915 driver with âhighly experimentalâ code is perfect, because user can decide themselves if they want to stick with default xe driver, or blacklist it to get i195 for SR IOV.
Plus, as I said earlier, if we are talking about âexperimentalâ, ZFS was built for FreeBSD, and OpenZFS is a port to linux with A LOT of serious bugs as you can see on their GitHub issue.
I imagine far fewer people are doing nested virtualization (VM inside TrueNAS inside Proxmox) which is what would necessitate SR-IOV in TrueNAS at that level. Just assign one host VF/mdev to the TrueNAS VM, another to your Windows VM, another to your Debian VM, another to the Xpenology VM.
If TrueNAS is the bare-metal OS then having SR-IOV in TrueNAS makes more sense - because then you run TrueNAS and your lightweight apps (Frigate/Jellyfin) on the host VF/mdev, and give the subordinates to the other VMs (Windows/Debian/Xpenology)
The xe driver wonât ever support anything before the Xe generation, and i915 doesnât support anything after ARC A-series. Running the vanilla modules side-by-side is fine but it appears that something in the SR-IOV DKMS module conflicts with xe that necessitates blacklisting it as a required kernel parameter.
Breaking the existing i915 driver for everyone pre-Battlemage just seems like a bad idea to me.
I imagine far fewer people are doing nested virtualization (VM inside TrueNAS inside Proxmox) which is what would necessitate SR-IOV in TrueNAS at that level. Just assign one host VF/mdev to the TrueNAS VM, another to your Windows VM, another to your Debian VM, another to the Xpenology VM.
There might be a misunderstanding here, I am not trying to do nested virtualization. Without the dkms module, if you assign a VF from host PVE to VM TrueNAS, the iGPU inside TrueNAS is simply not working at all.
The xe driver wonât ever support anything before the Xe generation, and i915 doesnât support anything after ARC A-series. Running the vanilla modules side-by-side is fine but it appears that something in the SR-IOV DKMS module conflicts with xe that necessitates blacklisting it as a required kernel parameter.
I think blacklisting xe is not a problem, user can decide from themselves whether or not they want to do it based on their hardware situation. By default xe should definitely not be blacklisted.
I do understand the concern about âexperimentalâ code breaking i915.