Include intel i915 SR IOV driver

Problem/Justification
Currently, it is possible to build cutom dkms modules on Debian based system to support SR IOV for intel iGPUs thanks to the repository strongtz/i915-sriov-dkms on github.
Being able to share the iGPU between host and multiple VMs are convenient.
But unfortunately TrueNAS has not supported it out of box, and some users have experimented with building themselves without success:

Impact
Whether TrueNAS is installed on baremetal, or as a VM in other hypervisor like PVE, if TrueNAS is able to support SR IOV out of box, it would be beneficial to a lot of users who wants to share a iGPU between multiple VMs for some light workload, without the need to purchase additional GPU.

User Story
I will take myself as an example, I am currently running a PVE host, with a Windows VM, a Debian VM, and a Xpenology VM. iGPU are needed on all 3 VMs, and I am able to do it thanks to SR IOV. I have been trying out TrueNAS lately and I really like what it has to offer, but the deal breaker is that it does not support SR IOV and thus some of the services I have been running like frigate and jellyfin would be handicapped due to this. Had TrueNAS supported SR IOV, I would not hesitate to make the switch.

I have the same need

I’m actually confused by this, I thought the driver is in the 6.12 kernel and by using 25.04 the GPU VF would automatically work, but it’s apparently not the case.

Tried other distros, same result.

So it turns out we really need the custom dkms. Please consider this!

This is a pain point for me as well.

I have resorted to run certain services on another VM which I can install the dkms, and mount the TrueNAS drive using NFS. It works but the performance takes a hit and I am worried about the data integrity for some of the heavier docker containers.

The process to install the dkms doesn’t seem too complicated, maybe the driver bloating OS is somewhat of a concern? But it would definitely be useful for a significant amount of people, as GPU decoding and encoding is useful for a lot of services.

This package is highly experimental, you should only use it when you know what you are doing.
maybe not the best idea to include software like that in production level OS?
The fact the driver is not in the kernel already (or maybe the module is disabled by default? This is confusing) is a concern by itself, especially since Intel has no problem upstreaming stuff usually.

I understand your concern. I have been following the repo and it seems that the most issue people have with this driver is that it doesn’t work on all systems, and I haven’t seen any report that it actually impacted other parts of the system.

Given that TrueNAS is a relatively locked down system, I’d say the risks are to a certain extent mitigated.

If we really are going down the rabbit hole of flawed software component, the OpenZFS repo on GitHub is very uninspiring…some of the issues people bring up make you hard to believe it can be used in a production level OS as well…

I think leaning into that, first, makes sense. Why without success? The source code for the truenas kernel is available, so modules should be able to be built (on a dev machine, not on TrueNAS, but for TrueNAS).

Have that working first, and then see.

FYI, SR-IOV is not necessary to share an i915 or other GPU among the Docker/containerized workloads like Jellyfin or Frigate - it’s only necessary if you want a fully isolated VM solution.

2 Likes

Yeah I’m aware of that, I am using TrueNAS in a VM with Proxmox as host.

I think leaning into that, first, makes sense. Why without success? The source code for the truenas kernel is available, so modules should be able to be built (on a dev machine, not on TrueNAS, but for TrueNAS).

Have that working first, and then see.

I took your advice and build it on a different system, then I moved the .ko files to TrueNAS. It works now.

The reason why the build fails is probably because of some restrictions TrueNAS put in place. All the more reasons this should be done officially as opposed to by users.

1 Like

We could use some meat for 25.04.2, this is getting my vote because running TN under Proxmox is a thing.

Enabling developer mode will lift the majority of guardrails but with the obvious caveats of allowing undefined behavior and any bug tickets being asked for reproduction on a clean system.

Furthermore:

This package is highly experimental , you should only use it when you know what you are doing.

IANTET (I Am Not The Engineering Team) but personally speaking this statement in the repo going to disqualify it pretty quickly.

In addition, the prereqs include:

intel_iommu=on i915.enable_guc=3 i915.max_vfs=7 module_blacklist=xe

Specifically that last one is an issue because we’re trying to enable the new xe driver to support the newer-generation Intel (i)GPUs, not blacklist them out - and the xe driver is actually getting much more official support from Intel in the upstream repos to enable SR-IOV natively - IE, without a dkms module currently marked “highly experimental.”

I’m less inclined to see that as the use case vs. wanting TrueNAS to replace Proxmox and having GPU-accelerated VMs with only an Intel iGPU.

1 Like

I’m less inclined to see that as the use case vs. wanting TrueNAS to replace Proxmox and having GPU-accelerated VMs with only an Intel iGPU.

I mean TrueNAS team wrote this: “yes-you-can-virtualize-freenas” (cannot submit link)
So I think it is very much a thing.

Specifically that last one is an issue because we’re trying to enable the new xe driver to support the newer-generation Intel (i)GPUs, not blacklist them out - and the xe driver is actually getting much more official support from Intel in the upstream repos to enable SR-IOV natively - IE, without a dkms module currently marked “highly experimental.”

Isn’t this the perfect opportunity? If the vision is to push xe driver for intel iGPUs, then shipping a i915 driver with “highly experimental” code is perfect, because user can decide themselves if they want to stick with default xe driver, or blacklist it to get i195 for SR IOV.

Plus, as I said earlier, if we are talking about “experimental”, ZFS was built for FreeBSD, and OpenZFS is a port to linux with A LOT of serious bugs as you can see on their GitHub issue.

We don’t have to blacklist xe by default. Users can do it themselves if they want to.

Yes, I was the one who updated it recently. :wink: Yes, You Can (Still) Virtualize TrueNAS

I imagine far fewer people are doing nested virtualization (VM inside TrueNAS inside Proxmox) which is what would necessitate SR-IOV in TrueNAS at that level. Just assign one host VF/mdev to the TrueNAS VM, another to your Windows VM, another to your Debian VM, another to the Xpenology VM.

If TrueNAS is the bare-metal OS then having SR-IOV in TrueNAS makes more sense - because then you run TrueNAS and your lightweight apps (Frigate/Jellyfin) on the host VF/mdev, and give the subordinates to the other VMs (Windows/Debian/Xpenology)

The xe driver won’t ever support anything before the Xe generation, and i915 doesn’t support anything after ARC A-series. Running the vanilla modules side-by-side is fine but it appears that something in the SR-IOV DKMS module conflicts with xe that necessitates blacklisting it as a required kernel parameter.

Breaking the existing i915 driver for everyone pre-Battlemage just seems like a bad idea to me.

I imagine far fewer people are doing nested virtualization (VM inside TrueNAS inside Proxmox) which is what would necessitate SR-IOV in TrueNAS at that level. Just assign one host VF/mdev to the TrueNAS VM, another to your Windows VM, another to your Debian VM, another to the Xpenology VM.

There might be a misunderstanding here, I am not trying to do nested virtualization. Without the dkms module, if you assign a VF from host PVE to VM TrueNAS, the iGPU inside TrueNAS is simply not working at all.

The xe driver won’t ever support anything before the Xe generation, and i915 doesn’t support anything after ARC A-series. Running the vanilla modules side-by-side is fine but it appears that something in the SR-IOV DKMS module conflicts with xe that necessitates blacklisting it as a required kernel parameter.

I think blacklisting xe is not a problem, user can decide from themselves whether or not they want to do it based on their hardware situation. By default xe should definitely not be blacklisted.

I do understand the concern about “experimental” code breaking i915.