Incus container AMDGPU passthrough /dev/kfd for ROCm AI compute

System:

  • Truenas Scale 25.04.2.4
  • RX 7900 XTX
  • Core Ultra 9 285k
  • W880 MB
  • Incus container image: Ubuntu 24.04

Issue:
Cannot deploy any application that relies on ROCm compute in incus container because /dev/kfd is not passed. I can access /dev/dri, but that’s not enough to deploy ROCm applications.

Goal:
Deploy faster whisper with proper ROCm support and configurations. Technically the same thing can be done with the truenas apps/docker compose, BUT there is no published docker image for my use-case, and after changing the settings one apparently needs to re-bbuild the image, which makes it cumbersome to publish myself (I guess, no experience there).

Tried:

  • Isolate GPU and passthrough to VM → VM hangs (food for another post)
  • Recreate new container with and without passing GPU in the settings
  • Restarting system and container
  • Checking incus container config in truenas shell (was expecting to find /dev/dri there and just add /dev/kfd, but nope).

I do believe everything I need is there, I just lack the know-how on how to deploy it properly.

the only related topic i could find is this on the incus forum

Since incus well be removed again and replaced by libvirt (same as the vm backend) i don’t know if i would invest much time into it…

1 Like

Oh no! I didn’t know it will be removed again. That’s a pity, was really convenient deploying things that are not straight forward with docker.

lxc won’t go away, the management plane will just switch from incus to libvirt. And existing incus lxc should automatically migrate to the libvirt backend.

1 Like

Maybe I am being dense, but in that thread they don’t actually solve passing /dev/dfk to the container?

(https://discuss.linuxcontainers.org/t/amd-gpu-passthrough-to-containers-obs-studio/21203)

As far as i understood it the op from that post added this to his incus config file

devices:
dri_card0:
gid: “44”
source: /dev/dri/card0
type: unix-char
dri_renderD128:
gid: “44”
source: /dev/dri/renderD128
type: unix-char

a later comment in that thread refers to this post

which explains how to passthrough dev/kfd

1 Like

You are right, and seems to have done the trick! In case anyone else ends up here, these are the steps:

  1. Setup incus container using GUI, make sure to add the GPU during initial setup.
  2. Open a container shell and make sure you can at least access ls /dev/dri, but not ls /dev/kfd
  3. Give the container access to /dev/kfd for applications that depend on ROCm, by opening a truenas shell and running the following command (NOTE: change the gid to your truenas systems render group id):
incus config device add <container_name> dev_kfd unix-char source=/dev/kfd path=/dev/kfd gid=110

Permissions were already correct in my case, guess that’s set by the GUI when creating the container with a GPU.

Thanks @LarsR