I run a TrueNAS Scale Homeserver since 2022, but the recent 25.04 released has caused me some headaches.
I could just roll back to 24.10, but I can’t track down what’s going wrong and since nothing affected is urgently needed I would like to fix it.
When migrating to 25.04 I followed the migration Guide from the Release Notes. Regardless of that this problem too occurs on fresh Virtual Machines made with Custom Images and Catalogue Images alike.
The System can Output video to a VNC screen just fine
My VM is a Linux Mint Guest OS with all 12 CPU(AMD R5 7600x) Treads of my machine, 32GB of my 64GB of RAM, Bridged NIC and an Powercolor RX 6700XT fighter GPU .
Since migrating I am greeted not by the OS but by GRUB with my DGPU outputting the picture to my screen. I then proceed to exit GRUB and start mint from the ubuntu boot entry. After selecting the Boot entry the Screen goes black and never comes alive again. The OS is booting up fine and I can still access it over ssh. Just no screen output. Restarting lightdm makes the screens briefly wake up but never show a picture.
The Logs both on TrueNAS and the GuestOS are weirdly uneventfull. Below I will list everything that stood out to me:
lspci shows a Red Hat, Inc. Virtio 1.0 GPU even with no VNC Display device added. Thats odd
journalctl will show following errors immediately after I select the ubuntu boot entry, but not before(rdmsr is not a new error, my previous research concluded that it could be ignored, but that’s all the system gives me):
May 01 10:05:29 DANTE kernel: kvm_do_msr_access: 34 callbacks suppressed
May 01 10:05:29 DANTE kernel: kvm: kvm [70637]: ignored rdmsr: 0x3a data 0x0
May 01 10:05:29 DANTE kernel: kvm: kvm [70637]: ignored rdmsr: 0xd90 data 0x0
May 01 10:05:29 DANTE kernel: kvm: kvm [70637]: ignored rdmsr: 0x122 data 0x0
May 01 10:05:29 DANTE kernel: kvm: kvm [70637]: ignored rdmsr: 0x570 data 0x0
May 01 10:05:29 DANTE kernel: kvm: kvm [70637]: ignored rdmsr: 0x571 data 0x0
May 01 10:05:29 DANTE kernel: kvm: kvm [70637]: ignored rdmsr: 0x572 data 0x0
May 01 10:05:29 DANTE kernel: kvm: kvm [70637]: ignored rdmsr: 0x560 data 0x0
May 01 10:05:29 DANTE kernel: kvm: kvm [70637]: ignored rdmsr: 0x561 data 0x0
May 01 10:05:29 DANTE kernel: kvm: kvm [70637]: ignored rdmsr: 0x580 data 0x0
May 01 10:05:29 DANTE kernel: kvm: kvm [70637]: ignored rdmsr: 0x581 data
-The VM’s journal is entirely inconclusive to me. The GPU is recognised and the driver loaded but xrandr can´t find a Display. I am happy to provide further detail if needed. Since I don’t want to clutter the post I’ll stop here for now.
The LAST Update of the following thread by @TheColin21 seems to indicate a Similar Issue. The Thread itself covers a different topic:
GPU passthrough to VMs caused quite a few people problems even before 25.04 - especially on AMD as it seems (there’s a PCIe reset bug on AM4 at least, I don’t know if it’s also on AM5).
Virtualization on 25.04 is even more experimental than most users probably thought before upgrading (I had lots of problems with it too which I described here).
I rolled back to 24.10 for the time being - just like a few others as far as I read on this forum and/or reddit.
If you do figure out the issue, please post the solution here, but I’d honestly just recommend rolling back until virtualization becomes more stable in 25.**
I have found a solution. At least for X11 users. In my case the problem was that there was this wierd virtual gpu that nobody invited that was chosen as the boot gpu for the guest OS.
The solution looks like this:
Log into your Guest OS via ssh or serial console
Get the pcie Bus ID of the GPU you want to use with lspci | grep VGA
Create /etc/X11/xorg.conf and fill in the gpu info like this:
This of course being an example specific to my GPU.
I spend far to much time on this. Been having this problem since Easter. So this better help some other people.
Other solutions like trying to choke out the Virtual GPU with pci-stub.ids= kernel parameters didn’t work for me.
edit: Clarification that fix is applied to GuestOS
Now that I think about it: I have a similar issue with my actual physical TrueNAS machine.
I have a dedicated Intel A310 for encoding.
My mainboard also has IPMI with it’s own basic video card. Since I installed the A310 TrueNAS outputs it’s console on the A310 instead of the IPMI.
If I am not mistaken TrueNAS shouldn’t start a window manager like X though?
@AlvinEnde do yoh have an idea how I can force TrueNAS to use the correct video output?
Up until recently I didn’t force TrueNAS to run on my Integrated GPU. Had my RX 6700xt isolated and after starting a VM, TrueNAS switched to the IGPU.
The boot GPU is generally handled by the Bios. On my MSI consumer Bord I had to select IGD and the Force option in Bios for TrueNAS to use the IGPU from the start. I also have an HDMI Dummy for the IGPU HDMI Port. Because my old Bord from my 13600k(Rest in Vmin-shift) needed something plugged in to the IGPU.
I would strongly advise against tinkering with TrueNAS itself. If you experience problems and the Bios isn’t helping, you could try to set a Kernel parameter pci-stub.ids=XXXX:XXXX in the Grub config with the corresponding ID found with sudo lspci -Dnn . But this should make TrueNAS not use the dGPU for anything. So maybe not the result you would want and also, do at your own risk!