Hi,
Why is Scale designed so that single gpu would be reserver by system and cannot be passthrugh to VM, while Proxmox has no problem to pass only gpu to VM.
What was the tought process behing that kind of desision? Is it gonna change?
Can’t speak fo the decision process, but since it’s been that way since the release of scale in 2022 i doubt it will change. Over the years many have requested to run scale “headless” but they were never answered (as far as i can remember)
I’m quite sure that in some beta phase I had scale running headless.
I mean not only headless, but single gpu forwarded to VM.
Scale has no proble running headless, but when you add a gpu, system will reserve it. It is stupid
It’s actually working now. You can pass single gpu to a VM. What a suprise!
Is that a new feature, or did you do something special?
No, I did not do anything special. You have to make sure that gpu is not in the isolated list.
The motherboard, and the quality of its IOMMU grouping, may make a big difference here.
Now I’m confused… I thought the whole point of isolating the GPU was that one can pass the GPU to the VM?
All the write ups I’ve seen, people jump through hoops to get the GPU isolated, such as to be able to pass it to a VM. Are we talking about a different type of pass through?
Isolating the GPU through our middleware binds the vfio-pci driver to the device at boot time, to prevent it from being in-use by another host level process and the system responding rather poorly when it gets yanked.
If you manually add it as a PCI device by its PCI ID (eg 08:00.0) and include the subdevices, then the VM back end should (assuming that you have a well-behaved motherboard as mentioned by @etorix) happily snatch the device when the VM powers on; any host-level processes like Docker using it might react poorly, but so far the only problem outside of that has been “host console vanishes” - if you have IPMI or serial console then that may not be an issue for you.
Cool, if that works, then I don’t care which approach I take…
Bear in mind that manually adding the devices by PCI ID means you should probably:
- disable VM autostart for machines that have these devices added to give yourself a chance to recover if a breaking change occurs
- be very cautious when changing GPUs or PCI devices in case it reassigns them and decides to pass through a crucial component
In my case, the only GPU there is is an iGPU (Ryzen 9 AI HX370 Pro), so there won’t be any hardware changes. The only changes that might pose an issue, are OS updates.
Speaking of auto start: is there an option to auto start with a pre-set delay? e.g. 5 minutes after system boot? It wouldn’t really matter in daily operations if a reboot would tack on 5 minutes of downtime for the VM, but in the case of trouble, there would be ample time to go and stop/fix/disable things if there is trouble.
Not that I know of, but it might be something we could code into middleware. Feature Request it, maybe even find the autostart code and make an example PR?