I’m new to this forum so apologizes upfront if the topic has already been addressed.
Currently running Dragonfish-24.04.2 (upgrading soon to ElectricEel and beyond). I have modest 9 yr old hardware that has served me well due to the ix-systems development prowess and vision. Kudos!
I’m curious to learn of any experiences around using an Oculink connected eGPU for pass-through to apps/VMs. My use case would be to host a local LLM.
I realize I’ll likely need to upgrade the hardware. But, I would be remiss if I went down that rabbit hole without asking for insight from anyone who’s already jumped into the deep end
I’ve never tried it. I just threw a low-profile Arc gpu in a slot for not-LLM reasons.
I’m not sure if you’re asking if it’s possible or if it’ll work well. It should be possible. Whether it will work well depends on the card and oculink bandwidth, but I’d expect a noticeable impact over PCIe.
My real interest is in understanding if the gpu will be picked up in Truenas over a M2. to oculink interface. I realize there will be a performance hit. But, I would imagine it will be negligible for an LLM application.
M.2 is usually PCIe. OcuLink is PCIe. It should be seen more or less the same as connecting the card to a PCIe slot. Using the M.2 could further reduce the bandwidth to the GPU.
I don’t do ML training, but from what I understand the more VRAM and training data, the better. The limited bandwidth could negatively impact ML more than gaming.
I currently have a similar setup on a mini PC. Its connected to an eGPU via M.2 Oculink. Using Windows’ WSL, Ollama, OpenWebUI and a 12GB NVIDIA 3060 series card. I’ve been able to get pretty good performance on 7b to 14b models. The goal here is to have a self hosted LLM service, on the home network, that I can use in conjunction with VS Code and maybe home assistant (and their upcoming voice assistant).
In any event, its something I can play with to keep me out of trouble