Lost networking after isolating GPU

Hi all, I’m a bit stumped with my TrueNas Community installation and hoping to get some assistance.

Shorter Version:

I’ve completely lost the NIC in my installation after setting a secondary GPU as isolated from the WebUI Advanced Settings screen.

On reboot, the system now shows an error on the console that “the web interface cannot be accessed”.

Logging into the physical console, the ethernet adapter doesn’t show up using “ip link”, “ip addr” or ifconfig.

I can, however, see the adapter, Realtek R8169, with lspci --vvv. It shows that the kernel driver in use is vfio-pci and the kernel module is R8169.

Longer Version
This is a brand new build ~1 week old. So far multiple reboots, app installations, zfs replication tasks, etc. have all gone off without a hitch.

Last night, I setup the pool and network configuration in the Containers (LXC) screen (not sure this is relevant, but a recent change to be sure). I didn’t spool up any containers yet, just set the basic configuration. The system worked normally all morning after that change.

This afternoon, I shutdown the system and installed an Intel arc card to start playing around with GPU pass through in some of the apps I’m running. I moved the system to my bench and enabled resizeable BAR, changed the boot GPU back to the iGPU, and then moved the server back up to it’s home near the switch.

First boot after installing the GPU, I was able to log into the WebUI and isolate the arc card (under advanced settings). It was after that reboot that the NIC has been lost.

I initially chased this on my bench as a BIOS setting issue, assuming that TrueNas was trying to display on the arc card, not the iGPU. I’ve tried disabling resizeable BAR but the behavior is the same. Logging into the physical terminal it looks like a networking issue.

The system seems completely unaware of the NIC. It’s notably absent from “ip” and “ifconfig” commands, but it does show in lspci (per the notes in the shorter version above). Running dmesg with grep I’m not finding anything obvious standing out, but to be fair, I’m not really sure what I should be looking for.

I’m stumped at this point so happy to have any suggestions, ideas to try, etc. I’m baffled as to how isolating a GPU was the last step before losing the NIC.

Just a reminder, I have no access to the WebUI so I’m working within the bounds of the CLI.

System details below.

System Details

  • Ryzen 9900X (includes the iGPU)
  • MSI X870E-P Pro WiFi (WiFi is disabled in BIOS, using the onboard NIC)
  • Drives, all connected directly to the motherboard
    • 2 Toshiba drives for the data pool
    • 2 SATA SSD for the boot pool
    • 2 NVME SSD for metadata for the data pool
    • 1 NVME SSD for l2arc cache
  • Sparkle A310 Eco (installed in a chipset controlled PCI-E slot)

Thanks,
Brian

For apps, you dont need to isolate the GPU. Thats for passthrough to VMs.

Also, that the NIC now is bound to vfio-pci is not good. Vfio -pci is for passing it through to a VM.

Is your install bare metal or on proxmox ?

I can speculate that your GPU and your NIC are in the same IOMMU group and therefore got somehow bound to vfio-pci together.

EDIT: be careful with the metadata special vdev. First, you probably dont need it, and second if you lose it, you lose your data. Also l2arc is probably not of any benefit.

1 Like

Thanks for the reply!

Installation is the latest 25.10 version on bare metal. Any way to check the IOMMU grouping? I assume that’s going to be a BIOS level setting vs. something in TrueNas?

Regarding the vdev layout:
The special metadata vdev is probably a bit gratuitous but does provide a measurable, albeit minor uplift. I agree the l2arc probably isn’t necessary; it was crime of opportunity. I can’t just let a spare NVME drive go to waste.

You can use this script:

for d in /sys/kernel/iommu_groups/*/devices/*; do
  n=${d#*/iommu_groups/*}; n=${n%%/*}
  printf 'IOMMU Group %s ' "$n"
  lspci -nns "${d##*/}"
done

The groups are pre defined. You cant change them. Sometimes changing the PCI slot puts the device in another group.

Occasionally the groups can change in a BIOS update, for better or worse.
(due to intended or unintended changes by the motherboard maker)

Thanks for the guidance thus far. I looked in the BIOS and there was a toggle to “Disable IOMMU”. I went ahead and toggled that on, but same error and missing NIC on boot.

@Farout , thanks for the script, when I run it it came back with nothing. ls -la in /sys/kernel/iommu_groups returns nothing.

Interestingly, with the IOMMU toggled to “Disabled” in the BIOS, lspci is no longer showing any “kernel driver in use” listing.

Is there a way to "un-isolate” the GPU from the CLI?

By pasting this thread into a LLM I received a possible answer. Which I am not posting here.
I am not a fan of using random CLI commands. But if you want to take a risk, go ahead. Just dont remove the french language pack like this poor dude. How to recover from rm -rf /* ? (lost everything)

However using the CLI usually messes things up. You should use the middleware. And I have no clue what the API calls are for that.

EDIT:

Its probably easier to just reinstall Truenas.

1 Like

I pulled the card to see what would happen, and TrueNas now can’t import the boot_pool. When I put the card back in, it boots, but the network issue remains.

At this point, I think @Farout is right, my time is better spent reinstalling TrueNas. Thanks for the help!

Alright, we’re back up and running, thanks in huge part to @Farout !!!

Between his guidance and the documentation on the TrueNAS REST API (and some help from ChatGPT), we (99% Farout to be honest) came up with the below method.

Step 1: Boot with IOMMU enabled in BIOS

You need it enabled so TrueNAS can cleanly undo the isolation.

Re-enable:

  • IOMMU → Enabled
  • Reboot into TrueNAS
  • Log into the console

Step 2: Identify the NIC PCI address

Run:

lspci | grep -i ethernet

You’ll see something like:

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE

Note the PCI ID (03:00.0 in this example).


Step 3: Check what is bound to vfio

lspci -nnk -s 03:00.0

If you see:

Kernel driver in use: vfio-pci

that confirms the issue.


Step 4: Remove PCI isolation from the TrueNAS config

TrueNAS stores isolation in middleware config, not just kernel state.

Run:

midclt call system.advanced.config

Look for something like:

"isolated_gpu_pci_ids": [...]

Now clear it:

midclt call system.advanced.update_gpu_pci_ids '[]'

If that succeeds, you’ll get no output (normal).

Confirm that the config is updated by re-running:

midclt call system.advanced.config

The “isolated_gpu_pci_ids” should have the PCI ids removed between the brackets.


Step 5: Reboot

reboot

On next boot:

  • NIC should bind to r8169
  • ip link should show it
  • WebUI will come back
2 Likes