Need to wipe Instances config without enabling to avoid boot loop

jj76 · April 26, 2025, 12:49am

I recently setup TrueNAS Scale and deployed apps and all was going well. I then tried setting up my first Instance VM with an Ubuntu image. I think I passed a piece of hardware to the virtual machine that it wasn’t happy with, as it put the box into a boot loop that I couldn’t recover from.
I reinstalled from a USB stick, and got everything up and running again, reimported my pool, and got my apps back up and running.
As soon as I went back to the instances tab and pointed it to my existing pool, the system immediately went into a boot loop again.
I’m now installing for the 3rd time. I need to know a way that I can wipe the “Instances” data from the pool via CLI or UI so I can start over cleanly. Simply opening “Instances” from the UI and selecting my pool seems to put the machine back into a boot loop, so I don’t know a clean way to recover.

Thanks in advance,
JJ

gpolis · May 28, 2025, 7:58pm

Same problem here. I am looking for a way to erase the VM via CLI. Could you find a solution? I would apriciate if you could share…

jj76 · May 29, 2025, 8:22pm

Nope, I never resolved it, and no one ever replied to the thread. I’ve just decided to not create VMs at all in TrueNAS, and only use it for containers and storage. I just set up an external Proxmox box and moved on. Frustrated that TrueNAS was so unstable in this regard, but am glad I didn’t invest a lot of time into it.

gpolis · May 30, 2025, 1:58pm

I totally agree, there is no fun trying to set up VMs in Truenas. At least I found a way to wipe the instances, since each time i opened the ‘instance tab’ the network connection to my truenas server broke (in your case you went in a boot-loop as you described). My solution was to disable virtualization capabilities in the BIOS settings, so next time I bootet into truenas, ‘Instances’ couldnt start any instance at all, but I could go into ‘manage volumes’ in the Insrance UI and delete all the corupted stuff.
Maybe this could help you too, to get rid of the instances and free up space. Good luck!

jj76 · June 4, 2025, 6:07pm

That’s a great idea. I’ll have to get a KVM hooked back up so I can disable in BIOS. Thanks!

jj76 · June 24, 2025, 3:42pm

I just wanted to follow up here and confirm that the “workaround” for this issue that gpolis shared did work. However, just disabling virtualization wasn’t enough. I had to disable the VD-T, or Virtualization Technology for Directed I/O, in order for the system to stop the boot loop. After doing that, the system came back online. At this point, I think I’m going to move to running TrueNAS in Proxmox and pass through the disks for the pool. I’ll run the rest of my VMs from Proxmox and throw out this whole TrueNAS “instances” stuff, which is in no way ready for prime time.

neofusion · June 24, 2025, 4:26pm

What did you pass through in TrueNAS to create this issue?
If you were to pass the same device through to a virtual machine running on Proxmox, what would happen then?

Finally, are you aware of the risks of running a virtualised ZFS pool in Proxmox (which in turn is ZFS-aware) and how to mitigate said risks?

Btw, this may be relevant:

jj76 · June 29, 2025, 3:53pm

I believe I was trying to pass through a Radeon GPU, though so much time has passed now that I can’t say for sure. I believe that when looking at guides on doing device passthrough in tutorials for Proxmox, they show how to edit configs so that those devices aren’t usable on the host at all, preventing conflicts. In hindsight, I’m guessing those steps would be required in TrueNAS as well, but I just didn’t see those instructions.

As for the complications and gotchas of running TrueNAS inside a VM, yes, I’ve seen those, and I agree that there are a lot of tradeoffs going either direction. In this case, I am passing through a PCIe SATA card, so that the disks come through to the TrueNAS VM in their original form, so there’s no nested ZFS pool happening. However, there are still a lot of complicating factors in general with IOMMU, I think, and your mileage may vary.

HoneyBadger · June 30, 2025, 2:10pm

The middleware should be preventing the addition of boot-critical devices like PCI root bridges/etc to VMs.

The following code snippet, executed on TrueNAS or Proxmox, will output a list of IOMMU groups:

for d in $(find /sys/kernel/iommu_groups/ -type l | sort -n -k5 -t/); do 
    n=${d#*/iommu_groups/*}; n=${n%%/*}
    printf 'IOMMU Group %s ' "$n"
    lspci -nns "${d##*/}"
done;

I’d be curious to what the result is from your host Proxmox OS to see if the device in question is sharing a group with a critical system device - and if so, how TrueNAS didn’t catch that.

Assume you were/are running 25.04.1 on both bare-metal and as a VM?

jj76 · July 10, 2025, 6:45pm

Sorry I didn’t respond earlier. I ended up running into issues running TrueNAS inside Proxmox and moved back to bare metal. There were random IO delays /errors on the disks for some reason that I wasn’t able to track down, and decided it wasn’t worth the trouble to keep digging. I decided to try not even running VMs until the next release drops, since it sounds like that things are going back to the old model. I put VMs back on a separate Proxmox box, and am just using TrueNAS for storage and containers for now…