Virtual TrueNAS, Proxmox, and Preventing Double Imports with "zpool multihost"

The Challenge with a Virtual TrueNAS

Virtualizing TrueNAS is a popular way to get an “all-in-one” solution with compute and storage in a single box. We even use TrueNAS VMs ourselves as part of our testing process, and there’s a blog about it:

But as outlined in that blog, certain configurations of TrueNAS-as-a-VM - those using PCIe storage controller passthrough - tend to be stable and reliable, while others - especially those using “raw disks” on ZFS-aware hypervisors such as the popular Proxmox Virtualization Environment (Proxmox VE/PVE) - are not.

Let’s break down a little bit about why those problematic configurations are - well, problematic.

So What’s the Problem?

ZFS is awesome; so awesome, that many OSes and distributions use it, and speak it natively - including Proxmox. So there’s a number of helpful services running on Proxmox that will scan for, detect, and occasionally import ZFS pools.

ZFS is a great filesystem, but one thing it doesn’t natively do is work in a clustered host setup.

So when that second machine shows up - the host server vs. guest VM - they start overwriting each other’s metadata including the crucial uberblocks that serve as the “point of entry” into the top of the ZFS filesystem.

You might get away with it for a few seconds. Heck, even a few minutes on a slow pool. But eventually, you rotate through all of your uberblocks, they’ve all been overwritten with corrupted headers - and it’s game over.

root@truenas[/home/truenas_admin]# zpool import
  pool: vpool
    id: 2239295033687861730
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

        vpool                                   ONLINE
          0e375eb9-7d87-43c2-931e-fb29e81c987c  ONLINE

root@truenas[/home/truenas_admin]# zpool import vpool
cannot import 'vpool': I/O error
        Destroy and re-create the pool from
        a backup source.

Once this happened, you’re generally in big trouble. Sometimes you can get lucky and revert using zpool-import parameters that discard recent transactions (losing recent data) - but other times, even that isn’t possible.

Preventing Double Import

One way to prevent the some automatic pool imports on Proxmox is to disable the services that automatically import ZFS storage pools, with the following commands:

systemctl disable --now zfs-import-scan.service
systemctl disable --now zfs-import-cache.service

Reboot the system, and you should no longer have those services scanning for and importing ZFS pools. However, this won’t prevent a manual import - and obviously, if you’re using a local ZFS pool on Proxmox, you can’t use this method.

So what to do?

zpool set multihost=on vpool

I’d like to introduce the use of a pool property setting to hopefully mitigate some of these events in the future - zpool multihost

This setting will cause ZFS to periodically make “heartbeat” writes to the pool, signifying it as being in use. There are drawbacks - potentially very long pool import times while it’s checking all member disks for those heartbeat writes, and scenarios where a pool that should import on reboot doesn’t.

However, it prevents a number of scenarios where you don’t want that pool to import.

root@pve:~# zpool import
  pool: vpool
    id: 386268640321606023
 state: UNAVAIL
status: The pool is currently imported by another system.
action: The pool must be exported from truenas (hostid=3c6fd28e)
        before it can be safely imported.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config:

        vpool                                   UNAVAIL  currently in use
          389a58b4-7260-4811-a156-58169089473d  ONLINE

Note the explicit “currently imported” error and “currently in use” status on the pool itself.

Even attempting to use the -f force parameter on the import option, ZFS will refuse.

root@pve:~# zpool import vpool -f
cannot import 'vpool': pool is imported on host 'truenas' (hostid=3c6fd28e).
Export the pool on the other system, then run 'zpool import'.

Cool, so I don’t need this second HBA?

This pool setting is not a substitute for PCIe passthrough and storage controller isolation - it’s just putting a restriction on the zpool import command. There are still plenty of ways to obliterate the data via an errant host command or accidentally passing the same disk to a second VM.

root@truenas[/home/truenas_admin]# sg_persist --read-reservation /dev/sdb 
  QEMU      QEMU HARDDISK     2.5+
  Peripheral device type: disk
PR in (Read reservation): command not supported
sg_persist failed: Illegal request, Invalid opcode

So if you map this drive to another VM and do some random testing or go “huh, why won’t it let me format it?” and dd overtop, it will not stop you.

root@pve:~# dd if=/dev/zero of=/dev/sdb bs=1M count=1K
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.565798 s, 1.9 GB/s

“Huh, that’s weird. Where did my TrueNAS shares go? Why is my system hung?”

root@truenas[/home/truenas_admin]# zpool status -v vpool
  pool: vpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 60K in 00:00:05 with 4 errors on Wed Oct 22 08:14:35 2025
config:

        NAME        STATE     READ WRITE CKSUM
        vpool       ONLINE       0     0     0
          sda1      ONLINE       0     0    29

errors: Permanent errors have been detected in the following files:

        vpool:<0x1>
        vpool:<0x23>

Ruh roh.

So In Conclusion?

You should still endeavor to use a dedicated storage controller, isolated from the host with PCIe passthrough.

This will prevent the possibility of disk access outside the VM, as long as the VM is powered on and claiming the disk controller. This configuration does require careful planning and compatible hardware.

But if you cannot pull this off, and you’ve weighed the risks of a virtual-TrueNAS setup - then enabling the multihost setting on your pool may be a valuable safety measure. It’s not a panacea, it’s not guaranteed to prevent all failure modes, and it most likely will result in significant delays during boot - both on the hypervisor, and the TrueNAS VM - but if it keeps your pool from destroying itself, that’s probably worth it.

8 Likes