Home lab migrating off vSphere

I am currently running five small servers in my home lab.

TrueNAS CORE (general file and media server)
TrueNAS CORE (NFS server for vSphere)
3x vSphere

I am for various reasons (VMware’s new relevance, vCenter’s gigantic system requirements etc.) planning to move off vSphere.

So far I have considered:

Hyper-V: no-go since for some reason it just does not work with a non-Microsoft file server

XCP-ng: remains an option

TrueNAS SCALE: Apparently Bhyve on CORE is no longer supported so I figured why not migrate to TrueNAS-configured KVM.

My requirements for VMs are these:

I have two Windows domain controllers, a Windows-based Jenkins server and an AzureDevops server. I keep imagining that I would run more systems but I can never find time or need to add a separate database server. I occassionally create Linux and FreeBSD VMs to try out stuff.

Currently those VMs are running on generally 2 of the 3 vmhosts since vSphere actually does make one of them sleep when it’s not needed. I don’t know if TrueNAS/KVM support that sort of thing.

VMs are moved between the three hosts occasionally and sometime some of the hosts fails and I can start its VMs on one of the other two. Does TrueNAS/KVM allow that?

What else should I look out for? I have no experience at all with KVM, but only with vSphere and Hyper-V and some with Xen and Bhyve.

Tough call. Been there, done that. I am still running VMs on CORE @work without any major issues. Same @home.

Bhyve being not supported simply means that they won’t put any engineering efforts into bug fixing unless it’s a no-brainer with enough evidence and a quick fix or someone from the community steps in and provides patches.

It does not mean bhyve doesn’t work. Far from it. There’s so much work and momentum that has been put into the project by the Foundation, individuals like Michael Dexter and myself, not to mention all the developers who actually produce improvements.

Also bhyve is not in any way “buggy” or unreliable. Lacking in features - yes. Absolutely. Because the project decided that if we are already late to the hypervisor party, let’s not waste time implementing e.g. parallel port emulation, 32 bit guests, or legacy (BIOS) boot.

So it’s 64 bit, EFI boot only. From my point of view the features implemented are rock solid and deliver great performance. It’s a bit unfortunate iX did not work closer with the community, but the current situation is what it is.

So in TrueNAS land you have two solid options but given that the days of CORE are counted I would recommend considering SCALE. Although I personally will rather join any project that forks CORE as I have frequently stated here.

I also considered Proxmox for @work - because we need a TPM capable supported hypervisor in the mid-term to support Windows 11 once Windows 10 is EOL.

I was pretty underwhelmed to be honest. Installed Proxmox on a boot device, then wanted to configure a zpool and some swap - no support in the UI whatsoever. No partitioning/disk management not even in the rudimentary way TN does it - set the swap size globally. Command line it is.

Next: add an SSH key to the root user. No UI support for that, either. mkdir .ssh; yadda yadda yadda ... Seriously?

Next: enable SNMP. Where is the “services” menu? There isn’t any. If you want SNMP … apt-get install ... command line again.

Of course I perfectly can automate all of this with Ansible. That’s my day job. But for crying out loud - what does this Proxmox product do, actually? Not manage any aspect of the host itself, apparently.

So, @kris here it comes: I admit I am spoiled by the UIs of both CORE and SCALE and the plethora of features supported. I simply expect at least as much from any competitive product.

On to the original topic:

No idea about/experience with either.

Not quite true with respect to bhyve, but to someone who isn’t a FreeBSD die-hard like myself - yes, SCALE is the way.

But anyway, neither CORE nor SCALE support live migration. Or even moving shut down VMs between hosts via the UI. Bad news if you know vSphere or Proxmox, but that’s the state.

So I share what we do @work. Two hypervisor hosts (CORE), lots of VMs. Half of the VMs are active on host “freenas01” (named for historical reasons) and the other half on host “freenas02”.

All virtual disk drives (zvols) are children of either the “vm01” or the “vm02” dataset, respectively.

Then we snapshot and replicate those zvols - with the VMs running - hourly and replicate the entire “vm01” dataset and all children from “freenas01” to “freenas02” and vice versa.

So if one of the two hosts should fail I can go to the documentation, reconfigure the essential VMs of the failed host pointing them at the replicated zvols, and fire away.

Downtime: an hour or two - good enough for us.

Therefore I run e.g. “DC01” (domain controller, you guessed it) on “freenas01” and “DC02” on “freenas02” and I will always (for applicable values of “always”) have one DC available and a quick recovery procedure at hand.

Without replication you could of course build a central storage system and have the hypervisors read the virtual disks via NFS or iSCSI. Then the storage becomes your SPoF, but that could be addressed by buying a TN Enterprise with dual redundant controllers and hot failover.

Mind that this does only apply to storage - even TN Enterprise will to my knowledge not do any clustering/failover for VMs.

That’s why I am quite happy with the “lukewarm standby” setup I developed for us and I would keep it even if I ever switched from CORE to SCALE.

HTH, good luck,
Patrick

2 Likes

XCP-NG has very easy live migration between hosts. In fact, I was very much surprised over how good that worked. They need to be on the same CPU vendor, and have shared storage between them (NAS). But once that is setup, then live migration works like a charm.

2 Likes

:exploding_head:

Hey, we have all been there. You can only go so far DIY before you want some of your personal time back :slight_smile: I used to only use FreeBSD ports and compile all my own apps from scratch as well. But in 2024 I struggle to find good reasons to keep doing that kind of thing anymore.

That said, we’ll do our best to incrementally keep improving TrueNAS into the best experience possible, and with our 6 month cadence now there are a lot of opportunities for good and regular quality of life improvements.

1 Like

Did you get the FreeBSD release engineering memo by Colin Percival? They are planning to tag a release every 6 months, now. Which is good if you are based on stock FreeBSD, because it will make your upgrades way more predictable.

OTOH that might shorten the usefulness of TN 13.3 by quite some time. Depends on if FreeBSD 13.4, 13.5 jails will continue to work.

But that “way into 2026” estimate is probably moot by now.

And you know me by now - before I switch to Linux I am definitely going to put a noticeable effort into a fork. FreeBSD deserves and needs an enterprise class NAS platform. The foundation thinks the same.

About SCALE: if TPM emulation or passthrough is not already available in the UI I’d say this is the most important feature in the VM area. Because the days of Windows 10 are also counted.

This is not currently a feature of TrueNAS and if you want VM migration/evacuation, I’d suggest you probably want to look at either another hypervisor, possibly xcp-ng or proxmox

As @pmh said, you can replicate a VMs backing store (zvol) and then reconfigure a vm instance using that on the new host.

This is a manual process.

Thanks a lot. That was a lot of helpful information for a start.

Don’t get me wrong, I know Bhyve itself is fully functional and I have great confidence in it. That is why I would want CORE rather than SCALE. But at the end the only real goal is getting away from vSphere (since they alone do not want people like me, it’s clear).

I don’t really need live migration, it’s really more of a gimmick. I run a home lab, not an enterprise infrastructure. @work we use vSphere, but I am responsible for Windows Server running on it (I used to be running OpenVMS on Alpha and vSphere before becoming Windows-Server-only).

An easy method to start a VM on any TrueNAS host would be sufficient for my needs, i.e. the VM should be stored on my NFS server and be able to start on any of the (three) SCALE hosts. Is that easily doable via GUI and/or REST?

Yes, that sounds like the best solution. I considered TrueNAS SCALE in order to have the same GUI for all servers, but perhaps a split into file servers (CORE) and vm hosts (XCP-ng) would be the best solution.

Thanks.

1 Like

I don’t know if that is possible or if using a local zvol for the virtual disk is mandatory. In CORE it is.

Interesting. Can I sync a local zvol onto an NFS share somehow?

Sure. But also CORE does not support working as an NFS client in the UI or middleware. You sure can script an NFS mount as a post init task.

zfs send <pool>/path/to/zvol | gzip -c >/mnt/path/to/nfs/myzvol.img.gz

But with zfs send/receive you could do incremental backups instead of copying the entire zvol every time. As I wrote we take hourly snapshots of all VMs and sync them to the respective other hypervisor host.

But, ZFS replication is not a live synchronization, it can only ever be a replicated snapshot, and unless the VM is shutdown, the snapshot will necessarily be “from the past”

Yes, that is clear. It would likely be sufficient for my needs if the VMs synced every night at 4 AM for example.

That sounds a lot like what I would actually need. (In fact on vSphere it wasn’t easy to start a VM on another host when one host was down either!)

So am I right that I can configure (GUI or CLI) two TrueNAS CORE servers to run each their own set of VMs and then, after the VMs are shut down, sync the disk images to each other (using updates) so that when I start the VMs again I could start them on either host?

I think this would be sufficient, perhaps ideal, for my needs. I could worry about up-to-date data syncing by setting up syncing at VM OS level. Does that make sense?

I sync them from one host to the other one hourly while the VMs are still running. After all when was the last time a power loss of a physical machine, e.g. a Windows DC, rendered the machine “broken” entirely. You will always lose some data in-flight with any snapshot based backup scheme. As long as the system comes back up again, I am fine with a live snapshot.

1 Like