TL;DR:
I want to be able to cluster TrueNAS togeather just like proxmox to get the ability of HA apps/VMs.
PROBLEM:
My homelab, is not just a homelab, but also live and permanent features I offer to my family.
3-2-1 backup solution, Jellyfin and Immich for them as well, along with a bunch of other services for myself, like Heimdall and Pi-hole.
I’d like to make these sorts of things highly available but TrueNAS Scale doesn’t offer that flexability.
Instead I use 2 Proxmox nodes, with a Qdevice to handle HA-workloads and here lies the problem.
I’m forced to use another system when TrueNAS already contains a robust solution for running my services, except missing a small feature.
Then after running the services on Proxmox, my big data pool is on TrueNAS so now I have to integrate with another system.
Another thing is that Proxmox isn’t really that stable. It’s hard to work with and easy to brick.
SOLUTION:
If TrueNAS Scale can get 2 features, every homelab will be able to do away with Proxmox all togeather:
Storage system for live sync of drive, like CEPH, which is used in Proxmox. You already have a system sort of like this, but it’s locked behind very special conditions, and then you still can’t cluster apps/VMs.
HA-system between TrueNAS servers. You already have a system for making TrueNAS Scale servers work togeather so making/tweaking this shouldn’t be hard.
PITCH:
The community already agrees that TrueNAS Scale is S-tier.
If these features are added for the general community, the possibilities of TrueNAS is gonna explode, and most homelabs will likely do away with Proxmox entirely and just use extra servers running smaller TrueNAS servers, running apps/VM’s.
That was on the roadmap for SCALE–then Gluster died, and SCALE became, well, not-SCALE.
That’s, um, not my experience.
Neither is that.
Nor that.
Nor that, for that matter. TrueNAS is great as a NAS. It’s OK as an apps platform. It frankly sucks as a virtualization platform. iX promises to fix the latter; I’ll believe it when I see it.
it already has incus cluster and swarm cluster baked in, unclear how well that works if done at command line and not through the truenas orchestrator
one issure i have with the request is that to make this feasible for VMs truenas would have to expose much more of the VM options and hardware pass through like proxmox does, truenas is not a great hypervisor at this time, it is too basic for many common scenarios
i still use docker in VMs because well to be fran LXCs on proxmox are PITA and too custom
that said i have never found proxmox / VMs / ceph unstable in general use - only when serioulsy tarting about with networking - and this is something truenas as an opinionated OS won’t let folks do, so it will protect folks like you from yourself
I am interested to see where truenas (there is no ix systems any more ) take the options they have on the table on incus vs swarm, incus vs classic VMs
my prediction, incus VMs will come back when they have figured out how to seamlessly migrate and give the same UI options / configuration flexibility … how long that might be, well i suspect not 25.10 based on what i know about feature development…
I guess I should have clarified how I bricked my proxmox cluster:
THE SETUP:
2x identical MS-01 nodes, mirrored 128gb M.2 SSD drives for boot and a 1tb M.2 NVME for storage.
1x random ASUS Nuc acting as a 3rd node, but not permitted to run any load. It’s basically only a Qdevice, but still have the same storage setup.
1tb NVMEs clustered with CEPH.
Workloads set to autobalance between the 2x MS-01s in HA-mode.
THE BREAKAGE:
I’m not entirely sure of all the details but what I do know is that a new VM that I had just created and enabled to run in HA-mode was backing up through CEPH. Then one of the MS-01s had a network connection failure (according to a message from my Uptime-Kuma instance running on my TrueNAS server). Something corrupted and the VM got stuck in limbo somewhere in this process. The node where the VM was running dropped out entirety on the cluster. Then right after, connection to the ASUS Nuc was lost, leaving no quarum left. The last node couldn’t start/stop any services as it had no quarum. Something must have happened with the cluster configuration as when the 2 downed servers was running again, they were still showing as offline in the cluster. A running cluster that had HA enabled can’t take in new nodes without quarum. A restore of the 2 downed nodes from a snapshot did nothing. I was forced to start over, because of a at least semi-random glitch. All servers works fine. No apparent corruption anywhere to be found.
THE FINAL STRAW:
Proxmox is unintuitive, archaic and hard to use compared the the apps system in TrueNAS.
It took me, without a manual about 30min to know exactly what to do with a new install of TrueNAS Scale. It took me hours to figure out how to do simple stuff in Proxmox like snapshots, priority, clustering, backup destinations, dig around in the guts of the nodes because tou can’t do everything from the UI. For all of my odd experiments, I’ve never had to open the console in TrueNAS even once. Don’t get me wrong, I’m no slouch in a commandline, but if you have a UI, then everything you do on a regular basis, should be do-able from the UI. Not running commands to install the library and network for a Qdevice for example. TrueNAS is better in every reguard with this, except it can’t do HA apps/VMs, yet at least. Proxmox has its uses for easy VMs and that, but TrueNAS is cleaner in every aspect.
But whatever. You’re asking for something that was planned to be part of the product, but has long since been removed from its roadmap due to the death of Gluster, on which it depended. Good luck.
I’m comparing a system that can run services to another system that can run services. Each has their own quirks and special things it does better. You’re being narrow minded my guy, but thanks for the luck.
While it would be nice, the whole containers/VM system isn’t production ready until 25.10, and only 3 apps are supported for Enterprise use, which is where they make the money. Basically the demand would have to come from that side if it’s going to be realized anytime soon, but hey, more options the better. The issue is always going to be is it going to be enough for everyone
Personally I wouldn’t mind paying for Enterprise if I could then get HA apps, but that’s not possible because the feature is missing. The only thing I can cluster is storage. The whole system that monitors apps/VMs is missing. You actually don’t need HA storage for it either. You can just create snapshots of the apps/VMs which is industry recommended anyway, and then the nodes only need to keep track of a configuration file and up/down state between themselfs. If an app/VM goes down, then start the latest snapshot on another server. What I’m after is the curated apps library in TrueNAS which is very well done. I can just run a docker host on all TrueNAS nodes and cluster those if I want HA docker, but I want the curated docker apps which is already easily configured in TrueNAS.
to be fair the use case is not to be a file server, it is supposed to be for clustering VMs and LXCs and thats it
in terms of backups and snapshot i am a liitle confused how you were confused, its a few simple things
as for you ceph stuff, ceph needs some care, the very fact you said you were poking around in the guts tells me why ceph likely became unstable, but even if ceph fails utterly the host OS will work, just no VMs will start, i suspect you had ceph poorly configured, it sounds like you had network configuration issues (no the nodes) if they couldn’t reach cluster quorom
as there is no clustering or ceph on truenas scale community edition you are comparing apples and bricks, what you want was a NAS (network attached STORAGE) what you implemented with proxmox was a hypervisor platform not a NAS.
I agree, i would happily pay for a ‘get some eneterprise features but use community support’ edition - i am sure we will have a different thought on that that subset of ent features would be lol
I suggest this as i suspect they would want to keep some enterprise features that mean other businesses gravitate to buying truenas hardware… unless truenas (nee ix-systems) is now more interested in selling eneterprise standlone and charging business for support… (its what i would if i was in their shoes)
I used CEPH as an example. It’s one protocol out of many, it was just the one I thought of first, probably because of my issue. A distributed disk network isn’t even strictly necessary to make TrueNAS work with HA, but it will make it most useful. The Proxmox nodes was configured after industry standards. 3/2 CEPH pool with about 1 year old 1Tb NVMEs, tried and tested. I’m confident that CEPH wasn’t what failed, first at least.
Actually, I’m more interested in why you care so much. Freak stuff happens all the time. I’ve just made a choice to not trust a piece of software because of my own experiences and testing, and asked for another solution in another software that I’ve found very hard to break and want to keep using.
Why are you in such a hurry to call me a fool at every turn you can instead of constructive input on how this feature could work. You’re clearly experienced and have your own opinions.
i am sorry you think i am calling you a fool, nothing could be further from the case
i already said your request was a good one and at the moment seem to be the only person who upvoted it
your experiences with proxmox are irrelevant to the request
if you start making broad statements about unreliability of things, expect to get questioned on it, proxmox is incredibly stable and it is interesting to hear when people had an issue and for the many folks who have mixed environments
you text implied to me that your unreliability came from ceph, thats why i was asking
why might i care? several hundred hours invested in creating this gist my proxmox cluster including getitng the linux kernel patched so that thunderbolt networking would be stable across ALL distributions of Linux.