(This is gonna be a long one, pretty much just me ranting about various issues I’ve had with TrueNAS over the past 5-7 years and how using Arch with OpenZFS and Docker is way easier in the end, even though administrating ZFS via the CLI with more than a handful of disks is a headache. Wasn’t sure if I should place this here since it’s directly related to TrueNAS or in General Discussion since it’s a rant/opinion post)
I’ve been a long time follower/user of TrueNAS, since the days of FreeNAS, I first started using it during the early 9.x releases when I discovered ZFS (switched over from using software RAID for years). It seems that no matter how much time has passed, TrueNAS seems good in theory and each major or minor update draws me back in, but in practice it’s just headache after headache.
I’m a long time Linux user (started in '05, I’m a Linux System Engineer currently) and knew about the BSDs but never really used them, so I figured I would give FreeNAS a shot, after using Arch on my server for a few years. I loved the ease of use due to it being UI driven, but there was a lot of things I didn’t like: the UI was fugly; apps were sorely lacking, and even though you could setup a jail and install packages from the ports collection, they were usually many months out of date, if you did manage to find something up to date you were occasionally left in dependency hell which meant compiling everything from source for hours on end (I wanted a BSD jail, not Gentoo! hahaha).
When v10 was being beta tested I ran that a lot to help find bugs (and if you used it, you knew there were A LOT), but the instability of it was frustrating. Jails were out of the question due to previous experiences/nightmares; Bhyve eventually became an option but it was severely lacking in performance and features compared to KVM, also at the time I was using it to run an Arch VM it would regularly freeze after like 2 days. The devs were never able to figure it out, no issues anywhere else, even running a VM in Arch, so it was TrueNAS/FreeBSD specific. I finally got frustrated and went back to Arch w/OpenZFS and everything was great…except for managing 20+ disks via the CLI.
After v10 was scrapped and it was announced that v11 would be taking it’s place, along with a(nother) UI refresh, I followed that closely, but had largely the same issues, and once again went back to Arch. Once SCALE was announced I thought my prayers had finally been answered. I followed the development closely and beta tested it a lot…but still ran into many sources of frustration. KVM was usable, but clunky, since they decided to use pure qemu instead of utilizing libvirt, so you were stuck using the webUI; containers were finally supported…but only in terms of Kubernetes which is definitely overkill for a lot of home users, and once you finally got used to the K8S way of doing things…stuff would constantly break. I remember the joy of setting up 20 containers over the course of like 2-3 hours (the container UI really sucked back then) only to discover that an update (to either the container image itself or TrueNAS) completely borked the containers (or possibly K8S itself), requiring me to reconfigure all of them by hand since there was no easy way to just simply redeploy them en masse, if they weren’t all FUBAR, usually a few were, just enough to be a huge annoyance. This happened pretty frequently, at least 2-3x a month. Getting tired of doing this for multiple months, with seemingly no end in sight, I jumped ship back to Arch w/OpenZFS. By that point I was heavily invested in docker and docker-compose since I was jumping back and forth from TrueNAS and Arch pretty often, and setting everything up natively in Arch was becoming just as much of a nightmare as TrueNAS was. I think I tried out SCALE a few months later but it was still a pain.
I continued to run Arch on my server for about a year or two, with no huge issues. I found a closed-source, paid ($60/license/year), ZFS module for cockpit that a few people are working on, and used that for a while which made administration easier…but their development progress is about as slow as molasses in winter. It took about 9 months for them to add support for creating pools in the UI, previous to that it was mostly for information. It was only in February or March of this year that it actually became useful for creating pools. I saw about a week or two ago that SCALE v24 was released and had a lot of huge fixes and additions, so I loaded it up in a VM, messed with it for a while and was like “yeah, this looks good, lets give it a try”, fully intending to keep Arch around because I would most likely be jumping ship again…
I just spent the past 4 days attempting to set everything up…and it’s back to Arch I go because I have made zero progress. Granted about 2 of those day were wasted on attempting to get Cosmos Cloud fully working in a VM (it’s a NAS management webUI that runs entirely out of a docker container, I was also planning on setting it up on a Pi for a friend so I wanted to get experience with it) as a quick backup for when K8S inevitably screws up, but I digress…After scaling my VM from 1 core and 2 GiB of RAM to 40 cores and 32 GiB (I have a Threadripper 2970WX [24 cores, 48 threads] and 128 GB of DDR4 ECC, so no lack of resources) usage spiked from occasionally being around 30-40% CPU usage to 100% on all cores all the time, there must’ve been a memory leak somewhere because ram usage would quickly increase to full. At first I thought it was rclone or docker consuming all the cycles and RAM, but even with those disabled and after a reboot RAM was still maxed out, CPU usage would stay low until I started docker and then it would be pegged at 100%. Also disk access was extremely slow (backlogged), even though I had 3 ZVOLs on a Samsung 970 Evo Plus NVME drive, I had the storage driver set as VIRTIO since that should have been better performance than using NFS for temp data downloaded from Usenet. This happened in both a Debian 12 VM and a Arch VM.
Not wanting to waste anymore time on something that was going to be a fallback solution and was going to be worse performance than running in containers with direct access to the datasets, I decided to give K8S another try since a lot of the issues had been supposedly fixed. I will admit that while the UI has improved drastically, it’s still extremely cumbersome to get an app exposed to the internet via a reverse proxy. In fact, I never even got that far! I spent about the past 3 hours trying to get ONE app to work, using the TrueCharts guide, and I can’t even get the initial SSL cert generator working! I changed the UI ports, (which is just a general annoyance in and of itself, this shouldn’t be necessary if you want to run TrueNAS and a webserver/RP running on the standard ports, I don’t have to do this in Arch if I want to run Cockpit and Caddy from the same NIC/IP, so why do I have to do it in TrueNAS? Also, why can’t I have two NICs in the same subnet? I can in Arch with no issues…), installed Traefik and set it up as documented on TrueCharts, attempted to install Clusterissuer…but it complains about /etc/rancher/k3s/k3s.yaml being group readable (I didn’t change anything) and about namespaces not existing…even though I’m following what the guide says!
Also, the one pool which I didn’t create in TrueNAS had minor issues upon being imported, it mounted itself at /mnt/mnt/storage instead of /mnt/storage like I had it in Arch. Apparently once I created my 4th pool, the UI completely forgot that the storage pool (my first pool) existed since the widget on the homepage or the Storage section didn’t show it at all, even though zpool status shows that it was imported and mounted. The Disks list showed that it existed and the drives were part of that pool though.
I think it’s time to finally accept that TrueNAS will never be in the state that I want it to be in and I’ll just have to pay and wait for Poolsman (the cockpit ZFS module) to reach parity with TrueNAS’ features (it’s about 75% of the way there). I’m a sucker for a nice UI, which is why I always keep coming back to TrueNAS, but everything is extremely cumbersome to do when it shouldn’t be, so for the sake of a nice UI that I’ll use maybe once a month I bash my head against the wall for hours a day, and the time it takes just isn’t worth it. I’ve heard many people say that TrueNAS works great as purely a storage OS (so, CORE or SCALE without the apps setup), and I have to agree, but I don’t really have the space and money necessary to run one server for storage and one for everything else.
Even after Arch is installed, configured, and has my pools imported. I can set up about 20+ containers and secure them with SSL certs in about 5 minutes or less. If I saved the previous config data there’s nothing else I need to do, except maybe change a few mount points. In this case, I’ll have to reconfigure everything because apparently the Disks section didn’t let me know that one of the NVME drives I was about to add to a pool contained an EXT4 filesystem (I have 8 NVME drives, 7 of which are usually used for ZFS).
After writing a freaking essay, I think I’m finally done ranting.