I never had issued with RAM, but it would appear - and I could be wrong - that this version is more RAM hungry than the previous. My system has been rock solid for several years now with 32GB and three VMs provisioned with 6GB, 4GB, and 1GB of RAM. I upgraded with these VMs running and they all booted fine and ran for days.
Today, I had to take one of them down, and when bringing it back up it said there was no memory. Services was chewing up about 29GB, so I rebooted and this showed 12GB free, then 8, then 4, then 2.. and thatâs where it stayed. So.. now I cannot boot the VMs in their normal config because thereâs no RAM available anymore. And this with one of my small VMs that only had 1GB provisioned disabled. WHERE DID ALL THE RAM GO? The system now shuts down one of my VMs with OOM errors after itâs been running for a few minutes.
[ 271.143838] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=emulator,mems_allowed=0,global_oom,task_memcg=/machine.slice/machine-qemu\x2d2\x2d10HTPC.scope/libvirt,task=qemu-system-x86,pid=7268,uid=986
[ 271.143910] Out of memory: Killed process 7268 (qemu-system-x86) total-vm:10136676kB, anon-rss:6345368kB, file-rss:276kB, shmem-rss:0kB, UID:986 pgtables:14872kB oom_score_adj:0
Nothing has changed on my end aside from the upgrade from 24 to 25.
I rolled back to Dragonfish-24.04.2.5 and the problem is GONE. My VMs are provisioned with 6GB, 6GB, and 1GB and they are running fine. I can stop and restart them without any OOM issues, so thereâs something amiss with 25. I rarely open bug reports, but this seems like a major issue with how RAM is being handled in 25.
OOM is a feature, not a bug. Itâs good that your system starts killing processes and requires reboots, because it alerts you to a problem.
Have you ever considered upgrading from 32GB of RAM to 128GB?
Iâm still on TrueNAS Core, where I donât have the benefit of OOM, poor memory management, and spontaneous reboots. Every day I have to suffer with the reminder that Core runs on the inferior and boring FreeBSD.
Absolutely true, and yes I plan to go with 128+ on my next build, but this is more about WHY the system appears to be more memory efficient in 24 than in 25. Itâs clearly a THING that shouldnât be happening or has a specific reason as to why itâs happening.
We all know that boring isnât as euphoric or fun as upgrading to new features you probably donât need, but itâs far better than troubleshooting. The momentary elation gained isnât worth the hassle of reliability and familiarity. Thatâs why I never upgraded my wife.
It wasnât âmore efficientâ in the earlier versions of SCALE. They removed the restriction on the ARCâs allowance,[1] so that it behaves similar to FreeBSDâs defaults.[2] The problem is that Linux is less graceful at dealing with memory pressure when ZFS/ARC is involved, and heavily relies on swapping to disk. Swap was subsequently removed, which means there is no more buffer. This is why you see OOM killers invoked.
The âsolutionâ for the latest version of SCALE/CE is to use a custom parameter to limit the ARC and buy more RAM to lessen the likelihood of nearing OOM.
EDIT: Apparently the default for Linux has changed with OpenZFS 2.3.0, which matches the default for FreeBSD.
This raises the question: Why does upstream OpenZFS default to 50% of system memory for the ARC maximum on Linux systems, yet the default value for FreeBSD systems is âTotal RAM - 1GBâ? What do they know that we donât? Itâs unlikely they decided on a radically different default value for Linux without good reason. âŠď¸
This was done under the assumption that thereâs really no difference in memory management between the two OSes when it comes to ZFS and ARC. âWhy limit how much ARC can reside in RAM? Linux can handle it just as well as FreeBSD!â âŠď¸
Thanks for the info. I missed anything and everything about ARC. Iâm looking into this and seeing some confusing information. âThe âsolutionâ for the latest version of SCALE/CE is to use a custom parameter to limit the ARCâ
However, for the latest versions of TrueNAS SCALE (Dragonfish 24.04 and later), this is generally no longer necessary. The system default behavior has changed to automatically manage ARC memory more dynamically, similar to TrueNAS CORE, making manual limits less critical for general use cases like running VMs or applications.
While the parameter still exists, the newer versions are designed to use available memory more efficiently and automatically reduce the ARC size when other applications or VMs need memory. Manually setting auxiliary parameters is generally discouraged before a major upgrade, as configurations can change
Which seems to be fine on 24, but on 25 itâs clearly not âautomatically reduce the ARC size when other applications or VMs need memory.â Based on this I shouldnât have to use âzfs_arc_max" in a post-init script to limit this on ânewer versionsâ of TN. Also, itâs the âservicesâ section that eating up far more memory, not the ZFS cache as displayed on the Dashboard. What gives?!
I get it, 32GB is old-school, and Iâll go modern (128GB) on my next upgrade folks, but Iâm still confused about 25. Itâs the âservicesâ section that eating up far more memory, not the ZFS cache as displayed on the Dashboard. When I run âarc_summaryâ âŚ
It shows 3.4 which is exactly what the Dashboard shows for ZFS Cache, so I assume they are indicating the same value.. correct me if Iâm wrong here. Assuming the value for ZFS Cache is an independent value no represented in the Services RAM allocation. I ask because itâs the Services that crept up to 29-30GB of RAM usage on 25, not the ZFS Cache, so is this truly an ARC issue or something else?
Itâs not. Iâve been happy on 16GB of RAM and now recently 32.
I was being facetious about quadrupling your RAM to âsolveâ this OOM problem.
Which should have rapidly and dynamically pruned and shrunk your ARC before reaching those levels. Having swap (on an SSD/NVMe boot drive) would have also allowed more room to breathe.
As for those âServicesâ, you would need to use something like htop to see which processes they are, as the GUI doesnât really break it down. You can only guess at this point. No sure how eager you are to get back on 25.04. Maybe itâs solved on 25.10?
Perhaps.. I have not tried that as I usually wait for a few .xx âfixâ releases to move to a new train. I did use iotop and htop at the time, but I think it was the KVM process using the RAM. Iâd need to boot back into 25 to verify. Iâll check that in my next available block of tinker-time.
Uhmm, i wonder what happens if you start to copy some 10 GB files and the cache starts eating up that 15 GB of free RAM you got. I mean i also noticed a difference of RAM beeing used under Services in 24.x and 25.x but it wasnât actually that large - maybe about 3 GB moreâŚ
Iâve monitored it over the past few days back on Dragonfish-24.04.2.5. Iâve seen the ZFS cache fluctuate, all the way to 15GB, but it comes back down to 11-12 and settles there. It shows I still have 5.6GB free and this is with my 3 VMs running at 6GB, 6GB, and 1GB. I couldnât even get the VMs off the ground in 25 and was met with constant OOM errors.
I donât know what is happening with 25, but thereâs definitely something being handled differently with RAM allocation/usage. Is it some ARC setting? No idea, but Iâd love to know the answer as I would like to move up to 25 Goldeneye in the future, but I canât do it if my VMs are going to be denied resources they have no problem with in 24.
Hmm, i did some testing⌠had about 5 GB RAM free, 46 GB ZFS cache, 13 GB Services. Shut down a VM which had 8 GB allocated (2 min/8 max) and changed it to 24 GB allocated (24 min/24 max) and i could start it. It just grabbed the RAM for the VM from the zfs cache.
So i really wonder why your VM refused to boot up. You actually have to be a bit cautious about the Minimum Memory Size setting: if you set this too low the VM might not start at all. 2 GiB is probably a safe value for this.
I set the min size as low as 1GB with the same results. I also thought that if I rebooted it would allocate the memory to the VM and wouldnât steal RAM from them, but that wasnât the case as it would shut down the VM with the OOM error. If it were operating as stated, it would never have received an OOM error and allocated that RAM to the VM when needed, however, it wasnât the ZFS Cache using 29-30GB of RAM, it was Services.