Cosuming more RAM? 25.04.02

I never had issued with RAM, but it would appear - and I could be wrong - that this version is more RAM hungry than the previous. My system has been rock solid for several years now with 32GB and three VMs provisioned with 6GB, 4GB, and 1GB of RAM. I upgraded with these VMs running and they all booted fine and ran for days.

Today, I had to take one of them down, and when bringing it back up it said there was no memory. Services was chewing up about 29GB, so I rebooted and this showed 12GB free, then 8, then 4, then 2.. and that’s where it stayed. So.. now I cannot boot the VMs in their normal config because there’s no RAM available anymore. And this with one of my small VMs that only had 1GB provisioned disabled. WHERE DID ALL THE RAM GO? The system now shuts down one of my VMs with OOM errors after it’s been running for a few minutes.

[  271.143838] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=emulator,mems_allowed=0,global_oom,task_memcg=/machine.slice/machine-qemu\x2d2\x2d10HTPC.scope/libvirt,task=qemu-system-x86,pid=7268,uid=986
[  271.143910] Out of memory: Killed process 7268 (qemu-system-x86) total-vm:10136676kB, anon-rss:6345368kB, file-rss:276kB, shmem-rss:0kB, UID:986 pgtables:14872kB oom_score_adj:0

Nothing has changed on my end aside from the upgrade from 24 to 25.

Has anyone else run into anything like this?

UPDATE

I rolled back to Dragonfish-24.04.2.5 and the problem is GONE. My VMs are provisioned with 6GB, 6GB, and 1GB and they are running fine. I can stop and restart them without any OOM issues, so there’s something amiss with 25. I rarely open bug reports, but this seems like a major issue with how RAM is being handled in 25.

OOM is a feature, not a bug. It’s good that your system starts killing processes and requires reboots, because it alerts you to a problem.

Have you ever considered upgrading from 32GB of RAM to 128GB? :smiley:


I’m still on TrueNAS Core, where I don’t have the benefit of OOM, poor memory management, and spontaneous reboots. Every day I have to suffer with the reminder that Core runs on the inferior and boring FreeBSD. :face_vomiting:

How lucky you are.

3 Likes

Absolutely true, and yes I plan to go with 128+ on my next build, but this is more about WHY the system appears to be more memory efficient in 24 than in 25. It’s clearly a THING that shouldn’t be happening or has a specific reason as to why it’s happening.

We all know that boring isn’t as euphoric or fun as upgrading to new features you probably don’t need, but it’s far better than troubleshooting. The momentary elation gained isn’t worth the hassle of reliability and familiarity. That’s why I never upgraded my wife.

It wasn’t “more efficient” in the earlier versions of SCALE. They removed the restriction on the ARC’s allowance,[1] so that it behaves similar to FreeBSD’s defaults.[2] The problem is that Linux is less graceful at dealing with memory pressure when ZFS/ARC is involved, and heavily relies on swapping to disk. Swap was subsequently removed, which means there is no more buffer. This is why you see OOM killers invoked.

The “solution” for the latest version of SCALE/CE is to use a custom parameter to limit the ARC and buy more RAM to lessen the likelihood of nearing OOM.

EDIT: Apparently the default for Linux has changed with OpenZFS 2.3.0, which matches the default for FreeBSD.


  1. This raises the question: Why does upstream OpenZFS default to 50% of system memory for the ARC maximum on Linux systems, yet the default value for FreeBSD systems is “Total RAM - 1GB”? What do they know that we don’t? It’s unlikely they decided on a radically different default value for Linux without good reason. ↩︎

  2. This was done under the assumption that there’s really no difference in memory management between the two OSes when it comes to ZFS and ARC. “Why limit how much ARC can reside in RAM? Linux can handle it just as well as FreeBSD!” ↩︎

1 Like

Thanks for the info. I missed anything and everything about ARC. I’m looking into this and seeing some confusing information. “The “solution” for the latest version of SCALE/CE is to use a custom parameter to limit the ARC”

However, for the latest versions of TrueNAS SCALE (Dragonfish 24.04 and later), this is generally no longer necessary. The system default behavior has changed to automatically manage ARC memory more dynamically, similar to TrueNAS CORE, making manual limits less critical for general use cases like running VMs or applications.

While the parameter still exists, the newer versions are designed to use available memory more efficiently and automatically reduce the ARC size when other applications or VMs need memory. Manually setting auxiliary parameters is generally discouraged before a major upgrade, as configurations can change

Which seems to be fine on 24, but on 25 it’s clearly not “automatically reduce the ARC size when other applications or VMs need memory.” Based on this I shouldn’t have to use “zfs_arc_max" in a post-init script to limit this on “newer versions” of TN. Also, it’s the “services” section that eating up far more memory, not the ZFS cache as displayed on the Dashboard. What gives?!

I don’t see the justification for why this did this other than, “We should change this now because 1/2 of RAM is too limiting for modern systems.”

:sweat_smile:

3 Likes

Right. The underlying issue was never resolved.

First, TrueNAS removed the restriction on their end, prior to OpenZFS 2.3.0, to “match how FreeBSD does it.”

Then later on, OpenZFS 2.3.0 removed this restriction of the default value for Linux systems, “just because”.

Nowhere do I see that memory management of Linux with ZFS+ARC has been addressed. This all feels like they’re just winging it and deciding, “Yeah, we don’t think this is a problem anymore. Modern systems should be able to handle this just fine.”

See my previous post that links to the commit on their GitHub.

1 Like

I get it, 32GB is old-school, and I’ll go modern (128GB) on my next upgrade folks, but I’m still confused about 25. It’s the “services” section that eating up far more memory, not the ZFS cache as displayed on the Dashboard. When I run ‘arc_summary’ …

ARC status:                                                      HEALTHY
        Memory throttle count:                                         0

ARC size (current):                                    11.1 %    3.4 GiB

It shows 3.4 which is exactly what the Dashboard shows for ZFS Cache, so I assume they are indicating the same value.. correct me if I’m wrong here. Assuming the value for ZFS Cache is an independent value no represented in the Services RAM allocation. I ask because it’s the Services that crept up to 29-30GB of RAM usage on 25, not the ZFS Cache, so is this truly an ARC issue or something else?

It’s not. I’ve been happy on 16GB of RAM and now recently 32. :slightly_smiling_face:

I was being facetious about quadrupling your RAM to “solve” this OOM problem.


Which should have rapidly and dynamically pruned and shrunk your ARC before reaching those levels. Having swap (on an SSD/NVMe boot drive) would have also allowed more room to breathe.

As for those “Services”, you would need to use something like htop to see which processes they are, as the GUI doesn’t really break it down. You can only guess at this point. No sure how eager you are to get back on 25.04. Maybe it’s solved on 25.10?

Perhaps.. I have not tried that as I usually wait for a few .xx “fix” releases to move to a new train. I did use iotop and htop at the time, but I think it was the KVM process using the RAM. I’d need to boot back into 25 to verify. I’ll check that in my next available block of tinker-time.

Just a heads-up, somewhere in a different post I think I saw a comment that starting/restarting vms removes a manually set arc limit

2 Likes

That is my personal experience, yes. It is fairly easy to replicate.

Uhmm, i wonder what happens if you start to copy some 10 GB files and the cache starts eating up that 15 GB of free RAM you got. I mean i also noticed a difference of RAM beeing used under Services in 24.x and 25.x but it wasn’t actually that large - maybe about 3 GB more…

I had never set a limit before, but that is good to know should I employ that in the future.

Never had any OOM issues over several years until I upgraded to 25.

1 Like

I’ve monitored it over the past few days back on Dragonfish-24.04.2.5. I’ve seen the ZFS cache fluctuate, all the way to 15GB, but it comes back down to 11-12 and settles there. It shows I still have 5.6GB free and this is with my 3 VMs running at 6GB, 6GB, and 1GB. I couldn’t even get the VMs off the ground in 25 and was met with constant OOM errors.

I don’t know what is happening with 25, but there’s definitely something being handled differently with RAM allocation/usage. Is it some ARC setting? No idea, but I’d love to know the answer as I would like to move up to 25 Goldeneye in the future, but I can’t do it if my VMs are going to be denied resources they have no problem with in 24.

Docker VM

Windows VM

Linux VM

Hmm, i did some testing… had about 5 GB RAM free, 46 GB ZFS cache, 13 GB Services. Shut down a VM which had 8 GB allocated (2 min/8 max) and changed it to 24 GB allocated (24 min/24 max) and i could start it. It just grabbed the RAM for the VM from the zfs cache.

So i really wonder why your VM refused to boot up. You actually have to be a bit cautious about the Minimum Memory Size setting: if you set this too low the VM might not start at all. 2 GiB is probably a safe value for this.

I set the min size as low as 1GB with the same results. I also thought that if I rebooted it would allocate the memory to the VM and wouldn’t steal RAM from them, but that wasn’t the case as it would shut down the VM with the OOM error. If it were operating as stated, it would never have received an OOM error and allocated that RAM to the VM when needed, however, it wasn’t the ZFS Cache using 29-30GB of RAM, it was Services.

Guess then its back to htop and see which of the “services” is actually hogging the memory.