OOM killer being triggered regardless of ZFS ARC parameters even when set on 1 minute cronjob

TrueNAS SCALE version: ElectricEel-24.10.2

Feb 22 18:10:46 disarray kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=vcpu0,mems_allowed=0,global_oom,task_memcg=/machine.slice/machine-qemu\x2d1\x2d4windowsgameservers.scope/libvirt,task=qemu-system-x86,pid=6495,uid=64055

I’ve been having constant issues with my system being OOM’d regardless of what I set ZFS ARC cache system parameters to, causing my VM to be consistently killed within about ~2 days of uptime. I have the following script run on boot:

echo 42949672960 > /sys/module/zfs/parameters/zfs_arc_max
echo 17179869184 > /sys/module/zfs/parameters/zfs_arc_sys_free

and had it running as a cronjob every minute(!!!) and yet I still am consistently seeing the OOM killer being triggered and have noticed that something is causing the /sys/module/zfs/parameters/zfs_arc_max parameter to be reset to a high value (~68 GB? which I think is roughly the amount of free RAM available at the time of it being set to the value) which then within a small amount of time, consumes the rest of my free RAM and then triggers the OOM killer. For reference, I have 96 GB of RAM total (with only 8 of that being given to the VM), and have attempted to limit ARC to 40 GB (hence the above), but no matter what I do I can’t get the system to be stable. This has persisted since I set up the system around November of last year. I’ve read through other forum posts and Reddit posts but none of the suggestions have been a long term fix, so any additional help would be appreciated. I’m happy to provide any logs if that’d also help.

I’m having exactly the same issues as you are. It goes as far as trueNAS VM causing my other VMs to OOM as well, despite total assigned RAM being 10GB less than max RAM.

To be honest, it looks like trueNAS scale or rather ZFS’s arc cache simply does not work properly in a VM. It seems to not be detected as cache and thus not evicted when RAM is running low.

This coupled with the fact there is no stable VM feature in trueNAS itself is a huge issue for me.

It would help if you post your detailed hardware, os, aps, vms, etc. We can only go off the information posted.

You can also try Report A Bug in the TrueNAS GUI or this forum, top right along with the dump file.

If you Report A Bug, please link the Jira ticket here.

Thank you for offering help, but at the moment I don’t really have the time to tinker with unstable setups. The good thing about trueNAS and docker is that you can reinstall and be up and running again fast.

I now have trueNAS installed on bare metal with dockge still managing the containers. The VMs will have to wait. I’ll see if that runs more stable or not.

I was running proxmox with trueNAS 25.01 with the SATA controller passed through. The system has 64GB RAM and trueNAS had 32GB assigned, while the second VM had 20GB assigned. Proxmox itself was using about 3GB tops.

Both VMs reported OOM errors every time there was high activitty on the NFS shares for a while. Last time I caught trueNAS VM RAM usage peaking at 97% and when I went to look at the console, it restarted repeatedly. After a third try the RAM usage dropped.