TrueNAS SCALE version: ElectricEel-24.10.2
Feb 22 18:10:46 disarray kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=vcpu0,mems_allowed=0,global_oom,task_memcg=/machine.slice/machine-qemu\x2d1\x2d4windowsgameservers.scope/libvirt,task=qemu-system-x86,pid=6495,uid=64055
I’ve been having constant issues with my system being OOM’d regardless of what I set ZFS ARC cache system parameters to, causing my VM to be consistently killed within about ~2 days of uptime. I have the following script run on boot:
echo 42949672960 > /sys/module/zfs/parameters/zfs_arc_max
echo 17179869184 > /sys/module/zfs/parameters/zfs_arc_sys_free
and had it running as a cronjob every minute(!!!) and yet I still am consistently seeing the OOM killer being triggered and have noticed that something is causing the /sys/module/zfs/parameters/zfs_arc_max
parameter to be reset to a high value (~68 GB? which I think is roughly the amount of free RAM available at the time of it being set to the value) which then within a small amount of time, consumes the rest of my free RAM and then triggers the OOM killer. For reference, I have 96 GB of RAM total (with only 8 of that being given to the VM), and have attempted to limit ARC to 40 GB (hence the above), but no matter what I do I can’t get the system to be stable. This has persisted since I set up the system around November of last year. I’ve read through other forum posts and Reddit posts but none of the suggestions have been a long term fix, so any additional help would be appreciated. I’m happy to provide any logs if that’d also help.