ARC Memory Allocation on Dragonfish TrueNAS 24 and Other Issues

Hello everyone,

Weird to see this new forum… but anyway the new version of TrueNAS has changed the way it allocates the memory for ARC and I would like to start another discussion on the matter to try to figure out a solution to this major problem.

You see this specific machine I moved away from Proxmox because it had a major problem when the root filesystem just filled up from over-provisioning (which shouldn’t even happen to begin with) and I could not repair the root filesystem, I tried to free up space to restore everything just wouldn’t happen, and since most of what that machine did was a storage server I figured I would move everything from that machine to TrueNAS Scale which is just as competent as a hypervisor for what I need to do.

The two major problems I’m having are, first the fact I just had to free up one pcie slot and order a quadro p620 so I can have that as the system GPU and then be able to passthrough my radeon pro w6600 to a windows vm, still waiting on that gpu but with proxmox i was able to simply passthrough that single gpu to the vm and run headless. I do think that this should absolutely be an option and there is no reason to ‘waste’ a gpu and a valuable pcie slot on simply running nothing but a text UI that will never be used unless something were to ever go wrong (which you can still override and take the gpu back in that situation, ideally with a boot menu entry to minimise the effort).

The second major problem i’m having is if I start all the VMs on boot, I get all my 128GB of memory minus whatever the system uses available to me, then the vms start and all that free memory remains available.

But then if i start to use the system the ARC cache will eat ALL the available free ram and if i were to say want to create and start another VM as my needs would change or as i’m still setting them up one by one i cannot unless i tell it that i understand the risks and FORCE the vm to start which depending on how much memory is available is likely to lead to a full crash.

I also cannot simply stop a vm to change something and then start again, because if i do that the ARC cache will happily eat all of the memory the VM was using before I stopped it.

Now from online research it seems that people complained in the past ARC was only using the 50% or so of system memory and now it’s just eating up as much as it can. Now this memory is being used because I have also SMB activity in the background such as downloads writing things to disk.

I have seen some overrides but also people mentioning that it no longer applies because the system will override automatically when you add more vms but it doesn’t seem to be working as it should.

The behaviour i would like to see is ideally have a new menu on the left side of the web ui where you can manually allocate and control the ARC memory as in, if you need to free up memory should be just one button, limit to xxGB and click here to immediately free up or slowly (if it needs to write that data to disk first) and give me a progress bar, and after that the memory will be untouched/reserved so i can start vms with it and stop etc and all the VMs will be able to run without being affected by ARC and ARC will have access to, almost, all the rest of the memory of the system but with some manual control, the automatic behaviour can still be as is now because I absolutely understand why as a primarily file server it would want to use up all the RAM for ARC, but then people have to run their own vms and containers and we all have our budget limitations on how many servers and OSes we can have so flexibility here is always welcome.

Thanks!

There was a bug related to lru_gen being enabled. Disabling it resolves.

As you allocate more memory to other processes jncuding VMs, the ARC is released.

Seems to work okay.

You can limit the ARC with a sysctl setting, but as this shouldn’t need to be done (if everything is working right) there probably isn’t much need for a UI for it.

Had an issue yesterday with a Docker container inside a jail (not your average setup, I know…), and ARC cache not being released, blocking the container start. Had to resort to limiting ARC to 16GB (server has 32GB), with

echo 17179869184 >> /sys/module/zfs/parameters/zfs_arc_max

Granted, my particular scenario is unusual (it is a Windows 11 container, requiring 4GB to boot, and used for managing a Samba AD controller, itself running inside another jail…), but this works for me - YMMV.

P.S.: the Windows container is started only when needed, to remotely manage the AD controller running on the Samba jail.

@Stux Well the sysctl setting should be easy enough to set though that means i have to change/reboot if i want to make more ram available if i say stop using a vm/container and there’s more ram available for arc to use, the gui option would at least allow no memory wasted yet smart enough to free up/flush when i need it without intervention though possibly would just be a gui script to set via sysctl anyway so…

@Cellobita Would prefer avoiding doing it manually as ideally it should just figure itself and understand that system can also need memory for vms/containers i feel SCALE in particular is far more of a flexible system than CORE was, CORE really felt like a NAS same as pfsense feels like a router only machine, scale is very flexible and hopefully this philosophy will carry on through future releases.

I’m not happy with having to limit the ARC usage manually - particularly because the Windows container is started a couple times a month, at most. But for now I haven’t found a better way.

With lru_gen disabled it should just automatically resize. But if for whatever reason you can’t start a jail without a specific amount of RAM free (I would have thought it would just resize), you can just set /sys/module/zfs/parameters/zfs_arc_sys_free as a post-init task to leave x amount free, i.e. 8589934592 for 8GiB.

For example:

echo “8589934592” > /sys/modules/zfs/parameters/zfs_arc_sys_free

This will force ARC to keep 8GiB memory free/unused. In your case if it’s just this Windows container you could play around with it to leave 4GiB or 6GiB free.

I’m assuming by jail you mean jailmaker, which only recently got ‘initial’ support in TrueNAS. If so I’m not knowledgeable, but I would have assumed it would not error out when starting a jail if there was not a sufficient amount of free memory. Maybe there’s a way to force it, as ARC will shrink as needed to fit it (@Stux any ideas?)

Many thanks for the zfs_arc_sys_free tip! It seems just what I need.

FWIW, what errored out was not the docker jail itself, but a specific docker container inside it, called Dockur (basically a virtualized instance of Windows, self-installing - useful for my particular needs)

It solves a very specific need: being able to remotely manage an AD domain for a customer (the DC itself runs inside a Samba jail), without having to resort to a TeamViewer session to one of their workstations, interrupting their work.

@essinghigh will try this out might also somewhat fix the problem for me, though there will be always some ram wasted at least gives some control back without having to reboot the server whenever too much ARC is used and not flushed.

1 Like

I should also mention I have been ignoring this warning for as long as I can remember and forcing it, and have never so much as run into a hiccup. I honestly think this is some sort of bug that needs to be addressed at some point as ARC can and absolutely will dynamically resize to fit the VMs memory requirements (assuming there’s enough memory free including ARC).

Edit: Once I’m home I’ll put together a quick PoC for how this works. As I’ve TRIED to hit an OOM condition when testing these ARC changes before they were released and never managed to cause one.

2 Likes

I’m still running 1% of RAM in “/sys/module/zfs/parameters/zfs_arc_min” and 10% of RAM “/sys/module/zfs/parameters/zfs_arc_min” on my 128 GByte RAM TrueNAS Scale VM machine with the actual Dragonfish release, works fine.

Here is my memory pie chart, isn’t it pretty?

Now let’s go crazy and fill 60GiB of my memory as quickly as I possibly can (I have no swap)
head -c 60000m /dev/zero | tail


Going… going… and…

No problems, ARC shrinks dynamically. As long as you have enough free space + ARC you should be able to ignore that VM startup warning.

1 Like

Nice, i would love to have that data via telegraf in my InfluxDB :slight_smile:

@essinghigh So why have that warning at all if it works perfectly? Not sure why they would bother then. I have been able to force start VMs before without any issues as well but they were pretty close to the available RAM, though starting a vm with say 32GB of ram with 0 available that i haven’t done yet to test as this is a semi-production machine i don’t wanna play and crash it, i have another threadripper with 256GB ram a 3960X i will mess with later.

vvvv

What should probably be shown is free memory with ARC in use as a sort of in-brackets-additional-thing. There should still be a warning, but it probably shouldn’t be putting the fear of God into you

@essinghigh Reason I didn’t push any further is that I don’t want this machine to hard lock but also from the old forum I remember reading a post somewhere someone mentioning they ignored and their machine crashed out of memory but then again software changes over time.