Dragonfish swap usage/high memory/memory leak issues

I think there’s a little bit of “trying to have it both ways” going on here on iX’ part. On the one hand, they don’t want to raise the minimum spec too much, to avoid scaring off prospective users[1]. But on the other hand, there seems to be some (understandable) dismissiveness of inadequately-resourced systems.


  1. We haven’t seen it too much in recent years–because this is 2024, and 8 GB just isn’t that much any more–but I remember the hue and cry when the “minimum requirement” for FreeNAS was increased to 8 GB. ↩︎

2 Likes

Fix is being test right now for .1, but fundamentally we had two different issues here which were being hit by users in a variety of ways.

First, we had the bad ARC/Kernel LRU behaviors, where memory was being too aggressively pushed out into Swap space and causing poor system performance and stability issues.

Second swap in general was another issue by itself. Issue there was twofold.

  1. We don’t test with swap on enterprise hardware, it is disabled.

  2. Swap the way it was originally implemented back in the day gave users an option to put swap partition on boot devices. The ARC issue only exposed one aspect of this design problem. What happens in general if you have SWAP usage which is thrashing your boot device? Turns out the answer is that all sorts of no-good things, including performance, stability issues and other random undefined exotic behaviors that mask the real issue. Never mind that many systems don’t use quality boot devices and all that constant extra write load can wear them out too quickly.

After consideration of both sets of of real-world problems, in .1 we will disable swap as a default AND enable the LRU / swapiness fixes. Users who want that safety cushion of on-disk space can re-enable it if they so desire. When re-enabled the ARC LRU / swapiness settings will be in effect and hopefully prevent swap from being used too aggressively, but remember once you do start to swap too much you are opening the doors to a whole host of other unpredictable behaviors.

While ARC brought the issue to the forefront, we still have too many other reports of general instability where swap is heavily utilized. Apps / VMs being used more often on SCALE I’m sure helps drive this behavior. Turns out you generally don’t want to deploy 20+ applications on a system with only 16GB of memory. :wink:

4 Likes

@Kris If a system is currently configured with swap (as it is/was the default during install) and a person wishes to no longer have swap going forward, what would be the best least intrusive and safest data wise suggested path to disabling and removing swap and removing the swap partitions?

At the moment for .1, I’d say we first give some soak time. The partitions being present won’t necessary hurt anything on the system or boot device, they just don’t get utilized unless the user explicitly re-enables them. Longer term we may either auto-remove unused ones, or provide a clear guide on how to do so safely after we’ve written up and tested proper procedures for doing so.

2 Likes

To confirm, an already existing install will have swap disabled installing 24.04.1, it’s not limited to fresh installs from .iso?
(the partitions will stick around but that’s a minor issue)

That sounds like a good path to work forward.
Thanks!

Yes, upgrades and fresh will default to disabled, until user explicitly enables.

4 Likes

We are having it both ways - deliberately

What we sell and support is professional or enterprise quality. RAM costs are not an issue, 32GB is the minimum we use.

However, we understand that in personal/home lab applications, TrueNAS is used on lower cost or 2nd hand hardware. We verify 8GB works for basic storage. However, it is limited for VMs and Apps and won’t perform well with a lot of drives. The guidelines are reasonable.

3 Likes

Perhaps SCALE could have an alert pop up when the user attempts to open the Apps or VM page or otherwise initialise/start using Kubernetes in it, if SCALE’s installed on a system with <16GB of RAM. Doesn’t have to be anything drastic, just that such a limited amount of RAM available for everything SCALE is trying to do could result in performance issues. I think it’s pretty reasonable to assume almost any system that has less than 16GB of RAM is running with 8GB, except for that group of people running SCALE on triple-channel RAM platforms with 12GB.

The problem is that SCALE, arguably unlike CORE, isn’t only focused on serving basic storage needs. Buried way down in the Scale Hardware Guide is notes about needing to add more RAM beyond basic storage needs, but your average personal/homelab user isn’t going to find that and SCALE is presently going through what I’d assume is a large number of users migrating from CORE (traditionally a basic storage-centric platform) to SCALE (something with a lot more flexibility).

R.e the guidelines, I have to go down into the above-linked part of the SCALE docs to find recommendations on how to adjust my decision on how much RAM to throw SCALE’s way. The guidelines are, technically, reasonable, but they’re not easily communicated to someone who can just go to iX’s website, download and start using SCALE. I’m going to be much more inclined to think about it more if the minimum requirements for RAM instead said “8GB for basic storage functions, 16GB+ when adding apps/VMs”.

Tying back into the above, just in general it might be nice for iX/SCALE to add some additional basic considerations for the audience that isn’t in a professional or enterprise environment. And for those who think they’re a professional just because they homelab, but aren’t. Amending minimum system requirements to briefly account for the various things you can stack on top of basic NAS functions with SCALE, some GUI pointers, etc.

Considerations written from the perspective of someone who always ends needing to edit TrueCharts documentation to add a degree of hand-holding far beyond that of which I’d imagine any user savvy enough to use SCALE and install apps on it would need.

When I was trying things out it certainly does pop up a warning about memory when you look at installing apps already. I quickly backed out…

2 Likes

Ah, fair enough then. Been years (since Angelfish) since I’ve had to setup apps.

I am one such user and with 10GB of memory and 5x 4TB drives, and a few apps using (say) 1GB of memory, but no VMs, I still get 99.5% ARC hit ratio!! I guess there is enough memory to cache the ZFS metadata and a lot of the Plex metadata and the media streams benefit from read-ahead.

So, I would agree that a minimum memory of 8GB will still likely be sufficient for reasonable performance from a relatively small amount of disk space - and I think that @kris and @Captain_Morgan are giving us good guidelines.

4 Likes

I just did a “sudo swapoff -a” and (as expected) swap went from 0.69GB to zero, and cache went from 4.45GB to 3.95GB (almost as expected).

I will report back as to whether this has a noticeable impact on the ARC hit ratio once I have had enough usage to tell.

Thanks to @everyone for the feedback so far. I appreciate all the fixes and good works that ixSystems have put into 24.04.1 to correct the biggest issues, but I am still unclear what issues remain and whether it will be stable enough for me to upgrade or whether I should wait for 24.04.2…

Software Status - TrueNAS Roadmap - Open Source NAS Software

You can generally follow the guidance here for “Conservative” if that adjective more closely follows your risk tolerance.

1 Like

Thanks for posting this. I just migrated to Dragonfish -24.04.1.1 and came upon this energy usage issue.

To be clear, if I run the shell command you posted this resolves this issue 100% for now, correct?

echo n >/sys/kernel/mm/lru_gen/enabled

@Mark_the_Red I am not sure what energy usage are you talking about. But the LRU_GEN should be disabled in 24.04.1.1 on the level of kernel configuration.

2 Likes

Sorry Mav for dumb question. I am not on your level of technical understanding. I just changed all my hardware from scratch and having low power usage was a key buying criteria to that end. So when I saw this it concerned me

To be clear, if I just installed Dragonfish-24.04.1.1 I do not need to do anything via shell command, correct?

So if I entered “echo n >/sys/kernel/mm/lru_gen/enabled” then I need to “echo n >/sys/kernel/mm/lru_gen/disabled” back to the stock settings correct?

Hope I am being clear.

DragonFish 24.04.1 includes the lru_gen fix. 24.04.1.1 has a fix for apps starting.

You should no longer be doing anything to lru_gen. It resets to default on reboot, which is now “0”

1 Like

Ok and thank you.