Dragonfish swap usage/high memory/memory leak issues

Fix is being test right now for .1, but fundamentally we had two different issues here which were being hit by users in a variety of ways.

First, we had the bad ARC/Kernel LRU behaviors, where memory was being too aggressively pushed out into Swap space and causing poor system performance and stability issues.

Second swap in general was another issue by itself. Issue there was twofold.

  1. We don’t test with swap on enterprise hardware, it is disabled.

  2. Swap the way it was originally implemented back in the day gave users an option to put swap partition on boot devices. The ARC issue only exposed one aspect of this design problem. What happens in general if you have SWAP usage which is thrashing your boot device? Turns out the answer is that all sorts of no-good things, including performance, stability issues and other random undefined exotic behaviors that mask the real issue. Never mind that many systems don’t use quality boot devices and all that constant extra write load can wear them out too quickly.

After consideration of both sets of of real-world problems, in .1 we will disable swap as a default AND enable the LRU / swapiness fixes. Users who want that safety cushion of on-disk space can re-enable it if they so desire. When re-enabled the ARC LRU / swapiness settings will be in effect and hopefully prevent swap from being used too aggressively, but remember once you do start to swap too much you are opening the doors to a whole host of other unpredictable behaviors.

While ARC brought the issue to the forefront, we still have too many other reports of general instability where swap is heavily utilized. Apps / VMs being used more often on SCALE I’m sure helps drive this behavior. Turns out you generally don’t want to deploy 20+ applications on a system with only 16GB of memory. :wink:

4 Likes