But swap does serve a very important purpose to prevent OOM during extreme situations.
Why would it be okay to just allow the system to hit OOM without even the possibility for a little bit of swap to be used to prevent this, in which the system can continue to run normally and eventually hold everything in RAM once the load is reduced?
I’ll try not to use the “F” word, but it seems this is an issue of the OS and how it handles swapping, rather than the presence of available swap in the first place.
This is why I disagree with outright disabling swap. It could cause issues for users with VMs, Apps, and large data reads, in which they might run into situations where their RAM’s capacity cannot hold everything at certain peak usages.
At least even a tiny bit of swap (such as 1 GiB) can provide a temporary safety cushion to avoid OOM.
EDIT: And are we sure that setting the swappiness to “1” doesn’t result in the same benefits being seen as when swap is completely disabled? (If even at a value of “1” it is still aggressively used, then goodness, the Linux kernel developers need to fix that.)
We may make that presence of swap configurable, again that is the debate.
Personally I don’t want that cushion. I’d rather fail early and fail hard, instead of entering a state where the system just starts performing poorly and exhibiting odd behaviors on top of it. Much easier to recognize an OOM and figure out why that is, vs the alternative which could be going catatonic in random places
The legacy reason we had swap in FreeNAS/TrueNAS before was for kernel crash dumps and less about OOM protection. But we’ve not used that in a LONG time, in case anybody was curious. I’ve been personally running systems here for a long time (BSD/Linux + ZFS both) without swap. ARC tends to just do the right thing and shrink when needed. If it cant shrink further, then you’ve got major issues elsewhere. I.E. too many apps, or some memory leak in a userspace service that is going to take the system down rudely either way.
Does swap actually prevent OOM or just add an additional cushion? If a process manages to leak until it exhausts all RAM and moves to swap, it’ll eventually consume that as well and still OOM, won’t it?
I don’t think disabling swap should be the permanent solution even if I have no desire to use it. I’m curious if something with the new ARC size handling is the cause of the problem, or if the root problem has been around for a long time but never reared its head due to the smaller ARC limit? I’m leaning towards the former – I’d expect a number of people increased ARC prior to the auto-sizing and don’t recall seeing this problem discussed.
Agreed. But I was referring to “peak” situations of a user with VMs, Apps, high loads, etc, where the RAM’s limit is temporarily reaching a point where it might need to swap out, rather than OOM.
(Yes, a memory leak would just delay the inevitable, so that requires fixing the culprit rather, than relying on swap to save you.)
Not to stray too much off topic, but is any of this being communicated to the Linux kernel devs?
Even if we get the best of all worlds for SCALE[1] (i.e., FreeBSD-like ARC behavior + no more issues with swap or crippling slowdowns), I think Linux needs to do some serious rework of their memory management if simply using ZFS + a very high zfs_arc_max causes the system to needlessly swap to disk.
EDIT: To be more clear on this point, it’s silly that removing the very presence of swap allows the kernel to behave more sanely.
It’s like having a dog that tears up all your furniture and attacks your guests because you have a box of treats visibly set on the table. But then if you remove the box of treats from the house, your dog starts behaving properly. (Even though it always could have this entire time!)
It’s looking to be the case that we might have a longterm solution for SCALE soon, without sacrificing the benefits of the ARC. ↩︎
Once we get a bit further and understand the deeper “why” then perhaps. Whats odd is that swap is still being used even when plenty of free memory is still available, ARC or no ARC. That’s the behavior I’d like to understand fully. Zero reason to swap at all if you have plenty of RAM still to spare.
It’s looking to be the case that we might have a longterm solution for SCALE soon, without sacrificing the benefits of the ARC. ↩︎
I just set my swappiness to 1 from the default of 60 and re-enabled swap. I’ll monitor over the next few days and let ya’ll know what happens.
Yesterday, before disabling swap, I had also limited my arc cache to 175 gb out of my 228G available- wanted to make sure I didn’t have any OOM issues so I trimmed that way down. Unlike @kris , I didn’t want to fail early or hard Perhaps I can bump up my arc a bit more if this proves stable.
Yeah, unfortunately not in a position right now where I want to reboot this box. It’s not a “mission critical” box for me, but it would cause some headache if I were to reboot right now
Question- what would theoretically happen if I set zfs_arc_max to more than what ram is available? Would I fail hard, or is zfs smart enough to not exceed the available ram?