Thanks for the info.
Yeah, I noticed a while back that despite having 512gb of ram, only 30-40gb allocated by services, and always having 30-40gb “free”, that I was noticing swap being used. Over a long enough time, the entire 10gb of swap would be in use. It made me scratch my head but I didn’t expect it to be an issue.
My use case has a ton of open network connections, a lot of containers running, and a large number of files being accessed simultaneously. Increasing the sysctl variables really seemed to have an impact, which makes me wonder if some of the defaults are perfectly fine for a desktop or single-service server scenario, but not for a situation where a lot of files are being accessed by a lot of services and users? max_user_instances default 128 was definitely an issue for the number of containers. max_user_watches default seemed to be an issue for situations where a lot of files are accessed (rsync, remote peers). net.core.somaxconn seemed to help with the number of connections…
Is it possible these values are only ‘too low’ because swap/memory issues are causing delays in releasing the above resources and is allowing them to become exhausted?
So far with the above values increased and swap disabled, truenas has been SNAPPY for the past day or two no matter what I throw at it. As a test, tonight I will disable the modified sysctl values, reboot, and disable swap. If the UI remains responsive for a few days, I’ll feel confident the swap is the only issue.