Nothing in that log, besides multiple warnings about a deprecated log setting. However those warnings stop at 4 in the night, and returns after 10 am when I restart SMB service, so at least I know what time the service fails… I just dont know why…
Syslog revealed out of memory errors, and that memory manager closed down services.
I did not have any problems with v22, although I was already pushing it to the limit with 8 gb. It is running in proxmox, so now I upped it to 12 gb. (non ballooning)
Same error after a couple of hours.
Found a bug report that might be related - NAS-128788
It states however, that is was resolved in 24.04.1 - which is the version I am running…
I did however not take a backup of the original file - so cant revert to the original setting. Will this revert with the next update?
It seems however to have solved it, so now services have quite a lot of RAM available (and SMB has not crashed yet) and ZFS cache a lot less.
I assume this is a performance hit for the disk i/o?
But better than crashing services.
SMB crashed again after a couple of hours.
Either you are right, that reboot resets the arc to default, because I did a reboot after the setting (however it seemed that memory management was different, since services had more memory) or there is another major bug.
Yes, a reboot does undo this. You should set a post-init task if you want it to persist on reboot. You don’t need to reboot for it to take effect.
I wouldn’t think ARC using the majority of memory would be causing issues as it should resize dynamically as needed. I’ve demonstrated this in previous posts quite a few times, and I have never managed to cause an OOM condition despite attempts.
How much memory do you have?
As opposed to using zfs_arc_max, maybe take a look at zfs_arc_sys_free, this is the number of free bytes that ARC should leave as free memory on the system. By default I believe it’s 1/64th of the total memory capacity (so 128GiB would be 2GiB free), but you could nudge this to test whether free memory availability is actually the issue. This way if services start eating up memory ARC won’t grow beyond where you want it.
EDIT:
Also, now that you know when the service stops roughly, have you checked netdata stats to see what CPU usage, memory usage, etc looks like during this time?
Assuming smbd is getting killed OOM killer (should be visible in /var/log/messages), it would be a good idea to investigate what is using memory and triggering OOM condition. You can review probably by htop / top and sorting by RES. Some variants of this can be misleading due to presenting separate entries for threads in multithreaded apps (making middlewared appear to take up staggeringly large amounts of memory, when in reality the memory is shared between the threads).