Truenas Scale Dragonfish 24.04.RC.1 and Beta Services memory slowly climbs until memory gone

I have 2 servers running Scale and upgraded them both to 24.04 when the beta came out and noticed that slowly over time the Services memory usage would climb from around GiB up to all of the memory available on the machine. I then upgraded both to the RC1 version to see if that helped, but the issue persists. Both of them have 32 GiB and this last time before rebooting Free was at 0.7 GiB the ZFS Cache had dropped to 1.0 GiB and Services had climbed to 29.6 GiB. There are no VMs running on either machine and the only services running are NFS, SMART, SMB and SSH. I tried turning each one off to see if memory was reclaimed, but it didn’t seem to help so in the end I just end up rebooting. They can go about 1 to 2 weeks before needing to reboot with the ZFS cache down to around 1 GiB. I looked for any other recent posts about the issue and haven’t seen any. I didn’t see this issue on any of the previous releases it wasn’t until moving to Dragonfish.

Thanks,

Jeff

If the system is basically vanilla with no apps / VMs running I’d try a clean install first and restoring the configuration, but if that also fails I’d suggest a bug report.

This would be normal if one of your services had a memory leak.

Maybe top or something similar can show what’s using all the ram?

That does sound like a memory leak. To troubleshoot it here, the output of top or htop sorted by memory usage would be helpful. Otherwise a bug ticket and debug file.

Thanks!

Yes thanks for the replies. Both systems are vanilla with no apps and no VMs. htop shows the top processes as all middlewared:

This htop though is the current system state where memory hasn’t climbed back up all the way. It is currently using 8.2 GiB on system one which was rebooted 4 days ago. Then on system 2 which has been up for 26 days it is using 23.2 GiB out of 32. htop on the second machine looks like this currently:

Thanks,
Jeff

If its middleware then that sounds like a legit bug. Can you file a bug ticket (Link up top) and attach a debug file from one of the systems where it is showing signs of that memory leak?

1 Like

Thanks for your help. I have created a bug ticket and uploaded the debug files.

Thanks,
Jeff

Those screenshots don’t show a memory leak. The tool you’re running seems to show threads as separate processes.

See also TrueNas Scale Memory Usage slowly increases to 100% after adding GPU and using it with App

I’m seeing something similar here.
When I reboot my system, a small amount of memory is allocated to ZFS Cache. This would top out to about 50% of my memory on the previous version, but now I’m seeing ZFS Cache consume just about all of it to the point where I’m no longer able to start any VMs.

Reboot:
TNS Reboot

Idle:
TNS Idle

Memory Usage


TNS htop

not sure if related, but i’m using the latest dragonfish release, fresh install. only thing i did was setup jailmaker, docker jail, then deployed 20 active docker containers.

noticed this error message on ram


other settings i did were, enable daily snapshots retain 1 week, the scheduled smart tests 1 weel/1month. raid scrubs 1 month.

i didn’t enable any optional vdevs, it’s just a basic setup 4x4tb raidz1 for media pool, and 2x m.2 sata ssd mirors for the vm pool. 1 single boot drive via usb enclosure (m.2 nvme ssd). Swap memory disabled during installation setup.

We have a known issue for this in 24.04.0

https://ixsystems.atlassian.net/browse/NAS-128544

3 Likes

Hello, can you send me the Jira link? I also encountered this problem.

My original jira was actually closed last week requesting I upgrade to the release version of 24.04.0 and test it. If the issue was still happening they asked that I create a new ticket. I upgraded last week and have let the systems run for about a week now and both of them are still seeing the same issue so I have created a new ticket and uploaded the debug files today. Here is the new jira:

https://ixsystems.atlassian.net/browse/NAS-128871

1 Like

Actually, I encountered this problem a long time ago. In version 22 of my TrueNAS, the memory would grow infinitely until it crashed But I haven’t found a solution yet I finally saw a user who encountered the same problem on the forum today