RAM Size Guidance for Dragonfish

bitpushr · May 6, 2024, 4:37pm

Additional datapoints after the system being left entirely unattended since my last post ~22 hours ago: Imgur: The magic of the Internet

In the first picture, you can see that a couple of hours after my last post here, swap usage again occurred. This is despite free RAM remaining at the same amount at the time swap was engaged, and in the middle of the graph in the first picture you can see that at some point a bit of swap usage went down, available RAM went back up, and then roughly 4 hours later, free RAM went down and swap usage was this time higher than previously.

In the second picture, you can see me simply trying to load the Disks page of the web UI just now. After only a bit over 2 days of uptime, the web UI is now so slow that this pages takes almost 15 seconds to load, instead of the usual 2-3 seconds it would normally take. This behaviour is repeated on other pages such as the Reporting/graphs one.

The third photo shows Disk I/O activity on the Samsung SATA SSD housing my SCALE install, for the last 24 hours. Here we can see that right before Swap usage started happening in the first photo, a ton of writes happened to the SSD for roughly an hour straight.

The fourth photo shows CPU metrics for the last 24 hours. Note the sporadic/raised CPU usage from 06:00 to 07:00, despite the system being unused at that time, a spike at 11:00 with others over the course of an hour, a minor increase at 13:00 to 14:00, and the the system was completely idle until 19:00 when there was a ramp in CPU usage that has persisted and is still continuing.

Throughout this time the system was unused entirely, by anyone, and I was actually asleep at 19:00 when the CPU usage started up again.

In the fifth/last photo, I ran ‘top’ on the SCALE GUI Shell and, sure enough, the top CPU culprit is middlewared which has 4 processes doing… whatever middlewared does on 4 processes. This is constant, continual, presumably has been the case since 19:00 in the above photo, and I’ll now need to go and reboot SCALE once again in order to reclaim the quarter of my CPU that middlewared is using.

This may all be moot or otherwise irrelevant once the next SCALE update comes out with those aforementioned fixes in it, we’ll have to see and hope. I probably won’t post in here again since it’d just be me reposting the same repeating behaviour, but hopefully it’ll encourage others to share data from their systems as well.

winnielinnie · May 6, 2024, 5:03pm

Curious, what value does this command yield?

cat /sys/module/zfs/parameters/zfs_arc_max

LarsR · May 6, 2024, 5:05pm

I removed my postinit value from cobia before updating to Dragonfish and it 0 for me on dragonfish with default settings.

winnielinnie · May 6, 2024, 5:07pm

What does this reveal?

arc_summary | grep "Max size"

And how much physical RAM is available to the OS?

LarsR · May 6, 2024, 5:09pm

Total available Memory is 62.7 GiB

winnielinnie · May 6, 2024, 5:09pm

Interesting. So this “tweak” isn’t simply changing the parameter’s value upon bootup. They must have modified the ZFS code itself for SCALE?

Because “0” is the “operating system default”, which for upstream OpenZFS for Linux is 50% of RAM. However, even though you’re using “0” for the default… it’s set to exactly 1 GiB less than physical RAM. (AKA: The “FreeBSD way”.)

winnielinnie · May 6, 2024, 5:15pm

@bitpushr, do you find any relief to these issues if you apply this “fix”, and then reboot?

Confirm the change is in effect (after you reboot) with this command:

arc_summary | grep "Max size"

bitpushr · May 6, 2024, 5:23pm

Simply outputs “0”

Davvo · May 6, 2024, 5:25pm

Take a look.

winnielinnie · May 6, 2024, 5:25pm

What about this?

winnielinnie:

@bitpushr, do you find any relief to these issues if you apply this “fix” , and then reboot?

Confirm the change is in effect (after you reboot) with this command:
arc_summary | grep "Max size"

I know it will require a reboot, so whenever it’s convenient for you.

bitpushr · May 6, 2024, 5:27pm

Have set it, there’s 64GB in my system as well so just copied the command from the post you linked and set it to Post Init, but can’t reboot the system currently and will then have to observe system behaviour for at least 24hrs after changing it to see if there’s any differences.

winnielinnie · May 6, 2024, 5:47pm

Don’t do that! The user has 128 GiB of RAM, not 64!

You need to calculate what 50% of your RAM is to use for that value.

bitpushr · May 6, 2024, 5:51pm

Ah, true, nice catch. Done. Will chime back in probably in a couple of days’ time once I’ve found a window to reboot the system and give it a day or two to observe.

Captain_Morgan · May 6, 2024, 6:21pm

winnielinnie:

@cmplieger posted this . Slowdowns when running qBittorrent after some time, until system eventually freezes. No issue when reverted back to Cobia. (Update: While in Dragonfish, issues are resolved when the ARC is limited to 50% of RAM.)

@Noks posted this . Slowdowns, swapping, freezing, and sluggish web UI with Dragonfish. They resolved their issue by setting a parameter to limit the ARC to 50%.

@anto294 posted this . Same issue, same solution: Limiting the ARC maximum size to 50% of RAM resolved the problem.

@SnowReborn posted this . Slowdowns and freeze-ups that started with Dragonfish. Temporarily resolved with reboots, until it occurs again. (User has not tried changing the parameter to limit ARC to 50% of RAM followed by a reboot to see if it resolves their issues.)

Thanks for this.
This will help to see if there are any specific patterns for when the issue occurs.

bitpushr · May 7, 2024, 12:37am

Just chiming in to update: have just rebooted the SCALE system after applying the previously recommended changes.

Running "arc_summary | grep “Max size” gives a result of ‘Max size (high water): 16:1 32.0 GiB’

I’ll now leave the system completely unattended, as I normally would, for at least 24hrs before looking back at it and chiming back in.

CheeryFlame · May 7, 2024, 12:45am

62 Truecharts apps and 256 GB RAM.

Update to DF yesterday and I’m having terrible performance issues, pretty much the same as it was on Cobia. I cleaned all my snapshots and my system is stable apart of the apps. Editing and Saving an app is taking a lot of time. Often the whole ui crash and is stuck for 5 minutes on the login loading page.

I have migrated another server with fewer apps and 32 GB RAM and this server is snappy responsive.

Please IX, hear me out: Fix the rubbish gui. It doesn’t make any sense with 256 GB and 72 cores to have something that feels such broken.

bitpushr · May 7, 2024, 12:46am

Just seeing this reply to my post now. FYI it now shows as “34360000000”, can reply once the post-init workaround is removed in the future if needed.

andersh · May 7, 2024, 7:14am

After upgrading to Dragonfish my system would lock up completely and stop responding to network and console when running Cloud Sync jobs (even a “dry run” would cause a crash). I am syncing with Backblaze and have the “–fast-list” option checked (don’t know if that makes a difference though).
Limiting ARC to 50% solved this, and the system now seems to be running stably again.
This is a ten year old server that have been absolutely stable through all upgrades of Free-/True-nas. It has 16GB RAM and is running a number of docker containers (mariadb, grafana, nextcloud influx, plex etc.).

essinghigh · May 7, 2024, 9:04am

Dropping my own thread in as it looks to be related to swap as well. The issue with the boot drive can be ignored as it’s unrelated and I think that’s my own fault for rebooting the system so readily (though it certainly was an odd issue).

As I mentioned in the most recent post, swap usage is way up compared to Cobia, so I’m playing around with ARC limits to ensure I don’t hit this. I’ve also temporarily outright disabled swap to avoid running into issues while I work

winnielinnie · May 8, 2024, 12:28am

@kris @Captain_Morgan

Is zram a viable alternative to outright disabling swap?

I’ve had success with it on Arch Linux (non-server, non-NAS), but I’m wondering if it would serve SCALE users well?

No need for a swap partition / swap-on-disk
Anything that needs to be “swapped” will remain in RAM in a compressed format (ZSTD)

So under ideal conditions, it never gets used. However, to prevent OOM situations, there’s a non-disk safety net that should theoretically work at the speed of your RAM.

I’m not sure if there are caveats of zram in the presence of ZFS, however.