RAM Size Guidance for Dragonfish

I thought it might be nice to have a possible good news story… which is also an interesting data point…

I have a very low end… very much below minimum specs backup server running TrueNAS Scale.

It’s an Intel Core 2 Quad with 4GB of RAM running off a pair of USB disks. Yes. I’m naughty, and don’t deserve any cookies.

Its been working well since upgrading to DragonFish 24.0.4 final.

Previously with Cobia you could see swap was always in use… after boot… but since upgrading to Dragonfish… 0 bytes. Heh.

Maybe its too early too say… hard to tell… since TrueNAS Scale only keeps 1 weeks worth of reporting (at least in Cobia)

Will keep an eye on this system… over time. It receives replications every hour.

The curious thing is that I do not have SMB or NFS services active on this system, only SMART and SSH.

It’s a replication target, that is all.

2 Likes

That’s really weird! I presume it’s a backup target system? Maybe you don’t have enough arc to cause the issue, what does the arc reporting look like?

I’m putting in Prometheus to capture data so I can keep what I want. Even with IX expanding the retention supposedly, I want more useful info like VM resources, Kubernetes app resources, etc.

yes, its a backup target.

ARC is still growing… we shall see :wink:


The 1 hour CPU chart to show it does experience some load :wink:

Impressive it survived the backup, assuming that’s what I see. So, the solution to the problem isn’t more memory, it’s less! :clown_face:

I have a 3GB backup ZFS target, but, it’s not Scale, just Debian.

2 Likes

So, I went into the UI of this backup system, and I started interrogating snapshots etc, sorting them, deleted a bunch of snapshots for the .system dataset that I’d taken accidentally, etc.

This triggered a bit of swap. Looking out at 1day view, it peaks at 543MB then drops to 130MB.

What I think is interesting is what the memory usage looked like when this happened. ARC didn’t really recede much, forcing “used” to get paged out as “free” dropped.

And zoom in on peak and after peak


“Cached” appears to be ARC, according to the dashboard numbers

It looks to me like it prefers to swap out than to lower cache. or the swapping occurs faster than the cache drops.

Don’t get me wrong, as it is, I don’t really care, the system was working fine, but if the issue is that cache is forcing swap to be used as the memory is full and the cache is not making way…

1 Like

at 1hr zoom, going back to the event…

Cached didn’t really budge… did it :wink:

ARC size (current):                                    34.5 %  999.4 MiB
        Target size (adaptive):                        36.9 %    1.0 GiB
        Min size (hard limit):                          4.2 %  122.6 MiB
        Max size (high water):                           23:1    2.8 GiB
        Anonymous data size:                          < 0.1 %  132.5 KiB
        Anonymous metadata size:                        0.1 %  796.5 KiB
        MFU data target:                               37.7 %  346.6 MiB
        MFU data size:                                 30.8 %  283.3 MiB
        MFU ghost data size:                                    59.0 MiB
        MFU metadata target:                           14.1 %  129.7 MiB
        MFU metadata size:                             13.6 %  125.5 MiB
        MFU ghost metadata size:                               146.8 MiB
        MRU data target:                               36.0 %  331.2 MiB
        MRU data size:                                  9.8 %   90.5 MiB
        MRU ghost data size:                                    54.5 MiB
        MRU metadata target:                           12.2 %  112.2 MiB
        MRU metadata size:                             45.6 %  419.5 MiB
        MRU ghost metadata size:                               199.7 MiB
        Uncached data size:                             0.0 %    0 Bytes
        Uncached metadata size:                         0.0 %    0 Bytes
        Bonus size:                                     0.5 %    4.9 MiB
        Dnode cache target:                            10.0 %  289.9 MiB
        Dnode cache size:                              12.1 %   35.0 MiB
        Dbuf size:                                      0.8 %    7.9 MiB
        Header size:                                    1.8 %   17.9 MiB
        L2 header size:                                 0.0 %    0 Bytes
        ABD chunk waste size:                           1.4 %   14.0 MiB

Where does the “Target size (adaptive): 36.9 % 1.0 GiB” come from? 25% of RAM? A hard floor?

It’s a game of juggling, constantly changing based on total RAM, free memory, non-ARC memory, ARC data/metadata requests, etc, to provide the best balance of efficiency (reading from RAM rather than the physical storage of the pool itself) vs. flexibility (enough slack for system services, processes, and other non-ARC memory needs.)

That target size can fluctuate anywhere between the min/max allowable values. ZFS is usually pretty good at actually storing in the ARC the amount of data defined by its target size. (As seen in your example.)

Another way to think of the target size: “If this is the general state of my system, then ZFS will aim and work for an ARC to be this size and stay that way.” Meaning that you’ll have more data/metadata evictions the smaller the target size is, and fewer evictions the larger the target size is. You can think of it as a “separate RAM stick with a defined capacity”. Of course, this imaginary “RAM stick” can also dynamically adjust based on many variables.

I figured it out, thanks to the power of A.I.!


arc-is-flash


Why does TrueNAS SCALE use flash-based storage to hold the ZFS cache?! Are you kidding me? :face_with_symbols_over_mouth: You’re supposed to keep the ARC in RAM. Now everything makes sense.

A.I. saves the day, once again! :partying_face:

3 Likes

FYI.

TrueNAS Engineering team is making progress on this issue and is testing some scenarios.

We expect to be able to recommend best mitigation and plans for a fix, on Friday.

5 Likes

If you come back with “We’ve decided it’s best to recommend new installations to have a minimum of 512 GiB of RAM”, then I’m deleting my account.

6 Likes

Sounds like a perfect solution, along with limiting ARC to 50% so everyone has plenty of memory free! :wink:

1 Like

Give me a good excuse to tell Mrs. Potatohead - I need more ram to run the next version of truenas. Yes - I need to upgrade to the next version for security updates!

2 Likes

I’m sorry, but I just have to drop this in here. :grin:

more-gooder-better-than-you

I didn’t say it! The A.I. chatbot said it!

I had to chuckle…

Seriously, I’m assuming that for Linux it involves far more than simply setting “total RAM minus 1GB” to get it to work?

1 Like

Ha! No expectations that any hardware changes are required…

3 Likes

Are you trying to get ix to drop SCALE - and keep going with CORE?!

No.

Yes. :smiling_imp:

4 Likes

Cross-posting for reference:

1 Like

Cross-posting for visibility.

We can confirm that the significant contributor to the issue that has been reported in this thread and others is related to a Linux kernel change in 6.6. While swap adjustments proposed can mitigate the issue to some degree, the multi-gen LRU code in question will be disabled by default in a subsequent Dragonfish patch release, and should resolve the problem. This can be done in advance following the directions below from @mav. Thanks to all of the community members who helped by reporting the issue which aided in reaching the root cause. Additional feedback is welcome from those who apply the change below.

8 Likes