RAM Size Guidance for Dragonfish

Captain_Morgan · May 4, 2024, 9:31pm

There have been some reports of Dragonfish systems failing on systems with small amounts of RAM. The guidelines for RAM are here:

Memory Sizing

TrueNAS has higher memory requirements than many Network Attached Storage solutions for good reason: it shares dynamic random-access memory (DRAM or simply RAM) between sharing services, add-on plugins, jails, and virtual machines, and sophisticated read caching. RAM rarely goes unused on a TrueNAS system, and enough RAM is vital to maintaining peak performance. You should have 8 GB of RAM for basic TrueNAS operations with up to eight drives. Other use cases each have distinct RAM requirements:

Add 1 GB for each drive added after eight to benefit most use cases.
Add extra RAM (in general) if more clients connect to the TrueNAS system. A 20 TB pool backing many high-performance VMs over iSCSI might need more RAM than a 200 TB pool storing archival data. If using iSCSI to back up VMs, plan to use at least 16 GB of RAM for good performance and 32 GB or more for optimal performance.
Add 2 GB of RAM for directory services for the Winbind internal cache.
Add more RAM for plugins and jails, as each has specific application RAM requirements.
Add more RAM for virtual machines with a guest operating system and application RAM requirements.
Add the suggested 5 GB per TB of storage for deduplication that depends on an in-RAM deduplication table.
Add approximately 1 GB of RAM (conservative estimate) for every 50 GB of L2ARC in your pool. Attaching an L2ARC drive to a pool uses some RAM, too. ZFS needs metadata in ARC to know what data is in L2ARC.

We do not recommend using a SWAP drive. ZFS expects ARC to be in RAM and will not behave well if its really on a disk drive.

If you are having problems, please describe your systems here and indicate whether you think the systems is within the guidelines.

After we gather this info, we can see if TrueNAS could fail more gracefully with better warning messages.

dan · May 4, 2024, 9:41pm

No version of SCALE supports plugins or jails in the strict sense, though I guess Jailmaker in Dragonfish gets closer–“apps” would seem to better align with the product than “plugins and jails.”

Then why do you create a swap partition on the boot device (if it’s big enough), and on every data drive (or did that go away in SCALE when I wasn’t looking)?

Dragonfish doesn’t work at all for me–it takes 20+ minutes to import all my pools (Cobia on the same hardware takes 52 seconds), causing apps to completely fail–but I think the post here can be clarified a bit.

Davvo · May 4, 2024, 9:53pm

8GB remains the official minimum then?

Captain_Morgan · May 4, 2024, 10:07pm

At this stage we think it works, but let’s gather data on real problems and see whether these guidelines need to be revised.

Captain_Morgan · May 4, 2024, 10:09pm

Agreed that terminology should be fixed and replaced with Apps and Sandboxes.

Captain_Morgan · May 4, 2024, 10:13pm

That’s a serious degradation (probably not related to low RAM issues). Can we file a bug report on that. Is there a NAS ticket?

winnielinnie · May 4, 2024, 10:17pm

The worse that can happen is it’s used as a “safety mattress” if there is not enough RAM for the current needs. (Lest the entire system crash, anyways.) Not once has Core ever used swap for me, not even a kilobyte, even though it always has a 16 GiB swap partition available at all times on the boot disk.

I don’t believe the presence of a swap partition is the problem itself, but rather that the system is resorting to using swap in the first place.

The question is: Why would it? Is this perhaps the reason why OpenZFS for Linux defaults to limiting the maximum ARC size to 50% of physical RAM, in order to prevent issues with non-ARC pressure on Linux?

Captain_Morgan · May 4, 2024, 10:17pm

My comment was about using a dedicated swap device.

The 2GB partition on each pool drive was a compatibility process for CORE.

The 16GB swap partition on the boot device is used only as a safety net AFAIK.

winnielinnie · May 4, 2024, 10:19pm

Isn’t this what the installer offers if it detects a non-USB boot drive that is 64 GiB or larger?

Captain_Morgan · May 4, 2024, 10:23pm

Let’s gather some real problem cases and help troubleshoot.

dan · May 4, 2024, 10:25pm

Already have, and with 128 GB RAM in my system, it seems pretty unlikely it’s due to lack of RAM:

winnielinnie · May 4, 2024, 10:27pm

Not to derail, and I believe this is pertinent:

The swap being used on the boot device is not likely the issue in of itself, and might only be a canary in the coalmine. In fact, if there was no swap device present, then the people facing performance / stability / crashing issues with Dragonfish would suffer from them even more.

So something happened from Cobia → Dragonfish, where systems with 8 GiB of RAM seem to unearth. (It’s likely also a problem for systems with 16 GiB of RAM, though it might take longer to manifest and/or “stressing” the system with more work, services, etc, would reveal the same issues.^[1]

I believe that “something” with Dragonfish is the new parameter that allows the ARC size to reach levels beyond 50% of total available RAM. (Wasn’t it set to “unlimited” or something that mimics FreeBSD’s default?)

EDIT: Turns out that systems with 64 and 128 GiB of RAM also experience this. ↩︎

joeschmuck · May 4, 2024, 10:54pm

@Captain_Morgan Was there something specific that prompted this thread? These are not the first time I’ve seen very similar recommendations for how much RAM a TrueNAS system requires to run certain applications/processes.

This was the first time I’d seen the recommendation of no SWAP however you did clarify what you meant. I think we can all agree SWAP should never be used, if it is, you don’t have enough RAM. However it is valuable to have available should that one time happen where it was needed. Honestly, it would be great to have a TrueNAS notification if the SWAP file gets used (date/time). This would tell us a lot during troubleshooting ane the user would know they ran out of RAM. Guess that is a feature request.

Are there a lot of issues coming to light with Dragonfish? Or is it more of the same old thing where someone grabs a piece of crap computer and tries to make it a high end commercial NAS and then are upset when it constantly crashes.

Stux · May 4, 2024, 10:57pm

My experience with Ubuntu etc is that a little bit of swap is a good thing. It allows some of the older processes in the system to swap out their most likely never to be used again or rarely used allocations.

If it never swaps in again it’s fine. The real problem is when you are actively swapping. Ie paging out and in.

Captain_Morgan · May 5, 2024, 1:04am

Maybe something happened, but I was hoping for real cases to be collected with actual data.

winnielinnie · May 5, 2024, 1:05am

I would “ping” the usernames that have posted such threads / replies. They might not really “notice” this thread.

EDIT: The topic title might attract more attention with something like “RAM / swap / performance issues with Dragonfish? Please share your experience in here”

Captain_Morgan · May 5, 2024, 1:06am

I don’t know… hence we want to find real examples if they are happening. In another thread there was one report, but not well documented.

winnielinnie · May 5, 2024, 12:21pm

Some users of note:

@cmplieger posted this. Slowdowns when running qBittorrent after some time, until system eventually freezes. No issue when reverted back to Cobia. (Update: While in Dragonfish, issues are resolved when the ARC is limited to 50% of RAM.)

@Noks posted this. Slowdowns, swapping, freezing, and sluggish web UI with Dragonfish. They resolved their issue by setting a parameter to limit the ARC to 50%.

@anto294 posted this. Same issue, same solution: Limiting the ARC maximum size to 50% of RAM resolved the problem.

@SnowReborn posted this. Slowdowns and freeze-ups that started with Dragonfish. Temporarily resolved with reboots, until it occurs again. (User has not tried changing the parameter to limit ARC to 50% of RAM followed by a reboot to see if it resolves their issues.)

SnowReborn · May 5, 2024, 12:56pm

official dragonfish 24.04.0 Release freshly installed; truenas scale running under exsi. I have 1TB RAM 32cores 2690v4 reserved. I have experienced UI freeze up completely and siginificant speed and I/O throttle 3 times in 4 days while migrating 80TB worth of data from windows client to ZFS. only corrolation i see is that when UI lock up happens, i will have SWAP usage around 15%~20%, and when i stop file trasnfer swap goes down. The top swap usage is by “asyncio_loop”. Every other resource utilization seems LOW, with average about 5~20% CPU usage, and 15GB RAM in service, cool temp for everything. Iperf3 checks up normal bandwidth. Restart solves the issue for 20+ hours. Not sure what triggers it, other than just lots read and writes.

PhilD13 · May 5, 2024, 4:29pm

I recently went from Bluefin to Dragonfish on my primary server and from Cobia to Dragonfish on a secondary server. The only thing these two servers have in common hardware wise is they are both Supermicro and both essentially use some of the same brands of drives. Use wise they are both backup and file servers. The primary also has Tailscale Neither use VM’s or iScsi and both contain a few SMB datasets and shares. There is little file serving and most of the activity would be various computers and laptops sending backups to the primary server. In other words these servers are way overpowered.

These 2 systems have been up for primary: 4 days, and secondary: 5 days after the updates to Dragonfish.

Drive Space:
Data space on the primary is a Usable Capacity:82.63 TiB and about 24.51 TiB or 29.7% available used.
Data space on the secondary is a Usable Capacity: 87.06 TiB and abou 27.3 TiB or 31.4% available used.

Memory:
The primary has 128Gb ram divided between it’s two processors. The secondary has 64Gb ram divided between two processors. Both of the original Truenas installs used the defaults of the install program active for their install program at the time at the time of install. Primary backs up to secondary each night by running Rsync over ssh created within the Data Protection tab and data pushed to the secondary server.

What I have noticed after updating to Dragonfish is the primary server with it’s 128Gb ram actually hits and uses a small bit (779 MiB) of 15GiB swap space. The secondary servers Swap Utilization is 268 KiB out of 9.99 GiB. These values are as reported from the Reports tab >> Memory tab.

What the primary shows for use is
Usable: 125.8 GiB total available (ECC)
Free: 14.6 GiB
ZFS Cache: 105.5 GiB
Services: 5.7 GiB

Secondary server shows for use is:
62.8GiB total available (ECC)
Free: 17.7 GiB
ZFS Cache: 33.5 GiB
Services: 11.6 GiB

Of course these usages vary some depending on what the servers may be doing at the time. But the overall distribution is pretty consistent.

I do not know if the swap usage was typical of the systems before as It didn’t seem to be a concern with anyone before and I never paid attention. The systems before were limited to 75% memory for Cache and 50% memory for cache. Any scripts to alter cache memory were removed before the upgrades to Dragonfish.

It is my opinion that on such lightly loaded and used systems that no swap should be used. To me the occasional hitting of swap would mean the back off or release memory back to system trigger of the ZFS cache in Dragonfish is a bit slow to respond. Which is already suspected and being looked into.

I have so far not experienced any unusual slowdowns, freeze ups, crashes of the GUI, failure of any tasks, ssh access or use through Tailscale. I don’t see any memory exhaustion.

I can see where slow response of releasing cache memory back to the system could cause an out of memory issue for VM’s, or running processes, and a issue of the GUI and other tasks being forced into paging to swap grinding things to a halt at least until the system could release ZFS cache memory and get back to normal.