RAM Size Guidance for Dragonfish

No, you do configure a maximum size, so, not endlessly,.

1 Like

Its worth noting that none of the TrueNAS Enterprise systems exhibit these issues under extreme load testing. However, there are notable differences:

  1. All systems have adequate RAM for their workloads
  2. Swap is disabled (because there is adequate RAM)
  3. We don’t run unverified Apps on Enterprise appliances

We are reviewing which of these is most important. Betting is allowed, but it would have to be on another web site.

2 Likes

I would only comment that swap is normal and used widely in Debian and derivatives. If swap is actually itself causing the issue, then, there’s still a bug somewhere in Truenas. Seriously doubt there is a bug in swapping code, though there certainly has been and obviously new versions of things are on Scale and could be introduced. But still outside of Scale, that would be tons of people around the world not using Truenas reporting this I would think. More likely there is a memory leak somewhere, maybe even in ZFS. Unless IX modified the Linux code. That’s the normal reason for swap usage and thrashing. Only problem with that is disabling swap thus far seems to resolve it, but I’d love to see some arc summaries posted during the time the problem is occurring. And, memory reporting during the time. I don’t envy you guys Captain!

My understanding is openzfs rewrote at least some of the arc code and how it adapts. I believe I saw that last year. If that version is being used (and it should be), there could obviously be issues there in some rare cases. There’s so many possibilities!

So, of the 3 choices, I believe we saw this happen with SnowReborn or whomever had the 1TB memory? So many posts. I would say rule out #1 (me guessing). #2 does seem to resolve it thus far, not heard anyone who said the problem came back. I believe if that is the answer then there’s still another bug somewhere but it would still qualify as the answer. I don’t think it’s #3, so, I vote #2. Though, I believe that should not happen as noted and the actual problem not the symptom lies elsewhere. Glad it seems to be rare.

1 Like

I think it’s clear, that if someone wants to repro, then swap should be enabled when trying to repro.

And then look for excessive swap partition disk utilization.

1 Like

Evidence seems to be that there are two safe modes:

  1. ARC=50%… Swap can be enabled
  2. ARC = 9x%… Swap should be disabled

The 3rd mode
ARC = 9x%… SWAP = Enabled - seems to be sensitive to application’s use of memory.

This makes some sense… ZFS ARC hogs the RAM & forces apps and middleware into Swap space.

3 Likes

What about this, which we’re still waiting to hear some experiences:

  1. ARC Max = “FreeBSD-like” (Dragonfish default, RAM - 1 GiB) + swap enabled + “swappiness” set to “1”

EDIT: Or maybe not. It’s already not looking like a viable option.

I have swap enabled and at 70% or so (memory fails me), the “safe modes” would be system dependent of course but I get your point. That’s how it’s always been done outside of Truenas. Admin picks number. But you wanted to improve that to let it handle on it’s own. Without swap, I don’t see how you can have 0 OOM errors though when under memory pressure with arc filling ram. You must have a way to avoid that. It’s always been a problem in the old days.

ZFS does hog the ram of course. If ZFS takes it first, and something else needs it (which in the past was rare except for things like VMs), ZFS can’t evict fast enough meaning swap. If your swap space is on HDD, then, not fast at all. I know they made changes such as this and others with the arc:

There are still lots of OOM happening in openzfs. A partial list of interesting ones below (not ones about corruption as that is different), but, the proof of course is your distribution base and if swapoff resolves all of the issues without OOM, then, that’s ok I guess. You have allowed ARC to grow to almost memory size so this is much different than the way people have typically run zfs. Time will tell for sure!

While your enterprise customers who have no issues and all have plenty of memory, my concern would be for the armchair Plex/converted home 10 year old computer guys with very little memory. They can be the ones with plenty of overuse. I guess guidelines can be changed, etc. If swap is simply now incompatible with zfs that allows filling ram with arc, then, your installer shouldn’t be adding it or asking to add it anymore. But other than telling people here, what about the people who don’t use the forums, how will they know to disable swap or maybe you can make it part of an update?

2 Likes

We haven’t concluded on the issue. When we have, we’ll work out short term guidance and a technical solution for longer term safety.

1 Like

Not on FreeBSD though, so… how old?

1 Like

Right, so, we are on the Tag labelled SCALE, lol.

Yeah, but BSD and Linux share common origins… hence my question about how exactly old your “in the old days” referred.

Say what?

1 Like

Linux being inspired by Unix and BSD being originally based on Unix means they somehow have a common origin.

Anyway, my question was a masked “how far back was this happening?”.

That was more to play into me mocking myself earlier. But to answer, I am speaking before Linux systems allowing filling the arc to max memory instead of 50%. I see these pop up in ZFS discussion groups. So, not very old. While they have a common origin, as you know, memory management and zfs not even close to the same.

To clarify, still ambiguous, for those using > 50% manually (but obviously too much).

2 Likes

I thought it might be nice to have a possible good news story… which is also an interesting data point…

I have a very low end… very much below minimum specs backup server running TrueNAS Scale.

It’s an Intel Core 2 Quad with 4GB of RAM running off a pair of USB disks. Yes. I’m naughty, and don’t deserve any cookies.

Its been working well since upgrading to DragonFish 24.0.4 final.

Previously with Cobia you could see swap was always in use… after boot… but since upgrading to Dragonfish… 0 bytes. Heh.

Maybe its too early too say… hard to tell… since TrueNAS Scale only keeps 1 weeks worth of reporting (at least in Cobia)

Will keep an eye on this system… over time. It receives replications every hour.

The curious thing is that I do not have SMB or NFS services active on this system, only SMART and SSH.

It’s a replication target, that is all.

2 Likes

That’s really weird! I presume it’s a backup target system? Maybe you don’t have enough arc to cause the issue, what does the arc reporting look like?

I’m putting in Prometheus to capture data so I can keep what I want. Even with IX expanding the retention supposedly, I want more useful info like VM resources, Kubernetes app resources, etc.

yes, its a backup target.

ARC is still growing… we shall see :wink:


The 1 hour CPU chart to show it does experience some load :wink:

Impressive it survived the backup, assuming that’s what I see. So, the solution to the problem isn’t more memory, it’s less! :clown_face:

I have a 3GB backup ZFS target, but, it’s not Scale, just Debian.

2 Likes

So, I went into the UI of this backup system, and I started interrogating snapshots etc, sorting them, deleted a bunch of snapshots for the .system dataset that I’d taken accidentally, etc.

This triggered a bit of swap. Looking out at 1day view, it peaks at 543MB then drops to 130MB.

What I think is interesting is what the memory usage looked like when this happened. ARC didn’t really recede much, forcing “used” to get paged out as “free” dropped.

And zoom in on peak and after peak


“Cached” appears to be ARC, according to the dashboard numbers

It looks to me like it prefers to swap out than to lower cache. or the swapping occurs faster than the cache drops.

Don’t get me wrong, as it is, I don’t really care, the system was working fine, but if the issue is that cache is forcing swap to be used as the memory is full and the cache is not making way…

1 Like

at 1hr zoom, going back to the event…

Cached didn’t really budge… did it :wink: