Bug reporting + actual bug

I have today encountered a severe issue in TrueNAS Scale Fangtooth 25.4.2.6. But the Jira website is dysfunctional, I tried with different PCs. Therefore, I will write this here, hoping that it eventually gets to the devs.

OS: TrueNAS Scale 25.04.2.6

Machine: Dell Power Edge with AMD EPYC 9124 16-Core Processor and 128 GB of RAM

Issue: The zfs arc memory usage uses the entire RAM. It appears that zfs_arc_max is either not set or set incorrectly by the system. Note that this is on a machine with large RAM. The problem manifests as “out of memory” kernel messages, after running smoothly for several month. The kernel starts killing processes, eventually the system stalls. A reboot solves the issue.

If you experience something similar, go to system → shell and run `htop`. There you can check the ZFS MAX ARC. In my case I set it to 96 GB by doing `echo 102777520032 > /sys/module/zfs/parameters/zfs_arc_max` as post-init “script”. Before, there was 128 GB reported by htop, which was the entire RAM.

Enjoy,

Rainer

Unless you set a limit yourself that behavior is normal and not a bug…

1 Like

Additionally it is desired behavior, Truenas will evict arc cache from memory if it is needed elsewhere, the idea is to use every bit of your cool fast memory. I have 96GB on my server and I consistently use 64+ GB for arc cache, I have never had an issue using high memory utilizing software without swapping. This system works well, so unless you REALLY need less that total memory used for a very specific reason and you KNOW the auto evict process does not work for you, I would leave it.

What is the problem you are having with the JIRA ticket submission and how are you trying to create it? Are you using Report a Bug in the TrueNAS GUI or are you using Report a Bug on the upper, top right of the forum?

Not much can be done or investigaged without have a debug dump from the system. Complete details on your system hardware, pool layouts and how the server is used may help with replies. How do you reproduce the issue. Reading the introductory email and doing the TrueNAS Bot tutorial allow you to post screenshots of HTOP or explain your situation a bit better. At this point, Scale 25.04.2.6 is considered real stable so it may be a different problem, and not a bug.

Just a heads-up, I noticed randomly it’d set it back to max after starting/stopping VMs

I have tried Jira , but although i am logged in and have reloaded the website, when i click “create” (after previously checking that a similar issue is not already opened) i get “You are not authorized to perform this operation. Please log in. Close this dialog and press refresh in your browser”.

About my issue: We have another TrueNAS system that has 62.7 GiB of RAM. The ZFS ARC limit is set to 61.7 GiB, and 27,6 GiB of that are in use, both checked via htop. I find 1 GB for the OS too tight. Anyway, it works.

The machine mentioned in my first post now uses 5 GiB (from 1 GiB in 24h) of ZFS cache.

Specs:

Sys: Dell PowerEdge R7615

CPU: AMD Epyc 9124

RAM: 8x16 GB RDIMM 6,6GT/s, single rank

Disks: 12x 24 TB, joined into 1 RAIDZ2 vdev, 200 TB usable capacity, 6 TB in use via 1 Dataset that has 12 child Datasets that are used as SMB and NFS shares.

I have checked the kernel log via

journalctl -b -1 -p err..emerg

and this is an excerpt of the output:

Apr 22 01:05:40 truenas2 kernel: Out of memory: Killed process 1803914 (cli) total-vm:130736kB, anon-rss:37344kB, file-rss:7144kB, shmem-rss:0kB, UID:0 pgtables:160kB oom_score_adj:0
Apr 22 01:05:48 truenas2 kernel: Out of memory: Killed process 1803928 (cli) total-vm:127700kB, anon-rss:32992kB, file-rss:4656kB, shmem-rss:0kB, UID:0 pgtables:144kB oom_score_adj:0
Apr 22 01:05:53 truenas2 kernel: Out of memory: Killed process 1803958 (cli) total-vm:44416kB, anon-rss:24320kB, file-rss:7332kB, shmem-rss:0kB, UID:0 pgtables:120kB oom_score_adj:0
Apr 22 01:06:00 truenas2 kernel: Out of memory: Killed process 1803968 (cli) total-vm:127700kB, anon-rss:32916kB, file-rss:5520kB, shmem-rss:0kB, UID:0 pgtables:152kB oom_score_adj:0
Apr 22 01:06:00 truenas2 kernel: Out of memory: Killed process 1178847 (smbd[192.168.18) total-vm:128052kB, anon-rss:3788kB, file-rss:3552kB, shmem-rss:15732kB, UID:0 pgtables:220kB oom_score_adj:0
Apr 22 01:06:06 truenas2 kernel: Out of memory: Killed process 1803989 (cli) total-vm:48596kB, anon-rss:27136kB, file-rss:7444kB, shmem-rss:0kB, UID:0 pgtables:132kB oom_score_adj:0
Apr 22 01:06:11 truenas2 kernel: Out of memory: Killed process 1803999 (cli) total-vm:127700kB, anon-rss:32916kB, file-rss:6956kB, shmem-rss:0kB, UID:0 pgtables:156kB oom_score_adj:0
Apr 22 01:06:17 truenas2 kernel: Out of memory: Killed process 1804008 (cli) total-vm:48596kB, anon-rss:27392kB, file-rss:6796kB, shmem-rss:0kB, UID:0 pgtables:128kB oom_score_adj:0
Apr 22 01:06:22 truenas2 kernel: Out of memory: Killed process 1804017 (cli) total-vm:33512kB, anon-rss:17408kB, file-rss:5160kB, shmem-rss:0kB, UID:0 pgtables:100kB oom_score_adj:0
Apr 22 01:06:28 truenas2 kernel: Out of memory: Killed process 1804022 (cli) total-vm:34536kB, anon-rss:18176kB, file-rss:5396kB, shmem-rss:0kB, UID:0 pgtables:104kB oom_score_adj:0
Apr 22 01:06:33 truenas2 kernel: Out of memory: Killed process 1804035 (cli) total-vm:127700kB, anon-rss:32916kB, file-rss:6472kB, shmem-rss:0kB, UID:0 pgtables:144kB oom_score_adj:0
Apr 22 01:06:39 truenas2 kernel: Out of memory: Killed process 1804049 (cli) total-vm:43696kB, anon-rss:23296kB, file-rss:6288kB, shmem-rss:0kB, UID:0 pgtables:124kB oom_score_adj:0
Apr 22 01:06:44 truenas2 kernel: Out of memory: Killed process 1804063 (cli) total-vm:53952kB, anon-rss:32660kB, file-rss:6292kB, shmem-rss:0kB, UID:0 pgtables:140kB oom_score_adj:0
Apr 22 01:06:50 truenas2 kernel: Out of memory: Killed process 1804073 (cli) total-vm:45876kB, anon-rss:25088kB, file-rss:7208kB, shmem-rss:0kB, UID:0 pgtables:120kB oom_score_adj:0
Apr 22 01:06:55 truenas2 kernel: Out of memory: Killed process 1804086 (cli) total-vm:42160kB, anon-rss:22272kB, file-rss:7552kB, shmem-rss:0kB, UID:0 pgtables:116kB oom_score_adj:0
Apr 22 01:07:01 truenas2 kernel: Out of memory: Killed process 1804095 (cli) total-vm:127684kB, anon-rss:32916kB, file-rss:6896kB, shmem-rss:0kB, UID:0 pgtables:140kB oom_score_adj:0

Clearly, the kernel kills processes because it runs out of memory. I have lowered the arc zfs limit, assuming that this is the issue. Of course, something else can have caused the memory exhaustion. If you tell me how to dig into this I can try.

Try to file the bug report directly from the truenas webui (small smiley icon in the top right corner) .
Logging into jira directly fails because wie don’t have the permissions (i guess because only iX employees have them)

Can you try hard refreshing your browser to clear the cache after logging in (CTRL+F5)?