Memory issue while transferring data between two pools: Purging GPU Memory

I have TrueNAS Scale Dragonfish with these specs:

Motherboard + CPU: AsRock J3455
Memory: 8GB

I know 8GB is the bare minimum, but I have no VMs, just two pools. Whenever I start a transfer between the two pools that is more than a single file (e.g. a folder with a lot of pictures, a big file etc…) the transfer hangs. I connected a monitor and it says “Purging GPU Memory”. Then the WebUI becomes not accessible.

Transferring files or folders to/from the pool to/from my system works just fine. The problem happens when I send files between two pools using SMB.

Since the problem is reproducible, I tried to see the amount of used RAM before this happens. The free memory is always little since the ZFS cache always takes the most space.

Also, I tried to perform a journalctl -b -p err and I got this result:

truenas kernel: tpm_crb MSFT0101:00 [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x200] vs fed40080 f80
truenas kernel: tpm_crb MSFT0101:00 [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x200] vs fed40080 f80
truenas kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0Bank 4: a600000000020408
truenas kernel: mce: [Hardware Error]: TSC 0 ADDR fef13bc0
truenas kernel: mce: [Hardware Error]: PROCESSOR0:506ca TIME 1716272856 SOCKET 0 APIC 0 microcode 28
truenas kernel: Error: Driver 'pcspkr' is already registered, aborting...
truenas kernel: EDAC pnd2: Failed to register device with error -22.

The hardware errors don’t seem to be really an hardware error tho, since I found this post on the Unraid forum that shows the exact same CPU error with the exact same CPU. I am not very linux-savy unfortunately and I don’t know whether these issues are linked.

What can I do? Thank you.

Edit: memtest86+ and stress do not show any issue with memory or CPU.

Possible this is in some way related to the issues seen with lru_gen, if you haven’t already, try running the following echo n >/sys/kernel/mm/lru_gen/enabled.

Alternatively you may have luck playing around with /sys/modules/zfs/parameters/zfs_arc_sys_free as you have a low amount of memory and may need to enforce that ARC keeps some free

1 Like

Thank you very much for your kind answer. May I ask you what lru_gen does? Thank you in advance.

Edit: I’d also like to know if this

Alternatively you may have luck playing around with /sys/modules/zfs/parameters/zfs_arc_sys_free as you have a low amount of memory and may need to enforce that ARC keeps some free

is considered a bug in the ARC algorithm, and hence will be hopefully fixed, or if it was meant this way. You know, because without digging in forums users wouldn’t have known.

Mutli-Gen LRU is an implementation of LRU (least recently used) that ‘should’ improve performance under memory pressure that was enabled with the upgrade to Dragonfish/Kernel 6.6. The only problem is that it does not appear to play nicely with ZFS/ARC, leading to excessive and unnecessary swappage.

I can imagine this would be even more prevalent on a machine with less memory as it becomes much easier to fill that memory.

This will be fixed in the .1 release coming in ~6 days.

No, I’d have a play about with this if disabling MGLRU does not help. I can’t say for sure if it will help as I have not personally run TrueNAS based on the 8GiB minimum.
See here for more recommendations on memory sizing: SCALE Hardware Guide | TrueNAS Documentation Hub

1 Like

It’s a cache system that conflicts with the ZFS ARC (Adaptive Replacement Cache)

Causing chronic paging. And this perhaps failures.

1 Like