Going insane, truenas webUI dies, again

Cross-posting for visibility.

We can confirm that the significant contributor to the issue that has been reported in this thread and others is related to a Linux kernel change in 6.6. While swap adjustments proposed can mitigate the issue to some degree, the multi-gen LRU code in question will be disabled by default in a subsequent Dragonfish patch release, and should resolve the problem. This can be done in advance following the directions below from @mav. Thanks to all of the community members who helped by reporting the issue which aided in reaching the root cause. Additional feedback is welcome from those who apply the change below.

1 Like

Thanks @essinghigh , that little script has got my box back under control. Really appreciated!

Have you tried disabling multi-gen LRU as mentioned in @mavā€™s post?
The script worked well for me as a breakfix however Dragonfish seems to now be working fine after disabling this and letting ARC run at itā€™s defaults.

Not yet. I saw your script yesterday and could get that done last night. I didnā€™t see the multi-gen LRU one until this morning and I donā€™t trust myself with Linux shell at 6:30am & not enough coffee :wink:

1 Like

I get permission denied. I also get that if I try and Sudo it.

Ok, weirdly worked when I changed to my scratch drive. Canā€™t explain that one. Main thing is I can still log in whilst transferring all my data onto the NAS for the final go-live.

Hello,
Not sure if this is the same problem but also my Scale is slowing down (but not blocked).


This is the output of

top -o VIRT

command

I can confirm that turning off lru_gen solves the excessive-swapping problem on my test system. (My prod system wonā€™t upgrade until laterā€¦) Thatā€™s with no changes to arc_max or arc_sys_free, or swappiness.

Of course, changing (just) lru_gen also raises the free memory under pressure from 1.2GB to 6.2GB (on a 32G system), severely reducing memory pressure at the cost of ~5GB. I note that setting arc_sys_free to 6GB instead reduces the intensity of persistent swapping but does not eliminate it outright, so it looks like lru_gen itself changes the interaction between the kernelā€™s memory system and zfsā€™s arc.

Thanks to everybody who contributed to this thread; it got me a much better starting point on figuring this out.

Cheers
ā€“ perry

1 Like