brywhi
May 10, 2024, 4:10pm
101
Done!
root@storage01[~]# swapon -s
Filename Type Size Used Priority
/dev/dm-0 partition 16777212 1211860 -2
root@storage01[~]# swapoff -a
root@storage01[~]# swapon /dev/dm-0
root@storage01[~]# free -mh
total used free shared buff/cache available
Mem: 228Gi 213Gi 15Gi 99Mi 332Mi 15Gi
Swap: 15Gi 0B 15Gi
Stux
May 10, 2024, 10:06pm
102
Beginning to get déjà vu
Which just shows that Core was not always perfect either
2 Likes
brywhi
May 12, 2024, 1:31pm
103
Update- still no swap usage. Woot woot!
root@storage01[~]# free -mh
total used free shared buff/cache available
Mem: 228Gi 216Gi 12Gi 99Mi 372Mi 12Gi
Swap: 15Gi 0B 15Gi
root@storage01[~]# arc_summary | head -13 | tail -3
Target size (adaptive): 89.4 % 203.1 GiB
Min size (hard limit): 3.1 % 7.1 GiB
Max size (high water): 31:1 227.2 GiB
2 Likes
No.
NO.
NOOOOOOOOOOOO!!!
This can’t be happening. I will not stand for this! I cannot allow SCALE to win! CORE WILL NEVER DIE!
@brywhi : Please update your post and fabricate how everything is breaking, and aggressively swapping, and the ARC caused mold to grow in your house.
3 Likes
brywhi
May 13, 2024, 1:19pm
105
Solid over the weekend, still no swappage.
1 Like
Cross-posting for visibility.
We can confirm that the significant contributor to the issue that has been reported in this thread and others is related to a Linux kernel change in 6.6. While swap adjustments proposed can mitigate the issue to some degree, the multi-gen LRU code in question will be disabled by default in a subsequent Dragonfish patch release, and should resolve the problem. This can be done in advance following the directions below from @mav . Thanks to all of the community members who helped by reporting the issue which aided in reaching the root cause. Additional feedback is welcome from those who apply the change below.
After deeper digging into the problem, it looks to me caused not only by increased ARC size, reduction which according to some people may not fix the problem, but also by Linux kernel update to 6.6, which enabled previously disabled Multi-Gen LRU code, written in a way that assumes that memory can not be significantly used by anything other than page cache (which does not include ZFS). My early tests show that disabling it as it was in Cobia with echo n >/sys/kernel/mm/lru_gen/enabled
may fix the problem without disabling swap.
3 Likes
We’ve got some bad news.
@brywhi just DM’d me in private. With his permission, I’ll share his message:
I-Really-Am-brywhi_This-Is-Not-Winnie:
Good evening, @winnielinnie , my most revered user of the TrueNAS forums, old and new.
As you know, I am very shy, and sometimes it’s hard to speak my mind. I am afraid of retaliation by other members of the community. They might regard my words as betrayal.
My system has not been stable ever since applying the changes.
The swap has ballooned to 5 TiB (yes TiB ), overwriting my irreplaceable family albums.
My CPU is constantly running at 85% - 99%, even when the system is supposedly idle.
I’ve already lost two disks in my pool due to the constant strain, where not even the ARC could save them.
When I opened up my server’s case, I found algae and bacterial film coating the motherboard, and some corrosion has damaged the RAM sticks.
You were right about everything.
If I could do it all over again, I would have only installed TrueNAS Core and lived a happy life. Unfortunately, it is too late for me. Let my words warn others, so that they may avoid the same dark fate.
Long live Core. Long live FreeBSD.
4 Likes
etorix
May 13, 2024, 5:57pm
108
Hm… there’s a fine line between humour and trolling…
brywhi
May 13, 2024, 6:53pm
109
Hahaha this is awesome thanks for the laugh!
1 Like
Perfect, thanks for the info
patrickkeane:
We can confirm that the significant contributor to the issue that has been reported in this thread and others is related to a Linux kernel change in 6.6. While swap adjustments proposed can mitigate the issue to some degree, the multi-gen LRU code in question will be disabled by default in a subsequent Dragonfish patch release, and should resolve the problem.
I hope that someone will feedback this clash between Open ZFS and MGLRU to both the Linux Kernel folks and the Open ZFS folks (and maybe the major Linux distros that support ZFS) so that:
Any other (non-TrueNAS) users of ZFS can be warned about the current issue.
The Linux Kernel and / or ZFS code can be tweaked so that they play well together.
1 Like
Hi @Protopia
We are in conversation with the developer, and have been testing some proposed patches and providing feedback.
3 Likes
This is awesome news @patrickkeane , good work tracking it down. It never made sense to me that arc was doing what harm it was doing without other issues being involved. I am glad the true issue has been found!