Hello All,
I recently built my new TrueNAS system and noticed that ZFS Cache grows around certain time, around 2am, then eventually crashes the system.
I did memtest+ and it passed. No hard drives reported any error. But when I am awake the next morning I notice that my TrueNAS is powered on but unresponsive.
Does anybody experiencing similar issue?
I am using
Ryzen 7 9700X
ASRock B650 Steel Legend ver 3.25
NEMIX RAM 32GB (2X16GB) DDR5 5600MHZ PC5-44800 1Rx8 1.1V CL46 288-PIN ECC Unbuffered
More details on your problem are necessary. What version of Scale? Are you running apps or VMs? What processes are going on? Do you have backups or something else running?
I am running latest version 25.04.1. i am running 4 apps only atm. Dockge, qbittorrent, plex and wg-easy. no backups are running only scheduled scrubs and enabled trim on both storage.
Reason why I am suspecting zfs cache is that whenever system decide to grow zfs cache, I see zfs cache fluctuate then system freezes.
I am trying to upload screenshot but it is keep giving me error on this post that I cannot upload embeded media. so annoying
Thisâll put a cap on arc size (until system reboot; also starting/stopping a VM seems to reset back to defaults).
If I had to guess, one of your apps is taking more memory suddently, and arc on linux isnât as graceful as the truenascore days - the slight delay for it to free-up causes oom.
I think I saw a very similar post to yours in the past & it was something to do with qbittorent⌠canât remember the details though.
Edit: a quick search seems to imply it had something to do with older versions of TrueNAS scale - what version are you running? More details, more betterer.
Thank you very much for your respond.
I do not have any VM and I only run those four apps I posted before.
Now that you mentioned, yes it does freeze often when I download something with qbittorrent. I thought it was NIC, so I got intel I226 card and capped download speed but ultimately it did not fix the issue.
I am going to stop qbittorrent while I am not use then see how stable my system will be.
As I stated I am using version 25.04.1.
Please let me know any information you would like to see.
I ran Scale virtualized for a while, and made the mistake of using ZFS on Proxmox above it. I cranked on some kernel tuning and was trying to push speed limits on a 10g network, and maximize caching in the ARC. Every few days it would start thrashing and Proxmox would be digging into the swap space, hard. It was resource starved. That starvation went downstairs to Truenas and would make it unusable over time. Finally checked ram usage (64gb in the server, 32 for prox and 32 for Scale) and saw it was all used up. Undid all my changes, choked zfs to the 50% of ram limit it started with, and everything behaved again.
Youâre not running this under proxmox by any chance?
On my old non-truenas nas I had a similar issue with transmission (running as a docker container). It would eat almost all the RAM eventually, and other docker apps crashed with OOM.
What is more interesting is that portainerâs memory usage reported all this eaten memory as a transmission containerâs cache. So, docker is/was stupid to the point it would crash the 1.5GB âfair and squareâ app instead of purging 8+GB of transmission cache. I have solved it by setting a memory limit for transmission. [1]
AIUI, truenasâs apps use docker under the hood. So, itâs possible that you have the same issue.
IMO, this was a half-assed solution because of no disk cache inside a transmission. Perhaps it was mitigated at the OS level. âŠď¸
Ah sorry, missed that. I think the version issue I was thinking of was from like Spring last year; I donât think it is the version youâre using.
I think either playing with the ARC limit command I gave or the settings in torrenting app should help. I sadly donât have specific steps on the app.
The ARC cache growing is normal and not necessarily related to your crashes, as others have said, the system views free RAM as wasted ram and will use as much as it thinks it can get away with. It could be an OOM event, but as far as I can see you havenât determined that conclusively.
I would expect there to be a log trail if oomkiller is getting activated, have you see any mention there? There could also be mentions of crash related events even if they arenât necessarily oomkiller-related, so the logs are highly relevant.
With extra industrial grade butter. For me it was an experiment and I learned from it on a secondary box nobody relies on. But watching connectivity to something youâre working on, gradually decay to where you canât even ping the hypervisor is disturbing.
Thank you for all the responses so far. Sorry for late respond as I was traveling.
Long story short, stopping qbittorrent did not help.
Truenas, ZFS again grew rapidly around 2am and eventually crashed the system.
Letâs take a closer look at the system itself then. Iâll pretend like Iâm new to the thread. Is this running under a hypervisor or is it bare metal? Have you changed any zfs ARC settings/tunables? Does this machine ever start eating swap space or never touches it? Iâd guess that leading up to an OOM state it would start swapping to stay alive. There are limits you can place on ARC if thatâs the problem.
âThis Truenas setup is nearly default as it gets.â
Posting a screenshot with a handful of apps.
Me with zero apps on truenas: âSeems legit!â
Back on track. Quick googling showed that plex runs (or at least used to) scheduled tasks at 2AM. You can ensure whether itâs true or not in your case. And then disable all the tasks, or just stop the plex itself.
I am sorry, i forgot to mention it. This is bare metal. This system only runs truenas and I do not have and VM running on this machine either. I have not yet changed zfs arc settings as one person suggested.
I do not know what swap space is but this is near vanilla setting as possible because I am new to truenasâŚ
Swap space is a pretty basic concept that has been around for decades; basically a hard drive file somewhere that ram can lean on when memory is running out. The system can flush some things to the cache/swap and stay alive until memory gets freed up. Windows does this. Your Truenas install does this in a sneaky way without really asking for it.
<<<just saw your new post as I was typing
Yes thatâs the 2am task. Reschedule it so you can catch it, disable it, or stop the plex app when youâre not using it. You can also try out Jellyfin or Emby, the kids all rave about it and I know Jelly can use hardware transcoding with a discrete GPU, like Plex with a Plexpass does.
<<edit Also funny how your zfs load chart a few posts up, is EXACTLY from 2am to 5am. Youâve found the culprit.