Need help with ZFS Cache, it is crashing my TrueNAS

Could - but depending on how you undervolted, it could be memory controller ain’t getting enough juice if you just set a global offset imo

1 Like

yes, I will revert back when i get home this weekend.
Thank you so much for everyone who input on this matter! I appreciate you all.

Google says those are commonly errors with ECC ram. Are you using ECC?

Yeah - he is using ECC:

Common? I don’t know, I think I get 1 ecc error a year… but here he is getting spikes of them around the same time that system reboots.

The way I’m reading these logs - Jun 6th, normal boot up stuff, what parts of memory shouldn’t be touch during hibernation, how much memory, nvme memory, sas memory, gpu on CPU memory

… oh - I think I might have figured it out…

Jun  6 15:40:47 truenas kernel: [drm] amdgpu: 15587M of GTT memory ready

So hear me out - what if these ~16 gigs of memory that the APU has access to are in user while the system has less than 16 gigs available? What could cause these 16 gigs to be in use though, you might ask; Plex doing some maintenance that involves the APU!

Thoughts?

…otherwise, yeah random ram errors that spike during the off-hour intervals that I can then see turn into system booting back up at Jun 8th ~7am.

by the way, I am also using Intel ARC A380 for hw enconding for plex. I didn’t and don’t think gpu is cause of all this issue, so I didn’t mention it.
If I knew ECC ram would give me all these headaches I would have just bought regular non ecc ram hehe

That was misread. Google was basically saying those types of errors are commonly associated with ECC, not that ECC does this all the time. So you are running ECC. I didn’t know the B650’s could use ECC but I never tried on mine. Here, Google is saying ECC support is a bit hit or miss depending on your motherboard and your CPU.

@FailedPanda
On your way home, stop by the shop and grab a pack of 16/32gb pair of regular old DDR5 6000 to swap in. If it fixes it, good. If it doesn’t, take it back. That’s what I would try.

Yes, I am actually thinking about grabbing non ecc memory. I had basically near similar system with non ecc memory before i built this system recently and never had issue.
Many posts, youtubes recommended ecc memory and how there is chance corrupt files which spooked me and force me to get ecc memory.

This experience is entirely mine and mine alone, I can’t promise for anyone else. But as a homelabber for the better part of a decade, with all quality, new, name brand components… I have been just fine with DDR, non-ECC. Haven’t corrupted or lost any data, but a Toshiba tried to do that once.

I cross my fingers that the 128GB that live in my TrueNas box now continue the tradition but they’re a little rainbow-y for a server.

I just found it that Jellyfin was the reason why TrueNAS Scale was randomly locking up. I think it was a scheduled task that scrubs the disk and fills up zfs cache until the CPU starts melting from swapping or something. Apps crash as well due to filled memory.

Yes apparently it is a good torture test.

I think I am going to stop plex server until I had back home and see if it will freeze the system. fingers crossed again.

I’m on ElectricEel 24.10 and Plex never caused ZFS cache “issues” like Jellyfin, but I really want JF to work, I spite Plex even though I have thrown money at them already. I have ECC but no power modifications or anything modified beside XMP iirc, not sure if there’s a good solution for this beside turning JF schedules off. I thought TrueNAS would “balance” RAM usage out and not take over services ram (crash them).

I swear there is some bios setting for how much ram to share with ipgu/apu… I’d also consider setting it to minimum if it is being used for plex.

Curious on what cpu you have; I’m wondering if I’m onto something with memory dedicated/available to ipgu or I’m just looking at nothing.

Reason is that my system is fairly identical in setup/use, but I haven’t hade these issues; only major difference is my cpu doesn’t have an api/ipgu

1 Like

5600x + 32GB ECC
Running it without a GPU

1 Like

BTW here is more detail of what I am using in my system
Ryzen 7 9700X
ASRock B650 Steel Legend ver 3.25
NEMIX RAM 32GB (2X16GB) DDR5 5600MHZ PC5-44800 1Rx8 1.1V CL46 288-PIN ECC Unbuffered
Intel ARC A380
intel I226-v nic
2x 870 evo samsung ssd 500gb for AppPool Mirrored
6x 20tb x24 EXO for StoragePool RAIDZ2
2x nvme (forgot the model) for Boot Mirrored

Alright - I’m way off the mark then. I guess just checking ram/mem controller is likely best next step.

1 Like

You may be on to something indeed.

The amdgpu driver is saying “I can consume up to ~16GB of main memory if VRAM overflows on a GPU” - and because @FailedPanda is also using an Intel GPU, both the intended A380 and the onboard APU in the 9700X might be getting passed to the Plex container with the “pass through available non-NVIDIA GPUs” checkbox.

And since the Plex scan is what’s triggering it, I wonder if we have a scenario where either Plex is trying to use both GPUs for work, or the amdgpu driver is mistakenly allocating system RAM for when the Intel one is working.

@FailedPanda strange ask - but are you able to disable your AMD iGPU entirely in the BIOS/UEFI and have the Intel A380 be your primary/only GPU in the system? It can still be shared between Host/Apps like Plex this way.

2 Likes

Hi hi, another day and another thank you to all who are giving their input on this issue.
So my system survived last night without plex running.


do not mind the deep at 7pm where I manually restarted.

I think I can disable iGPU on bios but as I stated I am away atm until the weekend. I will try this when I get home.

1 Like

I had similar issues with ECC RAM, where sometimes I rebooted the NAS and it just would not POST. Other times it would POST and work fine for few days and then suddenly freeze.
Sometimes it would take minutes, sometimes days for the freeze to happen. Also did all memtests and what not. Until I replaced the ECC RAM with non ECC RAM, then the problems went away.

That is the price we pay for having consumer CPU + Mobo with ECC RAM. :grimacing: