ZFS Cache is using all RAM, Dataset locks, ZFS unhealthy

hmm… quite interesting.
I have DDR5 5200 RAM the BIOS is set to manual frequency with 5200 but “Auto” sets it to 4800.

DRAM Profile is set to Auto. I can chose:

  • Auto
  • DDR5-4800
  • XMP1-5200
  • EXP0-5200

Is my RAM capable of 5200 but only in XMP1/EXP0?
Should I set it to 4800?

Do you have plenty of airflow over your HBA. They want and need high air flows going over the heat sinks.

From 9206-16e document but I expect about the same for all their cards

Minimum airflow:
— 100 linear feet per minute at 35 °C (95 °F) bay inlet temperature
— 150 linear feet per minute at 45 °C (113 °F) bay inlet temperature
— 200 linear feet per minute at 55 °C (131 °F) bay inlet temperature

Set all of your RAM speeds/timings to “Automatic” or “Stock” speeds - this will include any manually adjusted latency/timing values.

Also consider the airflow/temperature question posed by @SmallBarky - try sudo storcli /c0 show all | grep -i temperature to see if it is reporting back, as the 9500 should be new enough to have a temperature sensor and readout.

1 Like

See Connectivity and Max Memory Speed.
It shows DDR5 - 3600 for the 4 dimms section

best I could get to format here.
Max Memory Speed
2x1R
DDR5-5200
2x2R
DDR5-5200
4x1R
DDR5-3600
4x2R
DDR5-3600

2 Likes

:man_facepalming: stupid me.
Let me fix that (must reset my BIOS).
And that can cause the RAM “overflow” in the Cache?

That’s not the issue. The issue is the system locking up. If your memory is not at 100% health (as @HoneyBadger said, just 1 error is too many) then your system can behave in random and mysterious ways, especially if a lot of memory is being used, since it increases the likelihood of hitting a faulty area in the bad RAM.

If the BIOS settings do not resolve this, I would figure out which of the 4 sticks are bad and replace them.

You should be able to pass multiple memtest runs with 0 errors before booting back into TrueNAS. Having bad RAM can put new data at risk.

2 Likes

no errors during test 8 and 9 in memtest after 4 passes…
System is booting as normal…
Restarting a file transfer > 150 GB…
aaand…
its working without a max limit in the settings

So RAM is healthy and its working.
So stupid :zipper_mouth_face:
But thank you guys!

3 Likes

Because you set the RAM to stock settings in the BIOS?

I had to reset the BIOS settings and set all to “Auto”. The BIOS detects the RAM now as 3600 as it should be.
Never checked the settings before because it was “stock”

2 Likes

Another example of why those who know use ECC.

3 Likes

He would still have had issues with 4x sticks of ECC memory with the bios manually set to 5200.

When you have 4 sticks in these systems they need to run at lower frequencies to keep the signal integrity… want full speed memory with ECC, you need to go to big boy EPYC cpu’s with registered ECC… and I don’t mean the 4004/4005 series which are still little boy epyc cpu’s :stuck_out_tongue:

However I’m still advocate for ECC memory for even home server systems you want to run 24/7…

Just a shame its sooo hard to source affordable dd5 udimm ecc here in Australia let alone latest server grade hardware

I have seen bit flips in the past reported in my ipmi logs hence why I’m an advocate :slight_smile:

Any DDR5 system has to downgrade speed when running at 2DPC.
The benefit of ECC in this case, beside a BIOS that is hopefully NOT designed towards overclocking, is that it would spam the logs with warnings and you’d know right away where the problem is.

3 Likes

Never seen ECC ram with eXtreme Memory Profiles either :wink:

1 Like

eXcc RAM can be installed on overclocked gaming server boards.

Thanks to Threadripper using RDIMM you can now find DDR5 RDIMM with XMP/EXPO. :roll_eyes:

2 Likes