I use Truenas Scale and started playing around with Ollama and different models. I have 32 GB of RAM.
As I was pulling more models, more and more RAM was cached, and now I am at the point where I cannot run the model anymore, that I was running before. I get from Ollama Error: model requires more system memory (xx GiB) than is available (xx GiB)
According to the netdata board, RAM currently is 20.4 GB cached, 6.6 GB free, 3.9 GB used, 0.0 buffers, and is just sitting at those values.
So how to free up the cache so I can start a model again? The most models would require around 11 Gb.
2 Likes
I have the same issue. I allocated 64G of RAM for the docker container but was unable to run the model. The expected behavior would be to automatically shrink the zfs arc size and allow the model to load right? Setting an arbitrary upper limit on arc size would be a waste of RAM right?
Also apparently this is an OOM checking issue. If there is some way to force the model loading without OOM checking the problem will be solved. Check issue 5700 on ollama GitHub.