I recently installed Ollama as an IX community container, with open web ui support. It all installed OK and works, however, it’s very slow. In this setup, is there a way to pass thru the Nvidia Tesla P4 GPU card that’s installed on the same system. Or is this a defect/feature request that must be made to IX community container support?
Thanks!
scyto
November 14, 2024, 1:16am
2
Are you talking about the docker version on 24.10? Or an earlier version.
The docker one appears to work based on the logs i am seeing:
2024-11-14 00:31:55.889522+00:00Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
2024-11-14 00:31:55.890514+00:00Your new public key is:
2024-11-14 00:31:55.890530+00:002024-11-14T00:31:55.890530964Z
2024-11-14 00:31:55.890535+00:00ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILHf3D4EQ9JWBGkIDJRkUJVoJr98lezRmGMmelm1T81P
2024-11-14 00:31:55.890540+00:002024-11-14T00:31:55.890540102Z
2024-11-14 00:31:55.891214+00:002024/11/14 00:31:55 routes.go:1189: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:31028 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
2024-11-14 00:31:55.891493+00:00time=2024-11-14T00:31:55.891Z level=INFO source=images.go:755 msg="total blobs: 0"
2024-11-14 00:31:55.891532+00:00time=2024-11-14T00:31:55.891Z level=INFO source=images.go:762 msg="total unused blobs removed: 0"
2024-11-14 00:31:55.891951+00:00time=2024-11-14T00:31:55.891Z level=INFO source=routes.go:1240 msg="Listening on [::]:31028 (version 0.4.1)"
2024-11-14 00:31:55.893141+00:00time=2024-11-14T00:31:55.892Z level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu_avx2 cuda_v11 cuda_v12 cpu cpu_avx]"
2024-11-14 00:31:55.893175+00:00time=2024-11-14T00:31:55.893Z level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
2024-11-14 00:31:56.606890+00:00time=2024-11-14T00:31:56.606Z level=INFO source=types.go:123 msg="inference compute" id=GPU-5df9050c-7da9-0072-d743-6bc3ca9ddbde library=cuda variant=v12 compute=7.5 driver=12.4 name="NVIDIA GeForce RTX 2080 Ti" total="10.7 GiB" available="10.6 GiB"