Using the following yaml code I am able to get better performance from an LLM running in an Ollama container:
version: ‘3.7’
services:
ollama:
image: ollama/ollama:latest
container_name: ollama-local
runtime: nvidia
devices:
- /dev/nvidia0:/dev/nvidia0
- /dev/nvidia1:/dev/nvidia1 # Add more lines if more GPUs are present
- /dev/nvidiactl:/dev/nvidiactl
- /dev/nvidia-modeset:/dev/nvidia-modeset
- /dev/nvidia-uvm:/dev/nvidia-uvm
- /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools
environment:
- NVIDIA_VISIBLE_DEVICES=none # Using explicit device mapping
volumes:
- /mnt/window_share/Apps/Ollama/data:/data
- /mnt/window_share/Apps/Ollama/config:/config
ports:
- “11434:11434”
restart: unless-stopped
version: ‘3.8’
services:
ollama:
image: ollama/ollama:latest # Replace with the specific Ollama image if needed
container_name: ollama-local
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
volumes:
- /mnt/window_share/Apps/Ollama/data:/data
- /mnt/window_share/Apps/Ollama/config:/config
ports:
- “11434:11434” # Default Ollama API port
restart: unless-stopped
Both cases work. However, when I look at how the GPU resources are utilized with sudo watch -n 0.5 nvidia-smi, one CPU rails utilization to 98%, while the other GPU remains at 0%.
Has anybody els had this experience?
Thanks!
-Rodney