Redesign Docker build system to single-stage architecture for reliable model loading

Replaced complex two-stage build (builder → runtime) with single-stage
Dockerfile that builds and runs Ollama in one image. This fixes model
loading issues caused by missing CUDA libraries and LD_LIBRARY_PATH
mismatches in the previous multi-stage design.

Changes:
- Add docker/Dockerfile: Single-stage build with GCC 10, CMake 4, Go 1.25.3, CUDA 11.4
- Clone source from https://github.com/dogkeeper886/ollama37
- Compile Ollama with "CUDA 11" preset for Tesla K80 (compute capability 3.7)
- Keep complete CUDA toolkit and all libraries in final image (~20GB)
- Update docker-compose.yml: Simplified config, use ollama37:latest image
- Update docker/README.md: New build instructions and architecture docs

Trade-off: Larger image size (~20GB vs ~3GB) for guaranteed compatibility
and reliable GPU backend operation. All libraries remain accessible with
correct paths, ensuring models load properly on Tesla K80.

Tested: Successfully runs gemma3:1b on Tesla K80

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Shang Chieh Tseng
2025-11-10 09:19:22 +08:00
parent 0293c53746
commit 6dbd8ed44e
3 changed files with 232 additions and 161 deletions

View File

@@ -2,10 +2,9 @@ version: "3.8"
services:
ollama:
image: ollama37-runtime:latest
container_name: ollama37-runtime
image: ollama37:latest
container_name: ollama37
runtime: nvidia
user: "${UID:-1000}:${GID:-1000}"
deploy:
resources:
reservations:
@@ -16,9 +15,8 @@ services:
ports:
- "11434:11434"
volumes:
- ${HOME}/.ollama:${HOME}/.ollama
- ollama-data:/root/.ollama
environment:
- HOME=${HOME}
- OLLAMA_HOST=0.0.0.0:11434
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
@@ -29,6 +27,7 @@ services:
timeout: 10s
retries: 3
start_period: 5s
volumes:
ollama-data:
name: ollama-data