Files
ollama37/docker/README.md
Shang Chieh Tseng 4810471b33 Redesign Docker build system to two-stage architecture with builder/runtime separation
Redesigned the Docker build system from a single-stage monolithic design to a clean
two-stage architecture that separates build environment from compilation process while
maintaining library path compatibility.

## Architecture Changes

### Builder Image (docker/builder/Dockerfile)
- Provides base environment: CUDA 11.4, GCC 10, CMake 4, Go 1.25.3
- Built once, cached for subsequent builds (~90 min first time)
- Removed config file copying (cuda-11.4.sh, gcc-10.conf, go.sh)
- Added comprehensive comments explaining each build step
- Added git installation for runtime stage source cloning

### Runtime Image (docker/runtime/Dockerfile)
- Two-stage build using ollama37-builder as base for BOTH stages
- Stage 1 (compile): Clone source from GitHub → CMake configure → Build C/C++/CUDA → Build Go
- Stage 2 (runtime): Copy artifacts from stage 1 → Setup environment → Configure server
- Both stages use identical base image to ensure library path compatibility
- Removed -buildvcs=false flag (VCS info embedded from git clone)
- Comprehensive comments documenting library paths and design rationale

### Makefile (docker/Makefile)
- Simplified from 289 to 145 lines (-50% complexity)
- Removed: run, stop, logs, shell, test targets (use docker-compose instead)
- Removed: build orchestration targets (start-builder, copy-source, run-cmake, etc.)
- Removed: artifact copying (handled internally by multi-stage build)
- Focus: Build images only (build, build-builder, build-runtime, clean, help)
- All runtime operations delegated to docker-compose.yml

### Documentation (docker/README.md)
- Completely rewritten for new two-stage architecture
- Added "Build System Components" section with file structure
- Documented why both runtime stages use builder base (library path compatibility)
- Updated build commands to use Makefile
- Updated runtime commands to use docker-compose
- Added comprehensive troubleshooting section
- Added build time and image size tables
- Reference to archived single-stage design

## Key Design Decision

**Problem**: Compiled binaries have hardcoded library paths
**Solution**: Use ollama37-builder as base for BOTH compile and runtime stages
**Trade-off**: Larger image (~18GB) vs guaranteed library compatibility

## Benefits

-  Cleaner separation of concerns (builder env vs compilation vs runtime)
-  Builder image cached after first build (90 min → <1 min rebuilds)
-  Runtime rebuilds only take ~10 min (pulls latest code from GitHub)
-  No library path mismatches (identical base images)
-  No complex artifact extraction (multi-stage COPY)
-  Simpler Makefile focused on image building
-  Runtime management via docker-compose (industry standard)

## Files Changed

Modified:
- docker/builder/Dockerfile - Added comments, removed COPY config files
- docker/runtime/Dockerfile - Converted to two-stage build
- docker/Makefile - Simplified to focus on image building only
- docker/README.md - Comprehensive rewrite for new architecture

Deleted:
- docker/builder/README.md - No longer needed
- docker/builder/cuda-11.4.sh - Generated in Dockerfile
- docker/builder/gcc-10.conf - Generated in Dockerfile
- docker/builder/go.sh - Generated in Dockerfile

Archived:
- docker/Dockerfile → docker/Dockerfile.single-stage.archived

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-10 13:14:49 +08:00

11 KiB

Ollama37 Docker Build System

Two-stage Docker build for Ollama with CUDA 11.4 and Compute Capability 3.7 support (Tesla K80)

Overview

This Docker build system uses a two-stage architecture to build and run Ollama with Tesla K80 (compute capability 3.7) support:

  1. Builder Image (builder/Dockerfile) - Base environment with build tools

    • Rocky Linux 8
    • CUDA 11.4 toolkit (required for Tesla K80)
    • GCC 10 (built from source, required by CUDA 11.4)
    • CMake 4.0 (built from source)
    • Go 1.25.3
  2. Runtime Image (runtime/Dockerfile) - Two-stage build process

    • Stage 1 (compile): Clone source → Configure CMake → Build C/C++/CUDA → Build Go binary
    • Stage 2 (runtime): Copy artifacts → Setup runtime environment

The runtime uses the builder image as its base to ensure library path compatibility between build and runtime environments.

Prerequisites

  • Docker with NVIDIA Container Runtime
  • Docker Compose
  • NVIDIA GPU drivers (470+ for Tesla K80)
  • Verify GPU access:
    docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.4.3-base-rockylinux8 nvidia-smi
    

Quick Start

1. Build Images

cd /home/jack/Documents/ollama37/docker
make build

This will:

  1. Build the builder image (if not present) - ~90 minutes first time
  2. Build the runtime image - ~10 minutes

First-time build: ~100 minutes total (includes building GCC 10 and CMake 4 from source)

Subsequent builds: ~10 minutes (builder image is cached)

docker-compose up -d

Check logs:

docker-compose logs -f

Stop the server:

docker-compose down

3. Run Manually

docker run -d \
  --name ollama37 \
  --runtime=nvidia \
  --gpus all \
  -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  ollama37:latest

Usage

Using the API

# List models
curl http://localhost:11434/api/tags

# Pull a model
curl http://localhost:11434/api/pull -d '{"name": "gemma3:4b"}'

# Run inference
curl http://localhost:11434/api/generate -d '{
  "model": "gemma3:4b",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

Using the CLI

# List models
docker exec ollama37 ollama list

# Pull a model
docker exec ollama37 ollama pull gemma3:4b

# Run a model
docker exec ollama37 ollama run gemma3:4b "Hello!"

Architecture

Build System Components

docker/
├── builder/
│   └── Dockerfile          # Base image: CUDA 11.4, GCC 10, CMake 4, Go 1.25.3
├── runtime/
│   └── Dockerfile          # Two-stage: compile ollama37, package runtime
├── Makefile                # Build orchestration (images only)
├── docker-compose.yml      # Runtime orchestration
└── README.md               # This file

Two-Stage Build Process

Stage 1: Builder Image (builder/Dockerfile)

Purpose: Provide consistent build environment

Contents:

  • Rocky Linux 8 base
  • CUDA 11.4 toolkit (compilation only, no driver)
  • GCC 10 from source (~60 min build time)
  • CMake 4.0 from source (~8 min build time)
  • Go 1.25.3 binary
  • All build dependencies

Build time: ~90 minutes (first time), cached thereafter

Image size: ~15GB

Stage 2: Runtime Image (runtime/Dockerfile)

Stage 2.1 - Compile (FROM ollama37-builder)

  1. Clone ollama37 source from GitHub
  2. Configure with CMake ("CUDA 11" preset for compute 3.7)
  3. Build C/C++/CUDA libraries
  4. Build Go binary

Stage 2.2 - Runtime (FROM ollama37-builder)

  1. Copy entire source tree (includes compiled artifacts)
  2. Copy binary to /usr/local/bin/ollama
  3. Setup LD_LIBRARY_PATH for runtime libraries
  4. Configure server, expose ports, setup volumes

Build time: ~10 minutes

Image size: ~18GB (includes build environment + compiled Ollama)

Why Both Stages Use Builder Base?

Problem: Compiled binaries have hardcoded library paths (via rpath/LD_LIBRARY_PATH)

Solution: Use identical base images for compile and runtime stages

Benefits:

  • Library paths match between build and runtime
  • All GCC 10 runtime libraries present
  • All CUDA libraries at expected paths
  • No complex artifact extraction/copying
  • Guaranteed compatibility

Trade-off: Larger runtime image (~18GB) vs complexity and reliability issues

Alternative: Single-Stage Build

See Dockerfile.single-stage.archived for the original single-stage design that inspired this architecture.

Build Commands

Using the Makefile

# Build both builder and runtime images
make build

# Build only builder image
make build-builder

# Build only runtime image (will auto-build builder if needed)
make build-runtime

# Remove all images
make clean

# Show help
make help

Direct Docker Commands

# Build builder image
docker build -f builder/Dockerfile -t ollama37-builder:latest builder/

# Build runtime image
docker build -f runtime/Dockerfile -t ollama37:latest .

Runtime Management

# Start server
docker-compose up -d

# View logs (live tail)
docker-compose logs -f

# Stop server
docker-compose down

# Stop and remove volumes
docker-compose down -v

# Restart server
docker-compose restart

Manual Docker Commands

# Start container
docker run -d \
  --name ollama37 \
  --runtime=nvidia \
  --gpus all \
  -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  ollama37:latest

# View logs
docker logs -f ollama37

# Stop container
docker stop ollama37
docker rm ollama37

# Shell access
docker exec -it ollama37 bash

Configuration

Environment Variables

Variable Default Description
OLLAMA_HOST 0.0.0.0:11434 Server listen address
LD_LIBRARY_PATH /usr/local/src/ollama37/build/lib/ollama:/usr/local/lib64:/usr/local/cuda-11.4/lib64:/usr/lib64 Library search path
NVIDIA_VISIBLE_DEVICES all Which GPUs to use
NVIDIA_DRIVER_CAPABILITIES compute,utility GPU capabilities
OLLAMA_DEBUG (unset) Enable verbose Ollama logging
GGML_CUDA_DEBUG (unset) Enable CUDA/CUBLAS debug logging

Volume Mounts

  • /root/.ollama - Model storage (use Docker volume ollama-data)

Customizing docker-compose.yml

# Change port
ports:
  - "11435:11434"  # Host:Container

# Use specific GPU
environment:
  - NVIDIA_VISIBLE_DEVICES=0  # Use GPU 0 only

# Enable debug logging
environment:
  - OLLAMA_DEBUG=1
  - GGML_CUDA_DEBUG=1

GPU Support

Supported Compute Capabilities

  • 3.7 - Tesla K80 (primary target)
  • 5.0-5.2 - Maxwell (GTX 900 series)
  • 6.0-6.1 - Pascal (GTX 10 series)
  • 7.0-7.5 - Volta, Turing (RTX 20 series)
  • 8.0-8.6 - Ampere (RTX 30 series)

Tesla K80 Recommendations

VRAM: 12GB per GPU (24GB for dual-GPU K80)

Model sizes:

  • Small (1-4B): Full precision or Q8 quantization
  • Medium (7-8B): Q4_K_M quantization
  • Large (13B+): Q4_0 quantization or multi-GPU

Tested models:

  • gemma3:4b
  • gpt-oss
  • deepseek-r1

Multi-GPU:

# Use all GPUs
docker run --gpus all ...

# Use specific GPU
docker run --gpus '"device=0"' ...

# Use multiple specific GPUs
docker run --gpus '"device=0,1"' ...

Troubleshooting

GPU not detected

# Check GPU visibility in container
docker exec ollama37 nvidia-smi

# Check CUDA libraries
docker exec ollama37 ldconfig -p | grep cuda

# Check NVIDIA runtime
docker info | grep -i runtime

Model fails to load

# Check logs with CUDA debug
docker run --rm --runtime=nvidia --gpus all \
  -e OLLAMA_DEBUG=1 \
  -e GGML_CUDA_DEBUG=1 \
  -p 11434:11434 \
  ollama37:latest

# Check library paths
docker exec ollama37 bash -c 'echo $LD_LIBRARY_PATH'

# Verify CUBLAS functions
docker exec ollama37 bash -c 'ldd /usr/local/bin/ollama | grep cublas'

Build fails with "out of memory"

# Edit runtime/Dockerfile line for cmake build
# Change: cmake --build build -j$(nproc)
# To: cmake --build build -j2

# Or set Docker memory limit
docker build --memory=8g ...

Port already in use

# Find process using port 11434
sudo lsof -i :11434

# Kill the process or change port in docker-compose.yml
ports:
  - "11435:11434"

Build cache issues

# Rebuild runtime image without cache
docker build --no-cache -f runtime/Dockerfile -t ollama37:latest .

# Rebuild builder image without cache
docker build --no-cache -f builder/Dockerfile -t ollama37-builder:latest builder/

# Remove all images and rebuild
make clean
make build

Rebuilding

Rebuild with latest code

# Runtime Dockerfile clones from GitHub, so rebuild to get latest
make build-runtime

# Restart container
docker-compose restart

Rebuild everything from scratch

# Stop and remove containers
docker-compose down -v

# Remove images
make clean

# Rebuild all
make build

# Start fresh
docker-compose up -d

Rebuild only builder (rare)

# Only needed if you change CUDA/GCC/CMake/Go versions
make clean
make build-builder
make build-runtime

Development

Modifying the build

  1. Change build tools - Edit builder/Dockerfile
  2. Change Ollama build process - Edit runtime/Dockerfile
  3. Change build orchestration - Edit Makefile
  4. Change runtime config - Edit docker-compose.yml

Testing changes

# Build with your changes
make build

# Run and test
docker-compose up -d
docker-compose logs -f

# If issues, check inside container
docker exec -it ollama37 bash

Shell access for debugging

# Enter running container
docker exec -it ollama37 bash

# Check GPU
nvidia-smi

# Check libraries
ldd /usr/local/bin/ollama
ldconfig -p | grep -E "cuda|cublas"

# Test binary
/usr/local/bin/ollama --version

Image Sizes

Image Size Contents
ollama37-builder:latest ~15GB CUDA, GCC, CMake, Go, build deps
ollama37:latest ~18GB Builder + Ollama binary + libraries

Note: Large size ensures all runtime dependencies are present and properly linked.

Build Times

Task First Build Cached Build
Builder image ~90 min <1 min
Runtime image ~10 min ~10 min
Total ~100 min ~10 min

Breakdown (first build):

  • GCC 10: ~60 min
  • CMake 4: ~8 min
  • CUDA toolkit: ~10 min
  • Go install: ~1 min
  • Ollama build: ~10 min

Documentation

License

MIT (same as upstream Ollama)