mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-10 07:46:59 +00:00

Files

Shang Chieh Tseng 4810471b33 Redesign Docker build system to two-stage architecture with builder/runtime separation

Redesigned the Docker build system from a single-stage monolithic design to a clean
two-stage architecture that separates build environment from compilation process while
maintaining library path compatibility.

## Architecture Changes

### Builder Image (docker/builder/Dockerfile)
- Provides base environment: CUDA 11.4, GCC 10, CMake 4, Go 1.25.3
- Built once, cached for subsequent builds (~90 min first time)
- Removed config file copying (cuda-11.4.sh, gcc-10.conf, go.sh)
- Added comprehensive comments explaining each build step
- Added git installation for runtime stage source cloning

### Runtime Image (docker/runtime/Dockerfile)
- Two-stage build using ollama37-builder as base for BOTH stages
- Stage 1 (compile): Clone source from GitHub → CMake configure → Build C/C++/CUDA → Build Go
- Stage 2 (runtime): Copy artifacts from stage 1 → Setup environment → Configure server
- Both stages use identical base image to ensure library path compatibility
- Removed -buildvcs=false flag (VCS info embedded from git clone)
- Comprehensive comments documenting library paths and design rationale

### Makefile (docker/Makefile)
- Simplified from 289 to 145 lines (-50% complexity)
- Removed: run, stop, logs, shell, test targets (use docker-compose instead)
- Removed: build orchestration targets (start-builder, copy-source, run-cmake, etc.)
- Removed: artifact copying (handled internally by multi-stage build)
- Focus: Build images only (build, build-builder, build-runtime, clean, help)
- All runtime operations delegated to docker-compose.yml

### Documentation (docker/README.md)
- Completely rewritten for new two-stage architecture
- Added "Build System Components" section with file structure
- Documented why both runtime stages use builder base (library path compatibility)
- Updated build commands to use Makefile
- Updated runtime commands to use docker-compose
- Added comprehensive troubleshooting section
- Added build time and image size tables
- Reference to archived single-stage design

## Key Design Decision

**Problem**: Compiled binaries have hardcoded library paths
**Solution**: Use ollama37-builder as base for BOTH compile and runtime stages
**Trade-off**: Larger image (~18GB) vs guaranteed library compatibility

## Benefits

- ✅ Cleaner separation of concerns (builder env vs compilation vs runtime)
- ✅ Builder image cached after first build (90 min → <1 min rebuilds)
- ✅ Runtime rebuilds only take ~10 min (pulls latest code from GitHub)
- ✅ No library path mismatches (identical base images)
- ✅ No complex artifact extraction (multi-stage COPY)
- ✅ Simpler Makefile focused on image building
- ✅ Runtime management via docker-compose (industry standard)

## Files Changed

Modified:
- docker/builder/Dockerfile - Added comments, removed COPY config files
- docker/runtime/Dockerfile - Converted to two-stage build
- docker/Makefile - Simplified to focus on image building only
- docker/README.md - Comprehensive rewrite for new architecture

Deleted:
- docker/builder/README.md - No longer needed
- docker/builder/cuda-11.4.sh - Generated in Dockerfile
- docker/builder/gcc-10.conf - Generated in Dockerfile
- docker/builder/go.sh - Generated in Dockerfile

Archived:
- docker/Dockerfile → docker/Dockerfile.single-stage.archived

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-10 13:14:49 +08:00

11 KiB

Raw Blame History

Ollama37 Docker Build System

Two-stage Docker build for Ollama with CUDA 11.4 and Compute Capability 3.7 support (Tesla K80)

Overview

This Docker build system uses a two-stage architecture to build and run Ollama with Tesla K80 (compute capability 3.7) support:

Builder Image (builder/Dockerfile) - Base environment with build tools
- Rocky Linux 8
- CUDA 11.4 toolkit (required for Tesla K80)
- GCC 10 (built from source, required by CUDA 11.4)
- CMake 4.0 (built from source)
- Go 1.25.3
Runtime Image (runtime/Dockerfile) - Two-stage build process
- Stage 1 (compile): Clone source → Configure CMake → Build C/C++/CUDA → Build Go binary
- Stage 2 (runtime): Copy artifacts → Setup runtime environment

The runtime uses the builder image as its base to ensure library path compatibility between build and runtime environments.

Prerequisites

Docker with NVIDIA Container Runtime
Docker Compose
NVIDIA GPU drivers (470+ for Tesla K80)

Verify GPU access:

docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.4.3-base-rockylinux8 nvidia-smi

Quick Start

1. Build Images

cd /home/jack/Documents/ollama37/docker
make build

This will:

Build the builder image (if not present) - ~90 minutes first time
Build the runtime image - ~10 minutes

First-time build: ~100 minutes total (includes building GCC 10 and CMake 4 from source)

Subsequent builds: ~10 minutes (builder image is cached)

2. Run with Docker Compose (Recommended)

docker-compose up -d

Check logs:

docker-compose logs -f

Stop the server:

docker-compose down

3. Run Manually

docker run -d \
  --name ollama37 \
  --runtime=nvidia \
  --gpus all \
  -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  ollama37:latest

Usage

Using the API

# List models
curl http://localhost:11434/api/tags

# Pull a model
curl http://localhost:11434/api/pull -d '{"name": "gemma3:4b"}'

# Run inference
curl http://localhost:11434/api/generate -d '{
  "model": "gemma3:4b",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

Using the CLI

# List models
docker exec ollama37 ollama list

# Pull a model
docker exec ollama37 ollama pull gemma3:4b

# Run a model
docker exec ollama37 ollama run gemma3:4b "Hello!"

Architecture

Build System Components

docker/
├── builder/
│   └── Dockerfile          # Base image: CUDA 11.4, GCC 10, CMake 4, Go 1.25.3
├── runtime/
│   └── Dockerfile          # Two-stage: compile ollama37, package runtime
├── Makefile                # Build orchestration (images only)
├── docker-compose.yml      # Runtime orchestration
└── README.md               # This file

Two-Stage Build Process

Stage 1: Builder Image (`builder/Dockerfile`)

Purpose: Provide consistent build environment

Contents:

Rocky Linux 8 base
CUDA 11.4 toolkit (compilation only, no driver)
GCC 10 from source (~60 min build time)
CMake 4.0 from source (~8 min build time)
Go 1.25.3 binary
All build dependencies

Build time: ~90 minutes (first time), cached thereafter

Image size: ~15GB

Stage 2: Runtime Image (`runtime/Dockerfile`)

Stage 2.1 - Compile (FROM ollama37-builder)

Clone ollama37 source from GitHub
Configure with CMake ("CUDA 11" preset for compute 3.7)
Build C/C++/CUDA libraries
Build Go binary

Stage 2.2 - Runtime (FROM ollama37-builder)

Copy entire source tree (includes compiled artifacts)
Copy binary to /usr/local/bin/ollama
Setup LD_LIBRARY_PATH for runtime libraries
Configure server, expose ports, setup volumes

Build time: ~10 minutes

Image size: ~18GB (includes build environment + compiled Ollama)

Why Both Stages Use Builder Base?

Problem: Compiled binaries have hardcoded library paths (via rpath/LD_LIBRARY_PATH)

Solution: Use identical base images for compile and runtime stages

Benefits:

✅ Library paths match between build and runtime
✅ All GCC 10 runtime libraries present
✅ All CUDA libraries at expected paths
✅ No complex artifact extraction/copying
✅ Guaranteed compatibility

Trade-off: Larger runtime image (~18GB) vs complexity and reliability issues

Alternative: Single-Stage Build

See Dockerfile.single-stage.archived for the original single-stage design that inspired this architecture.

Build Commands

Using the Makefile

# Build both builder and runtime images
make build

# Build only builder image
make build-builder

# Build only runtime image (will auto-build builder if needed)
make build-runtime

# Remove all images
make clean

# Show help
make help

Direct Docker Commands

# Build builder image
docker build -f builder/Dockerfile -t ollama37-builder:latest builder/

# Build runtime image
docker build -f runtime/Dockerfile -t ollama37:latest .

Runtime Management

Using Docker Compose (Recommended)

# Start server
docker-compose up -d

# View logs (live tail)
docker-compose logs -f

# Stop server
docker-compose down

# Stop and remove volumes
docker-compose down -v

# Restart server
docker-compose restart

Manual Docker Commands

# Start container
docker run -d \
  --name ollama37 \
  --runtime=nvidia \
  --gpus all \
  -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  ollama37:latest

# View logs
docker logs -f ollama37

# Stop container
docker stop ollama37
docker rm ollama37

# Shell access
docker exec -it ollama37 bash

Configuration

Environment Variables

Variable	Default	Description
`OLLAMA_HOST`	`0.0.0.0:11434`	Server listen address
`LD_LIBRARY_PATH`	`/usr/local/src/ollama37/build/lib/ollama:/usr/local/lib64:/usr/local/cuda-11.4/lib64:/usr/lib64`	Library search path
`NVIDIA_VISIBLE_DEVICES`	`all`	Which GPUs to use
`NVIDIA_DRIVER_CAPABILITIES`	`compute,utility`	GPU capabilities
`OLLAMA_DEBUG`	(unset)	Enable verbose Ollama logging
`GGML_CUDA_DEBUG`	(unset)	Enable CUDA/CUBLAS debug logging

Volume Mounts

/root/.ollama - Model storage (use Docker volume ollama-data)

Customizing docker-compose.yml

# Change port
ports:
  - "11435:11434"  # Host:Container

# Use specific GPU
environment:
  - NVIDIA_VISIBLE_DEVICES=0  # Use GPU 0 only

# Enable debug logging
environment:
  - OLLAMA_DEBUG=1
  - GGML_CUDA_DEBUG=1

GPU Support

Supported Compute Capabilities

3.7 - Tesla K80 (primary target)
5.0-5.2 - Maxwell (GTX 900 series)
6.0-6.1 - Pascal (GTX 10 series)
7.0-7.5 - Volta, Turing (RTX 20 series)
8.0-8.6 - Ampere (RTX 30 series)

Tesla K80 Recommendations

VRAM: 12GB per GPU (24GB for dual-GPU K80)

Model sizes:

Small (1-4B): Full precision or Q8 quantization
Medium (7-8B): Q4_K_M quantization
Large (13B+): Q4_0 quantization or multi-GPU

Tested models:

✅ gemma3:4b
✅ gpt-oss
✅ deepseek-r1

Multi-GPU:

# Use all GPUs
docker run --gpus all ...

# Use specific GPU
docker run --gpus '"device=0"' ...

# Use multiple specific GPUs
docker run --gpus '"device=0,1"' ...

Troubleshooting

GPU not detected

# Check GPU visibility in container
docker exec ollama37 nvidia-smi

# Check CUDA libraries
docker exec ollama37 ldconfig -p | grep cuda

# Check NVIDIA runtime
docker info | grep -i runtime

Model fails to load

# Check logs with CUDA debug
docker run --rm --runtime=nvidia --gpus all \
  -e OLLAMA_DEBUG=1 \
  -e GGML_CUDA_DEBUG=1 \
  -p 11434:11434 \
  ollama37:latest

# Check library paths
docker exec ollama37 bash -c 'echo $LD_LIBRARY_PATH'

# Verify CUBLAS functions
docker exec ollama37 bash -c 'ldd /usr/local/bin/ollama | grep cublas'

Build fails with "out of memory"

# Edit runtime/Dockerfile line for cmake build
# Change: cmake --build build -j$(nproc)
# To: cmake --build build -j2

# Or set Docker memory limit
docker build --memory=8g ...

Port already in use

# Find process using port 11434
sudo lsof -i :11434

# Kill the process or change port in docker-compose.yml
ports:
  - "11435:11434"

Build cache issues

# Rebuild runtime image without cache
docker build --no-cache -f runtime/Dockerfile -t ollama37:latest .

# Rebuild builder image without cache
docker build --no-cache -f builder/Dockerfile -t ollama37-builder:latest builder/

# Remove all images and rebuild
make clean
make build

Rebuilding

Rebuild with latest code

# Runtime Dockerfile clones from GitHub, so rebuild to get latest
make build-runtime

# Restart container
docker-compose restart

Rebuild everything from scratch

# Stop and remove containers
docker-compose down -v

# Remove images
make clean

# Rebuild all
make build

# Start fresh
docker-compose up -d

Rebuild only builder (rare)

# Only needed if you change CUDA/GCC/CMake/Go versions
make clean
make build-builder
make build-runtime

Development

Modifying the build

Change build tools - Edit builder/Dockerfile
Change Ollama build process - Edit runtime/Dockerfile
Change build orchestration - Edit Makefile
Change runtime config - Edit docker-compose.yml

Testing changes

# Build with your changes
make build

# Run and test
docker-compose up -d
docker-compose logs -f

# If issues, check inside container
docker exec -it ollama37 bash

Shell access for debugging

# Enter running container
docker exec -it ollama37 bash

# Check GPU
nvidia-smi

# Check libraries
ldd /usr/local/bin/ollama
ldconfig -p | grep -E "cuda|cublas"

# Test binary
/usr/local/bin/ollama --version

Image Sizes

Image	Size	Contents
`ollama37-builder:latest`	~15GB	CUDA, GCC, CMake, Go, build deps
`ollama37:latest`	~18GB	Builder + Ollama binary + libraries

Note: Large size ensures all runtime dependencies are present and properly linked.

Build Times

Task	First Build	Cached Build
Builder image	~90 min	<1 min
Runtime image	~10 min	~10 min
Total	~100 min	~10 min

Breakdown (first build):

GCC 10: ~60 min
CMake 4: ~8 min
CUDA toolkit: ~10 min
Go install: ~1 min
Ollama build: ~10 min

Documentation

../CLAUDE.md - Project goals, implementation details, and technical notes
Upstream Ollama - Original Ollama project
dogkeeper886/ollama37 - This fork with K80 support

License

MIT (same as upstream Ollama)

11 KiB Raw Blame History