mirror of
https://github.com/dogkeeper886/ollama37.git
synced 2025-12-10 07:46:59 +00:00
Redesigned the Docker build system from a single-stage monolithic design to a clean two-stage architecture that separates build environment from compilation process while maintaining library path compatibility. ## Architecture Changes ### Builder Image (docker/builder/Dockerfile) - Provides base environment: CUDA 11.4, GCC 10, CMake 4, Go 1.25.3 - Built once, cached for subsequent builds (~90 min first time) - Removed config file copying (cuda-11.4.sh, gcc-10.conf, go.sh) - Added comprehensive comments explaining each build step - Added git installation for runtime stage source cloning ### Runtime Image (docker/runtime/Dockerfile) - Two-stage build using ollama37-builder as base for BOTH stages - Stage 1 (compile): Clone source from GitHub → CMake configure → Build C/C++/CUDA → Build Go - Stage 2 (runtime): Copy artifacts from stage 1 → Setup environment → Configure server - Both stages use identical base image to ensure library path compatibility - Removed -buildvcs=false flag (VCS info embedded from git clone) - Comprehensive comments documenting library paths and design rationale ### Makefile (docker/Makefile) - Simplified from 289 to 145 lines (-50% complexity) - Removed: run, stop, logs, shell, test targets (use docker-compose instead) - Removed: build orchestration targets (start-builder, copy-source, run-cmake, etc.) - Removed: artifact copying (handled internally by multi-stage build) - Focus: Build images only (build, build-builder, build-runtime, clean, help) - All runtime operations delegated to docker-compose.yml ### Documentation (docker/README.md) - Completely rewritten for new two-stage architecture - Added "Build System Components" section with file structure - Documented why both runtime stages use builder base (library path compatibility) - Updated build commands to use Makefile - Updated runtime commands to use docker-compose - Added comprehensive troubleshooting section - Added build time and image size tables - Reference to archived single-stage design ## Key Design Decision **Problem**: Compiled binaries have hardcoded library paths **Solution**: Use ollama37-builder as base for BOTH compile and runtime stages **Trade-off**: Larger image (~18GB) vs guaranteed library compatibility ## Benefits - ✅ Cleaner separation of concerns (builder env vs compilation vs runtime) - ✅ Builder image cached after first build (90 min → <1 min rebuilds) - ✅ Runtime rebuilds only take ~10 min (pulls latest code from GitHub) - ✅ No library path mismatches (identical base images) - ✅ No complex artifact extraction (multi-stage COPY) - ✅ Simpler Makefile focused on image building - ✅ Runtime management via docker-compose (industry standard) ## Files Changed Modified: - docker/builder/Dockerfile - Added comments, removed COPY config files - docker/runtime/Dockerfile - Converted to two-stage build - docker/Makefile - Simplified to focus on image building only - docker/README.md - Comprehensive rewrite for new architecture Deleted: - docker/builder/README.md - No longer needed - docker/builder/cuda-11.4.sh - Generated in Dockerfile - docker/builder/gcc-10.conf - Generated in Dockerfile - docker/builder/go.sh - Generated in Dockerfile Archived: - docker/Dockerfile → docker/Dockerfile.single-stage.archived 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
500 lines
11 KiB
Markdown
500 lines
11 KiB
Markdown
# Ollama37 Docker Build System
|
|
|
|
**Two-stage Docker build for Ollama with CUDA 11.4 and Compute Capability 3.7 support (Tesla K80)**
|
|
|
|
## Overview
|
|
|
|
This Docker build system uses a two-stage architecture to build and run Ollama with Tesla K80 (compute capability 3.7) support:
|
|
|
|
1. **Builder Image** (`builder/Dockerfile`) - Base environment with build tools
|
|
- Rocky Linux 8
|
|
- CUDA 11.4 toolkit (required for Tesla K80)
|
|
- GCC 10 (built from source, required by CUDA 11.4)
|
|
- CMake 4.0 (built from source)
|
|
- Go 1.25.3
|
|
|
|
2. **Runtime Image** (`runtime/Dockerfile`) - Two-stage build process
|
|
- **Stage 1 (compile)**: Clone source → Configure CMake → Build C/C++/CUDA → Build Go binary
|
|
- **Stage 2 (runtime)**: Copy artifacts → Setup runtime environment
|
|
|
|
The runtime uses the builder image as its base to ensure library path compatibility between build and runtime environments.
|
|
|
|
## Prerequisites
|
|
|
|
- Docker with NVIDIA Container Runtime
|
|
- Docker Compose
|
|
- NVIDIA GPU drivers (470+ for Tesla K80)
|
|
- Verify GPU access:
|
|
```bash
|
|
docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.4.3-base-rockylinux8 nvidia-smi
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
### 1. Build Images
|
|
|
|
```bash
|
|
cd /home/jack/Documents/ollama37/docker
|
|
make build
|
|
```
|
|
|
|
This will:
|
|
1. Build the builder image (if not present) - **~90 minutes first time**
|
|
2. Build the runtime image - **~10 minutes**
|
|
|
|
**First-time build:** ~100 minutes total (includes building GCC 10 and CMake 4 from source)
|
|
|
|
**Subsequent builds:** ~10 minutes (builder image is cached)
|
|
|
|
### 2. Run with Docker Compose (Recommended)
|
|
|
|
```bash
|
|
docker-compose up -d
|
|
```
|
|
|
|
Check logs:
|
|
```bash
|
|
docker-compose logs -f
|
|
```
|
|
|
|
Stop the server:
|
|
```bash
|
|
docker-compose down
|
|
```
|
|
|
|
### 3. Run Manually
|
|
|
|
```bash
|
|
docker run -d \
|
|
--name ollama37 \
|
|
--runtime=nvidia \
|
|
--gpus all \
|
|
-p 11434:11434 \
|
|
-v ollama-data:/root/.ollama \
|
|
ollama37:latest
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Using the API
|
|
|
|
```bash
|
|
# List models
|
|
curl http://localhost:11434/api/tags
|
|
|
|
# Pull a model
|
|
curl http://localhost:11434/api/pull -d '{"name": "gemma3:4b"}'
|
|
|
|
# Run inference
|
|
curl http://localhost:11434/api/generate -d '{
|
|
"model": "gemma3:4b",
|
|
"prompt": "Why is the sky blue?",
|
|
"stream": false
|
|
}'
|
|
```
|
|
|
|
### Using the CLI
|
|
|
|
```bash
|
|
# List models
|
|
docker exec ollama37 ollama list
|
|
|
|
# Pull a model
|
|
docker exec ollama37 ollama pull gemma3:4b
|
|
|
|
# Run a model
|
|
docker exec ollama37 ollama run gemma3:4b "Hello!"
|
|
```
|
|
|
|
## Architecture
|
|
|
|
### Build System Components
|
|
|
|
```
|
|
docker/
|
|
├── builder/
|
|
│ └── Dockerfile # Base image: CUDA 11.4, GCC 10, CMake 4, Go 1.25.3
|
|
├── runtime/
|
|
│ └── Dockerfile # Two-stage: compile ollama37, package runtime
|
|
├── Makefile # Build orchestration (images only)
|
|
├── docker-compose.yml # Runtime orchestration
|
|
└── README.md # This file
|
|
```
|
|
|
|
### Two-Stage Build Process
|
|
|
|
#### Stage 1: Builder Image (`builder/Dockerfile`)
|
|
**Purpose**: Provide consistent build environment
|
|
|
|
**Contents:**
|
|
- Rocky Linux 8 base
|
|
- CUDA 11.4 toolkit (compilation only, no driver)
|
|
- GCC 10 from source (~60 min build time)
|
|
- CMake 4.0 from source (~8 min build time)
|
|
- Go 1.25.3 binary
|
|
- All build dependencies
|
|
|
|
**Build time:** ~90 minutes (first time), cached thereafter
|
|
|
|
**Image size:** ~15GB
|
|
|
|
#### Stage 2: Runtime Image (`runtime/Dockerfile`)
|
|
|
|
**Stage 2.1 - Compile** (FROM ollama37-builder)
|
|
1. Clone ollama37 source from GitHub
|
|
2. Configure with CMake ("CUDA 11" preset for compute 3.7)
|
|
3. Build C/C++/CUDA libraries
|
|
4. Build Go binary
|
|
|
|
**Stage 2.2 - Runtime** (FROM ollama37-builder)
|
|
1. Copy entire source tree (includes compiled artifacts)
|
|
2. Copy binary to /usr/local/bin/ollama
|
|
3. Setup LD_LIBRARY_PATH for runtime libraries
|
|
4. Configure server, expose ports, setup volumes
|
|
|
|
**Build time:** ~10 minutes
|
|
|
|
**Image size:** ~18GB (includes build environment + compiled Ollama)
|
|
|
|
### Why Both Stages Use Builder Base?
|
|
|
|
**Problem**: Compiled binaries have hardcoded library paths (via rpath/LD_LIBRARY_PATH)
|
|
|
|
**Solution**: Use identical base images for compile and runtime stages
|
|
|
|
**Benefits:**
|
|
- ✅ Library paths match between build and runtime
|
|
- ✅ All GCC 10 runtime libraries present
|
|
- ✅ All CUDA libraries at expected paths
|
|
- ✅ No complex artifact extraction/copying
|
|
- ✅ Guaranteed compatibility
|
|
|
|
**Trade-off:** Larger runtime image (~18GB) vs complexity and reliability issues
|
|
|
|
### Alternative: Single-Stage Build
|
|
|
|
See `Dockerfile.single-stage.archived` for the original single-stage design that inspired this architecture.
|
|
|
|
## Build Commands
|
|
|
|
### Using the Makefile
|
|
|
|
```bash
|
|
# Build both builder and runtime images
|
|
make build
|
|
|
|
# Build only builder image
|
|
make build-builder
|
|
|
|
# Build only runtime image (will auto-build builder if needed)
|
|
make build-runtime
|
|
|
|
# Remove all images
|
|
make clean
|
|
|
|
# Show help
|
|
make help
|
|
```
|
|
|
|
### Direct Docker Commands
|
|
|
|
```bash
|
|
# Build builder image
|
|
docker build -f builder/Dockerfile -t ollama37-builder:latest builder/
|
|
|
|
# Build runtime image
|
|
docker build -f runtime/Dockerfile -t ollama37:latest .
|
|
```
|
|
|
|
## Runtime Management
|
|
|
|
### Using Docker Compose (Recommended)
|
|
|
|
```bash
|
|
# Start server
|
|
docker-compose up -d
|
|
|
|
# View logs (live tail)
|
|
docker-compose logs -f
|
|
|
|
# Stop server
|
|
docker-compose down
|
|
|
|
# Stop and remove volumes
|
|
docker-compose down -v
|
|
|
|
# Restart server
|
|
docker-compose restart
|
|
```
|
|
|
|
### Manual Docker Commands
|
|
|
|
```bash
|
|
# Start container
|
|
docker run -d \
|
|
--name ollama37 \
|
|
--runtime=nvidia \
|
|
--gpus all \
|
|
-p 11434:11434 \
|
|
-v ollama-data:/root/.ollama \
|
|
ollama37:latest
|
|
|
|
# View logs
|
|
docker logs -f ollama37
|
|
|
|
# Stop container
|
|
docker stop ollama37
|
|
docker rm ollama37
|
|
|
|
# Shell access
|
|
docker exec -it ollama37 bash
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `OLLAMA_HOST` | `0.0.0.0:11434` | Server listen address |
|
|
| `LD_LIBRARY_PATH` | `/usr/local/src/ollama37/build/lib/ollama:/usr/local/lib64:/usr/local/cuda-11.4/lib64:/usr/lib64` | Library search path |
|
|
| `NVIDIA_VISIBLE_DEVICES` | `all` | Which GPUs to use |
|
|
| `NVIDIA_DRIVER_CAPABILITIES` | `compute,utility` | GPU capabilities |
|
|
| `OLLAMA_DEBUG` | (unset) | Enable verbose Ollama logging |
|
|
| `GGML_CUDA_DEBUG` | (unset) | Enable CUDA/CUBLAS debug logging |
|
|
|
|
### Volume Mounts
|
|
|
|
- `/root/.ollama` - Model storage (use Docker volume `ollama-data`)
|
|
|
|
### Customizing docker-compose.yml
|
|
|
|
```yaml
|
|
# Change port
|
|
ports:
|
|
- "11435:11434" # Host:Container
|
|
|
|
# Use specific GPU
|
|
environment:
|
|
- NVIDIA_VISIBLE_DEVICES=0 # Use GPU 0 only
|
|
|
|
# Enable debug logging
|
|
environment:
|
|
- OLLAMA_DEBUG=1
|
|
- GGML_CUDA_DEBUG=1
|
|
```
|
|
|
|
## GPU Support
|
|
|
|
### Supported Compute Capabilities
|
|
- **3.7** - Tesla K80 (primary target)
|
|
- **5.0-5.2** - Maxwell (GTX 900 series)
|
|
- **6.0-6.1** - Pascal (GTX 10 series)
|
|
- **7.0-7.5** - Volta, Turing (RTX 20 series)
|
|
- **8.0-8.6** - Ampere (RTX 30 series)
|
|
|
|
### Tesla K80 Recommendations
|
|
|
|
**VRAM:** 12GB per GPU (24GB for dual-GPU K80)
|
|
|
|
**Model sizes:**
|
|
- Small (1-4B): Full precision or Q8 quantization
|
|
- Medium (7-8B): Q4_K_M quantization
|
|
- Large (13B+): Q4_0 quantization or multi-GPU
|
|
|
|
**Tested models:**
|
|
- ✅ gemma3:4b
|
|
- ✅ gpt-oss
|
|
- ✅ deepseek-r1
|
|
|
|
**Multi-GPU:**
|
|
```bash
|
|
# Use all GPUs
|
|
docker run --gpus all ...
|
|
|
|
# Use specific GPU
|
|
docker run --gpus '"device=0"' ...
|
|
|
|
# Use multiple specific GPUs
|
|
docker run --gpus '"device=0,1"' ...
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### GPU not detected
|
|
|
|
```bash
|
|
# Check GPU visibility in container
|
|
docker exec ollama37 nvidia-smi
|
|
|
|
# Check CUDA libraries
|
|
docker exec ollama37 ldconfig -p | grep cuda
|
|
|
|
# Check NVIDIA runtime
|
|
docker info | grep -i runtime
|
|
```
|
|
|
|
### Model fails to load
|
|
|
|
```bash
|
|
# Check logs with CUDA debug
|
|
docker run --rm --runtime=nvidia --gpus all \
|
|
-e OLLAMA_DEBUG=1 \
|
|
-e GGML_CUDA_DEBUG=1 \
|
|
-p 11434:11434 \
|
|
ollama37:latest
|
|
|
|
# Check library paths
|
|
docker exec ollama37 bash -c 'echo $LD_LIBRARY_PATH'
|
|
|
|
# Verify CUBLAS functions
|
|
docker exec ollama37 bash -c 'ldd /usr/local/bin/ollama | grep cublas'
|
|
```
|
|
|
|
### Build fails with "out of memory"
|
|
|
|
```bash
|
|
# Edit runtime/Dockerfile line for cmake build
|
|
# Change: cmake --build build -j$(nproc)
|
|
# To: cmake --build build -j2
|
|
|
|
# Or set Docker memory limit
|
|
docker build --memory=8g ...
|
|
```
|
|
|
|
### Port already in use
|
|
|
|
```bash
|
|
# Find process using port 11434
|
|
sudo lsof -i :11434
|
|
|
|
# Kill the process or change port in docker-compose.yml
|
|
ports:
|
|
- "11435:11434"
|
|
```
|
|
|
|
### Build cache issues
|
|
|
|
```bash
|
|
# Rebuild runtime image without cache
|
|
docker build --no-cache -f runtime/Dockerfile -t ollama37:latest .
|
|
|
|
# Rebuild builder image without cache
|
|
docker build --no-cache -f builder/Dockerfile -t ollama37-builder:latest builder/
|
|
|
|
# Remove all images and rebuild
|
|
make clean
|
|
make build
|
|
```
|
|
|
|
## Rebuilding
|
|
|
|
### Rebuild with latest code
|
|
|
|
```bash
|
|
# Runtime Dockerfile clones from GitHub, so rebuild to get latest
|
|
make build-runtime
|
|
|
|
# Restart container
|
|
docker-compose restart
|
|
```
|
|
|
|
### Rebuild everything from scratch
|
|
|
|
```bash
|
|
# Stop and remove containers
|
|
docker-compose down -v
|
|
|
|
# Remove images
|
|
make clean
|
|
|
|
# Rebuild all
|
|
make build
|
|
|
|
# Start fresh
|
|
docker-compose up -d
|
|
```
|
|
|
|
### Rebuild only builder (rare)
|
|
|
|
```bash
|
|
# Only needed if you change CUDA/GCC/CMake/Go versions
|
|
make clean
|
|
make build-builder
|
|
make build-runtime
|
|
```
|
|
|
|
## Development
|
|
|
|
### Modifying the build
|
|
|
|
1. **Change build tools** - Edit `builder/Dockerfile`
|
|
2. **Change Ollama build process** - Edit `runtime/Dockerfile`
|
|
3. **Change build orchestration** - Edit `Makefile`
|
|
4. **Change runtime config** - Edit `docker-compose.yml`
|
|
|
|
### Testing changes
|
|
|
|
```bash
|
|
# Build with your changes
|
|
make build
|
|
|
|
# Run and test
|
|
docker-compose up -d
|
|
docker-compose logs -f
|
|
|
|
# If issues, check inside container
|
|
docker exec -it ollama37 bash
|
|
```
|
|
|
|
### Shell access for debugging
|
|
|
|
```bash
|
|
# Enter running container
|
|
docker exec -it ollama37 bash
|
|
|
|
# Check GPU
|
|
nvidia-smi
|
|
|
|
# Check libraries
|
|
ldd /usr/local/bin/ollama
|
|
ldconfig -p | grep -E "cuda|cublas"
|
|
|
|
# Test binary
|
|
/usr/local/bin/ollama --version
|
|
```
|
|
|
|
## Image Sizes
|
|
|
|
| Image | Size | Contents |
|
|
|-------|------|----------|
|
|
| `ollama37-builder:latest` | ~15GB | CUDA, GCC, CMake, Go, build deps |
|
|
| `ollama37:latest` | ~18GB | Builder + Ollama binary + libraries |
|
|
|
|
**Note**: Large size ensures all runtime dependencies are present and properly linked.
|
|
|
|
## Build Times
|
|
|
|
| Task | First Build | Cached Build |
|
|
|------|-------------|--------------|
|
|
| Builder image | ~90 min | <1 min |
|
|
| Runtime image | ~10 min | ~10 min |
|
|
| **Total** | **~100 min** | **~10 min** |
|
|
|
|
**Breakdown (first build):**
|
|
- GCC 10: ~60 min
|
|
- CMake 4: ~8 min
|
|
- CUDA toolkit: ~10 min
|
|
- Go install: ~1 min
|
|
- Ollama build: ~10 min
|
|
|
|
## Documentation
|
|
|
|
- **[../CLAUDE.md](../CLAUDE.md)** - Project goals, implementation details, and technical notes
|
|
- **[Upstream Ollama](https://github.com/ollama/ollama)** - Original Ollama project
|
|
- **[dogkeeper886/ollama37](https://github.com/dogkeeper886/ollama37)** - This fork with K80 support
|
|
|
|
## License
|
|
|
|
MIT (same as upstream Ollama)
|