mirror of
https://github.com/dogkeeper886/ollama37.git
synced 2025-12-09 23:37:06 +00:00
Redesign Docker build system to two-stage architecture with builder/runtime separation
Redesigned the Docker build system from a single-stage monolithic design to a clean two-stage architecture that separates build environment from compilation process while maintaining library path compatibility. ## Architecture Changes ### Builder Image (docker/builder/Dockerfile) - Provides base environment: CUDA 11.4, GCC 10, CMake 4, Go 1.25.3 - Built once, cached for subsequent builds (~90 min first time) - Removed config file copying (cuda-11.4.sh, gcc-10.conf, go.sh) - Added comprehensive comments explaining each build step - Added git installation for runtime stage source cloning ### Runtime Image (docker/runtime/Dockerfile) - Two-stage build using ollama37-builder as base for BOTH stages - Stage 1 (compile): Clone source from GitHub → CMake configure → Build C/C++/CUDA → Build Go - Stage 2 (runtime): Copy artifacts from stage 1 → Setup environment → Configure server - Both stages use identical base image to ensure library path compatibility - Removed -buildvcs=false flag (VCS info embedded from git clone) - Comprehensive comments documenting library paths and design rationale ### Makefile (docker/Makefile) - Simplified from 289 to 145 lines (-50% complexity) - Removed: run, stop, logs, shell, test targets (use docker-compose instead) - Removed: build orchestration targets (start-builder, copy-source, run-cmake, etc.) - Removed: artifact copying (handled internally by multi-stage build) - Focus: Build images only (build, build-builder, build-runtime, clean, help) - All runtime operations delegated to docker-compose.yml ### Documentation (docker/README.md) - Completely rewritten for new two-stage architecture - Added "Build System Components" section with file structure - Documented why both runtime stages use builder base (library path compatibility) - Updated build commands to use Makefile - Updated runtime commands to use docker-compose - Added comprehensive troubleshooting section - Added build time and image size tables - Reference to archived single-stage design ## Key Design Decision **Problem**: Compiled binaries have hardcoded library paths **Solution**: Use ollama37-builder as base for BOTH compile and runtime stages **Trade-off**: Larger image (~18GB) vs guaranteed library compatibility ## Benefits - ✅ Cleaner separation of concerns (builder env vs compilation vs runtime) - ✅ Builder image cached after first build (90 min → <1 min rebuilds) - ✅ Runtime rebuilds only take ~10 min (pulls latest code from GitHub) - ✅ No library path mismatches (identical base images) - ✅ No complex artifact extraction (multi-stage COPY) - ✅ Simpler Makefile focused on image building - ✅ Runtime management via docker-compose (industry standard) ## Files Changed Modified: - docker/builder/Dockerfile - Added comments, removed COPY config files - docker/runtime/Dockerfile - Converted to two-stage build - docker/Makefile - Simplified to focus on image building only - docker/README.md - Comprehensive rewrite for new architecture Deleted: - docker/builder/README.md - No longer needed - docker/builder/cuda-11.4.sh - Generated in Dockerfile - docker/builder/gcc-10.conf - Generated in Dockerfile - docker/builder/go.sh - Generated in Dockerfile Archived: - docker/Dockerfile → docker/Dockerfile.single-stage.archived 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
346
docker/Makefile
346
docker/Makefile
@@ -1,288 +1,144 @@
|
||||
# Makefile for building Ollama with GPU-enabled builder container
|
||||
# Makefile for Ollama37 Docker Build System
|
||||
#
|
||||
# This Makefile uses a pre-built builder container with CUDA support and GPU access
|
||||
# to compile Ollama with compute capability 3.7 support (Tesla K80).
|
||||
# This Makefile manages the two-stage Docker build process:
|
||||
# 1. Builder image: Base environment with CUDA 11.4, GCC 10, CMake 4, Go 1.25.3
|
||||
# 2. Runtime image: Two-stage build that compiles and packages Ollama
|
||||
#
|
||||
# The runtime Dockerfile handles:
|
||||
# - Cloning source from GitHub
|
||||
# - CMake configuration and C/C++/CUDA compilation
|
||||
# - Go binary compilation
|
||||
# - Packaging runtime environment
|
||||
#
|
||||
# Usage:
|
||||
# make build - Build ollama binary and libraries
|
||||
# make clean - Remove build artifacts from host
|
||||
# make clean-all - Remove build artifacts and stop/remove containers
|
||||
# make shell - Open a shell in the builder container
|
||||
# make test - Test the built binary
|
||||
# make build - Build builder and runtime images (default)
|
||||
# make build-builder - Build only the builder image
|
||||
# make build-runtime - Build only the runtime image
|
||||
# make clean - Remove all Docker images
|
||||
# make help - Show help message
|
||||
#
|
||||
# To run the container, use docker-compose:
|
||||
# docker-compose up -d
|
||||
# docker-compose logs -f
|
||||
# docker-compose down
|
||||
|
||||
# Configuration
|
||||
BUILDER_IMAGE := ollama37-builder
|
||||
BUILDER_TAG := latest
|
||||
BUILDER_DOCKERFILE := $(SOURCE_DIR)/docker/builder/Dockerfile
|
||||
CONTAINER_NAME := ollama37-builder
|
||||
RUNTIME_IMAGE := ollama37-runtime
|
||||
RUNTIME_IMAGE := ollama37
|
||||
RUNTIME_TAG := latest
|
||||
SOURCE_DIR := $(shell cd .. && pwd)
|
||||
BUILD_DIR := $(SOURCE_DIR)/build
|
||||
DIST_DIR := $(SOURCE_DIR)/dist
|
||||
OUTPUT_DIR := $(SOURCE_DIR)/docker/output
|
||||
BUILDER_DOCKERFILE := $(SOURCE_DIR)/docker/builder/Dockerfile
|
||||
RUNTIME_DOCKERFILE := $(SOURCE_DIR)/docker/runtime/Dockerfile
|
||||
|
||||
# CMake preset to use
|
||||
CMAKE_PRESET := CUDA 11
|
||||
# Docker build context directories
|
||||
BUILDER_CONTEXT := $(SOURCE_DIR)/docker/builder
|
||||
RUNTIME_CONTEXT := $(SOURCE_DIR)
|
||||
|
||||
# Detect number of CPU cores for parallel compilation
|
||||
NPROC := $(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
|
||||
|
||||
.PHONY: all build clean clean-all shell test build-builder clean-builder ensure-builder start-builder stop-builder copy-source run-cmake run-build run-go-build copy-artifacts build-runtime run-runtime stop-runtime clean-runtime
|
||||
.PHONY: all build build-builder build-runtime ensure-builder clean help
|
||||
|
||||
# Default target
|
||||
all: build
|
||||
|
||||
# ===== Builder Image Targets =====
|
||||
# Build both builder and runtime images
|
||||
build: build-builder build-runtime
|
||||
@echo ""
|
||||
@echo "✓ All images built successfully!"
|
||||
@echo " Builder: $(BUILDER_IMAGE):$(BUILDER_TAG)"
|
||||
@echo " Runtime: $(RUNTIME_IMAGE):$(RUNTIME_TAG)"
|
||||
@echo ""
|
||||
@echo "To start the Ollama server:"
|
||||
@echo " docker-compose up -d"
|
||||
@echo ""
|
||||
@echo "View logs:"
|
||||
@echo " docker-compose logs -f"
|
||||
@echo ""
|
||||
@echo "Stop the server:"
|
||||
@echo " docker-compose down"
|
||||
|
||||
# Build the builder Docker image from builder/Dockerfile
|
||||
# Build the builder base image
|
||||
build-builder:
|
||||
@echo "→ Building builder Docker image..."
|
||||
@echo " Building Docker image $(BUILDER_IMAGE):$(BUILDER_TAG)..."
|
||||
@cd $(SOURCE_DIR)/docker/builder && docker build \
|
||||
@echo "→ Building builder image..."
|
||||
@echo " Image: $(BUILDER_IMAGE):$(BUILDER_TAG)"
|
||||
@echo " Dockerfile: $(BUILDER_DOCKERFILE)"
|
||||
@echo ""
|
||||
@docker build \
|
||||
-f $(BUILDER_DOCKERFILE) \
|
||||
-t $(BUILDER_IMAGE):$(BUILDER_TAG) \
|
||||
.
|
||||
$(BUILDER_CONTEXT)
|
||||
@echo ""
|
||||
@echo "✓ Builder image built successfully!"
|
||||
@echo " Image: $(BUILDER_IMAGE):$(BUILDER_TAG)"
|
||||
@echo ""
|
||||
@echo "To use this custom builder:"
|
||||
@echo " make build BUILDER_IMAGE=$(BUILDER_IMAGE):$(BUILDER_TAG)"
|
||||
|
||||
# Clean builder image
|
||||
clean-builder:
|
||||
@echo "→ Cleaning builder image..."
|
||||
@docker rmi $(BUILDER_IMAGE):$(BUILDER_TAG) 2>/dev/null || echo " No builder image to remove"
|
||||
@echo " Builder image cleaned"
|
||||
|
||||
# ===== Build Targets =====
|
||||
|
||||
# Main build target - orchestrates the entire build process
|
||||
build: ensure-builder start-builder copy-source run-cmake run-build run-go-build copy-artifacts
|
||||
# Build the runtime image (requires builder image)
|
||||
build-runtime: ensure-builder
|
||||
@echo "→ Building runtime image..."
|
||||
@echo " Image: $(RUNTIME_IMAGE):$(RUNTIME_TAG)"
|
||||
@echo " Dockerfile: $(RUNTIME_DOCKERFILE)"
|
||||
@echo ""
|
||||
@echo "✓ Build completed successfully!"
|
||||
@echo " Binary: $(OUTPUT_DIR)/ollama"
|
||||
@echo " Libraries: $(OUTPUT_DIR)/lib/"
|
||||
@echo " This will:"
|
||||
@echo " - Clone ollama37 source from GitHub"
|
||||
@echo " - Configure with CMake (CUDA 11 preset)"
|
||||
@echo " - Compile C/C++/CUDA libraries"
|
||||
@echo " - Build Go binary"
|
||||
@echo " - Package runtime environment"
|
||||
@echo ""
|
||||
@echo "To test the binary:"
|
||||
@echo " cd $(OUTPUT_DIR) && ./ollama --version"
|
||||
@docker build \
|
||||
-f $(RUNTIME_DOCKERFILE) \
|
||||
-t $(RUNTIME_IMAGE):$(RUNTIME_TAG) \
|
||||
$(RUNTIME_CONTEXT)
|
||||
@echo ""
|
||||
@echo "✓ Runtime image built successfully!"
|
||||
@echo ""
|
||||
@echo "To start the Ollama server:"
|
||||
@echo " docker-compose up -d"
|
||||
|
||||
# Ensure builder image exists (build if not present)
|
||||
ensure-builder:
|
||||
@if ! docker images --format '{{.Repository}}:{{.Tag}}' | grep -q "^$(BUILDER_IMAGE):$(BUILDER_TAG)$$"; then \
|
||||
echo "→ Builder image not found. Building $(BUILDER_IMAGE):$(BUILDER_TAG)..."; \
|
||||
echo ""; \
|
||||
$(MAKE) build-builder; \
|
||||
else \
|
||||
echo "→ Builder image $(BUILDER_IMAGE):$(BUILDER_TAG) already exists"; \
|
||||
echo ""; \
|
||||
fi
|
||||
|
||||
# Start the builder container with GPU access
|
||||
start-builder:
|
||||
@echo "→ Starting builder container with GPU access..."
|
||||
@if docker ps --format '{{.Names}}' | grep -q "^$(CONTAINER_NAME)$$"; then \
|
||||
echo " Container $(CONTAINER_NAME) is already running"; \
|
||||
else \
|
||||
echo " Creating new builder container..."; \
|
||||
docker run --rm -d \
|
||||
--name $(CONTAINER_NAME) \
|
||||
--runtime=nvidia \
|
||||
--gpus all \
|
||||
$(BUILDER_IMAGE):$(BUILDER_TAG) \
|
||||
sleep infinity; \
|
||||
sleep 2; \
|
||||
docker exec $(CONTAINER_NAME) nvidia-smi --query-gpu=name,driver_version,memory.total --format=csv,noheader; \
|
||||
fi
|
||||
|
||||
# Stop and remove the builder container
|
||||
stop-builder:
|
||||
@echo "→ Stopping builder container..."
|
||||
@if docker ps --format '{{.Names}}' | grep -q "^$(CONTAINER_NAME)$$"; then \
|
||||
docker stop $(CONTAINER_NAME); \
|
||||
echo " Container stopped and removed (--rm flag)"; \
|
||||
else \
|
||||
echo " Container not running"; \
|
||||
fi
|
||||
|
||||
# Copy source code to the container
|
||||
copy-source: start-builder
|
||||
@echo "→ Copying source code to container..."
|
||||
@docker cp $(SOURCE_DIR)/. $(CONTAINER_NAME):/usr/local/src/ollama37/
|
||||
@echo "→ Cleaning any host build artifacts from container..."
|
||||
@docker exec $(CONTAINER_NAME) rm -rf /usr/local/src/ollama37/build /usr/local/src/ollama37/ollama /usr/local/src/ollama37/dist
|
||||
@echo " Source code copied (clean build environment)"
|
||||
|
||||
# Run CMake configuration
|
||||
run-cmake: copy-source
|
||||
@echo "→ Running CMake configuration (preset: $(CMAKE_PRESET))..."
|
||||
@docker exec -w /usr/local/src/ollama37 $(CONTAINER_NAME) \
|
||||
bash -l -c 'LD_LIBRARY_PATH=/usr/local/lib:/usr/local/lib64:/usr/lib64:$$LD_LIBRARY_PATH CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake --preset "$(CMAKE_PRESET)"'
|
||||
|
||||
# Run CMake build (C/C++/CUDA compilation)
|
||||
run-build: run-cmake
|
||||
@echo "→ Building C/C++/CUDA libraries (using $(NPROC) cores)..."
|
||||
@docker exec -w /usr/local/src/ollama37 $(CONTAINER_NAME) \
|
||||
bash -l -c 'LD_LIBRARY_PATH=/usr/local/lib:/usr/local/lib64:/usr/lib64:$$LD_LIBRARY_PATH CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake --build build -j$(NPROC)'
|
||||
|
||||
# Run Go build
|
||||
run-go-build: run-build
|
||||
@echo "→ Building Go binary..."
|
||||
@docker exec -w /usr/local/src/ollama37 $(CONTAINER_NAME) \
|
||||
bash -l -c 'go build -buildvcs=false -o ollama .'
|
||||
|
||||
# Copy build artifacts from container to host
|
||||
copy-artifacts: run-go-build
|
||||
@echo "→ Copying build artifacts to host..."
|
||||
@mkdir -p $(OUTPUT_DIR)/lib
|
||||
@docker cp $(CONTAINER_NAME):/usr/local/src/ollama37/ollama $(OUTPUT_DIR)/
|
||||
@docker cp $(CONTAINER_NAME):/usr/local/src/ollama37/build/lib/ollama/. $(OUTPUT_DIR)/lib/
|
||||
@echo "→ Copying GCC 10 runtime libraries..."
|
||||
@docker cp $(CONTAINER_NAME):/usr/local/lib64/libstdc++.so.6 $(OUTPUT_DIR)/lib/
|
||||
@docker cp $(CONTAINER_NAME):/usr/local/lib64/libstdc++.so.6.0.28 $(OUTPUT_DIR)/lib/
|
||||
@docker cp $(CONTAINER_NAME):/usr/local/lib64/libgcc_s.so.1 $(OUTPUT_DIR)/lib/
|
||||
@echo " Artifacts copied to $(OUTPUT_DIR)"
|
||||
@echo ""
|
||||
@echo " Binary: $(OUTPUT_DIR)/ollama"
|
||||
@ls -lh $(OUTPUT_DIR)/ollama
|
||||
@echo ""
|
||||
@echo " Libraries:"
|
||||
@ls -lh $(OUTPUT_DIR)/lib/
|
||||
|
||||
# Open an interactive shell in the builder container
|
||||
shell: start-builder
|
||||
@echo "→ Opening shell in builder container..."
|
||||
@docker exec -it -w /usr/local/src/ollama37 $(CONTAINER_NAME) bash -l
|
||||
|
||||
# Test the built binary
|
||||
test: build
|
||||
@echo "→ Testing ollama binary..."
|
||||
@cd $(OUTPUT_DIR) && LD_LIBRARY_PATH=$$PWD/lib:$$LD_LIBRARY_PATH ./ollama --version
|
||||
|
||||
# Clean build artifacts from host
|
||||
# Remove all Docker images
|
||||
clean:
|
||||
@echo "→ Cleaning build artifacts from host..."
|
||||
@rm -rf $(OUTPUT_DIR)
|
||||
@echo " Cleaned $(OUTPUT_DIR)"
|
||||
|
||||
# Clean everything including container
|
||||
clean-all: clean stop-builder
|
||||
@echo "→ Cleaning build directory in source..."
|
||||
@rm -rf $(BUILD_DIR)
|
||||
@rm -rf $(DIST_DIR)
|
||||
@echo " All cleaned"
|
||||
|
||||
# ===== Runtime Image Targets =====
|
||||
|
||||
# Build the runtime Docker image from artifacts
|
||||
build-runtime:
|
||||
@echo "→ Building runtime Docker image..."
|
||||
@if [ ! -f "$(OUTPUT_DIR)/ollama" ]; then \
|
||||
echo "Error: ollama binary not found in $(OUTPUT_DIR)"; \
|
||||
echo "Run 'make build' first to create the artifacts"; \
|
||||
exit 1; \
|
||||
fi
|
||||
@if [ ! -d "$(OUTPUT_DIR)/lib" ]; then \
|
||||
echo "Error: lib directory not found in $(OUTPUT_DIR)"; \
|
||||
echo "Run 'make build' first to create the artifacts"; \
|
||||
exit 1; \
|
||||
fi
|
||||
@echo " Building Docker image $(RUNTIME_IMAGE):$(RUNTIME_TAG)..."
|
||||
@docker build \
|
||||
-f $(RUNTIME_DOCKERFILE) \
|
||||
-t $(RUNTIME_IMAGE):$(RUNTIME_TAG) \
|
||||
$(SOURCE_DIR)
|
||||
@echo ""
|
||||
@echo "✓ Runtime image built successfully!"
|
||||
@echo " Image: $(RUNTIME_IMAGE):$(RUNTIME_TAG)"
|
||||
@echo ""
|
||||
@echo "To run the image:"
|
||||
@echo " make run-runtime"
|
||||
@echo ""
|
||||
@echo "Or manually:"
|
||||
@echo " docker run --rm -it --runtime=nvidia --gpus all -p 11434:11434 $(RUNTIME_IMAGE):$(RUNTIME_TAG)"
|
||||
@echo ""
|
||||
@echo "To stop the builder container:"
|
||||
@echo " make stop-builder"
|
||||
|
||||
# Run the runtime container
|
||||
run-runtime:
|
||||
@echo "→ Starting runtime container..."
|
||||
@if docker ps -a --format '{{.Names}}' | grep -q "^ollama37-runtime$$"; then \
|
||||
echo " Stopping existing container..."; \
|
||||
docker stop ollama37-runtime 2>/dev/null || true; \
|
||||
docker rm ollama37-runtime 2>/dev/null || true; \
|
||||
fi
|
||||
@echo " Starting new container..."
|
||||
@docker run -d \
|
||||
--name ollama37-runtime \
|
||||
--runtime=nvidia \
|
||||
--gpus all \
|
||||
-p 11434:11434 \
|
||||
-v ollama-data:/root/.ollama \
|
||||
$(RUNTIME_IMAGE):$(RUNTIME_TAG)
|
||||
@sleep 2
|
||||
@echo ""
|
||||
@echo "✓ Runtime container started!"
|
||||
@echo " Container: ollama37-runtime"
|
||||
@echo " API: http://localhost:11434"
|
||||
@echo ""
|
||||
@echo "Check logs:"
|
||||
@echo " docker logs -f ollama37-runtime"
|
||||
@echo ""
|
||||
@echo "Test the API:"
|
||||
@echo " curl http://localhost:11434/api/tags"
|
||||
@echo ""
|
||||
@echo "Stop the container:"
|
||||
@echo " make stop-runtime"
|
||||
|
||||
# Stop the runtime container
|
||||
stop-runtime:
|
||||
@echo "→ Stopping runtime container..."
|
||||
@if docker ps --format '{{.Names}}' | grep -q "^ollama37-runtime$$"; then \
|
||||
docker stop ollama37-runtime; \
|
||||
docker rm ollama37-runtime; \
|
||||
echo " Container stopped and removed"; \
|
||||
else \
|
||||
echo " Container not running"; \
|
||||
fi
|
||||
|
||||
# Clean runtime image
|
||||
clean-runtime:
|
||||
@echo "→ Cleaning runtime image..."
|
||||
@echo "→ Removing Docker images..."
|
||||
@docker rmi $(RUNTIME_IMAGE):$(RUNTIME_TAG) 2>/dev/null || echo " No runtime image to remove"
|
||||
@docker volume rm ollama-data 2>/dev/null || echo " No volume to remove"
|
||||
@echo " Runtime image cleaned"
|
||||
|
||||
# Help target
|
||||
help:
|
||||
@echo "Ollama Build System (with GPU-enabled builder)"
|
||||
@docker rmi $(BUILDER_IMAGE):$(BUILDER_TAG) 2>/dev/null || echo " No builder image to remove"
|
||||
@echo ""
|
||||
@echo "Builder Image Targets:"
|
||||
@echo " make build-builder - Build custom builder Docker image"
|
||||
@echo " make clean-builder - Remove builder image"
|
||||
@echo "✓ Images removed"
|
||||
@echo ""
|
||||
@echo "Note: To remove containers and volumes, use:"
|
||||
@echo " docker-compose down -v"
|
||||
|
||||
# Show help message
|
||||
help:
|
||||
@echo "Ollama37 Docker Build System"
|
||||
@echo ""
|
||||
@echo "Build Targets:"
|
||||
@echo " make build - Build ollama binary and libraries (default)"
|
||||
@echo " make clean - Remove build artifacts from host"
|
||||
@echo " make clean-all - Remove all build artifacts and stop container"
|
||||
@echo " make shell - Open a shell in the builder container"
|
||||
@echo " make test - Test the built binary"
|
||||
@echo ""
|
||||
@echo "Runtime Image Targets:"
|
||||
@echo " make build-runtime - Build Docker runtime image from artifacts"
|
||||
@echo " make run-runtime - Start the runtime container"
|
||||
@echo " make stop-runtime - Stop the runtime container"
|
||||
@echo " make clean-runtime - Remove runtime image and volumes"
|
||||
@echo ""
|
||||
@echo " make build - Build builder and runtime images (default)"
|
||||
@echo " make build-builder - Build only the builder base image"
|
||||
@echo " make build-runtime - Build only the runtime image"
|
||||
@echo " make clean - Remove all Docker images"
|
||||
@echo " make help - Show this help message"
|
||||
@echo ""
|
||||
@echo "Configuration:"
|
||||
@echo " BUILDER_IMAGE: $(BUILDER_IMAGE):$(BUILDER_TAG)"
|
||||
@echo " RUNTIME_IMAGE: $(RUNTIME_IMAGE):$(RUNTIME_TAG)"
|
||||
@echo " CONTAINER_NAME: $(CONTAINER_NAME)"
|
||||
@echo " CMAKE_PRESET: $(CMAKE_PRESET)"
|
||||
@echo " PARALLEL_JOBS: $(NPROC)"
|
||||
@echo ""
|
||||
@echo "Environment:"
|
||||
@echo " SOURCE_DIR: $(SOURCE_DIR)"
|
||||
@echo " OUTPUT_DIR: $(OUTPUT_DIR)"
|
||||
@echo "Dockerfiles:"
|
||||
@echo " Builder: $(BUILDER_DOCKERFILE)"
|
||||
@echo " Runtime: $(RUNTIME_DOCKERFILE)"
|
||||
@echo ""
|
||||
@echo "Build Architecture:"
|
||||
@echo " 1. Builder image: Base environment (CUDA 11.4, GCC 10, CMake 4, Go 1.25.3)"
|
||||
@echo " 2. Runtime image: Two-stage build (compile + package)"
|
||||
@echo " - Stage 1: Clone source, compile C/C++/CUDA/Go"
|
||||
@echo " - Stage 2: Package runtime with compiled binaries"
|
||||
@echo ""
|
||||
@echo "Container Management (use docker-compose):"
|
||||
@echo " docker-compose up -d - Start Ollama server"
|
||||
@echo " docker-compose logs -f - View logs"
|
||||
@echo " docker-compose down - Stop server"
|
||||
@echo " docker-compose down -v - Stop and remove volumes"
|
||||
|
||||
396
docker/README.md
396
docker/README.md
@@ -1,21 +1,28 @@
|
||||
# Ollama37 Docker Build System
|
||||
|
||||
**Single-stage Docker build for Ollama with CUDA 11.4 and Compute Capability 3.7 support (Tesla K80)**
|
||||
**Two-stage Docker build for Ollama with CUDA 11.4 and Compute Capability 3.7 support (Tesla K80)**
|
||||
|
||||
## Overview
|
||||
|
||||
This Docker build system creates a single all-in-one image that includes:
|
||||
- CUDA 11.4 toolkit (required for Tesla K80, compute capability 3.7)
|
||||
- GCC 10 (built from source, required by CUDA 11.4)
|
||||
- CMake 4.0 (built from source)
|
||||
- Go 1.25.3
|
||||
- Ollama37 binary with K80 GPU support
|
||||
This Docker build system uses a two-stage architecture to build and run Ollama with Tesla K80 (compute capability 3.7) support:
|
||||
|
||||
The image is built entirely from source by cloning from https://github.com/dogkeeper886/ollama37
|
||||
1. **Builder Image** (`builder/Dockerfile`) - Base environment with build tools
|
||||
- Rocky Linux 8
|
||||
- CUDA 11.4 toolkit (required for Tesla K80)
|
||||
- GCC 10 (built from source, required by CUDA 11.4)
|
||||
- CMake 4.0 (built from source)
|
||||
- Go 1.25.3
|
||||
|
||||
2. **Runtime Image** (`runtime/Dockerfile`) - Two-stage build process
|
||||
- **Stage 1 (compile)**: Clone source → Configure CMake → Build C/C++/CUDA → Build Go binary
|
||||
- **Stage 2 (runtime)**: Copy artifacts → Setup runtime environment
|
||||
|
||||
The runtime uses the builder image as its base to ensure library path compatibility between build and runtime environments.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Docker with NVIDIA Container Runtime
|
||||
- Docker Compose
|
||||
- NVIDIA GPU drivers (470+ for Tesla K80)
|
||||
- Verify GPU access:
|
||||
```bash
|
||||
@@ -24,16 +31,20 @@ The image is built entirely from source by cloning from https://github.com/dogke
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Build the Image
|
||||
### 1. Build Images
|
||||
|
||||
```bash
|
||||
cd /home/jack/Documents/ollama37/docker
|
||||
docker build -t ollama37:latest -f Dockerfile ..
|
||||
make build
|
||||
```
|
||||
|
||||
**Build time:** ~90 minutes (first time, includes building GCC 10 and CMake 4 from source)
|
||||
This will:
|
||||
1. Build the builder image (if not present) - **~90 minutes first time**
|
||||
2. Build the runtime image - **~10 minutes**
|
||||
|
||||
**Image size:** ~20GB (includes full build toolchain + CUDA toolkit + Ollama)
|
||||
**First-time build:** ~100 minutes total (includes building GCC 10 and CMake 4 from source)
|
||||
|
||||
**Subsequent builds:** ~10 minutes (builder image is cached)
|
||||
|
||||
### 2. Run with Docker Compose (Recommended)
|
||||
|
||||
@@ -46,6 +57,11 @@ Check logs:
|
||||
docker-compose logs -f
|
||||
```
|
||||
|
||||
Stop the server:
|
||||
```bash
|
||||
docker-compose down
|
||||
```
|
||||
|
||||
### 3. Run Manually
|
||||
|
||||
```bash
|
||||
@@ -92,46 +108,147 @@ docker exec ollama37 ollama run gemma3:4b "Hello!"
|
||||
|
||||
## Architecture
|
||||
|
||||
### Single-Stage Build Process
|
||||
### Build System Components
|
||||
|
||||
The Dockerfile performs these steps in order:
|
||||
```
|
||||
docker/
|
||||
├── builder/
|
||||
│ └── Dockerfile # Base image: CUDA 11.4, GCC 10, CMake 4, Go 1.25.3
|
||||
├── runtime/
|
||||
│ └── Dockerfile # Two-stage: compile ollama37, package runtime
|
||||
├── Makefile # Build orchestration (images only)
|
||||
├── docker-compose.yml # Runtime orchestration
|
||||
└── README.md # This file
|
||||
```
|
||||
|
||||
1. **Base Setup** (10 min)
|
||||
- Rocky Linux 8
|
||||
- CUDA 11.4 toolkit installation
|
||||
- Development tools
|
||||
### Two-Stage Build Process
|
||||
|
||||
2. **Build Toolchain** (70 min)
|
||||
- GCC 10 from source (~60 min)
|
||||
- CMake 4 from source (~8 min)
|
||||
- Go 1.25.3 binary (~1 min)
|
||||
#### Stage 1: Builder Image (`builder/Dockerfile`)
|
||||
**Purpose**: Provide consistent build environment
|
||||
|
||||
3. **Ollama Compilation** (10 min)
|
||||
- Git clone from dogkeeper886/ollama37
|
||||
- CMake configure with "CUDA 11" preset
|
||||
- Build C/C++/CUDA libraries
|
||||
- Build Go binary
|
||||
**Contents:**
|
||||
- Rocky Linux 8 base
|
||||
- CUDA 11.4 toolkit (compilation only, no driver)
|
||||
- GCC 10 from source (~60 min build time)
|
||||
- CMake 4.0 from source (~8 min build time)
|
||||
- Go 1.25.3 binary
|
||||
- All build dependencies
|
||||
|
||||
4. **Runtime Setup**
|
||||
- Configure library paths
|
||||
- Set environment variables
|
||||
- Configure entrypoint
|
||||
**Build time:** ~90 minutes (first time), cached thereafter
|
||||
|
||||
### Why Single-Stage?
|
||||
**Image size:** ~15GB
|
||||
|
||||
The previous two-stage design (builder → runtime) had issues:
|
||||
- Complex artifact copying between stages
|
||||
- Missing CUDA runtime libraries
|
||||
- LD_LIBRARY_PATH mismatches
|
||||
- User/permission problems
|
||||
#### Stage 2: Runtime Image (`runtime/Dockerfile`)
|
||||
|
||||
Single-stage ensures:
|
||||
- ✅ All libraries present and properly linked
|
||||
- ✅ Consistent environment from build to runtime
|
||||
- ✅ No artifact copying issues
|
||||
- ✅ Complete CUDA toolkit available at runtime
|
||||
**Stage 2.1 - Compile** (FROM ollama37-builder)
|
||||
1. Clone ollama37 source from GitHub
|
||||
2. Configure with CMake ("CUDA 11" preset for compute 3.7)
|
||||
3. Build C/C++/CUDA libraries
|
||||
4. Build Go binary
|
||||
|
||||
**Trade-off:** Larger image size (~20GB vs ~3GB) for guaranteed reliability
|
||||
**Stage 2.2 - Runtime** (FROM ollama37-builder)
|
||||
1. Copy entire source tree (includes compiled artifacts)
|
||||
2. Copy binary to /usr/local/bin/ollama
|
||||
3. Setup LD_LIBRARY_PATH for runtime libraries
|
||||
4. Configure server, expose ports, setup volumes
|
||||
|
||||
**Build time:** ~10 minutes
|
||||
|
||||
**Image size:** ~18GB (includes build environment + compiled Ollama)
|
||||
|
||||
### Why Both Stages Use Builder Base?
|
||||
|
||||
**Problem**: Compiled binaries have hardcoded library paths (via rpath/LD_LIBRARY_PATH)
|
||||
|
||||
**Solution**: Use identical base images for compile and runtime stages
|
||||
|
||||
**Benefits:**
|
||||
- ✅ Library paths match between build and runtime
|
||||
- ✅ All GCC 10 runtime libraries present
|
||||
- ✅ All CUDA libraries at expected paths
|
||||
- ✅ No complex artifact extraction/copying
|
||||
- ✅ Guaranteed compatibility
|
||||
|
||||
**Trade-off:** Larger runtime image (~18GB) vs complexity and reliability issues
|
||||
|
||||
### Alternative: Single-Stage Build
|
||||
|
||||
See `Dockerfile.single-stage.archived` for the original single-stage design that inspired this architecture.
|
||||
|
||||
## Build Commands
|
||||
|
||||
### Using the Makefile
|
||||
|
||||
```bash
|
||||
# Build both builder and runtime images
|
||||
make build
|
||||
|
||||
# Build only builder image
|
||||
make build-builder
|
||||
|
||||
# Build only runtime image (will auto-build builder if needed)
|
||||
make build-runtime
|
||||
|
||||
# Remove all images
|
||||
make clean
|
||||
|
||||
# Show help
|
||||
make help
|
||||
```
|
||||
|
||||
### Direct Docker Commands
|
||||
|
||||
```bash
|
||||
# Build builder image
|
||||
docker build -f builder/Dockerfile -t ollama37-builder:latest builder/
|
||||
|
||||
# Build runtime image
|
||||
docker build -f runtime/Dockerfile -t ollama37:latest .
|
||||
```
|
||||
|
||||
## Runtime Management
|
||||
|
||||
### Using Docker Compose (Recommended)
|
||||
|
||||
```bash
|
||||
# Start server
|
||||
docker-compose up -d
|
||||
|
||||
# View logs (live tail)
|
||||
docker-compose logs -f
|
||||
|
||||
# Stop server
|
||||
docker-compose down
|
||||
|
||||
# Stop and remove volumes
|
||||
docker-compose down -v
|
||||
|
||||
# Restart server
|
||||
docker-compose restart
|
||||
```
|
||||
|
||||
### Manual Docker Commands
|
||||
|
||||
```bash
|
||||
# Start container
|
||||
docker run -d \
|
||||
--name ollama37 \
|
||||
--runtime=nvidia \
|
||||
--gpus all \
|
||||
-p 11434:11434 \
|
||||
-v ollama-data:/root/.ollama \
|
||||
ollama37:latest
|
||||
|
||||
# View logs
|
||||
docker logs -f ollama37
|
||||
|
||||
# Stop container
|
||||
docker stop ollama37
|
||||
docker rm ollama37
|
||||
|
||||
# Shell access
|
||||
docker exec -it ollama37 bash
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
@@ -143,90 +260,239 @@ Single-stage ensures:
|
||||
| `LD_LIBRARY_PATH` | `/usr/local/src/ollama37/build/lib/ollama:/usr/local/lib64:/usr/local/cuda-11.4/lib64:/usr/lib64` | Library search path |
|
||||
| `NVIDIA_VISIBLE_DEVICES` | `all` | Which GPUs to use |
|
||||
| `NVIDIA_DRIVER_CAPABILITIES` | `compute,utility` | GPU capabilities |
|
||||
| `OLLAMA_DEBUG` | (unset) | Enable verbose Ollama logging |
|
||||
| `GGML_CUDA_DEBUG` | (unset) | Enable CUDA/CUBLAS debug logging |
|
||||
|
||||
### Volume Mounts
|
||||
|
||||
- `/root/.ollama` - Model storage (use Docker volume `ollama-data`)
|
||||
|
||||
### Customizing docker-compose.yml
|
||||
|
||||
```yaml
|
||||
# Change port
|
||||
ports:
|
||||
- "11435:11434" # Host:Container
|
||||
|
||||
# Use specific GPU
|
||||
environment:
|
||||
- NVIDIA_VISIBLE_DEVICES=0 # Use GPU 0 only
|
||||
|
||||
# Enable debug logging
|
||||
environment:
|
||||
- OLLAMA_DEBUG=1
|
||||
- GGML_CUDA_DEBUG=1
|
||||
```
|
||||
|
||||
## GPU Support
|
||||
|
||||
### Supported Compute Capabilities
|
||||
- **3.7** - Tesla K80 (primary target)
|
||||
- **5.0-8.6** - Pascal, Volta, Turing, Ampere
|
||||
- **5.0-5.2** - Maxwell (GTX 900 series)
|
||||
- **6.0-6.1** - Pascal (GTX 10 series)
|
||||
- **7.0-7.5** - Volta, Turing (RTX 20 series)
|
||||
- **8.0-8.6** - Ampere (RTX 30 series)
|
||||
|
||||
### Tesla K80 Recommendations
|
||||
|
||||
**VRAM:** 12GB per GPU (24GB for dual-GPU K80)
|
||||
|
||||
**Model sizes:**
|
||||
- Small (1-4B): Full precision
|
||||
- Small (1-4B): Full precision or Q8 quantization
|
||||
- Medium (7-8B): Q4_K_M quantization
|
||||
- Large (13B+): Q4_0 quantization or multi-GPU
|
||||
|
||||
**Tested models:**
|
||||
- ✅ gemma3:4b
|
||||
- ✅ gpt-oss
|
||||
- ✅ deepseek-r1
|
||||
|
||||
**Multi-GPU:**
|
||||
```bash
|
||||
docker run --gpus all ... # Use all GPUs
|
||||
docker run --gpus '"device=0"' ... # Use specific GPU
|
||||
# Use all GPUs
|
||||
docker run --gpus all ...
|
||||
|
||||
# Use specific GPU
|
||||
docker run --gpus '"device=0"' ...
|
||||
|
||||
# Use multiple specific GPUs
|
||||
docker run --gpus '"device=0,1"' ...
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### GPU not detected
|
||||
|
||||
```bash
|
||||
# Check GPU visibility in container
|
||||
docker exec ollama37 nvidia-smi
|
||||
|
||||
# Check CUDA libraries
|
||||
docker exec ollama37 ldconfig -p | grep cuda
|
||||
|
||||
# Check NVIDIA runtime
|
||||
docker info | grep -i runtime
|
||||
```
|
||||
|
||||
### Model fails to load
|
||||
|
||||
```bash
|
||||
# Check logs with CUDA debug
|
||||
docker run --rm --runtime=nvidia --gpus all \
|
||||
-e OLLAMA_DEBUG=1 \
|
||||
-e GGML_CUDA_DEBUG=1 \
|
||||
ollama37:latest serve
|
||||
-p 11434:11434 \
|
||||
ollama37:latest
|
||||
|
||||
# Check library paths
|
||||
docker exec ollama37 bash -c 'echo $LD_LIBRARY_PATH'
|
||||
|
||||
# Verify CUBLAS functions
|
||||
docker exec ollama37 bash -c 'ldd /usr/local/bin/ollama | grep cublas'
|
||||
```
|
||||
|
||||
### Out of memory during build
|
||||
### Build fails with "out of memory"
|
||||
|
||||
```bash
|
||||
# Reduce parallel jobs in Dockerfile
|
||||
# Edit line: cmake --build build -j$(nproc)
|
||||
# Change to: cmake --build build -j2
|
||||
# Edit runtime/Dockerfile line for cmake build
|
||||
# Change: cmake --build build -j$(nproc)
|
||||
# To: cmake --build build -j2
|
||||
|
||||
# Or set Docker memory limit
|
||||
docker build --memory=8g ...
|
||||
```
|
||||
|
||||
### Port already in use
|
||||
|
||||
```bash
|
||||
# Edit docker-compose.yml
|
||||
# Find process using port 11434
|
||||
sudo lsof -i :11434
|
||||
|
||||
# Kill the process or change port in docker-compose.yml
|
||||
ports:
|
||||
- "11435:11434" # Change host port
|
||||
- "11435:11434"
|
||||
```
|
||||
|
||||
### Build cache issues
|
||||
|
||||
```bash
|
||||
# Rebuild runtime image without cache
|
||||
docker build --no-cache -f runtime/Dockerfile -t ollama37:latest .
|
||||
|
||||
# Rebuild builder image without cache
|
||||
docker build --no-cache -f builder/Dockerfile -t ollama37-builder:latest builder/
|
||||
|
||||
# Remove all images and rebuild
|
||||
make clean
|
||||
make build
|
||||
```
|
||||
|
||||
## Rebuilding
|
||||
|
||||
### Rebuild from scratch
|
||||
```bash
|
||||
docker-compose down
|
||||
docker rmi ollama37:latest
|
||||
docker build --no-cache -t ollama37:latest -f Dockerfile ..
|
||||
docker-compose up -d
|
||||
```
|
||||
### Rebuild with latest code
|
||||
|
||||
### Rebuild with updated code
|
||||
```bash
|
||||
# The git clone will pull latest from GitHub
|
||||
docker build -t ollama37:latest -f Dockerfile ..
|
||||
# Runtime Dockerfile clones from GitHub, so rebuild to get latest
|
||||
make build-runtime
|
||||
|
||||
# Restart container
|
||||
docker-compose restart
|
||||
```
|
||||
|
||||
### Rebuild everything from scratch
|
||||
|
||||
```bash
|
||||
# Stop and remove containers
|
||||
docker-compose down -v
|
||||
|
||||
# Remove images
|
||||
make clean
|
||||
|
||||
# Rebuild all
|
||||
make build
|
||||
|
||||
# Start fresh
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
### Rebuild only builder (rare)
|
||||
|
||||
```bash
|
||||
# Only needed if you change CUDA/GCC/CMake/Go versions
|
||||
make clean
|
||||
make build-builder
|
||||
make build-runtime
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
### Modifying the build
|
||||
|
||||
1. **Change build tools** - Edit `builder/Dockerfile`
|
||||
2. **Change Ollama build process** - Edit `runtime/Dockerfile`
|
||||
3. **Change build orchestration** - Edit `Makefile`
|
||||
4. **Change runtime config** - Edit `docker-compose.yml`
|
||||
|
||||
### Testing changes
|
||||
|
||||
```bash
|
||||
# Build with your changes
|
||||
make build
|
||||
|
||||
# Run and test
|
||||
docker-compose up -d
|
||||
docker-compose logs -f
|
||||
|
||||
# If issues, check inside container
|
||||
docker exec -it ollama37 bash
|
||||
```
|
||||
|
||||
### Shell access for debugging
|
||||
|
||||
```bash
|
||||
# Enter running container
|
||||
docker exec -it ollama37 bash
|
||||
|
||||
# Check GPU
|
||||
nvidia-smi
|
||||
|
||||
# Check libraries
|
||||
ldd /usr/local/bin/ollama
|
||||
ldconfig -p | grep -E "cuda|cublas"
|
||||
|
||||
# Test binary
|
||||
/usr/local/bin/ollama --version
|
||||
```
|
||||
|
||||
## Image Sizes
|
||||
|
||||
| Image | Size | Contents |
|
||||
|-------|------|----------|
|
||||
| `ollama37-builder:latest` | ~15GB | CUDA, GCC, CMake, Go, build deps |
|
||||
| `ollama37:latest` | ~18GB | Builder + Ollama binary + libraries |
|
||||
|
||||
**Note**: Large size ensures all runtime dependencies are present and properly linked.
|
||||
|
||||
## Build Times
|
||||
|
||||
| Task | First Build | Cached Build |
|
||||
|------|-------------|--------------|
|
||||
| Builder image | ~90 min | <1 min |
|
||||
| Runtime image | ~10 min | ~10 min |
|
||||
| **Total** | **~100 min** | **~10 min** |
|
||||
|
||||
**Breakdown (first build):**
|
||||
- GCC 10: ~60 min
|
||||
- CMake 4: ~8 min
|
||||
- CUDA toolkit: ~10 min
|
||||
- Go install: ~1 min
|
||||
- Ollama build: ~10 min
|
||||
|
||||
## Documentation
|
||||
|
||||
- **[../CLAUDE.md](../CLAUDE.md)** - Project goals and implementation notes
|
||||
- **[../CLAUDE.md](../CLAUDE.md)** - Project goals, implementation details, and technical notes
|
||||
- **[Upstream Ollama](https://github.com/ollama/ollama)** - Original Ollama project
|
||||
- **[dogkeeper886/ollama37](https://github.com/dogkeeper886/ollama37)** - This fork with K80 support
|
||||
|
||||
## License
|
||||
|
||||
|
||||
@@ -1,3 +1,7 @@
|
||||
# Ollama37 Builder Image
|
||||
# This image provides the complete build environment for compiling Ollama with Tesla K80 (compute 3.7) support
|
||||
# Includes: CUDA 11.4, GCC 10, CMake 4, Go 1.25.3
|
||||
|
||||
FROM rockylinux/rockylinux:8
|
||||
|
||||
# Install CUDA toolkit 11.4
|
||||
@@ -9,13 +13,14 @@ RUN dnf -y install dnf-plugins-core\
|
||||
&& dnf -y config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo\
|
||||
&& dnf -y install cuda-toolkit-11-4
|
||||
|
||||
# Post install, setup path
|
||||
COPY cuda-11.4.sh /etc/profile.d/cuda-11.4.sh
|
||||
# Setup CUDA path
|
||||
RUN echo 'export PATH="${PATH}:/usr/local/cuda-11.4/bin"' > /etc/profile.d/cuda-11.4.sh
|
||||
ENV PATH="$PATH:/usr/local/cuda-11.4/bin"
|
||||
#ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/lib64:/usr/local/lib64"
|
||||
|
||||
# Install gcc 10
|
||||
RUN dnf -y install wget unzip bzip2\
|
||||
# Install GCC 10 from source
|
||||
# CUDA 11.4 requires GCC 10 maximum (enforced in host_config.h)
|
||||
# GCC 11+ is incompatible with CUDA 11.4
|
||||
RUN dnf -y install wget unzip bzip2 git\
|
||||
&& dnf -y groupinstall "Development Tools"\
|
||||
&& cd /usr/local/src\
|
||||
&& wget https://github.com/gcc-mirror/gcc/archive/refs/heads/releases/gcc-10.zip\
|
||||
@@ -28,17 +33,16 @@ RUN dnf -y install wget unzip bzip2\
|
||||
&& make -j $(nproc)\
|
||||
&& make install
|
||||
|
||||
# Post install, setup path
|
||||
#COPY gcc-10.sh /etc/profile.d/gcc-10.sh
|
||||
#ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/lib64:/usr/local/lib64"
|
||||
COPY gcc-10.conf /etc/ld.so.conf.d/gcc-10.conf
|
||||
RUN ldconfig\
|
||||
# Setup GCC 10 library path and update system compiler
|
||||
# Configure ldconfig to find GCC 10 runtime libraries
|
||||
# Replace default cc symlink to use our custom GCC 10
|
||||
RUN echo '/usr/local/lib64' > /etc/ld.so.conf.d/gcc-10.conf\
|
||||
&& ldconfig\
|
||||
&& rm -f /usr/bin/cc\
|
||||
&& ln -s /usr/local/bin/gcc /usr/bin/cc
|
||||
|
||||
# Install cmake
|
||||
#ENV LD_LIBRARY_PATH="/usr/local/nvidia/lib:/usr/local/nvidia/lib64"
|
||||
#ENV PATH="/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
|
||||
# Install CMake 4 from source
|
||||
# Required for modern CMake features and CUDA architecture configuration
|
||||
RUN dnf -y install openssl-devel\
|
||||
&& cd /usr/local/src\
|
||||
&& wget https://github.com/Kitware/CMake/releases/download/v4.0.0/cmake-4.0.0.tar.gz\
|
||||
@@ -49,11 +53,11 @@ RUN dnf -y install openssl-devel\
|
||||
&& make -j $(nproc)\
|
||||
&& make install
|
||||
|
||||
# Install go
|
||||
# Install go 1.25.3
|
||||
RUN cd /usr/local\
|
||||
&& wget https://go.dev/dl/go1.25.3.linux-amd64.tar.gz\
|
||||
&& tar xvf go1.25.3.linux-amd64.tar.gz
|
||||
|
||||
# Post install, setup path
|
||||
COPY go.sh /etc/profile.d/go.sh
|
||||
# Setup Go path
|
||||
RUN echo 'export PATH="${PATH}:/usr/local/go/bin"' > /etc/profile.d/go.sh
|
||||
ENV PATH="$PATH:/usr/local/go/bin"
|
||||
|
||||
@@ -1,58 +0,0 @@
|
||||
# Ollama37 Builder Image
|
||||
|
||||
This directory contains the Dockerfile for building the `ollama37-builder:latest` image.
|
||||
|
||||
## What's Inside
|
||||
|
||||
The builder image includes:
|
||||
- **Base**: `nvidia/cuda:11.4.3-devel-rockylinux8`
|
||||
- **GCC 10**: `gcc-toolset-10` (required by CUDA 11.4)
|
||||
- **CMake**: System package
|
||||
- **Go**: System package
|
||||
|
||||
## Building the Builder Image
|
||||
|
||||
The builder image is **automatically built** by the Makefile when you run `make build` for the first time.
|
||||
|
||||
To manually build the builder image:
|
||||
|
||||
```bash
|
||||
cd /home/jack/Documents/ollama37/docker
|
||||
make build-builder
|
||||
```
|
||||
|
||||
Or using Docker directly:
|
||||
|
||||
```bash
|
||||
cd /home/jack/Documents/ollama37/docker/builder
|
||||
docker build -t ollama37-builder:latest .
|
||||
```
|
||||
|
||||
## Using the Builder Image
|
||||
|
||||
The Makefile handles this automatically, but for reference:
|
||||
|
||||
```bash
|
||||
# Start builder container with GPU access
|
||||
docker run --rm -d \
|
||||
--name ollama37-builder \
|
||||
--runtime=nvidia \
|
||||
--gpus all \
|
||||
ollama37-builder:latest \
|
||||
sleep infinity
|
||||
|
||||
# Use the container
|
||||
docker exec -it ollama37-builder bash
|
||||
```
|
||||
|
||||
## Customization
|
||||
|
||||
If you need to modify the builder (e.g., change CUDA version, add packages):
|
||||
|
||||
1. Edit `Dockerfile` in this directory
|
||||
2. Rebuild: `make clean-builder build-builder`
|
||||
3. Build your project: `make build`
|
||||
|
||||
## Archived Builder
|
||||
|
||||
The `archived/` subdirectory contains an older Dockerfile that built GCC and CMake from source (~80 minutes). The current version uses Rocky Linux system packages for much faster builds (~5 minutes).
|
||||
@@ -1 +0,0 @@
|
||||
export PATH="${PATH}:/usr/local/cuda-11.4/bin"
|
||||
@@ -1 +0,0 @@
|
||||
/usr/local/lib64
|
||||
@@ -1 +0,0 @@
|
||||
export PATH="${PATH}:/usr/local/go/bin"
|
||||
@@ -1,46 +1,73 @@
|
||||
FROM rockylinux/rockylinux:8
|
||||
# Ollama37 Runtime Image
|
||||
# Two-stage build: compile stage builds the binary, runtime stage packages it
|
||||
# Both stages use ollama37-builder base to maintain identical library paths
|
||||
# This ensures the compiled binary can find all required runtime libraries
|
||||
|
||||
# Install only CUDA runtime libraries (not the full toolkit)
|
||||
# The host system provides the NVIDIA driver at runtime via --gpus flag
|
||||
RUN dnf -y install dnf-plugins-core\
|
||||
&& dnf -y config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo\
|
||||
&& dnf -y install cuda-cudart-11-4 libcublas-11-4 \
|
||||
&& dnf clean all
|
||||
# Stage 1: Compile ollama37 from source
|
||||
FROM ollama37-builder as builder
|
||||
|
||||
# Create directory structure
|
||||
RUN mkdir -p /usr/local/bin /usr/local/lib/ollama
|
||||
# Clone ollama37 source code from GitHub
|
||||
RUN cd /usr/local/src\
|
||||
&& git clone https://github.com/dogkeeper886/ollama37.git
|
||||
|
||||
# Copy the ollama binary from build output
|
||||
COPY docker/output/ollama /usr/local/bin/ollama
|
||||
# Set working directory for build
|
||||
WORKDIR /usr/local/src/ollama37
|
||||
|
||||
# Copy all shared libraries from build output (includes ollama libs + GCC 10 runtime libs)
|
||||
COPY docker/output/lib/ /usr/local/lib/ollama/
|
||||
# Configure build with CMake
|
||||
# Use "CUDA 11" preset for Tesla K80 compute capability 3.7 support
|
||||
# Set LD_LIBRARY_PATH to find GCC 10 and system libraries during build
|
||||
RUN bash -c 'LD_LIBRARY_PATH=/usr/local/lib:/usr/local/lib64:/usr/lib64:$LD_LIBRARY_PATH \
|
||||
CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ \
|
||||
cmake --preset "CUDA 11"'
|
||||
|
||||
# Set library path to include our ollama libraries first
|
||||
# This includes:
|
||||
# - Ollama CUDA/GGML libraries
|
||||
# - GCC 10 runtime libraries (libstdc++.so.6, libgcc_s.so.1)
|
||||
# - System CUDA libraries
|
||||
ENV LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/cuda-11.4/lib64:/usr/lib64
|
||||
# Build C/C++/CUDA libraries with CMake
|
||||
# Compile all GGML CUDA kernels and Ollama native libraries
|
||||
RUN bash -c 'LD_LIBRARY_PATH=/usr/local/lib:/usr/local/lib64:/usr/lib64:$LD_LIBRARY_PATH \
|
||||
CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ \
|
||||
cmake --build build -j$(nproc)'
|
||||
|
||||
# Base image already sets these, but we can override if needed:
|
||||
# NVIDIA_DRIVER_CAPABILITIES=compute,utility
|
||||
# NVIDIA_VISIBLE_DEVICES=all
|
||||
# Build Go binary
|
||||
# VCS info is embedded automatically since we cloned from git
|
||||
RUN go build -o /usr/local/bin/ollama .
|
||||
|
||||
# Ollama server configuration
|
||||
|
||||
# Stage 2: Runtime environment
|
||||
# Use ollama37-builder as base to maintain library path compatibility
|
||||
# The compiled binary has hardcoded library paths that match this environment
|
||||
FROM ollama37-builder as runtime
|
||||
|
||||
# Copy the entire source directory including compiled libraries
|
||||
# This preserves the exact directory structure the binary expects
|
||||
COPY --from=builder /usr/local/src/ollama37 /usr/local/src/ollama37
|
||||
|
||||
# Copy the ollama binary to system bin directory
|
||||
COPY --from=builder /usr/local/bin/ollama /usr/local/bin/ollama
|
||||
|
||||
# Setup library paths for runtime
|
||||
# The binary expects libraries in these exact paths:
|
||||
# /usr/local/src/ollama37/build/lib/ollama - Ollama CUDA/GGML libraries
|
||||
# /usr/local/lib64 - GCC 10 runtime libraries (libstdc++, libgcc_s)
|
||||
# /usr/local/cuda-11.4/lib64 - CUDA 11.4 runtime libraries
|
||||
# /usr/lib64 - System libraries
|
||||
ENV LD_LIBRARY_PATH=/usr/local/src/ollama37/build/lib/ollama:/usr/local/lib64:/usr/local/cuda-11.4/lib64:/usr/lib64
|
||||
|
||||
# Configure Ollama server to listen on all interfaces
|
||||
ENV OLLAMA_HOST=0.0.0.0:11434
|
||||
|
||||
# Expose the Ollama API port
|
||||
# Expose Ollama API port
|
||||
EXPOSE 11434
|
||||
|
||||
# Create a data directory for models
|
||||
# Create persistent volume for model storage
|
||||
# Models downloaded by Ollama will be stored here
|
||||
RUN mkdir -p /root/.ollama
|
||||
VOLUME ["/root/.ollama"]
|
||||
|
||||
# Health check
|
||||
# Configure health check to verify Ollama is running
|
||||
# Uses 'ollama list' command to check if the service is responsive
|
||||
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
|
||||
CMD /usr/local/bin/ollama list || exit 1
|
||||
|
||||
# Set entrypoint and default command
|
||||
# Container runs 'ollama serve' by default to start the API server
|
||||
ENTRYPOINT ["/usr/local/bin/ollama"]
|
||||
CMD ["serve"]
|
||||
|
||||
Reference in New Issue
Block a user