Redesign Docker build system to two-stage architecture with builder/runtime separation

Redesigned the Docker build system from a single-stage monolithic design to a clean two-stage architecture that separates build environment from compilation process while maintaining library path compatibility. ## Architecture Changes ### Builder Image (docker/builder/Dockerfile) - Provides base environment: CUDA 11.4, GCC 10, CMake 4, Go 1.25.3 - Built once, cached for subsequent builds (~90 min first time) - Removed config file copying (cuda-11.4.sh, gcc-10.conf, go.sh) - Added comprehensive comments explaining each build step - Added git installation for runtime stage source cloning ### Runtime Image (docker/runtime/Dockerfile) - Two-stage build using ollama37-builder as base for BOTH stages - Stage 1 (compile): Clone source from GitHub → CMake configure → Build C/C++/CUDA → Build Go - Stage 2 (runtime): Copy artifacts from stage 1 → Setup environment → Configure server - Both stages use identical base image to ensure library path compatibility - Removed -buildvcs=false flag (VCS info embedded from git clone) - Comprehensive comments documenting library paths and design rationale ### Makefile (docker/Makefile) - Simplified from 289 to 145 lines (-50% complexity) - Removed: run, stop, logs, shell, test targets (use docker-compose instead) - Removed: build orchestration targets (start-builder, copy-source, run-cmake, etc.) - Removed: artifact copying (handled internally by multi-stage build) - Focus: Build images only (build, build-builder, build-runtime, clean, help) - All runtime operations delegated to docker-compose.yml ### Documentation (docker/README.md) - Completely rewritten for new two-stage architecture - Added "Build System Components" section with file structure - Documented why both runtime stages use builder base (library path compatibility) - Updated build commands to use Makefile - Updated runtime commands to use docker-compose - Added comprehensive troubleshooting section - Added build time and image size tables - Reference to archived single-stage design ## Key Design Decision **Problem**: Compiled binaries have hardcoded library paths **Solution**: Use ollama37-builder as base for BOTH compile and runtime stages **Trade-off**: Larger image (~18GB) vs guaranteed library compatibility ## Benefits - ✅ Cleaner separation of concerns (builder env vs compilation vs runtime) - ✅ Builder image cached after first build (90 min → <1 min rebuilds) - ✅ Runtime rebuilds only take ~10 min (pulls latest code from GitHub) - ✅ No library path mismatches (identical base images) - ✅ No complex artifact extraction (multi-stage COPY) - ✅ Simpler Makefile focused on image building - ✅ Runtime management via docker-compose (industry standard) ## Files Changed Modified: - docker/builder/Dockerfile - Added comments, removed COPY config files - docker/runtime/Dockerfile - Converted to two-stage build - docker/Makefile - Simplified to focus on image building only - docker/README.md - Comprehensive rewrite for new architecture Deleted: - docker/builder/README.md - No longer needed - docker/builder/cuda-11.4.sh - Generated in Dockerfile - docker/builder/gcc-10.conf - Generated in Dockerfile - docker/builder/go.sh - Generated in Dockerfile Archived: - docker/Dockerfile → docker/Dockerfile.single-stage.archived 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-09 23:37:06 +00:00 · 2025-11-10 13:14:49 +08:00
parent 6dbd8ed44e
commit 4810471b33
9 changed files with 505 additions and 413 deletions
--- a/docker/Dockerfile.single-stage.archived
+++ b/docker/Dockerfile.single-stage.archived
--- a/docker/Makefile
+++ b/docker/Makefile
@@ -1,288 +1,144 @@
-# Makefile for building Ollama with GPU-enabled builder container
+# Makefile for Ollama37 Docker Build System
 #
-# This Makefile uses a pre-built builder container with CUDA support and GPU access
-# to compile Ollama with compute capability 3.7 support (Tesla K80).
+# This Makefile manages the two-stage Docker build process:
+#   1. Builder image: Base environment with CUDA 11.4, GCC 10, CMake 4, Go 1.25.3
+#   2. Runtime image: Two-stage build that compiles and packages Ollama
+#
+# The runtime Dockerfile handles:
+#   - Cloning source from GitHub
+#   - CMake configuration and C/C++/CUDA compilation
+#   - Go binary compilation
+#   - Packaging runtime environment
 #
 # Usage:
-#   make build          - Build ollama binary and libraries
-#   make clean          - Remove build artifacts from host
-#   make clean-all      - Remove build artifacts and stop/remove containers
-#   make shell          - Open a shell in the builder container
-#   make test           - Test the built binary
+#   make build          - Build builder and runtime images (default)
+#   make build-builder  - Build only the builder image
+#   make build-runtime  - Build only the runtime image
+#   make clean          - Remove all Docker images
+#   make help           - Show help message
+#
+# To run the container, use docker-compose:
+#   docker-compose up -d
+#   docker-compose logs -f
+#   docker-compose down

 # Configuration
 BUILDER_IMAGE := ollama37-builder
 BUILDER_TAG := latest
-BUILDER_DOCKERFILE := $(SOURCE_DIR)/docker/builder/Dockerfile
-CONTAINER_NAME := ollama37-builder
-RUNTIME_IMAGE := ollama37-runtime
+RUNTIME_IMAGE := ollama37
 RUNTIME_TAG := latest
 SOURCE_DIR := $(shell cd .. && pwd)
-BUILD_DIR := $(SOURCE_DIR)/build
-DIST_DIR := $(SOURCE_DIR)/dist
-OUTPUT_DIR := $(SOURCE_DIR)/docker/output
+BUILDER_DOCKERFILE := $(SOURCE_DIR)/docker/builder/Dockerfile
 RUNTIME_DOCKERFILE := $(SOURCE_DIR)/docker/runtime/Dockerfile

-# CMake preset to use
-CMAKE_PRESET := CUDA 11
+# Docker build context directories
+BUILDER_CONTEXT := $(SOURCE_DIR)/docker/builder
+RUNTIME_CONTEXT := $(SOURCE_DIR)

-# Detect number of CPU cores for parallel compilation
-NPROC := $(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
-
-.PHONY: all build clean clean-all shell test build-builder clean-builder ensure-builder start-builder stop-builder copy-source run-cmake run-build run-go-build copy-artifacts build-runtime run-runtime stop-runtime clean-runtime
+.PHONY: all build build-builder build-runtime ensure-builder clean help

 # Default target
 all: build

-# ===== Builder Image Targets =====
+# Build both builder and runtime images
+build: build-builder build-runtime
+	@echo ""
+	@echo "✓ All images built successfully!"
+	@echo "  Builder: $(BUILDER_IMAGE):$(BUILDER_TAG)"
+	@echo "  Runtime: $(RUNTIME_IMAGE):$(RUNTIME_TAG)"
+	@echo ""
+	@echo "To start the Ollama server:"
+	@echo "  docker-compose up -d"
+	@echo ""
+	@echo "View logs:"
+	@echo "  docker-compose logs -f"
+	@echo ""
+	@echo "Stop the server:"
+	@echo "  docker-compose down"

-# Build the builder Docker image from builder/Dockerfile
+# Build the builder base image
 build-builder:
-	@echo "→ Building builder Docker image..."
-	@echo "  Building Docker image $(BUILDER_IMAGE):$(BUILDER_TAG)..."
-	@cd $(SOURCE_DIR)/docker/builder && docker build \
+	@echo "→ Building builder image..."
+	@echo "  Image: $(BUILDER_IMAGE):$(BUILDER_TAG)"
+	@echo "  Dockerfile: $(BUILDER_DOCKERFILE)"
+	@echo ""
+	@docker build \
+		-f $(BUILDER_DOCKERFILE) \
 		-t $(BUILDER_IMAGE):$(BUILDER_TAG) \
-		.
+		$(BUILDER_CONTEXT)
 	@echo ""
 	@echo "✓ Builder image built successfully!"
-	@echo "  Image: $(BUILDER_IMAGE):$(BUILDER_TAG)"
-	@echo ""
-	@echo "To use this custom builder:"
-	@echo "  make build BUILDER_IMAGE=$(BUILDER_IMAGE):$(BUILDER_TAG)"

-# Clean builder image
-clean-builder:
-	@echo "→ Cleaning builder image..."
-	@docker rmi $(BUILDER_IMAGE):$(BUILDER_TAG) 2>/dev/null || echo "  No builder image to remove"
-	@echo "  Builder image cleaned"
-
-# ===== Build Targets =====
-
-# Main build target - orchestrates the entire build process
-build: ensure-builder start-builder copy-source run-cmake run-build run-go-build copy-artifacts
+# Build the runtime image (requires builder image)
+build-runtime: ensure-builder
+	@echo "→ Building runtime image..."
+	@echo "  Image: $(RUNTIME_IMAGE):$(RUNTIME_TAG)"
+	@echo "  Dockerfile: $(RUNTIME_DOCKERFILE)"
 	@echo ""
-	@echo "✓ Build completed successfully!"
-	@echo "  Binary:    $(OUTPUT_DIR)/ollama"
-	@echo "  Libraries: $(OUTPUT_DIR)/lib/"
+	@echo "  This will:"
+	@echo "    - Clone ollama37 source from GitHub"
+	@echo "    - Configure with CMake (CUDA 11 preset)"
+	@echo "    - Compile C/C++/CUDA libraries"
+	@echo "    - Build Go binary"
+	@echo "    - Package runtime environment"
 	@echo ""
-	@echo "To test the binary:"
-	@echo "  cd $(OUTPUT_DIR) && ./ollama --version"
+	@docker build \
+		-f $(RUNTIME_DOCKERFILE) \
+		-t $(RUNTIME_IMAGE):$(RUNTIME_TAG) \
+		$(RUNTIME_CONTEXT)
+	@echo ""
+	@echo "✓ Runtime image built successfully!"
+	@echo ""
+	@echo "To start the Ollama server:"
+	@echo "  docker-compose up -d"

 # Ensure builder image exists (build if not present)
 ensure-builder:
 	@if ! docker images --format '{{.Repository}}:{{.Tag}}' | grep -q "^$(BUILDER_IMAGE):$(BUILDER_TAG)$$"; then \
 		echo "→ Builder image not found. Building $(BUILDER_IMAGE):$(BUILDER_TAG)..."; \
+		echo ""; \
 		$(MAKE) build-builder; \
-	else \
-		echo "→ Builder image $(BUILDER_IMAGE):$(BUILDER_TAG) already exists"; \
+		echo ""; \
 	fi

-# Start the builder container with GPU access
-start-builder:
-	@echo "→ Starting builder container with GPU access..."
-	@if docker ps --format '{{.Names}}' | grep -q "^$(CONTAINER_NAME)$$"; then \
-		echo "  Container $(CONTAINER_NAME) is already running"; \
-	else \
-		echo "  Creating new builder container..."; \
-		docker run --rm -d \
-			--name $(CONTAINER_NAME) \
-			--runtime=nvidia \
-			--gpus all \
-			$(BUILDER_IMAGE):$(BUILDER_TAG) \
-			sleep infinity; \
-		sleep 2; \
-		docker exec $(CONTAINER_NAME) nvidia-smi --query-gpu=name,driver_version,memory.total --format=csv,noheader; \
-	fi
-
-# Stop and remove the builder container
-stop-builder:
-	@echo "→ Stopping builder container..."
-	@if docker ps --format '{{.Names}}' | grep -q "^$(CONTAINER_NAME)$$"; then \
-		docker stop $(CONTAINER_NAME); \
-		echo "  Container stopped and removed (--rm flag)"; \
-	else \
-		echo "  Container not running"; \
-	fi
-
-# Copy source code to the container
-copy-source: start-builder
-	@echo "→ Copying source code to container..."
-	@docker cp $(SOURCE_DIR)/. $(CONTAINER_NAME):/usr/local/src/ollama37/
-	@echo "→ Cleaning any host build artifacts from container..."
-	@docker exec $(CONTAINER_NAME) rm -rf /usr/local/src/ollama37/build /usr/local/src/ollama37/ollama /usr/local/src/ollama37/dist
-	@echo "  Source code copied (clean build environment)"
-
-# Run CMake configuration
-run-cmake: copy-source
-	@echo "→ Running CMake configuration (preset: $(CMAKE_PRESET))..."
-	@docker exec -w /usr/local/src/ollama37 $(CONTAINER_NAME) \
-		bash -l -c 'LD_LIBRARY_PATH=/usr/local/lib:/usr/local/lib64:/usr/lib64:$$LD_LIBRARY_PATH CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake --preset "$(CMAKE_PRESET)"'
-
-# Run CMake build (C/C++/CUDA compilation)
-run-build: run-cmake
-	@echo "→ Building C/C++/CUDA libraries (using $(NPROC) cores)..."
-	@docker exec -w /usr/local/src/ollama37 $(CONTAINER_NAME) \
-		bash -l -c 'LD_LIBRARY_PATH=/usr/local/lib:/usr/local/lib64:/usr/lib64:$$LD_LIBRARY_PATH CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake --build build -j$(NPROC)'
-
-# Run Go build
-run-go-build: run-build
-	@echo "→ Building Go binary..."
-	@docker exec -w /usr/local/src/ollama37 $(CONTAINER_NAME) \
-		bash -l -c 'go build -buildvcs=false -o ollama .'
-
-# Copy build artifacts from container to host
-copy-artifacts: run-go-build
-	@echo "→ Copying build artifacts to host..."
-	@mkdir -p $(OUTPUT_DIR)/lib
-	@docker cp $(CONTAINER_NAME):/usr/local/src/ollama37/ollama $(OUTPUT_DIR)/
-	@docker cp $(CONTAINER_NAME):/usr/local/src/ollama37/build/lib/ollama/. $(OUTPUT_DIR)/lib/
-	@echo "→ Copying GCC 10 runtime libraries..."
-	@docker cp $(CONTAINER_NAME):/usr/local/lib64/libstdc++.so.6 $(OUTPUT_DIR)/lib/
-	@docker cp $(CONTAINER_NAME):/usr/local/lib64/libstdc++.so.6.0.28 $(OUTPUT_DIR)/lib/
-	@docker cp $(CONTAINER_NAME):/usr/local/lib64/libgcc_s.so.1 $(OUTPUT_DIR)/lib/
-	@echo "  Artifacts copied to $(OUTPUT_DIR)"
-	@echo ""
-	@echo "  Binary: $(OUTPUT_DIR)/ollama"
-	@ls -lh $(OUTPUT_DIR)/ollama
-	@echo ""
-	@echo "  Libraries:"
-	@ls -lh $(OUTPUT_DIR)/lib/
-
-# Open an interactive shell in the builder container
-shell: start-builder
-	@echo "→ Opening shell in builder container..."
-	@docker exec -it -w /usr/local/src/ollama37 $(CONTAINER_NAME) bash -l
-
-# Test the built binary
-test: build
-	@echo "→ Testing ollama binary..."
-	@cd $(OUTPUT_DIR) && LD_LIBRARY_PATH=$$PWD/lib:$$LD_LIBRARY_PATH ./ollama --version
-
-# Clean build artifacts from host
+# Remove all Docker images
 clean:
-	@echo "→ Cleaning build artifacts from host..."
-	@rm -rf $(OUTPUT_DIR)
-	@echo "  Cleaned $(OUTPUT_DIR)"
-
-# Clean everything including container
-clean-all: clean stop-builder
-	@echo "→ Cleaning build directory in source..."
-	@rm -rf $(BUILD_DIR)
-	@rm -rf $(DIST_DIR)
-	@echo "  All cleaned"
-
-# ===== Runtime Image Targets =====
-
-# Build the runtime Docker image from artifacts
-build-runtime:
-	@echo "→ Building runtime Docker image..."
-	@if [ ! -f "$(OUTPUT_DIR)/ollama" ]; then \
-		echo "Error: ollama binary not found in $(OUTPUT_DIR)"; \
-		echo "Run 'make build' first to create the artifacts"; \
-		exit 1; \
-	fi
-	@if [ ! -d "$(OUTPUT_DIR)/lib" ]; then \
-		echo "Error: lib directory not found in $(OUTPUT_DIR)"; \
-		echo "Run 'make build' first to create the artifacts"; \
-		exit 1; \
-	fi
-	@echo "  Building Docker image $(RUNTIME_IMAGE):$(RUNTIME_TAG)..."
-	@docker build \
-		-f $(RUNTIME_DOCKERFILE) \
-		-t $(RUNTIME_IMAGE):$(RUNTIME_TAG) \
-		$(SOURCE_DIR)
-	@echo ""
-	@echo "✓ Runtime image built successfully!"
-	@echo "  Image: $(RUNTIME_IMAGE):$(RUNTIME_TAG)"
-	@echo ""
-	@echo "To run the image:"
-	@echo "  make run-runtime"
-	@echo ""
-	@echo "Or manually:"
-	@echo "  docker run --rm -it --runtime=nvidia --gpus all -p 11434:11434 $(RUNTIME_IMAGE):$(RUNTIME_TAG)"
-	@echo ""
-	@echo "To stop the builder container:"
-	@echo "  make stop-builder"
-
-# Run the runtime container
-run-runtime:
-	@echo "→ Starting runtime container..."
-	@if docker ps -a --format '{{.Names}}' | grep -q "^ollama37-runtime$$"; then \
-		echo "  Stopping existing container..."; \
-		docker stop ollama37-runtime 2>/dev/null || true; \
-		docker rm ollama37-runtime 2>/dev/null || true; \
-	fi
-	@echo "  Starting new container..."
-	@docker run -d \
-		--name ollama37-runtime \
-		--runtime=nvidia \
-		--gpus all \
-		-p 11434:11434 \
-		-v ollama-data:/root/.ollama \
-		$(RUNTIME_IMAGE):$(RUNTIME_TAG)
-	@sleep 2
-	@echo ""
-	@echo "✓ Runtime container started!"
-	@echo "  Container: ollama37-runtime"
-	@echo "  API: http://localhost:11434"
-	@echo ""
-	@echo "Check logs:"
-	@echo "  docker logs -f ollama37-runtime"
-	@echo ""
-	@echo "Test the API:"
-	@echo "  curl http://localhost:11434/api/tags"
-	@echo ""
-	@echo "Stop the container:"
-	@echo "  make stop-runtime"
-
-# Stop the runtime container
-stop-runtime:
-	@echo "→ Stopping runtime container..."
-	@if docker ps --format '{{.Names}}' | grep -q "^ollama37-runtime$$"; then \
-		docker stop ollama37-runtime; \
-		docker rm ollama37-runtime; \
-		echo "  Container stopped and removed"; \
-	else \
-		echo "  Container not running"; \
-	fi
-
-# Clean runtime image
-clean-runtime:
-	@echo "→ Cleaning runtime image..."
+	@echo "→ Removing Docker images..."
 	@docker rmi $(RUNTIME_IMAGE):$(RUNTIME_TAG) 2>/dev/null || echo "  No runtime image to remove"
-	@docker volume rm ollama-data 2>/dev/null || echo "  No volume to remove"
-	@echo "  Runtime image cleaned"
-
-# Help target
-help:
-	@echo "Ollama Build System (with GPU-enabled builder)"
+	@docker rmi $(BUILDER_IMAGE):$(BUILDER_TAG) 2>/dev/null || echo "  No builder image to remove"
 	@echo ""
-	@echo "Builder Image Targets:"
-	@echo "  make build-builder  - Build custom builder Docker image"
-	@echo "  make clean-builder  - Remove builder image"
+	@echo "✓ Images removed"
+	@echo ""
+	@echo "Note: To remove containers and volumes, use:"
+	@echo "  docker-compose down -v"
+
+# Show help message
+help:
+	@echo "Ollama37 Docker Build System"
 	@echo ""
 	@echo "Build Targets:"
-	@echo "  make build          - Build ollama binary and libraries (default)"
-	@echo "  make clean          - Remove build artifacts from host"
-	@echo "  make clean-all      - Remove all build artifacts and stop container"
-	@echo "  make shell          - Open a shell in the builder container"
-	@echo "  make test           - Test the built binary"
-	@echo ""
-	@echo "Runtime Image Targets:"
-	@echo "  make build-runtime  - Build Docker runtime image from artifacts"
-	@echo "  make run-runtime    - Start the runtime container"
-	@echo "  make stop-runtime   - Stop the runtime container"
-	@echo "  make clean-runtime  - Remove runtime image and volumes"
-	@echo ""
+	@echo "  make build          - Build builder and runtime images (default)"
+	@echo "  make build-builder  - Build only the builder base image"
+	@echo "  make build-runtime  - Build only the runtime image"
+	@echo "  make clean          - Remove all Docker images"
 	@echo "  make help           - Show this help message"
 	@echo ""
 	@echo "Configuration:"
 	@echo "  BUILDER_IMAGE:   $(BUILDER_IMAGE):$(BUILDER_TAG)"
 	@echo "  RUNTIME_IMAGE:   $(RUNTIME_IMAGE):$(RUNTIME_TAG)"
-	@echo "  CONTAINER_NAME:  $(CONTAINER_NAME)"
-	@echo "  CMAKE_PRESET:    $(CMAKE_PRESET)"
-	@echo "  PARALLEL_JOBS:   $(NPROC)"
 	@echo ""
-	@echo "Environment:"
-	@echo "  SOURCE_DIR:      $(SOURCE_DIR)"
-	@echo "  OUTPUT_DIR:      $(OUTPUT_DIR)"
+	@echo "Dockerfiles:"
+	@echo "  Builder:         $(BUILDER_DOCKERFILE)"
+	@echo "  Runtime:         $(RUNTIME_DOCKERFILE)"
+	@echo ""
+	@echo "Build Architecture:"
+	@echo "  1. Builder image: Base environment (CUDA 11.4, GCC 10, CMake 4, Go 1.25.3)"
+	@echo "  2. Runtime image: Two-stage build (compile + package)"
+	@echo "     - Stage 1: Clone source, compile C/C++/CUDA/Go"
+	@echo "     - Stage 2: Package runtime with compiled binaries"
+	@echo ""
+	@echo "Container Management (use docker-compose):"
+	@echo "  docker-compose up -d        - Start Ollama server"
+	@echo "  docker-compose logs -f      - View logs"
+	@echo "  docker-compose down         - Stop server"
+	@echo "  docker-compose down -v      - Stop and remove volumes"
--- a/docker/README.md
+++ b/docker/README.md
@@ -1,21 +1,28 @@
 # Ollama37 Docker Build System

-**Single-stage Docker build for Ollama with CUDA 11.4 and Compute Capability 3.7 support (Tesla K80)**
+**Two-stage Docker build for Ollama with CUDA 11.4 and Compute Capability 3.7 support (Tesla K80)**

 ## Overview

-This Docker build system creates a single all-in-one image that includes:
- CUDA 11.4 toolkit (required for Tesla K80, compute capability 3.7)
- GCC 10 (built from source, required by CUDA 11.4)
- CMake 4.0 (built from source)
- Go 1.25.3
- Ollama37 binary with K80 GPU support
+This Docker build system uses a two-stage architecture to build and run Ollama with Tesla K80 (compute capability 3.7) support:

-The image is built entirely from source by cloning from https://github.com/dogkeeper886/ollama37
+1. **Builder Image** (`builder/Dockerfile`) - Base environment with build tools
+   - Rocky Linux 8
+   - CUDA 11.4 toolkit (required for Tesla K80)
+   - GCC 10 (built from source, required by CUDA 11.4)
+   - CMake 4.0 (built from source)
+   - Go 1.25.3
+
+2. **Runtime Image** (`runtime/Dockerfile`) - Two-stage build process
+   - **Stage 1 (compile)**: Clone source → Configure CMake → Build C/C++/CUDA → Build Go binary
+   - **Stage 2 (runtime)**: Copy artifacts → Setup runtime environment
+
+The runtime uses the builder image as its base to ensure library path compatibility between build and runtime environments.

 ## Prerequisites

 - Docker with NVIDIA Container Runtime
+- Docker Compose
 - NVIDIA GPU drivers (470+ for Tesla K80)
 - Verify GPU access:
  ```bash
@@ -24,16 +31,20 @@ The image is built entirely from source by cloning from https://github.com/dogke

 ## Quick Start

-### 1. Build the Image
+### 1. Build Images

 ```bash
 cd /home/jack/Documents/ollama37/docker
-docker build -t ollama37:latest -f Dockerfile ..
+make build
 ```

-**Build time:** ~90 minutes (first time, includes building GCC 10 and CMake 4 from source)
+This will:
+1. Build the builder image (if not present) - **~90 minutes first time**
+2. Build the runtime image - **~10 minutes**

-**Image size:** ~20GB (includes full build toolchain + CUDA toolkit + Ollama)
+**First-time build:** ~100 minutes total (includes building GCC 10 and CMake 4 from source)
+
+**Subsequent builds:** ~10 minutes (builder image is cached)

 ### 2. Run with Docker Compose (Recommended)

@@ -46,6 +57,11 @@ Check logs:
 docker-compose logs -f
 ```

+Stop the server:
+```bash
+docker-compose down
+```
+
 ### 3. Run Manually

 ```bash
@@ -92,46 +108,147 @@ docker exec ollama37 ollama run gemma3:4b "Hello!"

 ## Architecture

-### Single-Stage Build Process
+### Build System Components

-The Dockerfile performs these steps in order:
+```
+docker/
+├── builder/
+│   └── Dockerfile          # Base image: CUDA 11.4, GCC 10, CMake 4, Go 1.25.3
+├── runtime/
+│   └── Dockerfile          # Two-stage: compile ollama37, package runtime
+├── Makefile                # Build orchestration (images only)
+├── docker-compose.yml      # Runtime orchestration
+└── README.md               # This file
+```

-1. **Base Setup** (10 min)
-   - Rocky Linux 8
-   - CUDA 11.4 toolkit installation
-   - Development tools
+### Two-Stage Build Process

-2. **Build Toolchain** (70 min)
-   - GCC 10 from source (~60 min)
-   - CMake 4 from source (~8 min)
-   - Go 1.25.3 binary (~1 min)
+#### Stage 1: Builder Image (`builder/Dockerfile`)
+**Purpose**: Provide consistent build environment

-3. **Ollama Compilation** (10 min)
-   - Git clone from dogkeeper886/ollama37
-   - CMake configure with "CUDA 11" preset
-   - Build C/C++/CUDA libraries
-   - Build Go binary
+**Contents:**
+- Rocky Linux 8 base
+- CUDA 11.4 toolkit (compilation only, no driver)
+- GCC 10 from source (~60 min build time)
+- CMake 4.0 from source (~8 min build time)
+- Go 1.25.3 binary
+- All build dependencies

-4. **Runtime Setup**
-   - Configure library paths
-   - Set environment variables
-   - Configure entrypoint
+**Build time:** ~90 minutes (first time), cached thereafter

-### Why Single-Stage?
+**Image size:** ~15GB

-The previous two-stage design (builder → runtime) had issues:
- Complex artifact copying between stages
- Missing CUDA runtime libraries
- LD_LIBRARY_PATH mismatches
- User/permission problems
+#### Stage 2: Runtime Image (`runtime/Dockerfile`)

-Single-stage ensures:
- ✅ All libraries present and properly linked
- ✅ Consistent environment from build to runtime
- ✅ No artifact copying issues
- ✅ Complete CUDA toolkit available at runtime
+**Stage 2.1 - Compile** (FROM ollama37-builder)
+1. Clone ollama37 source from GitHub
+2. Configure with CMake ("CUDA 11" preset for compute 3.7)
+3. Build C/C++/CUDA libraries
+4. Build Go binary

-**Trade-off:** Larger image size (~20GB vs ~3GB) for guaranteed reliability
+**Stage 2.2 - Runtime** (FROM ollama37-builder)
+1. Copy entire source tree (includes compiled artifacts)
+2. Copy binary to /usr/local/bin/ollama
+3. Setup LD_LIBRARY_PATH for runtime libraries
+4. Configure server, expose ports, setup volumes
+
+**Build time:** ~10 minutes
+
+**Image size:** ~18GB (includes build environment + compiled Ollama)
+
+### Why Both Stages Use Builder Base?
+
+**Problem**: Compiled binaries have hardcoded library paths (via rpath/LD_LIBRARY_PATH)
+
+**Solution**: Use identical base images for compile and runtime stages
+
+**Benefits:**
+- ✅ Library paths match between build and runtime
+- ✅ All GCC 10 runtime libraries present
+- ✅ All CUDA libraries at expected paths
+- ✅ No complex artifact extraction/copying
+- ✅ Guaranteed compatibility
+
+**Trade-off:** Larger runtime image (~18GB) vs complexity and reliability issues
+
+### Alternative: Single-Stage Build
+
+See `Dockerfile.single-stage.archived` for the original single-stage design that inspired this architecture.
+
+## Build Commands
+
+### Using the Makefile
+
+```bash
+# Build both builder and runtime images
+make build
+
+# Build only builder image
+make build-builder
+
+# Build only runtime image (will auto-build builder if needed)
+make build-runtime
+
+# Remove all images
+make clean
+
+# Show help
+make help
+```
+
+### Direct Docker Commands
+
+```bash
+# Build builder image
+docker build -f builder/Dockerfile -t ollama37-builder:latest builder/
+
+# Build runtime image
+docker build -f runtime/Dockerfile -t ollama37:latest .
+```
+
+## Runtime Management
+
+### Using Docker Compose (Recommended)
+
+```bash
+# Start server
+docker-compose up -d
+
+# View logs (live tail)
+docker-compose logs -f
+
+# Stop server
+docker-compose down
+
+# Stop and remove volumes
+docker-compose down -v
+
+# Restart server
+docker-compose restart
+```
+
+### Manual Docker Commands
+
+```bash
+# Start container
+docker run -d \
+  --name ollama37 \
+  --runtime=nvidia \
+  --gpus all \
+  -p 11434:11434 \
+  -v ollama-data:/root/.ollama \
+  ollama37:latest
+
+# View logs
+docker logs -f ollama37
+
+# Stop container
+docker stop ollama37
+docker rm ollama37
+
+# Shell access
+docker exec -it ollama37 bash
+```

 ## Configuration

@@ -143,90 +260,239 @@ Single-stage ensures:
 | `LD_LIBRARY_PATH` | `/usr/local/src/ollama37/build/lib/ollama:/usr/local/lib64:/usr/local/cuda-11.4/lib64:/usr/lib64` | Library search path |
 | `NVIDIA_VISIBLE_DEVICES` | `all` | Which GPUs to use |
 | `NVIDIA_DRIVER_CAPABILITIES` | `compute,utility` | GPU capabilities |
+| `OLLAMA_DEBUG` | (unset) | Enable verbose Ollama logging |
+| `GGML_CUDA_DEBUG` | (unset) | Enable CUDA/CUBLAS debug logging |

 ### Volume Mounts

 - `/root/.ollama` - Model storage (use Docker volume `ollama-data`)

+### Customizing docker-compose.yml
+
+```yaml
+# Change port
+ports:
+  - "11435:11434"  # Host:Container
+
+# Use specific GPU
+environment:
+  - NVIDIA_VISIBLE_DEVICES=0  # Use GPU 0 only
+
+# Enable debug logging
+environment:
+  - OLLAMA_DEBUG=1
+  - GGML_CUDA_DEBUG=1
+```
+
 ## GPU Support

 ### Supported Compute Capabilities
 - **3.7** - Tesla K80 (primary target)
- **5.0-8.6** - Pascal, Volta, Turing, Ampere
+- **5.0-5.2** - Maxwell (GTX 900 series)
+- **6.0-6.1** - Pascal (GTX 10 series)
+- **7.0-7.5** - Volta, Turing (RTX 20 series)
+- **8.0-8.6** - Ampere (RTX 30 series)

 ### Tesla K80 Recommendations

 **VRAM:** 12GB per GPU (24GB for dual-GPU K80)

 **Model sizes:**
- Small (1-4B): Full precision
+- Small (1-4B): Full precision or Q8 quantization
 - Medium (7-8B): Q4_K_M quantization
 - Large (13B+): Q4_0 quantization or multi-GPU

+**Tested models:**
+- ✅ gemma3:4b
+- ✅ gpt-oss
+- ✅ deepseek-r1
+
 **Multi-GPU:**
 ```bash
-docker run --gpus all ...              # Use all GPUs
-docker run --gpus '"device=0"' ...     # Use specific GPU
+# Use all GPUs
+docker run --gpus all ...
+
+# Use specific GPU
+docker run --gpus '"device=0"' ...
+
+# Use multiple specific GPUs
+docker run --gpus '"device=0,1"' ...
 ```

 ## Troubleshooting

 ### GPU not detected
+
 ```bash
 # Check GPU visibility in container
 docker exec ollama37 nvidia-smi

 # Check CUDA libraries
 docker exec ollama37 ldconfig -p | grep cuda
+
+# Check NVIDIA runtime
+docker info | grep -i runtime
 ```

 ### Model fails to load
+
 ```bash
 # Check logs with CUDA debug
 docker run --rm --runtime=nvidia --gpus all \
  -e OLLAMA_DEBUG=1 \
  -e GGML_CUDA_DEBUG=1 \
-  ollama37:latest serve
+  -p 11434:11434 \
+  ollama37:latest

 # Check library paths
 docker exec ollama37 bash -c 'echo $LD_LIBRARY_PATH'
+
+# Verify CUBLAS functions
+docker exec ollama37 bash -c 'ldd /usr/local/bin/ollama | grep cublas'
 ```

-### Out of memory during build
+### Build fails with "out of memory"
+
 ```bash
-# Reduce parallel jobs in Dockerfile
-# Edit line: cmake --build build -j$(nproc)
-# Change to: cmake --build build -j2
+# Edit runtime/Dockerfile line for cmake build
+# Change: cmake --build build -j$(nproc)
+# To: cmake --build build -j2
+
+# Or set Docker memory limit
+docker build --memory=8g ...
 ```

 ### Port already in use
+
 ```bash
-# Edit docker-compose.yml
+# Find process using port 11434
+sudo lsof -i :11434
+
+# Kill the process or change port in docker-compose.yml
 ports:
-  - "11435:11434"  # Change host port
+  - "11435:11434"
+```
+
+### Build cache issues
+
+```bash
+# Rebuild runtime image without cache
+docker build --no-cache -f runtime/Dockerfile -t ollama37:latest .
+
+# Rebuild builder image without cache
+docker build --no-cache -f builder/Dockerfile -t ollama37-builder:latest builder/
+
+# Remove all images and rebuild
+make clean
+make build
 ```

 ## Rebuilding

-### Rebuild from scratch
-```bash
-docker-compose down
-docker rmi ollama37:latest
-docker build --no-cache -t ollama37:latest -f Dockerfile ..
-docker-compose up -d
-```
+### Rebuild with latest code

-### Rebuild with updated code
 ```bash
-# The git clone will pull latest from GitHub
-docker build -t ollama37:latest -f Dockerfile ..
+# Runtime Dockerfile clones from GitHub, so rebuild to get latest
+make build-runtime
+
+# Restart container
 docker-compose restart
 ```

+### Rebuild everything from scratch
+
+```bash
+# Stop and remove containers
+docker-compose down -v
+
+# Remove images
+make clean
+
+# Rebuild all
+make build
+
+# Start fresh
+docker-compose up -d
+```
+
+### Rebuild only builder (rare)
+
+```bash
+# Only needed if you change CUDA/GCC/CMake/Go versions
+make clean
+make build-builder
+make build-runtime
+```
+
+## Development
+
+### Modifying the build
+
+1. **Change build tools** - Edit `builder/Dockerfile`
+2. **Change Ollama build process** - Edit `runtime/Dockerfile`
+3. **Change build orchestration** - Edit `Makefile`
+4. **Change runtime config** - Edit `docker-compose.yml`
+
+### Testing changes
+
+```bash
+# Build with your changes
+make build
+
+# Run and test
+docker-compose up -d
+docker-compose logs -f
+
+# If issues, check inside container
+docker exec -it ollama37 bash
+```
+
+### Shell access for debugging
+
+```bash
+# Enter running container
+docker exec -it ollama37 bash
+
+# Check GPU
+nvidia-smi
+
+# Check libraries
+ldd /usr/local/bin/ollama
+ldconfig -p | grep -E "cuda|cublas"
+
+# Test binary
+/usr/local/bin/ollama --version
+```
+
+## Image Sizes
+
+| Image | Size | Contents |
+|-------|------|----------|
+| `ollama37-builder:latest` | ~15GB | CUDA, GCC, CMake, Go, build deps |
+| `ollama37:latest` | ~18GB | Builder + Ollama binary + libraries |
+
+**Note**: Large size ensures all runtime dependencies are present and properly linked.
+
+## Build Times
+
+| Task | First Build | Cached Build |
+|------|-------------|--------------|
+| Builder image | ~90 min | <1 min |
+| Runtime image | ~10 min | ~10 min |
+| **Total** | **~100 min** | **~10 min** |
+
+**Breakdown (first build):**
+- GCC 10: ~60 min
+- CMake 4: ~8 min
+- CUDA toolkit: ~10 min
+- Go install: ~1 min
+- Ollama build: ~10 min
+
 ## Documentation

- **[../CLAUDE.md](../CLAUDE.md)** - Project goals and implementation notes
+- **[../CLAUDE.md](../CLAUDE.md)** - Project goals, implementation details, and technical notes
 - **[Upstream Ollama](https://github.com/ollama/ollama)** - Original Ollama project
+- **[dogkeeper886/ollama37](https://github.com/dogkeeper886/ollama37)** - This fork with K80 support

 ## License

--- a/docker/builder/Dockerfile
+++ b/docker/builder/Dockerfile
@@ -1,3 +1,7 @@
+# Ollama37 Builder Image
+# This image provides the complete build environment for compiling Ollama with Tesla K80 (compute 3.7) support
+# Includes: CUDA 11.4, GCC 10, CMake 4, Go 1.25.3
+
 FROM rockylinux/rockylinux:8

 # Install CUDA toolkit 11.4
@@ -9,13 +13,14 @@ RUN dnf -y install dnf-plugins-core\
    && dnf -y config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo\
    && dnf -y install cuda-toolkit-11-4

-# Post install, setup path
-COPY cuda-11.4.sh /etc/profile.d/cuda-11.4.sh
+# Setup CUDA path
+RUN echo 'export PATH="${PATH}:/usr/local/cuda-11.4/bin"' > /etc/profile.d/cuda-11.4.sh
 ENV PATH="$PATH:/usr/local/cuda-11.4/bin"
-#ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/lib64:/usr/local/lib64"

-# Install gcc 10
-RUN dnf -y install wget unzip bzip2\
+# Install GCC 10 from source
+# CUDA 11.4 requires GCC 10 maximum (enforced in host_config.h)
+# GCC 11+ is incompatible with CUDA 11.4
+RUN dnf -y install wget unzip bzip2 git\
    && dnf -y groupinstall "Development Tools"\
    && cd /usr/local/src\
    && wget https://github.com/gcc-mirror/gcc/archive/refs/heads/releases/gcc-10.zip\
@@ -28,17 +33,16 @@ RUN dnf -y install wget unzip bzip2\
    && make -j $(nproc)\
    && make install

-# Post install, setup path
-#COPY gcc-10.sh /etc/profile.d/gcc-10.sh
-#ENV LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/lib64:/usr/local/lib64"
-COPY gcc-10.conf /etc/ld.so.conf.d/gcc-10.conf
-RUN ldconfig\
+# Setup GCC 10 library path and update system compiler
+# Configure ldconfig to find GCC 10 runtime libraries
+# Replace default cc symlink to use our custom GCC 10
+RUN echo '/usr/local/lib64' > /etc/ld.so.conf.d/gcc-10.conf\
+    && ldconfig\
    && rm -f /usr/bin/cc\
    && ln -s /usr/local/bin/gcc /usr/bin/cc

-# Install cmake
-#ENV LD_LIBRARY_PATH="/usr/local/nvidia/lib:/usr/local/nvidia/lib64"
-#ENV PATH="/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
+# Install CMake 4 from source
+# Required for modern CMake features and CUDA architecture configuration
 RUN dnf -y install openssl-devel\
    && cd /usr/local/src\
    && wget https://github.com/Kitware/CMake/releases/download/v4.0.0/cmake-4.0.0.tar.gz\
@@ -49,11 +53,11 @@ RUN dnf -y install openssl-devel\
    && make -j $(nproc)\
    && make install

-# Install go
+# Install go 1.25.3
 RUN cd /usr/local\
    && wget https://go.dev/dl/go1.25.3.linux-amd64.tar.gz\
    && tar xvf go1.25.3.linux-amd64.tar.gz

-# Post install, setup path
-COPY go.sh /etc/profile.d/go.sh
+# Setup Go path
+RUN echo 'export PATH="${PATH}:/usr/local/go/bin"' > /etc/profile.d/go.sh
 ENV PATH="$PATH:/usr/local/go/bin"
--- a/docker/builder/README.md
+++ b/docker/builder/README.md
@@ -1,58 +0,0 @@
-# Ollama37 Builder Image
-
-This directory contains the Dockerfile for building the `ollama37-builder:latest` image.
-
-## What's Inside
-
-The builder image includes:
- **Base**: `nvidia/cuda:11.4.3-devel-rockylinux8`
- **GCC 10**: `gcc-toolset-10` (required by CUDA 11.4)
- **CMake**: System package
- **Go**: System package
-
-## Building the Builder Image
-
-The builder image is **automatically built** by the Makefile when you run `make build` for the first time.
-
-To manually build the builder image:
-
-```bash
-cd /home/jack/Documents/ollama37/docker
-make build-builder
-```
-
-Or using Docker directly:
-
-```bash
-cd /home/jack/Documents/ollama37/docker/builder
-docker build -t ollama37-builder:latest .
-```
-
-## Using the Builder Image
-
-The Makefile handles this automatically, but for reference:
-
-```bash
-# Start builder container with GPU access
-docker run --rm -d \
-  --name ollama37-builder \
-  --runtime=nvidia \
-  --gpus all \
-  ollama37-builder:latest \
-  sleep infinity
-
-# Use the container
-docker exec -it ollama37-builder bash
-```
-
-## Customization
-
-If you need to modify the builder (e.g., change CUDA version, add packages):
-
-1. Edit `Dockerfile` in this directory
-2. Rebuild: `make clean-builder build-builder`
-3. Build your project: `make build`
-
-## Archived Builder
-
-The `archived/` subdirectory contains an older Dockerfile that built GCC and CMake from source (~80 minutes). The current version uses Rocky Linux system packages for much faster builds (~5 minutes).
--- a/docker/builder/cuda-11.4.sh
+++ b/docker/builder/cuda-11.4.sh
@@ -1 +0,0 @@
-export PATH="${PATH}:/usr/local/cuda-11.4/bin"
--- a/docker/builder/gcc-10.conf
+++ b/docker/builder/gcc-10.conf
@@ -1 +0,0 @@
-/usr/local/lib64
--- a/docker/builder/go.sh
+++ b/docker/builder/go.sh
@@ -1 +0,0 @@
-export PATH="${PATH}:/usr/local/go/bin"
--- a/docker/runtime/Dockerfile
+++ b/docker/runtime/Dockerfile
@@ -1,46 +1,73 @@
-FROM rockylinux/rockylinux:8
+# Ollama37 Runtime Image
+# Two-stage build: compile stage builds the binary, runtime stage packages it
+# Both stages use ollama37-builder base to maintain identical library paths
+# This ensures the compiled binary can find all required runtime libraries

-# Install only CUDA runtime libraries (not the full toolkit)
-# The host system provides the NVIDIA driver at runtime via --gpus flag
-RUN dnf -y install dnf-plugins-core\
-    && dnf -y config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo\
-    && dnf -y install cuda-cudart-11-4 libcublas-11-4 \
-    && dnf clean all
+# Stage 1: Compile ollama37 from source
+FROM ollama37-builder as builder

-# Create directory structure
-RUN mkdir -p /usr/local/bin /usr/local/lib/ollama
+# Clone ollama37 source code from GitHub
+RUN cd /usr/local/src\
+    && git clone https://github.com/dogkeeper886/ollama37.git

-# Copy the ollama binary from build output
-COPY docker/output/ollama /usr/local/bin/ollama
+# Set working directory for build
+WORKDIR /usr/local/src/ollama37

-# Copy all shared libraries from build output (includes ollama libs + GCC 10 runtime libs)
-COPY docker/output/lib/ /usr/local/lib/ollama/
+# Configure build with CMake
+# Use "CUDA 11" preset for Tesla K80 compute capability 3.7 support
+# Set LD_LIBRARY_PATH to find GCC 10 and system libraries during build
+RUN bash -c 'LD_LIBRARY_PATH=/usr/local/lib:/usr/local/lib64:/usr/lib64:$LD_LIBRARY_PATH \
+    CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ \
+    cmake --preset "CUDA 11"'

-# Set library path to include our ollama libraries first
-# This includes:
-#   - Ollama CUDA/GGML libraries
-#   - GCC 10 runtime libraries (libstdc++.so.6, libgcc_s.so.1)
-#   - System CUDA libraries
-ENV LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/cuda-11.4/lib64:/usr/lib64
+# Build C/C++/CUDA libraries with CMake
+# Compile all GGML CUDA kernels and Ollama native libraries
+RUN bash -c 'LD_LIBRARY_PATH=/usr/local/lib:/usr/local/lib64:/usr/lib64:$LD_LIBRARY_PATH \
+    CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ \
+    cmake --build build -j$(nproc)'

-# Base image already sets these, but we can override if needed:
-# NVIDIA_DRIVER_CAPABILITIES=compute,utility
-# NVIDIA_VISIBLE_DEVICES=all
+# Build Go binary
+# VCS info is embedded automatically since we cloned from git
+RUN go build -o /usr/local/bin/ollama .

-# Ollama server configuration
+
+# Stage 2: Runtime environment
+# Use ollama37-builder as base to maintain library path compatibility
+# The compiled binary has hardcoded library paths that match this environment
+FROM ollama37-builder as runtime
+
+# Copy the entire source directory including compiled libraries
+# This preserves the exact directory structure the binary expects
+COPY --from=builder /usr/local/src/ollama37 /usr/local/src/ollama37
+
+# Copy the ollama binary to system bin directory
+COPY --from=builder /usr/local/bin/ollama /usr/local/bin/ollama
+
+# Setup library paths for runtime
+# The binary expects libraries in these exact paths:
+#   /usr/local/src/ollama37/build/lib/ollama - Ollama CUDA/GGML libraries
+#   /usr/local/lib64 - GCC 10 runtime libraries (libstdc++, libgcc_s)
+#   /usr/local/cuda-11.4/lib64 - CUDA 11.4 runtime libraries
+#   /usr/lib64 - System libraries
+ENV LD_LIBRARY_PATH=/usr/local/src/ollama37/build/lib/ollama:/usr/local/lib64:/usr/local/cuda-11.4/lib64:/usr/lib64
+
+# Configure Ollama server to listen on all interfaces
 ENV OLLAMA_HOST=0.0.0.0:11434

-# Expose the Ollama API port
+# Expose Ollama API port
 EXPOSE 11434

-# Create a data directory for models
+# Create persistent volume for model storage
+# Models downloaded by Ollama will be stored here
 RUN mkdir -p /root/.ollama
 VOLUME ["/root/.ollama"]

-# Health check
+# Configure health check to verify Ollama is running
+# Uses 'ollama list' command to check if the service is responsive
 HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD /usr/local/bin/ollama list || exit 1

 # Set entrypoint and default command
+# Container runs 'ollama serve' by default to start the API server
 ENTRYPOINT ["/usr/local/bin/ollama"]
 CMD ["serve"]
				`@@ -1 +0,0 @@`
				`export PATH="${PATH}:/usr/local/cuda-11.4/bin"`