16 Commits

Author SHA1 Message Date
Shang Chieh Tseng
b5dac79d2c Update README.md 2025-11-13 18:14:16 +08:00
Shang Chieh Tseng
4cf745b40a Update README.md 2025-11-12 12:50:13 +08:00
Shang Chieh Tseng
8d376e0f9b Add local development build support to Docker build system
Extends the Docker Makefile with targets for building from local source code without pushing to GitHub, enabling faster iteration during development.

New build targets:
- build-runtime-local: Build from local source with cache
- build-runtime-local-no-cache: Full rebuild from local source
- build-runtime-no-cache: Force fresh GitHub clone without cache

Added docker/runtime/Dockerfile.local for local source builds, mirroring the GitHub-based Dockerfile structure but using COPY instead of git clone.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-12 06:51:05 +08:00
Shang Chieh Tseng
7d9b59c520 Improve GPU detection and add detailed model loading logs
1. Fix binary path resolution using symlink (docker/runtime/Dockerfile)
   - Build binary to source directory (./ollama)
   - Create symlink from /usr/local/bin/ollama to /usr/local/src/ollama37/ollama
   - Allows ml/path.go to resolve libraries via filepath.EvalSymlinks()
   - Fixes "total vram=0 B" issue without requiring -w flag

2. Add comprehensive logging for model loading phases (llm/server.go)
   - Log runner subprocess startup and readiness
   - Log each memory allocation phase (FIT, ALLOC, COMMIT)
   - Log layer allocation adjustments during convergence
   - Log when model weights are being loaded (slowest phase)
   - Log progress during waitUntilRunnerLaunched (every 1s)
   - Improves visibility during 1-2 minute first-time model loads

3. Fix flash attention compute capability check (ml/device.go)
   - Changed DriverMajor to ComputeMajor for correct capability detection
   - Flash attention requires compute capability >= 7.0, not driver version

These changes improve user experience during model loading by providing
clear feedback at each stage, especially during the slow COMMIT phase
where GGUF weights are loaded and CUDA kernels compile.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 23:28:00 +08:00
Shang Chieh Tseng
db00f2d5f4 Create dockerhub-readme.md 2025-11-10 20:35:43 +08:00
Shang Chieh Tseng
738a8ba2da Improve Docker runtime Dockerfile documentation and accuracy
Corrects misleading architecture description and enhances code comments:
- Fix header: change "two-stage build" to accurate "single-stage build"
- Remove obsolete multi-stage build artifacts (builder/runtime aliases)
- Clarify LD_LIBRARY_PATH purpose during CMake configuration
- Document parallel compilation benefit (-j flag)
- Explain health check validation scope (API + model registry)
- Add specific library path location to header comments

This aligns with the CLAUDE.md documentation policy of adding helpful
comments to improve code maintainability and debugging experience.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-10 14:18:08 +08:00
Shang Chieh Tseng
4810471b33 Redesign Docker build system to two-stage architecture with builder/runtime separation
Redesigned the Docker build system from a single-stage monolithic design to a clean
two-stage architecture that separates build environment from compilation process while
maintaining library path compatibility.

## Architecture Changes

### Builder Image (docker/builder/Dockerfile)
- Provides base environment: CUDA 11.4, GCC 10, CMake 4, Go 1.25.3
- Built once, cached for subsequent builds (~90 min first time)
- Removed config file copying (cuda-11.4.sh, gcc-10.conf, go.sh)
- Added comprehensive comments explaining each build step
- Added git installation for runtime stage source cloning

### Runtime Image (docker/runtime/Dockerfile)
- Two-stage build using ollama37-builder as base for BOTH stages
- Stage 1 (compile): Clone source from GitHub → CMake configure → Build C/C++/CUDA → Build Go
- Stage 2 (runtime): Copy artifacts from stage 1 → Setup environment → Configure server
- Both stages use identical base image to ensure library path compatibility
- Removed -buildvcs=false flag (VCS info embedded from git clone)
- Comprehensive comments documenting library paths and design rationale

### Makefile (docker/Makefile)
- Simplified from 289 to 145 lines (-50% complexity)
- Removed: run, stop, logs, shell, test targets (use docker-compose instead)
- Removed: build orchestration targets (start-builder, copy-source, run-cmake, etc.)
- Removed: artifact copying (handled internally by multi-stage build)
- Focus: Build images only (build, build-builder, build-runtime, clean, help)
- All runtime operations delegated to docker-compose.yml

### Documentation (docker/README.md)
- Completely rewritten for new two-stage architecture
- Added "Build System Components" section with file structure
- Documented why both runtime stages use builder base (library path compatibility)
- Updated build commands to use Makefile
- Updated runtime commands to use docker-compose
- Added comprehensive troubleshooting section
- Added build time and image size tables
- Reference to archived single-stage design

## Key Design Decision

**Problem**: Compiled binaries have hardcoded library paths
**Solution**: Use ollama37-builder as base for BOTH compile and runtime stages
**Trade-off**: Larger image (~18GB) vs guaranteed library compatibility

## Benefits

-  Cleaner separation of concerns (builder env vs compilation vs runtime)
-  Builder image cached after first build (90 min → <1 min rebuilds)
-  Runtime rebuilds only take ~10 min (pulls latest code from GitHub)
-  No library path mismatches (identical base images)
-  No complex artifact extraction (multi-stage COPY)
-  Simpler Makefile focused on image building
-  Runtime management via docker-compose (industry standard)

## Files Changed

Modified:
- docker/builder/Dockerfile - Added comments, removed COPY config files
- docker/runtime/Dockerfile - Converted to two-stage build
- docker/Makefile - Simplified to focus on image building only
- docker/README.md - Comprehensive rewrite for new architecture

Deleted:
- docker/builder/README.md - No longer needed
- docker/builder/cuda-11.4.sh - Generated in Dockerfile
- docker/builder/gcc-10.conf - Generated in Dockerfile
- docker/builder/go.sh - Generated in Dockerfile

Archived:
- docker/Dockerfile → docker/Dockerfile.single-stage.archived

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-10 13:14:49 +08:00
Shang Chieh Tseng
6dbd8ed44e Redesign Docker build system to single-stage architecture for reliable model loading
Replaced complex two-stage build (builder → runtime) with single-stage
Dockerfile that builds and runs Ollama in one image. This fixes model
loading issues caused by missing CUDA libraries and LD_LIBRARY_PATH
mismatches in the previous multi-stage design.

Changes:
- Add docker/Dockerfile: Single-stage build with GCC 10, CMake 4, Go 1.25.3, CUDA 11.4
- Clone source from https://github.com/dogkeeper886/ollama37
- Compile Ollama with "CUDA 11" preset for Tesla K80 (compute capability 3.7)
- Keep complete CUDA toolkit and all libraries in final image (~20GB)
- Update docker-compose.yml: Simplified config, use ollama37:latest image
- Update docker/README.md: New build instructions and architecture docs

Trade-off: Larger image size (~20GB vs ~3GB) for guaranteed compatibility
and reliable GPU backend operation. All libraries remain accessible with
correct paths, ensuring models load properly on Tesla K80.

Tested: Successfully runs gemma3:1b on Tesla K80

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-10 09:19:22 +08:00
Shang Chieh Tseng
0293c53746 Fix Docker container to run as host user and use host .ollama directory
This change prevents permission issues when using Ollama both locally and
in Docker by:
- Running container as host user (UID/GID) instead of root
- Mounting host's $HOME/.ollama directory using environment variables
- Setting HOME environment variable in container

This allows both the local binary and Docker container to share the same
model data without permission conflicts or duplication.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 18:00:42 +08:00
Shang Chieh Tseng
8380ca93f8 Fix Docker build system: add library paths, GCC 10 runtime libs, and Go build flags
- Add LD_LIBRARY_PATH to CMake and build steps for GCC 10 libraries
- Copy GCC 10 runtime libraries (libstdc++.so.6, libgcc_s.so.1) to output
- Update runtime Dockerfile to use minimal CUDA runtime packages
- Add -buildvcs=false flag to Go build to avoid Git VCS errors
- Simplify runtime container to only include necessary CUDA libraries
- Fix library path configuration for proper runtime library loading
2025-11-09 00:05:12 +08:00
Shang Chieh Tseng
6237498297 Fix Makefile to use custom-built GCC 10 instead of non-existent gcc-toolset-10
- Replace 'scl enable gcc-toolset-10' with 'bash -l' (login shell)
- Login shell sources /etc/profile.d/cuda-11.4.sh and go.sh for PATH
- Explicitly set CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ (custom-built GCC 10)
- Fix run-cmake, run-build, run-go-build, and shell targets
- Enables CMake to find nvcc and use correct compiler toolchain
2025-11-08 21:20:26 +08:00
Shang Chieh Tseng
f2c94bb9af Add Docker builder image with CUDA 11.4, GCC 10, CMake 4, and Go 1.25.3
- Build CUDA 11.4 toolkit from NVIDIA repository (for K80 compute 3.7 support)
- Build GCC 10 from source (required for CUDA 11.4 compatibility)
- Build CMake 4.0.0 from source (latest version)
- Install Go 1.25.3 from official tarball
- Configure library paths via /etc/ld.so.conf.d/gcc-10.conf and ldconfig
- Add /etc/profile.d scripts for interactive shell PATH setup
- Use ENV statements for Docker build-time and runtime PATH configuration
- Switch from nvidia/cuda base image to rockylinux:8 for full control
2025-11-08 21:03:38 +08:00
Shang Chieh Tseng
71fc994a63 Fix Docker build: clean host artifacts after copy to prevent conflicts
- Add cleanup step in copy-source target to remove build/, ollama, and dist/
- Prevents host build artifacts from interfering with container builds
- Ensures clean build environment when switching between host and Docker workflows
- docker cp doesn't respect .dockerignore, so explicit cleanup is needed
2025-11-08 17:16:46 +08:00
Shang Chieh Tseng
94bbfbb2e7 Add Docker-based build system with GPU-enabled builder and runtime containers 2025-11-07 12:48:05 +08:00
Shang Chieh Tseng
ef14fb5b26 Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support
This commit represents a complete rework after pulling the latest changes from
official ollama/ollama repository and re-applying Tesla K80 compatibility patches.

## Key Changes

### CUDA Compute Capability 3.7 Support (Tesla K80)
- Added sm_37 (compute 3.7) to CMAKE_CUDA_ARCHITECTURES in CMakeLists.txt
- Updated CMakePresets.json to include compute 3.7 in "CUDA 11" preset
- Using 37-virtual (PTX with JIT compilation) for maximum compatibility

### Legacy Toolchain Compatibility
- **NVIDIA Driver**: 470.256.02 (last version supporting Kepler/K80)
- **CUDA Version**: 11.4.4 (last CUDA 11.x supporting compute 3.7)
- **GCC Version**: 10.5.0 (required by CUDA 11.4 host_config.h)

### CPU Architecture Trade-offs
Due to GCC 10.5 limitation, sacrificed newer CPU optimizations:
- Alderlake CPU variant enabled WITHOUT AVX_VNNI (requires GCC 11+)
- Still supports: SSE4.2, AVX, F16C, AVX2, BMI2, FMA
- Performance impact: ~3-7% on newer CPUs (acceptable for K80 compatibility)

### Build System Updates
- Modified ml/backend/ggml/ggml/src/ggml-cuda/CMakeLists.txt for compute 3.7
- Added -Wno-deprecated-gpu-targets flag to suppress warnings
- Updated ml/backend/ggml/ggml/src/CMakeLists.txt for Alderlake without AVX_VNNI

### Upstream Sync
Merged latest llama.cpp changes including:
- Enhanced KV cache management with ISWA and hybrid memory support
- Improved multi-modal support (mtmd framework)
- New model architectures (Gemma3, Llama4, Qwen3, etc.)
- GPU backend improvements for CUDA, Metal, and ROCm
- Updated quantization support and GGUF format handling

### Documentation
- Updated CLAUDE.md with comprehensive build instructions
- Documented toolchain constraints and CPU architecture trade-offs
- Removed outdated CI/CD workflows (tesla-k80-*.yml)
- Cleaned up temporary development artifacts

## Rationale

This fork maintains Tesla K80 GPU support (compute 3.7) which was dropped in
official Ollama due to legacy driver/CUDA requirements. The toolchain constraint
creates a deadlock:
- K80 → Driver 470 → CUDA 11.4 → GCC 10 → No AVX_VNNI

We accept the loss of cutting-edge CPU optimizations to enable running modern
LLMs on legacy but still capable Tesla K80 hardware (12GB VRAM per GPU).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 14:03:05 +08:00
Shang Chieh Tseng
8dc4ca7ccc Reorganize Docker build infrastructure for better maintainability
- Restructure from ollama37/ to docker/ with clear separation
- Separate builder and runtime images into dedicated directories
- Group environment scripts in builder/scripts/ subdirectory
- Add comprehensive root-level README.md (257 lines)
- Add .dockerignore files for optimized build contexts
- Enhance shell scripts with shebangs and documentation headers
- Update docker-compose.yml to build locally instead of pulling
- Add environment variables for GPU and host configuration
- Remove duplicate Dockerfile and confusing nested structure

New structure:
  docker/
  ├── README.md (comprehensive documentation)
  ├── docker-compose.yml (local build support)
  ├── builder/ (build environment: CUDA 11.4 + GCC 10 + Go 1.24)
  │   ├── Dockerfile
  │   ├── README.md
  │   ├── .dockerignore
  │   └── scripts/ (organized environment setup)
  └── runtime/ (production image)
      ├── Dockerfile
      ├── README.md
      └── .dockerignore

This reorganization eliminates confusion, removes duplication, and
provides a professional, maintainable structure for Tesla K80 builds.
2025-10-28 14:47:39 +08:00