Commit Graph

4 Commits

Author SHA1 Message Date
Shang Chieh Tseng
0293c53746 Fix Docker container to run as host user and use host .ollama directory
This change prevents permission issues when using Ollama both locally and
in Docker by:
- Running container as host user (UID/GID) instead of root
- Mounting host's $HOME/.ollama directory using environment variables
- Setting HOME environment variable in container

This allows both the local binary and Docker container to share the same
model data without permission conflicts or duplication.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 18:00:42 +08:00
Shang Chieh Tseng
94bbfbb2e7 Add Docker-based build system with GPU-enabled builder and runtime containers 2025-11-07 12:48:05 +08:00
Shang Chieh Tseng
ef14fb5b26 Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support
This commit represents a complete rework after pulling the latest changes from
official ollama/ollama repository and re-applying Tesla K80 compatibility patches.

## Key Changes

### CUDA Compute Capability 3.7 Support (Tesla K80)
- Added sm_37 (compute 3.7) to CMAKE_CUDA_ARCHITECTURES in CMakeLists.txt
- Updated CMakePresets.json to include compute 3.7 in "CUDA 11" preset
- Using 37-virtual (PTX with JIT compilation) for maximum compatibility

### Legacy Toolchain Compatibility
- **NVIDIA Driver**: 470.256.02 (last version supporting Kepler/K80)
- **CUDA Version**: 11.4.4 (last CUDA 11.x supporting compute 3.7)
- **GCC Version**: 10.5.0 (required by CUDA 11.4 host_config.h)

### CPU Architecture Trade-offs
Due to GCC 10.5 limitation, sacrificed newer CPU optimizations:
- Alderlake CPU variant enabled WITHOUT AVX_VNNI (requires GCC 11+)
- Still supports: SSE4.2, AVX, F16C, AVX2, BMI2, FMA
- Performance impact: ~3-7% on newer CPUs (acceptable for K80 compatibility)

### Build System Updates
- Modified ml/backend/ggml/ggml/src/ggml-cuda/CMakeLists.txt for compute 3.7
- Added -Wno-deprecated-gpu-targets flag to suppress warnings
- Updated ml/backend/ggml/ggml/src/CMakeLists.txt for Alderlake without AVX_VNNI

### Upstream Sync
Merged latest llama.cpp changes including:
- Enhanced KV cache management with ISWA and hybrid memory support
- Improved multi-modal support (mtmd framework)
- New model architectures (Gemma3, Llama4, Qwen3, etc.)
- GPU backend improvements for CUDA, Metal, and ROCm
- Updated quantization support and GGUF format handling

### Documentation
- Updated CLAUDE.md with comprehensive build instructions
- Documented toolchain constraints and CPU architecture trade-offs
- Removed outdated CI/CD workflows (tesla-k80-*.yml)
- Cleaned up temporary development artifacts

## Rationale

This fork maintains Tesla K80 GPU support (compute 3.7) which was dropped in
official Ollama due to legacy driver/CUDA requirements. The toolchain constraint
creates a deadlock:
- K80 → Driver 470 → CUDA 11.4 → GCC 10 → No AVX_VNNI

We accept the loss of cutting-edge CPU optimizations to enable running modern
LLMs on legacy but still capable Tesla K80 hardware (12GB VRAM per GPU).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 14:03:05 +08:00
Shang Chieh Tseng
8dc4ca7ccc Reorganize Docker build infrastructure for better maintainability
- Restructure from ollama37/ to docker/ with clear separation
- Separate builder and runtime images into dedicated directories
- Group environment scripts in builder/scripts/ subdirectory
- Add comprehensive root-level README.md (257 lines)
- Add .dockerignore files for optimized build contexts
- Enhance shell scripts with shebangs and documentation headers
- Update docker-compose.yml to build locally instead of pulling
- Add environment variables for GPU and host configuration
- Remove duplicate Dockerfile and confusing nested structure

New structure:
  docker/
  ├── README.md (comprehensive documentation)
  ├── docker-compose.yml (local build support)
  ├── builder/ (build environment: CUDA 11.4 + GCC 10 + Go 1.24)
  │   ├── Dockerfile
  │   ├── README.md
  │   ├── .dockerignore
  │   └── scripts/ (organized environment setup)
  └── runtime/ (production image)
      ├── Dockerfile
      ├── README.md
      └── .dockerignore

This reorganization eliminates confusion, removes duplication, and
provides a professional, maintainable structure for Tesla K80 builds.
2025-10-28 14:47:39 +08:00