Files
ollama37/docs/development.md
Shang Chieh Tseng ef14fb5b26 Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support
This commit represents a complete rework after pulling the latest changes from
official ollama/ollama repository and re-applying Tesla K80 compatibility patches.

## Key Changes

### CUDA Compute Capability 3.7 Support (Tesla K80)
- Added sm_37 (compute 3.7) to CMAKE_CUDA_ARCHITECTURES in CMakeLists.txt
- Updated CMakePresets.json to include compute 3.7 in "CUDA 11" preset
- Using 37-virtual (PTX with JIT compilation) for maximum compatibility

### Legacy Toolchain Compatibility
- **NVIDIA Driver**: 470.256.02 (last version supporting Kepler/K80)
- **CUDA Version**: 11.4.4 (last CUDA 11.x supporting compute 3.7)
- **GCC Version**: 10.5.0 (required by CUDA 11.4 host_config.h)

### CPU Architecture Trade-offs
Due to GCC 10.5 limitation, sacrificed newer CPU optimizations:
- Alderlake CPU variant enabled WITHOUT AVX_VNNI (requires GCC 11+)
- Still supports: SSE4.2, AVX, F16C, AVX2, BMI2, FMA
- Performance impact: ~3-7% on newer CPUs (acceptable for K80 compatibility)

### Build System Updates
- Modified ml/backend/ggml/ggml/src/ggml-cuda/CMakeLists.txt for compute 3.7
- Added -Wno-deprecated-gpu-targets flag to suppress warnings
- Updated ml/backend/ggml/ggml/src/CMakeLists.txt for Alderlake without AVX_VNNI

### Upstream Sync
Merged latest llama.cpp changes including:
- Enhanced KV cache management with ISWA and hybrid memory support
- Improved multi-modal support (mtmd framework)
- New model architectures (Gemma3, Llama4, Qwen3, etc.)
- GPU backend improvements for CUDA, Metal, and ROCm
- Updated quantization support and GGUF format handling

### Documentation
- Updated CLAUDE.md with comprehensive build instructions
- Documented toolchain constraints and CPU architecture trade-offs
- Removed outdated CI/CD workflows (tesla-k80-*.yml)
- Cleaned up temporary development artifacts

## Rationale

This fork maintains Tesla K80 GPU support (compute 3.7) which was dropped in
official Ollama due to legacy driver/CUDA requirements. The toolchain constraint
creates a deadlock:
- K80 → Driver 470 → CUDA 11.4 → GCC 10 → No AVX_VNNI

We accept the loss of cutting-edge CPU optimizations to enable running modern
LLMs on legacy but still capable Tesla K80 hardware (12GB VRAM per GPU).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 14:03:05 +08:00

4.0 KiB

Development

Install prerequisites:

  • Go
  • C/C++ Compiler e.g. Clang on macOS, TDM-GCC (Windows amd64) or llvm-mingw (Windows arm64), GCC/Clang on Linux.

Then build and run Ollama from the root directory of the repository:

go run . serve

Note

Ollama includes native code compiled with CGO. From time to time these data structures can change and CGO can get out of sync resulting in unexpected crashes. You can force a full build of the native code by running go clean -cache first.

macOS (Apple Silicon)

macOS Apple Silicon supports Metal which is built-in to the Ollama binary. No additional steps are required.

macOS (Intel)

Install prerequisites:

  • CMake or brew install cmake

Then, configure and build the project:

cmake -B build
cmake --build build

Lastly, run Ollama:

go run . serve

Windows

Install prerequisites:

Then, configure and build the project:

cmake -B build
cmake --build build --config Release

Important

Building for ROCm requires additional flags:

cmake -B build -G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++
cmake --build build --config Release

Lastly, run Ollama:

go run . serve

Windows (ARM)

Windows ARM does not support additional acceleration libraries at this time. Do not use cmake, simply go run or go build.

Linux

Install prerequisites:

  • CMake or sudo apt install cmake or sudo dnf install cmake
  • (Optional) AMD GPU support
  • (Optional) NVIDIA GPU support

Important

Ensure prerequisites are in PATH before running CMake.

Then, configure and build the project:

cmake -B build
cmake --build build

Lastly, run Ollama:

go run . serve

Docker

docker build .

ROCm

docker build --build-arg FLAVOR=rocm .

Running tests

To run tests, use go test:

go test ./...

NOTE: In rare circumstances, you may need to change a package using the new "synctest" package in go1.24.

If you do not have the "synctest" package enabled, you will not see build or test failures resulting from your change(s), if any, locally, but CI will break.

If you see failures in CI, you can either keep pushing changes to see if the CI build passes, or you can enable the "synctest" package locally to see the failures before pushing.

To enable the "synctest" package for testing, run the following command:

GOEXPERIMENT=synctest go test ./...

If you wish to enable synctest for all go commands, you can set the GOEXPERIMENT environment variable in your shell profile or by using:

go env -w GOEXPERIMENT=synctest

Which will enable the "synctest" package for all go commands without needing to set it for all shell sessions.

The synctest package is not required for production builds.

Library detection

Ollama looks for acceleration libraries in the following paths relative to the ollama executable:

  • ./lib/ollama (Windows)
  • ../lib/ollama (Linux)
  • . (macOS)
  • build/lib/ollama (for development)

If the libraries are not found, Ollama will not run with any acceleration libraries.