mirror of
https://github.com/dogkeeper886/ollama37.git
synced 2025-12-11 16:26:59 +00:00
This commit represents a complete rework after pulling the latest changes from official ollama/ollama repository and re-applying Tesla K80 compatibility patches. ## Key Changes ### CUDA Compute Capability 3.7 Support (Tesla K80) - Added sm_37 (compute 3.7) to CMAKE_CUDA_ARCHITECTURES in CMakeLists.txt - Updated CMakePresets.json to include compute 3.7 in "CUDA 11" preset - Using 37-virtual (PTX with JIT compilation) for maximum compatibility ### Legacy Toolchain Compatibility - **NVIDIA Driver**: 470.256.02 (last version supporting Kepler/K80) - **CUDA Version**: 11.4.4 (last CUDA 11.x supporting compute 3.7) - **GCC Version**: 10.5.0 (required by CUDA 11.4 host_config.h) ### CPU Architecture Trade-offs Due to GCC 10.5 limitation, sacrificed newer CPU optimizations: - Alderlake CPU variant enabled WITHOUT AVX_VNNI (requires GCC 11+) - Still supports: SSE4.2, AVX, F16C, AVX2, BMI2, FMA - Performance impact: ~3-7% on newer CPUs (acceptable for K80 compatibility) ### Build System Updates - Modified ml/backend/ggml/ggml/src/ggml-cuda/CMakeLists.txt for compute 3.7 - Added -Wno-deprecated-gpu-targets flag to suppress warnings - Updated ml/backend/ggml/ggml/src/CMakeLists.txt for Alderlake without AVX_VNNI ### Upstream Sync Merged latest llama.cpp changes including: - Enhanced KV cache management with ISWA and hybrid memory support - Improved multi-modal support (mtmd framework) - New model architectures (Gemma3, Llama4, Qwen3, etc.) - GPU backend improvements for CUDA, Metal, and ROCm - Updated quantization support and GGUF format handling ### Documentation - Updated CLAUDE.md with comprehensive build instructions - Documented toolchain constraints and CPU architecture trade-offs - Removed outdated CI/CD workflows (tesla-k80-*.yml) - Cleaned up temporary development artifacts ## Rationale This fork maintains Tesla K80 GPU support (compute 3.7) which was dropped in official Ollama due to legacy driver/CUDA requirements. The toolchain constraint creates a deadlock: - K80 → Driver 470 → CUDA 11.4 → GCC 10 → No AVX_VNNI We accept the loss of cutting-edge CPU optimizations to enable running modern LLMs on legacy but still capable Tesla K80 hardware (12GB VRAM per GPU). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
164 lines
4.0 KiB
Markdown
164 lines
4.0 KiB
Markdown
# Development
|
|
|
|
Install prerequisites:
|
|
|
|
- [Go](https://go.dev/doc/install)
|
|
- C/C++ Compiler e.g. Clang on macOS, [TDM-GCC](https://github.com/jmeubank/tdm-gcc/releases/latest) (Windows amd64) or [llvm-mingw](https://github.com/mstorsjo/llvm-mingw) (Windows arm64), GCC/Clang on Linux.
|
|
|
|
Then build and run Ollama from the root directory of the repository:
|
|
|
|
```shell
|
|
go run . serve
|
|
```
|
|
|
|
> [!NOTE]
|
|
> Ollama includes native code compiled with CGO. From time to time these data structures can change and CGO can get out of sync resulting in unexpected crashes. You can force a full build of the native code by running `go clean -cache` first.
|
|
|
|
|
|
## macOS (Apple Silicon)
|
|
|
|
macOS Apple Silicon supports Metal which is built-in to the Ollama binary. No additional steps are required.
|
|
|
|
## macOS (Intel)
|
|
|
|
Install prerequisites:
|
|
|
|
- [CMake](https://cmake.org/download/) or `brew install cmake`
|
|
|
|
Then, configure and build the project:
|
|
|
|
```shell
|
|
cmake -B build
|
|
cmake --build build
|
|
```
|
|
|
|
Lastly, run Ollama:
|
|
|
|
```shell
|
|
go run . serve
|
|
```
|
|
|
|
## Windows
|
|
|
|
Install prerequisites:
|
|
|
|
- [CMake](https://cmake.org/download/)
|
|
- [Visual Studio 2022](https://visualstudio.microsoft.com/downloads/) including the Native Desktop Workload
|
|
- (Optional) AMD GPU support
|
|
- [ROCm](https://rocm.docs.amd.com/en/latest/)
|
|
- [Ninja](https://github.com/ninja-build/ninja/releases)
|
|
- (Optional) NVIDIA GPU support
|
|
- [CUDA SDK](https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64&target_version=11&target_type=exe_network)
|
|
|
|
Then, configure and build the project:
|
|
|
|
```shell
|
|
cmake -B build
|
|
cmake --build build --config Release
|
|
```
|
|
|
|
> [!IMPORTANT]
|
|
> Building for ROCm requires additional flags:
|
|
> ```
|
|
> cmake -B build -G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++
|
|
> cmake --build build --config Release
|
|
> ```
|
|
|
|
|
|
Lastly, run Ollama:
|
|
|
|
```shell
|
|
go run . serve
|
|
```
|
|
|
|
## Windows (ARM)
|
|
|
|
Windows ARM does not support additional acceleration libraries at this time. Do not use cmake, simply `go run` or `go build`.
|
|
|
|
## Linux
|
|
|
|
Install prerequisites:
|
|
|
|
- [CMake](https://cmake.org/download/) or `sudo apt install cmake` or `sudo dnf install cmake`
|
|
- (Optional) AMD GPU support
|
|
- [ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html)
|
|
- (Optional) NVIDIA GPU support
|
|
- [CUDA SDK](https://developer.nvidia.com/cuda-downloads)
|
|
|
|
> [!IMPORTANT]
|
|
> Ensure prerequisites are in `PATH` before running CMake.
|
|
|
|
|
|
Then, configure and build the project:
|
|
|
|
```shell
|
|
cmake -B build
|
|
cmake --build build
|
|
```
|
|
|
|
Lastly, run Ollama:
|
|
|
|
```shell
|
|
go run . serve
|
|
```
|
|
|
|
## Docker
|
|
|
|
```shell
|
|
docker build .
|
|
```
|
|
|
|
### ROCm
|
|
|
|
```shell
|
|
docker build --build-arg FLAVOR=rocm .
|
|
```
|
|
|
|
## Running tests
|
|
|
|
To run tests, use `go test`:
|
|
|
|
```shell
|
|
go test ./...
|
|
```
|
|
|
|
> NOTE: In rare circumstances, you may need to change a package using the new
|
|
> "synctest" package in go1.24.
|
|
>
|
|
> If you do not have the "synctest" package enabled, you will not see build or
|
|
> test failures resulting from your change(s), if any, locally, but CI will
|
|
> break.
|
|
>
|
|
> If you see failures in CI, you can either keep pushing changes to see if the
|
|
> CI build passes, or you can enable the "synctest" package locally to see the
|
|
> failures before pushing.
|
|
>
|
|
> To enable the "synctest" package for testing, run the following command:
|
|
>
|
|
> ```shell
|
|
> GOEXPERIMENT=synctest go test ./...
|
|
> ```
|
|
>
|
|
> If you wish to enable synctest for all go commands, you can set the
|
|
> `GOEXPERIMENT` environment variable in your shell profile or by using:
|
|
>
|
|
> ```shell
|
|
> go env -w GOEXPERIMENT=synctest
|
|
> ```
|
|
>
|
|
> Which will enable the "synctest" package for all go commands without needing
|
|
> to set it for all shell sessions.
|
|
>
|
|
> The synctest package is not required for production builds.
|
|
|
|
## Library detection
|
|
|
|
Ollama looks for acceleration libraries in the following paths relative to the `ollama` executable:
|
|
|
|
* `./lib/ollama` (Windows)
|
|
* `../lib/ollama` (Linux)
|
|
* `.` (macOS)
|
|
* `build/lib/ollama` (for development)
|
|
|
|
If the libraries are not found, Ollama will not run with any acceleration libraries.
|