mirror of
https://github.com/dogkeeper886/ollama37.git
synced 2025-12-10 15:57:04 +00:00
Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support
This commit represents a complete rework after pulling the latest changes from official ollama/ollama repository and re-applying Tesla K80 compatibility patches. ## Key Changes ### CUDA Compute Capability 3.7 Support (Tesla K80) - Added sm_37 (compute 3.7) to CMAKE_CUDA_ARCHITECTURES in CMakeLists.txt - Updated CMakePresets.json to include compute 3.7 in "CUDA 11" preset - Using 37-virtual (PTX with JIT compilation) for maximum compatibility ### Legacy Toolchain Compatibility - **NVIDIA Driver**: 470.256.02 (last version supporting Kepler/K80) - **CUDA Version**: 11.4.4 (last CUDA 11.x supporting compute 3.7) - **GCC Version**: 10.5.0 (required by CUDA 11.4 host_config.h) ### CPU Architecture Trade-offs Due to GCC 10.5 limitation, sacrificed newer CPU optimizations: - Alderlake CPU variant enabled WITHOUT AVX_VNNI (requires GCC 11+) - Still supports: SSE4.2, AVX, F16C, AVX2, BMI2, FMA - Performance impact: ~3-7% on newer CPUs (acceptable for K80 compatibility) ### Build System Updates - Modified ml/backend/ggml/ggml/src/ggml-cuda/CMakeLists.txt for compute 3.7 - Added -Wno-deprecated-gpu-targets flag to suppress warnings - Updated ml/backend/ggml/ggml/src/CMakeLists.txt for Alderlake without AVX_VNNI ### Upstream Sync Merged latest llama.cpp changes including: - Enhanced KV cache management with ISWA and hybrid memory support - Improved multi-modal support (mtmd framework) - New model architectures (Gemma3, Llama4, Qwen3, etc.) - GPU backend improvements for CUDA, Metal, and ROCm - Updated quantization support and GGUF format handling ### Documentation - Updated CLAUDE.md with comprehensive build instructions - Documented toolchain constraints and CPU architecture trade-offs - Removed outdated CI/CD workflows (tesla-k80-*.yml) - Cleaned up temporary development artifacts ## Rationale This fork maintains Tesla K80 GPU support (compute 3.7) which was dropped in official Ollama due to legacy driver/CUDA requirements. The toolchain constraint creates a deadlock: - K80 → Driver 470 → CUDA 11.4 → GCC 10 → No AVX_VNNI We accept the loss of cutting-edge CPU optimizations to enable running modern LLMs on legacy but still capable Tesla K80 hardware (12GB VRAM per GPU). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
68
.github/ISSUE_TEMPLATE/10_bug_report.yml
vendored
Normal file
68
.github/ISSUE_TEMPLATE/10_bug_report.yml
vendored
Normal file
@@ -0,0 +1,68 @@
|
||||
name: Bug report
|
||||
labels: [bug]
|
||||
description: Something isn't working right.
|
||||
body:
|
||||
- type: textarea
|
||||
id: description
|
||||
attributes:
|
||||
label: What is the issue?
|
||||
description: What happened? What did you expect to happen?
|
||||
validations:
|
||||
required: true
|
||||
- type: textarea
|
||||
id: logs
|
||||
attributes:
|
||||
label: Relevant log output
|
||||
description: Please copy and paste any relevant log output. See [Troubleshooting Guide](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) for details.
|
||||
render: shell
|
||||
validations:
|
||||
required: false
|
||||
- type: dropdown
|
||||
id: os
|
||||
attributes:
|
||||
label: OS
|
||||
description: Which operating system are you using?
|
||||
multiple: true
|
||||
options:
|
||||
- Linux
|
||||
- macOS
|
||||
- Windows
|
||||
- Docker
|
||||
- WSL2
|
||||
validations:
|
||||
required: false
|
||||
- type: dropdown
|
||||
id: gpu
|
||||
attributes:
|
||||
label: GPU
|
||||
description: Which GPU are you using?
|
||||
multiple: true
|
||||
options:
|
||||
- Nvidia
|
||||
- AMD
|
||||
- Intel
|
||||
- Apple
|
||||
- Other
|
||||
validations:
|
||||
required: false
|
||||
- type: dropdown
|
||||
id: cpu
|
||||
attributes:
|
||||
label: CPU
|
||||
description: Which CPU are you using?
|
||||
multiple: true
|
||||
options:
|
||||
- Intel
|
||||
- AMD
|
||||
- Apple
|
||||
- Other
|
||||
validations:
|
||||
required: false
|
||||
- type: input
|
||||
id: version
|
||||
attributes:
|
||||
label: Ollama version
|
||||
description: What version of Ollama are you using? (`ollama --version`)
|
||||
placeholder: e.g., 0.1.32
|
||||
validations:
|
||||
required: false
|
||||
6
.github/ISSUE_TEMPLATE/20_feature_request.md
vendored
Normal file
6
.github/ISSUE_TEMPLATE/20_feature_request.md
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
---
|
||||
name: Feature request
|
||||
about: Request a new feature
|
||||
labels: feature request
|
||||
---
|
||||
|
||||
5
.github/ISSUE_TEMPLATE/30_model_request.md
vendored
Normal file
5
.github/ISSUE_TEMPLATE/30_model_request.md
vendored
Normal file
@@ -0,0 +1,5 @@
|
||||
---
|
||||
name: Model request
|
||||
about: Request support for a new model to be added to Ollama
|
||||
labels: model request
|
||||
---
|
||||
8
.github/ISSUE_TEMPLATE/config.yml
vendored
Normal file
8
.github/ISSUE_TEMPLATE/config.yml
vendored
Normal file
@@ -0,0 +1,8 @@
|
||||
blank_issues_enabled: true
|
||||
contact_links:
|
||||
- name: Help
|
||||
url: https://discord.com/invite/ollama
|
||||
about: Please join our Discord server for help using Ollama
|
||||
- name: Troubleshooting
|
||||
url: https://github.com/ollama/ollama/blob/main/docs/faq.md#faq
|
||||
about: See the FAQ for common issues and solutions
|
||||
266
.github/workflows/CLAUDE.md
vendored
266
.github/workflows/CLAUDE.md
vendored
@@ -1,266 +0,0 @@
|
||||
# GitHub Actions Workflows - Tesla K80 Testing
|
||||
|
||||
## Overview
|
||||
|
||||
This directory contains workflows for automated testing of ollama37 on Tesla K80 (CUDA Compute Capability 3.7) hardware.
|
||||
|
||||
## Workflows
|
||||
|
||||
### 1. tesla-k80-ci.yml - Build Workflow
|
||||
**Trigger**: Manual only (`workflow_dispatch`)
|
||||
|
||||
**Purpose**: Build the ollama binary with CUDA 3.7 support
|
||||
|
||||
**Steps**:
|
||||
1. Checkout code
|
||||
2. Clean previous build artifacts
|
||||
3. Configure CMake with GCC 10 and CUDA 11
|
||||
4. Build C++/CUDA components
|
||||
5. Build Go binary
|
||||
6. Verify binary
|
||||
7. Upload binary artifact
|
||||
|
||||
**Artifacts**: `ollama-binary-{sha}` - Compiled binary for the commit
|
||||
|
||||
### 2. tesla-k80-tests.yml - Test Workflow
|
||||
**Trigger**: Manual only (`workflow_dispatch`)
|
||||
|
||||
**Purpose**: Run comprehensive tests using the test framework
|
||||
|
||||
**Steps**:
|
||||
1. Checkout code
|
||||
2. Verify ollama binary exists
|
||||
3. Run test-runner tool (see below)
|
||||
4. Upload test results and logs
|
||||
|
||||
**Artifacts**: Test reports, logs, analysis results
|
||||
|
||||
## Test Framework Architecture
|
||||
|
||||
### TODO: Implement Go-based Test Runner
|
||||
|
||||
**Goal**: Create a dedicated Go test orchestrator at `cmd/test-runner/main.go` that manages the complete test lifecycle for Tesla K80.
|
||||
|
||||
#### Task Breakdown
|
||||
|
||||
1. **Design test configuration format**
|
||||
- Create `test/config/models.yaml` - List of models to test with parameters
|
||||
- Define model test spec: name, size, expected behavior, test prompts
|
||||
- Support test profiles: quick (small models), full (all sizes), stress test
|
||||
|
||||
2. **Implement server lifecycle management**
|
||||
- Start `./ollama serve` as subprocess
|
||||
- Capture stdout/stderr to log file
|
||||
- Monitor server readiness (health check API)
|
||||
- Graceful shutdown on test completion or failure
|
||||
- Timeout handling for hung processes
|
||||
|
||||
3. **Implement real-time log monitoring**
|
||||
- Goroutine to tail server logs
|
||||
- Pattern matching for critical events:
|
||||
- GPU initialization messages
|
||||
- Model loading progress
|
||||
- CUDA errors or warnings
|
||||
- Memory allocation failures
|
||||
- CPU fallback warnings
|
||||
- Store events for later analysis
|
||||
|
||||
4. **Implement model testing logic**
|
||||
- For each model in config:
|
||||
- Pull model via API (if not cached)
|
||||
- Wait for model ready
|
||||
- Parse logs for GPU loading confirmation
|
||||
- Send chat API request with test prompt
|
||||
- Validate response (not empty, reasonable length, coherent)
|
||||
- Check logs for errors during inference
|
||||
- Record timing metrics (load time, first token, completion)
|
||||
|
||||
5. **Implement test validation**
|
||||
- GPU loading verification:
|
||||
- Parse logs for "loaded model" + GPU device ID
|
||||
- Check for "offloading N layers to GPU"
|
||||
- Verify no "using CPU backend" messages
|
||||
- Response quality checks:
|
||||
- Response not empty
|
||||
- Minimum token count (avoid truncated responses)
|
||||
- JSON structure valid (for API responses)
|
||||
- Error detection:
|
||||
- No CUDA errors in logs
|
||||
- No OOM (out of memory) errors
|
||||
- No model loading failures
|
||||
|
||||
6. **Implement structured reporting**
|
||||
- Generate JSON report with:
|
||||
- Test summary (pass/fail/skip counts)
|
||||
- Per-model results (status, timings, errors)
|
||||
- Log excerpts for failures
|
||||
- GPU metrics (memory usage, utilization)
|
||||
- Generate human-readable summary (markdown/text)
|
||||
- Exit code: 0 for all pass, 1 for any failure
|
||||
|
||||
7. **Implement CLI interface**
|
||||
- Flags:
|
||||
- `--config` - Path to test config file
|
||||
- `--profile` - Test profile to run (quick/full/stress)
|
||||
- `--ollama-bin` - Path to ollama binary (default: ./ollama)
|
||||
- `--output` - Report output path
|
||||
- `--verbose` - Detailed logging
|
||||
- `--keep-models` - Don't delete models after test
|
||||
- Subcommands:
|
||||
- `run` - Run tests
|
||||
- `validate` - Validate config only
|
||||
- `list` - List available test profiles/models
|
||||
|
||||
8. **Update GitHub Actions workflow**
|
||||
- Build test-runner binary in CI workflow
|
||||
- Run test-runner in test workflow
|
||||
- Parse JSON report for pass/fail
|
||||
- Upload structured results as artifacts
|
||||
|
||||
#### File Structure
|
||||
|
||||
```
|
||||
cmd/test-runner/
|
||||
main.go # CLI entry point
|
||||
config.go # Config loading and validation
|
||||
server.go # Server lifecycle management
|
||||
monitor.go # Log monitoring and parsing
|
||||
test.go # Model test execution
|
||||
validate.go # Response and log validation
|
||||
report.go # Test report generation
|
||||
|
||||
test/config/
|
||||
models.yaml # Default test configuration
|
||||
quick.yaml # Quick test profile (small models)
|
||||
full.yaml # Full test profile (all sizes)
|
||||
|
||||
.github/workflows/
|
||||
tesla-k80-ci.yml # Build workflow (manual)
|
||||
tesla-k80-tests.yml # Test workflow (manual, uses test-runner)
|
||||
```
|
||||
|
||||
#### Example Test Configuration (models.yaml)
|
||||
|
||||
```yaml
|
||||
profiles:
|
||||
quick:
|
||||
models:
|
||||
- name: gemma2:2b
|
||||
prompts:
|
||||
- "Hello, respond with a greeting."
|
||||
min_response_tokens: 5
|
||||
timeout: 30s
|
||||
|
||||
full:
|
||||
models:
|
||||
- name: gemma2:2b
|
||||
prompts:
|
||||
- "Hello, respond with a greeting."
|
||||
- "What is 2+2?"
|
||||
min_response_tokens: 5
|
||||
timeout: 30s
|
||||
|
||||
- name: gemma3:4b
|
||||
prompts:
|
||||
- "Explain photosynthesis in one sentence."
|
||||
min_response_tokens: 10
|
||||
timeout: 60s
|
||||
|
||||
- name: gemma3:12b
|
||||
prompts:
|
||||
- "Write a haiku about GPUs."
|
||||
min_response_tokens: 15
|
||||
timeout: 120s
|
||||
|
||||
validation:
|
||||
gpu_required: true
|
||||
check_patterns:
|
||||
success:
|
||||
- "loaded model"
|
||||
- "offload.*GPU"
|
||||
failure:
|
||||
- "CUDA.*error"
|
||||
- "out of memory"
|
||||
- "CPU backend"
|
||||
```
|
||||
|
||||
#### Example Test Runner Usage
|
||||
|
||||
```bash
|
||||
# Build test runner
|
||||
go build -o test-runner ./cmd/test-runner
|
||||
|
||||
# Run quick test profile
|
||||
./test-runner run --config test/config/models.yaml --profile quick
|
||||
|
||||
# Run full test with verbose output
|
||||
./test-runner run --profile full --verbose --output test-report.json
|
||||
|
||||
# Validate config only
|
||||
./test-runner validate --config test/config/models.yaml
|
||||
|
||||
# List available profiles
|
||||
./test-runner list
|
||||
```
|
||||
|
||||
#### Integration with GitHub Actions
|
||||
|
||||
```yaml
|
||||
- name: Build test runner
|
||||
run: go build -o test-runner ./cmd/test-runner
|
||||
|
||||
- name: Run tests
|
||||
run: |
|
||||
./test-runner run --profile full --output test-report.json --verbose
|
||||
timeout-minutes: 45
|
||||
|
||||
- name: Check test results
|
||||
run: |
|
||||
if ! jq -e '.summary.failed == 0' test-report.json; then
|
||||
echo "Tests failed!"
|
||||
jq '.failures' test-report.json
|
||||
exit 1
|
||||
fi
|
||||
|
||||
- name: Upload test report
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: test-report
|
||||
path: |
|
||||
test-report.json
|
||||
ollama.log
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Self-Hosted Runner Setup
|
||||
|
||||
1. **Install GitHub Actions Runner on your Tesla K80 machine**:
|
||||
```bash
|
||||
mkdir -p ~/actions-runner && cd ~/actions-runner
|
||||
curl -o actions-runner-linux-x64-2.XXX.X.tar.gz -L \
|
||||
https://github.com/actions/runner/releases/download/vX.XXX.X/actions-runner-linux-x64-2.XXX.X.tar.gz
|
||||
tar xzf ./actions-runner-linux-x64-2.XXX.X.tar.gz
|
||||
|
||||
# Configure (use token from GitHub)
|
||||
./config.sh --url https://github.com/YOUR_USERNAME/ollama37 --token YOUR_TOKEN
|
||||
|
||||
# Install and start as a service
|
||||
sudo ./svc.sh install
|
||||
sudo ./svc.sh start
|
||||
```
|
||||
|
||||
2. **Verify runner environment has**:
|
||||
- CUDA 11.4+ toolkit installed
|
||||
- GCC 10 at `/usr/local/bin/gcc` and `/usr/local/bin/g++`
|
||||
- CMake 3.24+
|
||||
- Go 1.24+
|
||||
- NVIDIA driver with Tesla K80 support
|
||||
- Network access to download models
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- Self-hosted runners should be on a secure, isolated machine
|
||||
- Consider using runner groups to restrict repository access
|
||||
- Do not use self-hosted runners for public repositories (untrusted PRs)
|
||||
- Keep runner software updated
|
||||
24
.github/workflows/latest.yaml
vendored
Normal file
24
.github/workflows/latest.yaml
vendored
Normal file
@@ -0,0 +1,24 @@
|
||||
name: latest
|
||||
|
||||
on:
|
||||
release:
|
||||
types: [released]
|
||||
|
||||
jobs:
|
||||
update-latest:
|
||||
environment: release
|
||||
runs-on: linux
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Login to Docker Hub
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ vars.DOCKER_USER }}
|
||||
password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
|
||||
- name: Tag images as latest
|
||||
env:
|
||||
PUSH: "1"
|
||||
shell: bash
|
||||
run: |
|
||||
export "VERSION=${GITHUB_REF_NAME#v}"
|
||||
./scripts/tag_latest.sh
|
||||
431
.github/workflows/release.yaml
vendored
Normal file
431
.github/workflows/release.yaml
vendored
Normal file
@@ -0,0 +1,431 @@
|
||||
name: release
|
||||
|
||||
on:
|
||||
push:
|
||||
tags:
|
||||
- 'v*'
|
||||
|
||||
env:
|
||||
CGO_CFLAGS: '-O3'
|
||||
CGO_CXXFLAGS: '-O3'
|
||||
|
||||
jobs:
|
||||
setup-environment:
|
||||
runs-on: ubuntu-latest
|
||||
environment: release
|
||||
outputs:
|
||||
GOFLAGS: ${{ steps.goflags.outputs.GOFLAGS }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Set environment
|
||||
id: goflags
|
||||
run: |
|
||||
echo GOFLAGS="'-ldflags=-w -s \"-X=github.com/ollama/ollama/version.Version=${GITHUB_REF_NAME#v}\" \"-X=github.com/ollama/ollama/server.mode=release\"'" >>$GITHUB_OUTPUT
|
||||
|
||||
darwin-build:
|
||||
runs-on: macos-13-xlarge
|
||||
environment: release
|
||||
needs: setup-environment
|
||||
strategy:
|
||||
matrix:
|
||||
os: [darwin]
|
||||
arch: [amd64, arm64]
|
||||
env:
|
||||
GOFLAGS: ${{ needs.setup-environment.outputs.GOFLAGS }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-go@v5
|
||||
with:
|
||||
go-version-file: go.mod
|
||||
- run: |
|
||||
go build -o dist/ .
|
||||
env:
|
||||
GOOS: ${{ matrix.os }}
|
||||
GOARCH: ${{ matrix.arch }}
|
||||
CGO_ENABLED: 1
|
||||
CGO_CPPFLAGS: '-mmacosx-version-min=11.3'
|
||||
- if: matrix.arch == 'amd64'
|
||||
run: |
|
||||
cmake --preset CPU -DCMAKE_OSX_DEPLOYMENT_TARGET=11.3 -DCMAKE_SYSTEM_PROCESSOR=x86_64 -DCMAKE_OSX_ARCHITECTURES=x86_64
|
||||
cmake --build --parallel --preset CPU
|
||||
cmake --install build --component CPU --strip --parallel 8
|
||||
- uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: build-${{ matrix.os }}-${{ matrix.arch }}
|
||||
path: dist/*
|
||||
|
||||
windows-depends:
|
||||
strategy:
|
||||
matrix:
|
||||
os: [windows]
|
||||
arch: [amd64]
|
||||
preset: ['CPU']
|
||||
include:
|
||||
- os: windows
|
||||
arch: amd64
|
||||
preset: 'CUDA 12'
|
||||
install: https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda_12.8.0_571.96_windows.exe
|
||||
cuda-components:
|
||||
- '"cudart"'
|
||||
- '"nvcc"'
|
||||
- '"cublas"'
|
||||
- '"cublas_dev"'
|
||||
cuda-version: '12.8'
|
||||
flags: ''
|
||||
runner_dir: 'cuda_v12'
|
||||
- os: windows
|
||||
arch: amd64
|
||||
preset: 'CUDA 13'
|
||||
install: https://developer.download.nvidia.com/compute/cuda/13.0.0/local_installers/cuda_13.0.0_windows.exe
|
||||
cuda-components:
|
||||
- '"cudart"'
|
||||
- '"nvcc"'
|
||||
- '"cublas"'
|
||||
- '"cublas_dev"'
|
||||
- '"crt"'
|
||||
- '"nvvm"'
|
||||
- '"nvptxcompiler"'
|
||||
cuda-version: '13.0'
|
||||
flags: ''
|
||||
runner_dir: 'cuda_v13'
|
||||
- os: windows
|
||||
arch: amd64
|
||||
preset: 'ROCm 6'
|
||||
install: https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-24.Q4-WinSvr2022-For-HIP.exe
|
||||
rocm-version: '6.2'
|
||||
flags: '-DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_FLAGS="-parallel-jobs=4 -Wno-ignored-attributes -Wno-deprecated-pragma" -DCMAKE_CXX_FLAGS="-parallel-jobs=4 -Wno-ignored-attributes -Wno-deprecated-pragma"'
|
||||
runner_dir: 'rocm'
|
||||
runs-on: ${{ matrix.arch == 'arm64' && format('{0}-{1}', matrix.os, matrix.arch) || matrix.os }}
|
||||
environment: release
|
||||
env:
|
||||
GOFLAGS: ${{ needs.setup-environment.outputs.GOFLAGS }}
|
||||
steps:
|
||||
- name: Install system dependencies
|
||||
run: |
|
||||
choco install -y --no-progress ccache ninja
|
||||
ccache -o cache_dir=${{ github.workspace }}\.ccache
|
||||
- if: startsWith(matrix.preset, 'CUDA ') || startsWith(matrix.preset, 'ROCm ')
|
||||
id: cache-install
|
||||
uses: actions/cache/restore@v4
|
||||
with:
|
||||
path: |
|
||||
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
|
||||
C:\Program Files\AMD\ROCm
|
||||
key: ${{ matrix.install }}
|
||||
- if: startsWith(matrix.preset, 'CUDA ')
|
||||
name: Install CUDA ${{ matrix.cuda-version }}
|
||||
run: |
|
||||
$ErrorActionPreference = "Stop"
|
||||
if ("${{ steps.cache-install.outputs.cache-hit }}" -ne 'true') {
|
||||
Invoke-WebRequest -Uri "${{ matrix.install }}" -OutFile "install.exe"
|
||||
$subpackages = @(${{ join(matrix.cuda-components, ', ') }}) | Foreach-Object {"${_}_${{ matrix.cuda-version }}"}
|
||||
Start-Process -FilePath .\install.exe -ArgumentList (@("-s") + $subpackages) -NoNewWindow -Wait
|
||||
}
|
||||
|
||||
$cudaPath = (Resolve-Path "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\*").path
|
||||
echo "$cudaPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
|
||||
- if: startsWith(matrix.preset, 'ROCm')
|
||||
name: Install ROCm ${{ matrix.rocm-version }}
|
||||
run: |
|
||||
$ErrorActionPreference = "Stop"
|
||||
if ("${{ steps.cache-install.outputs.cache-hit }}" -ne 'true') {
|
||||
Invoke-WebRequest -Uri "${{ matrix.install }}" -OutFile "install.exe"
|
||||
Start-Process -FilePath .\install.exe -ArgumentList '-install' -NoNewWindow -Wait
|
||||
}
|
||||
|
||||
$hipPath = (Resolve-Path "C:\Program Files\AMD\ROCm\*").path
|
||||
echo "$hipPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
|
||||
echo "CC=$hipPath\bin\clang.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
|
||||
echo "CXX=$hipPath\bin\clang++.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
|
||||
echo "HIPCXX=$hipPath\bin\clang++.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
|
||||
echo "HIP_PLATFORM=amd" | Out-File -FilePath $env:GITHUB_ENV -Append
|
||||
echo "CMAKE_PREFIX_PATH=$hipPath" | Out-File -FilePath $env:GITHUB_ENV -Append
|
||||
- if: matrix.preset == 'CPU'
|
||||
run: |
|
||||
echo "CC=clang.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
|
||||
echo "CXX=clang++.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
|
||||
- if: ${{ !cancelled() && steps.cache-install.outputs.cache-hit != 'true' }}
|
||||
uses: actions/cache/save@v4
|
||||
with:
|
||||
path: |
|
||||
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
|
||||
C:\Program Files\AMD\ROCm
|
||||
key: ${{ matrix.install }}
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/cache@v4
|
||||
with:
|
||||
path: ${{ github.workspace }}\.ccache
|
||||
key: ccache-${{ matrix.os }}-${{ matrix.arch }}-${{ matrix.preset }}
|
||||
- name: Build target "${{ matrix.preset }}"
|
||||
run: |
|
||||
Import-Module 'C:\Program Files\Microsoft Visual Studio\2022\Enterprise\Common7\Tools\Microsoft.VisualStudio.DevShell.dll'
|
||||
Enter-VsDevShell -VsInstallPath 'C:\Program Files\Microsoft Visual Studio\2022\Enterprise' -SkipAutomaticLocation -DevCmdArguments '-arch=x64 -no_logo'
|
||||
cmake --preset "${{ matrix.preset }}" ${{ matrix.flags }} -DOLLAMA_RUNNER_DIR="${{ matrix.runner_dir }}"
|
||||
cmake --build --parallel --preset "${{ matrix.preset }}"
|
||||
cmake --install build --component "${{ startsWith(matrix.preset, 'CUDA ') && 'CUDA' || startsWith(matrix.preset, 'ROCm ') && 'HIP' || 'CPU' }}" --strip --parallel 8
|
||||
Remove-Item -Path dist\lib\ollama\rocm\rocblas\library\*gfx906* -ErrorAction SilentlyContinue
|
||||
env:
|
||||
CMAKE_GENERATOR: Ninja
|
||||
- uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: depends-${{ matrix.os }}-${{ matrix.arch }}-${{ matrix.preset }}
|
||||
path: dist\*
|
||||
|
||||
windows-build:
|
||||
strategy:
|
||||
matrix:
|
||||
os: [windows]
|
||||
arch: [amd64, arm64]
|
||||
include:
|
||||
- os: windows
|
||||
arch: amd64
|
||||
llvmarch: x86_64
|
||||
- os: windows
|
||||
arch: arm64
|
||||
llvmarch: aarch64
|
||||
runs-on: ${{ matrix.arch == 'arm64' && format('{0}-{1}', matrix.os, matrix.arch) || matrix.os }}
|
||||
environment: release
|
||||
needs: [setup-environment]
|
||||
env:
|
||||
GOFLAGS: ${{ needs.setup-environment.outputs.GOFLAGS }}
|
||||
steps:
|
||||
- name: Install ARM64 system dependencies
|
||||
if: matrix.arch == 'arm64'
|
||||
run: |
|
||||
$ErrorActionPreference = "Stop"
|
||||
Set-ExecutionPolicy Bypass -Scope Process -Force
|
||||
[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072
|
||||
iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
|
||||
echo "C:\ProgramData\chocolatey\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
|
||||
|
||||
choco install -y --no-progress git gzip
|
||||
echo "C:\Program Files\Git\cmd" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
|
||||
- name: Install clang and gcc-compat
|
||||
run: |
|
||||
$ErrorActionPreference = "Stop"
|
||||
Set-ExecutionPolicy Bypass -Scope Process -Force
|
||||
Invoke-WebRequest -Uri "https://github.com/mstorsjo/llvm-mingw/releases/download/20240619/llvm-mingw-20240619-ucrt-${{ matrix.llvmarch }}.zip" -OutFile "${{ runner.temp }}\llvm-mingw-ucrt.zip"
|
||||
Expand-Archive -Path ${{ runner.temp }}\llvm-mingw-ucrt.zip -DestinationPath "C:\Program Files\"
|
||||
$installPath=(Resolve-Path -Path "C:\Program Files\llvm-mingw-*-ucrt*").path
|
||||
echo "$installPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-go@v5
|
||||
with:
|
||||
go-version-file: go.mod
|
||||
- name: Verify gcc is actually clang
|
||||
run: |
|
||||
$ErrorActionPreference='Continue'
|
||||
$version=& gcc -v 2>&1
|
||||
$version=$version -join "`n"
|
||||
echo "gcc is $version"
|
||||
if ($version -notmatch 'clang') {
|
||||
echo "ERROR: GCC must be clang for proper utf16 handling"
|
||||
exit 1
|
||||
}
|
||||
$ErrorActionPreference='Stop'
|
||||
- run: |
|
||||
go build -o dist/${{ matrix.os }}-${{ matrix.arch }}/ .
|
||||
- uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: build-${{ matrix.os }}-${{ matrix.arch }}
|
||||
path: |
|
||||
dist\${{ matrix.os }}-${{ matrix.arch }}\*.exe
|
||||
|
||||
linux-build:
|
||||
strategy:
|
||||
matrix:
|
||||
include:
|
||||
- os: linux
|
||||
arch: amd64
|
||||
target: archive_novulkan
|
||||
- os: linux
|
||||
arch: amd64
|
||||
target: rocm
|
||||
- os: linux
|
||||
arch: arm64
|
||||
target: archive_novulkan
|
||||
runs-on: ${{ matrix.arch == 'arm64' && format('{0}-{1}', matrix.os, matrix.arch) || matrix.os }}
|
||||
environment: release
|
||||
needs: setup-environment
|
||||
env:
|
||||
GOFLAGS: ${{ needs.setup-environment.outputs.GOFLAGS }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: docker/setup-buildx-action@v3
|
||||
- uses: docker/build-push-action@v6
|
||||
with:
|
||||
context: .
|
||||
platforms: ${{ matrix.os }}/${{ matrix.arch }}
|
||||
target: ${{ matrix.target }}
|
||||
build-args: |
|
||||
GOFLAGS=${{ env.GOFLAGS }}
|
||||
CGO_CFLAGS=${{ env.CGO_CFLAGS }}
|
||||
CGO_CXXFLAGS=${{ env.CGO_CXXFLAGS }}
|
||||
outputs: type=local,dest=dist/${{ matrix.os }}-${{ matrix.arch }}
|
||||
cache-from: type=registry,ref=${{ vars.DOCKER_REPO }}:latest
|
||||
cache-to: type=inline
|
||||
- run: |
|
||||
for COMPONENT in bin/* lib/ollama/*; do
|
||||
case "$COMPONENT" in
|
||||
bin/ollama) echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}.tar.in ;;
|
||||
lib/ollama/*.so*) echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}.tar.in ;;
|
||||
lib/ollama/cuda_v*) echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}.tar.in ;;
|
||||
lib/ollama/cuda_jetpack5) echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}-jetpack5.tar.in ;;
|
||||
lib/ollama/cuda_jetpack6) echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}-jetpack6.tar.in ;;
|
||||
lib/ollama/rocm) echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}-rocm.tar.in ;;
|
||||
esac
|
||||
done
|
||||
working-directory: dist/${{ matrix.os }}-${{ matrix.arch }}
|
||||
- run: |
|
||||
echo "Manifests"
|
||||
for ARCHIVE in dist/${{ matrix.os }}-${{ matrix.arch }}/*.tar.in ; do
|
||||
echo $ARCHIVE
|
||||
cat $ARCHIVE
|
||||
done
|
||||
- run: |
|
||||
for ARCHIVE in dist/${{ matrix.os }}-${{ matrix.arch }}/*.tar.in; do
|
||||
tar c -C dist/${{ matrix.os }}-${{ matrix.arch }} -T $ARCHIVE --owner 0 --group 0 | pigz -9vc >$(basename ${ARCHIVE//.*/}.tgz);
|
||||
done
|
||||
- uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: dist-${{ matrix.os }}-${{ matrix.arch }}-${{ matrix.target }}
|
||||
path: |
|
||||
*.tgz
|
||||
|
||||
# Build each Docker variant (OS, arch, and flavor) separately. Using QEMU is unreliable and slower.
|
||||
docker-build-push:
|
||||
strategy:
|
||||
matrix:
|
||||
include:
|
||||
- os: linux
|
||||
arch: arm64
|
||||
target: novulkan
|
||||
build-args: |
|
||||
CGO_CFLAGS
|
||||
CGO_CXXFLAGS
|
||||
GOFLAGS
|
||||
- os: linux
|
||||
arch: amd64
|
||||
target: novulkan
|
||||
build-args: |
|
||||
CGO_CFLAGS
|
||||
CGO_CXXFLAGS
|
||||
GOFLAGS
|
||||
- os: linux
|
||||
arch: amd64
|
||||
suffix: '-rocm'
|
||||
build-args: |
|
||||
CGO_CFLAGS
|
||||
CGO_CXXFLAGS
|
||||
GOFLAGS
|
||||
FLAVOR=rocm
|
||||
- os: linux
|
||||
arch: amd64
|
||||
suffix: '-vulkan'
|
||||
target: default
|
||||
build-args: |
|
||||
CGO_CFLAGS
|
||||
CGO_CXXFLAGS
|
||||
GOFLAGS
|
||||
runs-on: ${{ matrix.arch == 'arm64' && format('{0}-{1}', matrix.os, matrix.arch) || matrix.os }}
|
||||
environment: release
|
||||
needs: setup-environment
|
||||
env:
|
||||
GOFLAGS: ${{ needs.setup-environment.outputs.GOFLAGS }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: docker/setup-buildx-action@v3
|
||||
- uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ vars.DOCKER_USER }}
|
||||
password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
|
||||
- id: build-push
|
||||
uses: docker/build-push-action@v6
|
||||
with:
|
||||
context: .
|
||||
platforms: ${{ matrix.os }}/${{ matrix.arch }}
|
||||
target: ${{ matrix.target }}
|
||||
build-args: ${{ matrix.build-args }}
|
||||
outputs: type=image,name=${{ vars.DOCKER_REPO }},push-by-digest=true,name-canonical=true,push=true
|
||||
cache-from: type=registry,ref=${{ vars.DOCKER_REPO }}:latest
|
||||
cache-to: type=inline
|
||||
- run: |
|
||||
mkdir -p ${{ matrix.os }}-${{ matrix.arch }}
|
||||
echo "${{ steps.build-push.outputs.digest }}" >${{ matrix.os }}-${{ matrix.arch }}-${{ matrix.suffix }}.txt
|
||||
working-directory: ${{ runner.temp }}
|
||||
- uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: digest-${{ matrix.os }}-${{ matrix.arch }}-${{ matrix.suffix }}
|
||||
path: |
|
||||
${{ runner.temp }}/${{ matrix.os }}-${{ matrix.arch }}-${{ matrix.suffix }}.txt
|
||||
|
||||
# Merge Docker images for the same flavor into a single multi-arch manifest
|
||||
docker-merge-push:
|
||||
strategy:
|
||||
matrix:
|
||||
suffix: ['', '-rocm']
|
||||
runs-on: linux
|
||||
environment: release
|
||||
needs: [docker-build-push]
|
||||
steps:
|
||||
- uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ vars.DOCKER_USER }}
|
||||
password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
|
||||
- id: metadata
|
||||
uses: docker/metadata-action@v4
|
||||
with:
|
||||
flavor: |
|
||||
latest=false
|
||||
suffix=${{ matrix.suffix }}
|
||||
images: |
|
||||
${{ vars.DOCKER_REPO }}
|
||||
tags: |
|
||||
type=ref,enable=true,priority=600,prefix=pr-,event=pr
|
||||
type=semver,pattern={{version}}
|
||||
- uses: actions/download-artifact@v4
|
||||
with:
|
||||
pattern: digest-*
|
||||
path: ${{ runner.temp }}
|
||||
merge-multiple: true
|
||||
- run: |
|
||||
docker buildx imagetools create $(echo '${{ steps.metadata.outputs.json }}' | jq -cr '.tags | map("-t", .) | join(" ")') $(cat *-${{ matrix.suffix }}.txt | xargs printf '${{ vars.DOCKER_REPO }}@%s ')
|
||||
docker buildx imagetools inspect ${{ vars.DOCKER_REPO }}:${{ steps.metadata.outputs.version }}
|
||||
working-directory: ${{ runner.temp }}
|
||||
|
||||
# Trigger downstream release process
|
||||
trigger:
|
||||
runs-on: ubuntu-latest
|
||||
environment: release
|
||||
needs: [darwin-build, windows-build, windows-depends, linux-build]
|
||||
permissions:
|
||||
contents: write
|
||||
env:
|
||||
GH_TOKEN: ${{ github.token }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Create or update Release for tag
|
||||
run: |
|
||||
RELEASE_VERSION="$(echo ${GITHUB_REF_NAME} | cut -f1 -d-)"
|
||||
echo "Looking for existing release for ${RELEASE_VERSION}"
|
||||
OLD_TAG=$(gh release ls --json name,tagName | jq -r ".[] | select(.name == \"${RELEASE_VERSION}\") | .tagName")
|
||||
if [ -n "$OLD_TAG" ]; then
|
||||
echo "Updating release ${RELEASE_VERSION} to point to new tag ${GITHUB_REF_NAME}"
|
||||
gh release edit ${OLD_TAG} --tag ${GITHUB_REF_NAME}
|
||||
else
|
||||
echo "Creating new release ${RELEASE_VERSION} pointing to tag ${GITHUB_REF_NAME}"
|
||||
gh release create ${GITHUB_REF_NAME} \
|
||||
--title ${RELEASE_VERSION} \
|
||||
--draft \
|
||||
--generate-notes \
|
||||
--prerelease
|
||||
fi
|
||||
- name: Trigger downstream release process
|
||||
run: |
|
||||
curl -L \
|
||||
-X POST \
|
||||
-H "Accept: application/vnd.github+json" \
|
||||
-H "Authorization: Bearer ${{ secrets.RELEASE_TOKEN }}" \
|
||||
-H "X-GitHub-Api-Version: 2022-11-28" \
|
||||
https://api.github.com/repos/ollama/${{ vars.RELEASE_REPO }}/dispatches \
|
||||
-d "{\"event_type\": \"trigger-workflow\", \"client_payload\": {\"run_id\": \"${GITHUB_RUN_ID}\", \"version\": \"${GITHUB_REF_NAME#v}\", \"origin\": \"${GITHUB_REPOSITORY}\", \"publish\": \"1\"}}"
|
||||
53
.github/workflows/tesla-k80-ci.yml
vendored
53
.github/workflows/tesla-k80-ci.yml
vendored
@@ -1,53 +0,0 @@
|
||||
name: Tesla K80 Build
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Manual trigger only
|
||||
|
||||
jobs:
|
||||
build:
|
||||
runs-on: self-hosted
|
||||
|
||||
# Use specific labels if you want to target a particular self-hosted runner
|
||||
# runs-on: [self-hosted, linux, cuda, tesla-k80]
|
||||
|
||||
timeout-minutes: 60 # Prevent hung jobs
|
||||
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 0 # Full history for accurate versioning
|
||||
|
||||
- name: Clean previous build
|
||||
run: |
|
||||
rm -rf build
|
||||
rm -f ollama
|
||||
|
||||
- name: Configure CMake
|
||||
run: |
|
||||
CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake -B build
|
||||
env:
|
||||
CMAKE_BUILD_TYPE: Release
|
||||
|
||||
- name: Build C++/CUDA components
|
||||
run: |
|
||||
CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake --build build -j$(nproc)
|
||||
timeout-minutes: 30
|
||||
|
||||
- name: Build Go binary
|
||||
run: |
|
||||
go build -v -o ollama .
|
||||
|
||||
- name: Verify binary was created
|
||||
run: |
|
||||
ls -lh ollama
|
||||
./ollama --version
|
||||
|
||||
- name: Upload ollama binary and libraries as artifact
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: ollama-binary
|
||||
path: |
|
||||
ollama
|
||||
build/lib/ollama/
|
||||
retention-days: 7
|
||||
86
.github/workflows/tesla-k80-multi-gpu-tests.yml
vendored
86
.github/workflows/tesla-k80-multi-gpu-tests.yml
vendored
@@ -1,86 +0,0 @@
|
||||
name: Tesla K80 Multi-GPU Tests
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Manual trigger only
|
||||
schedule:
|
||||
# Run weekly on Sundays at 2 AM UTC (less frequent than single-GPU tests)
|
||||
- cron: "0 2 * * 0"
|
||||
|
||||
jobs:
|
||||
multi-gpu-test:
|
||||
runs-on: self-hosted
|
||||
|
||||
timeout-minutes: 90 # Longer timeout for large models
|
||||
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Download ollama binary from latest build
|
||||
uses: dawidd6/action-download-artifact@v6
|
||||
with:
|
||||
workflow: tesla-k80-ci.yml
|
||||
name: ollama-binary
|
||||
github_token: ${{ secrets.GITHUB_TOKEN }}
|
||||
check_artifacts: true
|
||||
search_artifacts: true
|
||||
|
||||
- name: Make ollama binary executable
|
||||
run: |
|
||||
chmod +x ollama
|
||||
ls -lh ollama
|
||||
./ollama --version
|
||||
|
||||
- name: Verify multi-GPU setup
|
||||
run: |
|
||||
nvidia-smi --list-gpus
|
||||
GPU_COUNT=$(nvidia-smi --list-gpus | wc -l)
|
||||
if [ "$GPU_COUNT" -lt 2 ]; then
|
||||
echo "Error: Multi-GPU tests require at least 2 GPUs. Found: $GPU_COUNT"
|
||||
exit 1
|
||||
fi
|
||||
echo "Found $GPU_COUNT GPUs - proceeding with multi-GPU tests"
|
||||
|
||||
- name: Build test-runner
|
||||
run: |
|
||||
cd cmd/test-runner
|
||||
go mod init github.com/ollama/ollama/cmd/test-runner || true
|
||||
go mod tidy
|
||||
go build -o ../../test-runner .
|
||||
cd ../..
|
||||
ls -lh test-runner
|
||||
|
||||
- name: Validate multi-GPU test configuration
|
||||
run: |
|
||||
./test-runner validate --config test/config/models.yaml
|
||||
|
||||
- name: Run multi-GPU tests
|
||||
run: |
|
||||
./test-runner run --profile multi-gpu --config test/config/models.yaml --output test-report-multi-gpu --verbose
|
||||
timeout-minutes: 60
|
||||
|
||||
- name: Check multi-GPU test results
|
||||
run: |
|
||||
if ! jq -e '.summary.failed == 0' test-report-multi-gpu.json; then
|
||||
echo "Multi-GPU tests failed!"
|
||||
jq '.results[] | select(.status == "FAILED")' test-report-multi-gpu.json
|
||||
exit 1
|
||||
fi
|
||||
echo "All multi-GPU tests passed!"
|
||||
|
||||
- name: Display GPU memory usage
|
||||
if: always()
|
||||
run: |
|
||||
echo "=== Final GPU Memory State ==="
|
||||
nvidia-smi
|
||||
|
||||
- name: Upload multi-GPU test results
|
||||
if: always()
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: multi-gpu-test-results
|
||||
path: |
|
||||
test-report-multi-gpu.json
|
||||
test-report-multi-gpu.md
|
||||
ollama.log
|
||||
retention-days: 30 # Keep longer for analysis
|
||||
94
.github/workflows/tesla-k80-single-gpu-tests.yml
vendored
94
.github/workflows/tesla-k80-single-gpu-tests.yml
vendored
@@ -1,94 +0,0 @@
|
||||
name: Tesla K80 Single-GPU Tests
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Manual trigger only
|
||||
|
||||
jobs:
|
||||
test:
|
||||
runs-on: self-hosted
|
||||
|
||||
timeout-minutes: 60
|
||||
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Download ollama binary from latest build
|
||||
uses: dawidd6/action-download-artifact@v6
|
||||
with:
|
||||
workflow: tesla-k80-ci.yml
|
||||
name: ollama-binary
|
||||
github_token: ${{ secrets.GITHUB_TOKEN }}
|
||||
check_artifacts: true
|
||||
search_artifacts: true
|
||||
|
||||
- name: Make ollama binary executable
|
||||
run: |
|
||||
chmod +x ollama
|
||||
ls -lh ollama
|
||||
./ollama --version
|
||||
|
||||
- name: Build test-runner
|
||||
run: |
|
||||
cd cmd/test-runner
|
||||
go mod init github.com/ollama/ollama/cmd/test-runner || true
|
||||
go mod tidy
|
||||
go build -o ../../test-runner .
|
||||
cd ../..
|
||||
ls -lh test-runner
|
||||
|
||||
- name: Validate test configuration
|
||||
run: |
|
||||
./test-runner validate --config test/config/quick.yaml
|
||||
|
||||
- name: Run quick tests
|
||||
run: |
|
||||
./test-runner run --profile quick --config test/config/quick.yaml --output test-report-quick --verbose
|
||||
timeout-minutes: 10
|
||||
|
||||
- name: Check quick test results
|
||||
run: |
|
||||
if ! jq -e '.summary.failed == 0' test-report-quick.json; then
|
||||
echo "Quick tests failed!"
|
||||
jq '.results[] | select(.status == "FAILED")' test-report-quick.json
|
||||
exit 1
|
||||
fi
|
||||
echo "Quick tests passed!"
|
||||
|
||||
- name: Upload quick test results
|
||||
if: always()
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: quick-test-results
|
||||
path: |
|
||||
test-report-quick.json
|
||||
test-report-quick.md
|
||||
ollama.log
|
||||
retention-days: 7
|
||||
|
||||
- name: Run full tests (if quick tests passed)
|
||||
if: success()
|
||||
run: |
|
||||
./test-runner run --profile full --config test/config/models.yaml --output test-report-full --verbose
|
||||
timeout-minutes: 45
|
||||
|
||||
- name: Check full test results
|
||||
if: success()
|
||||
run: |
|
||||
if ! jq -e '.summary.failed == 0' test-report-full.json; then
|
||||
echo "Full tests failed!"
|
||||
jq '.results[] | select(.status == "FAILED")' test-report-full.json
|
||||
exit 1
|
||||
fi
|
||||
echo "All tests passed!"
|
||||
|
||||
- name: Upload full test results
|
||||
if: always()
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: full-test-results
|
||||
path: |
|
||||
test-report-full.json
|
||||
test-report-full.md
|
||||
ollama.log
|
||||
retention-days: 14
|
||||
287
.github/workflows/test.yaml
vendored
Normal file
287
.github/workflows/test.yaml
vendored
Normal file
@@ -0,0 +1,287 @@
|
||||
name: test
|
||||
|
||||
concurrency:
|
||||
# For PRs, later CI runs preempt previous ones. e.g. a force push on a PR
|
||||
# cancels running CI jobs and starts all new ones.
|
||||
#
|
||||
# For non-PR pushes, concurrency.group needs to be unique for every distinct
|
||||
# CI run we want to have happen. Use run_id, which in practice means all
|
||||
# non-PR CI runs will be allowed to run without preempting each other.
|
||||
group: ${{ github.workflow }}-$${{ github.pull_request.number || github.run_id }}
|
||||
cancel-in-progress: true
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
paths:
|
||||
- '**/*'
|
||||
- '!docs/**'
|
||||
- '!README.md'
|
||||
|
||||
jobs:
|
||||
changes:
|
||||
runs-on: ubuntu-latest
|
||||
outputs:
|
||||
changed: ${{ steps.changes.outputs.changed }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 0
|
||||
- id: changes
|
||||
run: |
|
||||
changed() {
|
||||
local BASE=${{ github.event.pull_request.base.sha }}
|
||||
local HEAD=${{ github.event.pull_request.head.sha }}
|
||||
local MERGE_BASE=$(git merge-base $BASE $HEAD)
|
||||
git diff-tree -r --no-commit-id --name-only "$MERGE_BASE" "$HEAD" \
|
||||
| xargs python3 -c "import sys; from pathlib import Path; print(any(Path(x).match(glob) for x in sys.argv[1:] for glob in '$*'.split(' ')))"
|
||||
}
|
||||
|
||||
echo changed=$(changed 'llama/llama.cpp/**/*' 'ml/backend/ggml/ggml/**/*') | tee -a $GITHUB_OUTPUT
|
||||
|
||||
linux:
|
||||
needs: [changes]
|
||||
if: needs.changes.outputs.changed == 'True'
|
||||
strategy:
|
||||
matrix:
|
||||
include:
|
||||
- preset: CPU
|
||||
- preset: CUDA
|
||||
container: nvidia/cuda:13.0.0-devel-ubuntu22.04
|
||||
flags: '-DCMAKE_CUDA_ARCHITECTURES=87'
|
||||
- preset: ROCm
|
||||
container: rocm/dev-ubuntu-22.04:6.1.2
|
||||
extra-packages: rocm-libs
|
||||
flags: '-DAMDGPU_TARGETS=gfx1010 -DCMAKE_PREFIX_PATH=/opt/rocm'
|
||||
- preset: Vulkan
|
||||
container: ubuntu:22.04
|
||||
extra-packages: >
|
||||
mesa-vulkan-drivers vulkan-tools
|
||||
libvulkan1 libvulkan-dev
|
||||
vulkan-sdk cmake ccache g++ make
|
||||
runs-on: linux
|
||||
container: ${{ matrix.container }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- run: |
|
||||
[ -n "${{ matrix.container }}" ] || sudo=sudo
|
||||
$sudo apt-get update
|
||||
# Add LunarG Vulkan SDK apt repo for Ubuntu 22.04
|
||||
if [ "${{ matrix.preset }}" = "Vulkan" ]; then
|
||||
$sudo apt-get install -y --no-install-recommends wget gnupg ca-certificates software-properties-common
|
||||
wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | $sudo gpg --dearmor -o /usr/share/keyrings/lunarg-archive-keyring.gpg
|
||||
# Use signed-by to bind the repo to the installed keyring to avoid NO_PUBKEY
|
||||
echo "deb [signed-by=/usr/share/keyrings/lunarg-archive-keyring.gpg] https://packages.lunarg.com/vulkan/1.4.313 jammy main" | $sudo tee /etc/apt/sources.list.d/lunarg-vulkan-1.4.313-jammy.list > /dev/null
|
||||
$sudo apt-get update
|
||||
fi
|
||||
$sudo apt-get install -y cmake ccache ${{ matrix.extra-packages }}
|
||||
# Export VULKAN_SDK if provided by LunarG package (defensive)
|
||||
if [ -d "/usr/lib/x86_64-linux-gnu/vulkan" ] && [ "${{ matrix.preset }}" = "Vulkan" ]; then
|
||||
echo "VULKAN_SDK=/usr" >> $GITHUB_ENV
|
||||
fi
|
||||
env:
|
||||
DEBIAN_FRONTEND: noninteractive
|
||||
- uses: actions/cache@v4
|
||||
with:
|
||||
path: /github/home/.cache/ccache
|
||||
key: ccache-${{ runner.os }}-${{ runner.arch }}-${{ matrix.preset }}
|
||||
- run: |
|
||||
cmake --preset ${{ matrix.preset }} ${{ matrix.flags }}
|
||||
cmake --build --preset ${{ matrix.preset }} --parallel
|
||||
|
||||
windows:
|
||||
needs: [changes]
|
||||
if: needs.changes.outputs.changed == 'True'
|
||||
strategy:
|
||||
matrix:
|
||||
include:
|
||||
- preset: CPU
|
||||
- preset: CUDA
|
||||
install: https://developer.download.nvidia.com/compute/cuda/13.0.0/local_installers/cuda_13.0.0_windows.exe
|
||||
flags: '-DCMAKE_CUDA_ARCHITECTURES=80'
|
||||
cuda-components:
|
||||
- '"cudart"'
|
||||
- '"nvcc"'
|
||||
- '"cublas"'
|
||||
- '"cublas_dev"'
|
||||
- '"crt"'
|
||||
- '"nvvm"'
|
||||
- '"nvptxcompiler"'
|
||||
cuda-version: '13.0'
|
||||
- preset: ROCm
|
||||
install: https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-24.Q4-WinSvr2022-For-HIP.exe
|
||||
flags: '-DAMDGPU_TARGETS=gfx1010 -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_FLAGS="-parallel-jobs=4 -Wno-ignored-attributes -Wno-deprecated-pragma" -DCMAKE_CXX_FLAGS="-parallel-jobs=4 -Wno-ignored-attributes -Wno-deprecated-pragma"'
|
||||
- preset: Vulkan
|
||||
install: https://sdk.lunarg.com/sdk/download/1.4.321.1/windows/vulkansdk-windows-X64-1.4.321.1.exe
|
||||
runs-on: windows
|
||||
steps:
|
||||
- run: |
|
||||
choco install -y --no-progress ccache ninja
|
||||
ccache -o cache_dir=${{ github.workspace }}\.ccache
|
||||
- if: matrix.preset == 'CUDA' || matrix.preset == 'ROCm' || matrix.preset == 'Vulkan'
|
||||
id: cache-install
|
||||
uses: actions/cache/restore@v4
|
||||
with:
|
||||
path: |
|
||||
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
|
||||
C:\Program Files\AMD\ROCm
|
||||
C:\VulkanSDK
|
||||
key: ${{ matrix.install }}
|
||||
- if: matrix.preset == 'CUDA'
|
||||
name: Install CUDA ${{ matrix.cuda-version }}
|
||||
run: |
|
||||
$ErrorActionPreference = "Stop"
|
||||
if ("${{ steps.cache-install.outputs.cache-hit }}" -ne 'true') {
|
||||
Invoke-WebRequest -Uri "${{ matrix.install }}" -OutFile "install.exe"
|
||||
$subpackages = @(${{ join(matrix.cuda-components, ', ') }}) | Foreach-Object {"${_}_${{ matrix.cuda-version }}"}
|
||||
Start-Process -FilePath .\install.exe -ArgumentList (@("-s") + $subpackages) -NoNewWindow -Wait
|
||||
}
|
||||
|
||||
$cudaPath = (Resolve-Path "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\*").path
|
||||
echo "$cudaPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
|
||||
- if: matrix.preset == 'ROCm'
|
||||
name: Install ROCm ${{ matrix.rocm-version }}
|
||||
run: |
|
||||
$ErrorActionPreference = "Stop"
|
||||
if ("${{ steps.cache-install.outputs.cache-hit }}" -ne 'true') {
|
||||
Invoke-WebRequest -Uri "${{ matrix.install }}" -OutFile "install.exe"
|
||||
Start-Process -FilePath .\install.exe -ArgumentList '-install' -NoNewWindow -Wait
|
||||
}
|
||||
|
||||
$hipPath = (Resolve-Path "C:\Program Files\AMD\ROCm\*").path
|
||||
echo "$hipPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
|
||||
echo "CC=$hipPath\bin\clang.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
|
||||
echo "CXX=$hipPath\bin\clang++.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
|
||||
echo "HIPCXX=$hipPath\bin\clang++.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
|
||||
echo "HIP_PLATFORM=amd" | Out-File -FilePath $env:GITHUB_ENV -Append
|
||||
echo "CMAKE_PREFIX_PATH=$hipPath" | Out-File -FilePath $env:GITHUB_ENV -Append
|
||||
- if: matrix.preset == 'Vulkan'
|
||||
name: Install Vulkan ${{ matrix.rocm-version }}
|
||||
run: |
|
||||
$ErrorActionPreference = "Stop"
|
||||
if ("${{ steps.cache-install.outputs.cache-hit }}" -ne 'true') {
|
||||
Invoke-WebRequest -Uri "${{ matrix.install }}" -OutFile "install.exe"
|
||||
Start-Process -FilePath .\install.exe -ArgumentList "-c","--am","--al","in" -NoNewWindow -Wait
|
||||
}
|
||||
|
||||
$vulkanPath = (Resolve-Path "C:\VulkanSDK\*").path
|
||||
echo "$vulkanPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
|
||||
echo "VULKAN_SDK=$vulkanPath" >> $env:GITHUB_ENV
|
||||
- if: ${{ !cancelled() && steps.cache-install.outputs.cache-hit != 'true' }}
|
||||
uses: actions/cache/save@v4
|
||||
with:
|
||||
path: |
|
||||
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
|
||||
C:\Program Files\AMD\ROCm
|
||||
key: ${{ matrix.install }}
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/cache@v4
|
||||
with:
|
||||
path: ${{ github.workspace }}\.ccache
|
||||
key: ccache-${{ runner.os }}-${{ runner.arch }}-${{ matrix.preset }}
|
||||
- run: |
|
||||
Import-Module 'C:\Program Files\Microsoft Visual Studio\2022\Enterprise\Common7\Tools\Microsoft.VisualStudio.DevShell.dll'
|
||||
Enter-VsDevShell -VsInstallPath 'C:\Program Files\Microsoft Visual Studio\2022\Enterprise' -SkipAutomaticLocation -DevCmdArguments '-arch=x64 -no_logo'
|
||||
cmake --preset "${{ matrix.preset }}" ${{ matrix.flags }}
|
||||
cmake --build --parallel --preset "${{ matrix.preset }}"
|
||||
env:
|
||||
CMAKE_GENERATOR: Ninja
|
||||
|
||||
go_mod_tidy:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: check that 'go mod tidy' is clean
|
||||
run: go mod tidy --diff || (echo "Please run 'go mod tidy'." && exit 1)
|
||||
|
||||
test:
|
||||
strategy:
|
||||
matrix:
|
||||
os: [ubuntu-latest, macos-latest, windows-latest]
|
||||
runs-on: ${{ matrix.os }}
|
||||
env:
|
||||
CGO_ENABLED: '1'
|
||||
GOEXPERIMENT: 'synctest'
|
||||
steps:
|
||||
- name: checkout
|
||||
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # 4.2.2
|
||||
|
||||
- name: cache restore
|
||||
uses: actions/cache/restore@1bd1e32a3bdc45362d1e726936510720a7c30a57 # v4.2.0
|
||||
with:
|
||||
# Note: unlike the other setups, this is only grabbing the mod download
|
||||
# cache, rather than the whole mod directory, as the download cache
|
||||
# contains zips that can be unpacked in parallel faster than they can be
|
||||
# fetched and extracted by tar
|
||||
path: |
|
||||
~/.cache/go-build
|
||||
~/go/pkg/mod/cache
|
||||
~\AppData\Local\go-build
|
||||
# NOTE: The -3- here should be incremented when the scheme of data to be
|
||||
# cached changes (e.g. path above changes).
|
||||
key: ${{ github.job }}-${{ runner.os }}-${{ matrix.goarch }}-${{ matrix.buildflags }}-go-3-${{ hashFiles('**/go.sum') }}-${{ github.run_id }}
|
||||
restore-keys: |
|
||||
${{ github.job }}-${{ runner.os }}-${{ matrix.goarch }}-${{ matrix.buildflags }}-go-3-${{ hashFiles('**/go.sum') }}
|
||||
${{ github.job }}-${{ runner.os }}-${{ matrix.goarch }}-${{ matrix.buildflags }}-go-3-
|
||||
|
||||
- name: Setup Go
|
||||
uses: actions/setup-go@v5
|
||||
with:
|
||||
# The caching strategy of setup-go is less than ideal, and wastes
|
||||
# time by not saving artifacts due to small failures like the linter
|
||||
# complaining, etc. This means subsequent have to rebuild their world
|
||||
# again until all checks pass. For instance, if you mispell a word,
|
||||
# you're punished until you fix it. This is more hostile than
|
||||
# helpful.
|
||||
cache: false
|
||||
|
||||
go-version-file: go.mod
|
||||
|
||||
# It is tempting to run this in a platform independent way, but the past
|
||||
# shows this codebase will see introductions of platform specific code
|
||||
# generation, and so we need to check this per platform to ensure we
|
||||
# don't abuse go generate on specific platforms.
|
||||
- name: check that 'go generate' is clean
|
||||
if: always()
|
||||
run: |
|
||||
go generate ./...
|
||||
git diff --name-only --exit-code || (echo "Please run 'go generate ./...'." && exit 1)
|
||||
|
||||
- name: go test
|
||||
if: always()
|
||||
run: go test -count=1 -benchtime=1x ./...
|
||||
|
||||
# TODO(bmizerany): replace this heavy tool with just the
|
||||
# tools/checks/binaries we want and then make them all run in parallel
|
||||
# across jobs, not on a single tiny vm on Github Actions.
|
||||
- uses: golangci/golangci-lint-action@v6
|
||||
with:
|
||||
args: --timeout 10m0s -v
|
||||
|
||||
- name: cache save
|
||||
# Always save the cache, even if the job fails. The artifacts produced
|
||||
# during the building of test binaries are not all for naught. They can
|
||||
# be used to speed up subsequent runs.
|
||||
if: always()
|
||||
|
||||
uses: actions/cache/save@1bd1e32a3bdc45362d1e726936510720a7c30a57 # v4.2.0
|
||||
with:
|
||||
# Note: unlike the other setups, this is only grabbing the mod download
|
||||
# cache, rather than the whole mod directory, as the download cache
|
||||
# contains zips that can be unpacked in parallel faster than they can be
|
||||
# fetched and extracted by tar
|
||||
path: |
|
||||
~/.cache/go-build
|
||||
~/go/pkg/mod/cache
|
||||
~\AppData\Local\go-build
|
||||
# NOTE: The -3- here should be incremented when the scheme of data to be
|
||||
# cached changes (e.g. path above changes).
|
||||
key: ${{ github.job }}-${{ runner.os }}-${{ matrix.goarch }}-${{ matrix.buildflags }}-go-3-${{ hashFiles('**/go.sum') }}-${{ github.run_id }}
|
||||
|
||||
patches:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Verify patches apply cleanly and do not change files
|
||||
run: |
|
||||
make -f Makefile.sync clean checkout apply-patches sync
|
||||
git diff --compact-summary --exit-code
|
||||
Reference in New Issue
Block a user