Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support

This commit represents a complete rework after pulling the latest changes from official ollama/ollama repository and re-applying Tesla K80 compatibility patches. ## Key Changes ### CUDA Compute Capability 3.7 Support (Tesla K80) - Added sm_37 (compute 3.7) to CMAKE_CUDA_ARCHITECTURES in CMakeLists.txt - Updated CMakePresets.json to include compute 3.7 in "CUDA 11" preset - Using 37-virtual (PTX with JIT compilation) for maximum compatibility ### Legacy Toolchain Compatibility - **NVIDIA Driver**: 470.256.02 (last version supporting Kepler/K80) - **CUDA Version**: 11.4.4 (last CUDA 11.x supporting compute 3.7) - **GCC Version**: 10.5.0 (required by CUDA 11.4 host_config.h) ### CPU Architecture Trade-offs Due to GCC 10.5 limitation, sacrificed newer CPU optimizations: - Alderlake CPU variant enabled WITHOUT AVX_VNNI (requires GCC 11+) - Still supports: SSE4.2, AVX, F16C, AVX2, BMI2, FMA - Performance impact: ~3-7% on newer CPUs (acceptable for K80 compatibility) ### Build System Updates - Modified ml/backend/ggml/ggml/src/ggml-cuda/CMakeLists.txt for compute 3.7 - Added -Wno-deprecated-gpu-targets flag to suppress warnings - Updated ml/backend/ggml/ggml/src/CMakeLists.txt for Alderlake without AVX_VNNI ### Upstream Sync Merged latest llama.cpp changes including: - Enhanced KV cache management with ISWA and hybrid memory support - Improved multi-modal support (mtmd framework) - New model architectures (Gemma3, Llama4, Qwen3, etc.) - GPU backend improvements for CUDA, Metal, and ROCm - Updated quantization support and GGUF format handling ### Documentation - Updated CLAUDE.md with comprehensive build instructions - Documented toolchain constraints and CPU architecture trade-offs - Removed outdated CI/CD workflows (tesla-k80-*.yml) - Cleaned up temporary development artifacts ## Rationale This fork maintains Tesla K80 GPU support (compute 3.7) which was dropped in official Ollama due to legacy driver/CUDA requirements. The toolchain constraint creates a deadlock: - K80 → Driver 470 → CUDA 11.4 → GCC 10 → No AVX_VNNI We accept the loss of cutting-edge CPU optimizations to enable running modern LLMs on legacy but still capable Tesla K80 hardware (12GB VRAM per GPU). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-10 15:57:04 +00:00 · 2025-11-05 14:03:05 +08:00
parent fabe2c5cb7
commit ef14fb5b26
817 changed files with 241634 additions and 70888 deletions
--- a/.github/ISSUE_TEMPLATE/10_bug_report.yml
+++ b/.github/ISSUE_TEMPLATE/10_bug_report.yml
@@ -0,0 +1,68 @@
+name: Bug report
+labels: [bug]
+description: Something isn't working right.
+body:
+  - type: textarea
+    id: description
+    attributes:
+      label: What is the issue?
+      description: What happened? What did you expect to happen?
+    validations:
+      required: true
+  - type: textarea
+    id: logs
+    attributes:
+      label: Relevant log output
+      description: Please copy and paste any relevant log output. See [Troubleshooting Guide](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) for details.
+      render: shell
+    validations:
+      required: false
+  - type: dropdown
+    id: os
+    attributes:
+      label: OS
+      description: Which operating system are you using?
+      multiple: true
+      options:
+        - Linux
+        - macOS
+        - Windows
+        - Docker
+        - WSL2
+    validations:
+      required: false
+  - type: dropdown
+    id: gpu
+    attributes:
+      label: GPU
+      description: Which GPU are you using?
+      multiple: true
+      options:
+        - Nvidia
+        - AMD
+        - Intel
+        - Apple
+        - Other
+    validations:
+      required: false
+  - type: dropdown
+    id: cpu
+    attributes:
+      label: CPU
+      description: Which CPU are you using?
+      multiple: true
+      options:
+        - Intel
+        - AMD
+        - Apple
+        - Other
+    validations:
+      required: false
+  - type: input
+    id: version
+    attributes:
+      label: Ollama version
+      description: What version of Ollama are you using? (`ollama --version`)
+      placeholder: e.g., 0.1.32
+    validations:
+      required: false
--- a/.github/ISSUE_TEMPLATE/20_feature_request.md
+++ b/.github/ISSUE_TEMPLATE/20_feature_request.md
@@ -0,0 +1,6 @@
+---
+name: Feature request
+about: Request a new feature
+labels: feature request
+---
+
--- a/.github/ISSUE_TEMPLATE/30_model_request.md
+++ b/.github/ISSUE_TEMPLATE/30_model_request.md
@@ -0,0 +1,5 @@
+---
+name: Model request
+about: Request support for a new model to be added to Ollama
+labels: model request
+---
--- a/.github/ISSUE_TEMPLATE/config.yml
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,8 @@
+blank_issues_enabled: true
+contact_links:
+  - name: Help
+    url: https://discord.com/invite/ollama
+    about: Please join our Discord server for help using Ollama
+  - name: Troubleshooting
+    url: https://github.com/ollama/ollama/blob/main/docs/faq.md#faq
+    about: See the FAQ for common issues and solutions
--- a/.github/workflows/CLAUDE.md
+++ b/.github/workflows/CLAUDE.md
@@ -1,266 +0,0 @@
-# GitHub Actions Workflows - Tesla K80 Testing
-
-## Overview
-
-This directory contains workflows for automated testing of ollama37 on Tesla K80 (CUDA Compute Capability 3.7) hardware.
-
-## Workflows
-
-### 1. tesla-k80-ci.yml - Build Workflow
-**Trigger**: Manual only (`workflow_dispatch`)
-
-**Purpose**: Build the ollama binary with CUDA 3.7 support
-
-**Steps**:
-1. Checkout code
-2. Clean previous build artifacts
-3. Configure CMake with GCC 10 and CUDA 11
-4. Build C++/CUDA components
-5. Build Go binary
-6. Verify binary
-7. Upload binary artifact
-
-**Artifacts**: `ollama-binary-{sha}` - Compiled binary for the commit
-
-### 2. tesla-k80-tests.yml - Test Workflow
-**Trigger**: Manual only (`workflow_dispatch`)
-
-**Purpose**: Run comprehensive tests using the test framework
-
-**Steps**:
-1. Checkout code
-2. Verify ollama binary exists
-3. Run test-runner tool (see below)
-4. Upload test results and logs
-
-**Artifacts**: Test reports, logs, analysis results
-
-## Test Framework Architecture
-
-### TODO: Implement Go-based Test Runner
-
-**Goal**: Create a dedicated Go test orchestrator at `cmd/test-runner/main.go` that manages the complete test lifecycle for Tesla K80.
-
-#### Task Breakdown
-
-1. **Design test configuration format**
-   - Create `test/config/models.yaml` - List of models to test with parameters
-   - Define model test spec: name, size, expected behavior, test prompts
-   - Support test profiles: quick (small models), full (all sizes), stress test
-
-2. **Implement server lifecycle management**
-   - Start `./ollama serve` as subprocess
-   - Capture stdout/stderr to log file
-   - Monitor server readiness (health check API)
-   - Graceful shutdown on test completion or failure
-   - Timeout handling for hung processes
-
-3. **Implement real-time log monitoring**
-   - Goroutine to tail server logs
-   - Pattern matching for critical events:
-     - GPU initialization messages
-     - Model loading progress
-     - CUDA errors or warnings
-     - Memory allocation failures
-     - CPU fallback warnings
-   - Store events for later analysis
-
-4. **Implement model testing logic**
-   - For each model in config:
-     - Pull model via API (if not cached)
-     - Wait for model ready
-     - Parse logs for GPU loading confirmation
-     - Send chat API request with test prompt
-     - Validate response (not empty, reasonable length, coherent)
-     - Check logs for errors during inference
-     - Record timing metrics (load time, first token, completion)
-
-5. **Implement test validation**
-   - GPU loading verification:
-     - Parse logs for "loaded model" + GPU device ID
-     - Check for "offloading N layers to GPU"
-     - Verify no "using CPU backend" messages
-   - Response quality checks:
-     - Response not empty
-     - Minimum token count (avoid truncated responses)
-     - JSON structure valid (for API responses)
-   - Error detection:
-     - No CUDA errors in logs
-     - No OOM (out of memory) errors
-     - No model loading failures
-
-6. **Implement structured reporting**
-   - Generate JSON report with:
-     - Test summary (pass/fail/skip counts)
-     - Per-model results (status, timings, errors)
-     - Log excerpts for failures
-     - GPU metrics (memory usage, utilization)
-   - Generate human-readable summary (markdown/text)
-   - Exit code: 0 for all pass, 1 for any failure
-
-7. **Implement CLI interface**
-   - Flags:
-     - `--config` - Path to test config file
-     - `--profile` - Test profile to run (quick/full/stress)
-     - `--ollama-bin` - Path to ollama binary (default: ./ollama)
-     - `--output` - Report output path
-     - `--verbose` - Detailed logging
-     - `--keep-models` - Don't delete models after test
-   - Subcommands:
-     - `run` - Run tests
-     - `validate` - Validate config only
-     - `list` - List available test profiles/models
-
-8. **Update GitHub Actions workflow**
-   - Build test-runner binary in CI workflow
-   - Run test-runner in test workflow
-   - Parse JSON report for pass/fail
-   - Upload structured results as artifacts
-
-#### File Structure
-
-```
-cmd/test-runner/
-  main.go              # CLI entry point
-  config.go            # Config loading and validation
-  server.go            # Server lifecycle management
-  monitor.go           # Log monitoring and parsing
-  test.go              # Model test execution
-  validate.go          # Response and log validation
-  report.go            # Test report generation
-
-test/config/
-  models.yaml          # Default test configuration
-  quick.yaml           # Quick test profile (small models)
-  full.yaml            # Full test profile (all sizes)
-
-.github/workflows/
-  tesla-k80-ci.yml     # Build workflow (manual)
-  tesla-k80-tests.yml  # Test workflow (manual, uses test-runner)
-```
-
-#### Example Test Configuration (models.yaml)
-
-```yaml
-profiles:
-  quick:
-    models:
-      - name: gemma2:2b
-        prompts:
-          - "Hello, respond with a greeting."
-        min_response_tokens: 5
-        timeout: 30s
-      
-  full:
-    models:
-      - name: gemma2:2b
-        prompts:
-          - "Hello, respond with a greeting."
-          - "What is 2+2?"
-        min_response_tokens: 5
-        timeout: 30s
-      
-      - name: gemma3:4b
-        prompts:
-          - "Explain photosynthesis in one sentence."
-        min_response_tokens: 10
-        timeout: 60s
-      
-      - name: gemma3:12b
-        prompts:
-          - "Write a haiku about GPUs."
-        min_response_tokens: 15
-        timeout: 120s
-
-validation:
-  gpu_required: true
-  check_patterns:
-    success:
-      - "loaded model"
-      - "offload.*GPU"
-    failure:
-      - "CUDA.*error"
-      - "out of memory"
-      - "CPU backend"
-```
-
-#### Example Test Runner Usage
-
-```bash
-# Build test runner
-go build -o test-runner ./cmd/test-runner
-
-# Run quick test profile
-./test-runner run --config test/config/models.yaml --profile quick
-
-# Run full test with verbose output
-./test-runner run --profile full --verbose --output test-report.json
-
-# Validate config only
-./test-runner validate --config test/config/models.yaml
-
-# List available profiles
-./test-runner list
-```
-
-#### Integration with GitHub Actions
-
-```yaml
- name: Build test runner
-  run: go build -o test-runner ./cmd/test-runner
-
- name: Run tests
-  run: |
-    ./test-runner run --profile full --output test-report.json --verbose
-  timeout-minutes: 45
-
- name: Check test results
-  run: |
-    if ! jq -e '.summary.failed == 0' test-report.json; then
-      echo "Tests failed!"
-      jq '.failures' test-report.json
-      exit 1
-    fi
-
- name: Upload test report
-  uses: actions/upload-artifact@v4
-  with:
-    name: test-report
-    path: |
-      test-report.json
-      ollama.log
-```
-
-## Prerequisites
-
-### Self-Hosted Runner Setup
-
-1. **Install GitHub Actions Runner on your Tesla K80 machine**:
-   ```bash
-   mkdir -p ~/actions-runner && cd ~/actions-runner
-   curl -o actions-runner-linux-x64-2.XXX.X.tar.gz -L \
-     https://github.com/actions/runner/releases/download/vX.XXX.X/actions-runner-linux-x64-2.XXX.X.tar.gz
-   tar xzf ./actions-runner-linux-x64-2.XXX.X.tar.gz
-   
-   # Configure (use token from GitHub)
-   ./config.sh --url https://github.com/YOUR_USERNAME/ollama37 --token YOUR_TOKEN
-   
-   # Install and start as a service
-   sudo ./svc.sh install
-   sudo ./svc.sh start
-   ```
-
-2. **Verify runner environment has**:
-   - CUDA 11.4+ toolkit installed
-   - GCC 10 at `/usr/local/bin/gcc` and `/usr/local/bin/g++`
-   - CMake 3.24+
-   - Go 1.24+
-   - NVIDIA driver with Tesla K80 support
-   - Network access to download models
-
-## Security Considerations
-
- Self-hosted runners should be on a secure, isolated machine
- Consider using runner groups to restrict repository access
- Do not use self-hosted runners for public repositories (untrusted PRs)
- Keep runner software updated
--- a/.github/workflows/latest.yaml
+++ b/.github/workflows/latest.yaml
@@ -0,0 +1,24 @@
+name: latest
+
+on:
+  release:
+    types: [released]
+
+jobs:
+  update-latest:
+    environment: release
+    runs-on: linux
+    steps:
+      - uses: actions/checkout@v4
+      - name: Login to Docker Hub
+        uses: docker/login-action@v3
+        with:
+          username: ${{ vars.DOCKER_USER }}
+          password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
+      - name: Tag images as latest
+        env:
+          PUSH: "1"
+        shell: bash
+        run: |
+          export "VERSION=${GITHUB_REF_NAME#v}"
+          ./scripts/tag_latest.sh
--- a/.github/workflows/release.yaml
+++ b/.github/workflows/release.yaml
@@ -0,0 +1,431 @@
+name: release
+
+on:
+  push:
+    tags:
+      - 'v*'
+
+env:
+  CGO_CFLAGS: '-O3'
+  CGO_CXXFLAGS: '-O3'
+
+jobs:
+  setup-environment:
+    runs-on: ubuntu-latest
+    environment: release
+    outputs:
+      GOFLAGS: ${{ steps.goflags.outputs.GOFLAGS }}
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set environment
+        id: goflags
+        run: |
+          echo GOFLAGS="'-ldflags=-w -s \"-X=github.com/ollama/ollama/version.Version=${GITHUB_REF_NAME#v}\" \"-X=github.com/ollama/ollama/server.mode=release\"'" >>$GITHUB_OUTPUT
+
+  darwin-build:
+    runs-on: macos-13-xlarge
+    environment: release
+    needs: setup-environment
+    strategy:
+      matrix:
+        os: [darwin]
+        arch: [amd64, arm64]
+    env:
+      GOFLAGS: ${{ needs.setup-environment.outputs.GOFLAGS }}
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-go@v5
+        with:
+          go-version-file: go.mod
+      - run: |
+          go build -o dist/ .
+        env:
+          GOOS: ${{ matrix.os }}
+          GOARCH: ${{ matrix.arch }}
+          CGO_ENABLED: 1
+          CGO_CPPFLAGS: '-mmacosx-version-min=11.3'
+      - if: matrix.arch == 'amd64'
+        run: |
+          cmake --preset CPU -DCMAKE_OSX_DEPLOYMENT_TARGET=11.3 -DCMAKE_SYSTEM_PROCESSOR=x86_64 -DCMAKE_OSX_ARCHITECTURES=x86_64
+          cmake --build --parallel --preset CPU
+          cmake --install build --component CPU --strip --parallel 8
+      - uses: actions/upload-artifact@v4
+        with:
+          name: build-${{ matrix.os }}-${{ matrix.arch }}
+          path: dist/*
+
+  windows-depends:
+    strategy:
+      matrix:
+        os: [windows]
+        arch: [amd64]
+        preset: ['CPU']
+        include:
+          - os: windows
+            arch: amd64
+            preset: 'CUDA 12'
+            install: https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda_12.8.0_571.96_windows.exe
+            cuda-components:
+              - '"cudart"'
+              - '"nvcc"'
+              - '"cublas"'
+              - '"cublas_dev"'
+            cuda-version: '12.8'
+            flags: ''
+            runner_dir: 'cuda_v12'
+          - os: windows
+            arch: amd64
+            preset: 'CUDA 13'
+            install: https://developer.download.nvidia.com/compute/cuda/13.0.0/local_installers/cuda_13.0.0_windows.exe
+            cuda-components:
+              - '"cudart"'
+              - '"nvcc"'
+              - '"cublas"'
+              - '"cublas_dev"'
+              - '"crt"'
+              - '"nvvm"'
+              - '"nvptxcompiler"'
+            cuda-version: '13.0'
+            flags: ''
+            runner_dir: 'cuda_v13'
+          - os: windows
+            arch: amd64
+            preset: 'ROCm 6'
+            install: https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-24.Q4-WinSvr2022-For-HIP.exe
+            rocm-version: '6.2'
+            flags: '-DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_FLAGS="-parallel-jobs=4 -Wno-ignored-attributes -Wno-deprecated-pragma" -DCMAKE_CXX_FLAGS="-parallel-jobs=4 -Wno-ignored-attributes -Wno-deprecated-pragma"'
+            runner_dir: 'rocm'
+    runs-on: ${{ matrix.arch == 'arm64' && format('{0}-{1}', matrix.os, matrix.arch) || matrix.os }}
+    environment: release
+    env:
+      GOFLAGS: ${{ needs.setup-environment.outputs.GOFLAGS }}
+    steps:
+      - name: Install system dependencies
+        run: |
+          choco install -y --no-progress ccache ninja
+          ccache -o cache_dir=${{ github.workspace }}\.ccache
+      - if: startsWith(matrix.preset, 'CUDA ') || startsWith(matrix.preset, 'ROCm ')
+        id: cache-install
+        uses: actions/cache/restore@v4
+        with:
+          path: |
+            C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
+            C:\Program Files\AMD\ROCm
+          key: ${{ matrix.install }}
+      - if: startsWith(matrix.preset, 'CUDA ')
+        name: Install CUDA ${{ matrix.cuda-version }}
+        run: |
+          $ErrorActionPreference = "Stop"
+          if ("${{ steps.cache-install.outputs.cache-hit }}" -ne 'true') {
+            Invoke-WebRequest -Uri "${{ matrix.install }}" -OutFile "install.exe"
+            $subpackages = @(${{ join(matrix.cuda-components, ', ') }}) | Foreach-Object {"${_}_${{ matrix.cuda-version }}"}
+            Start-Process -FilePath .\install.exe -ArgumentList (@("-s") + $subpackages) -NoNewWindow -Wait
+          }
+
+          $cudaPath = (Resolve-Path "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\*").path
+          echo "$cudaPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
+      - if: startsWith(matrix.preset, 'ROCm')
+        name: Install ROCm ${{ matrix.rocm-version }}
+        run: |
+          $ErrorActionPreference = "Stop"
+          if ("${{ steps.cache-install.outputs.cache-hit }}" -ne 'true') {
+            Invoke-WebRequest -Uri "${{ matrix.install }}" -OutFile "install.exe"
+            Start-Process -FilePath .\install.exe -ArgumentList '-install' -NoNewWindow -Wait
+          }
+
+          $hipPath = (Resolve-Path "C:\Program Files\AMD\ROCm\*").path
+          echo "$hipPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
+          echo "CC=$hipPath\bin\clang.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
+          echo "CXX=$hipPath\bin\clang++.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
+          echo "HIPCXX=$hipPath\bin\clang++.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
+          echo "HIP_PLATFORM=amd" | Out-File -FilePath $env:GITHUB_ENV -Append
+          echo "CMAKE_PREFIX_PATH=$hipPath" | Out-File -FilePath $env:GITHUB_ENV -Append
+      - if: matrix.preset == 'CPU'
+        run: |
+          echo "CC=clang.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
+          echo "CXX=clang++.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
+      - if: ${{ !cancelled() && steps.cache-install.outputs.cache-hit != 'true' }}
+        uses: actions/cache/save@v4
+        with:
+          path: |
+            C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
+            C:\Program Files\AMD\ROCm
+          key: ${{ matrix.install }}
+      - uses: actions/checkout@v4
+      - uses: actions/cache@v4
+        with:
+          path: ${{ github.workspace }}\.ccache
+          key: ccache-${{ matrix.os }}-${{ matrix.arch }}-${{ matrix.preset }}
+      - name: Build target "${{ matrix.preset }}"
+        run: |
+          Import-Module 'C:\Program Files\Microsoft Visual Studio\2022\Enterprise\Common7\Tools\Microsoft.VisualStudio.DevShell.dll'
+          Enter-VsDevShell -VsInstallPath 'C:\Program Files\Microsoft Visual Studio\2022\Enterprise' -SkipAutomaticLocation  -DevCmdArguments '-arch=x64 -no_logo'
+          cmake --preset "${{ matrix.preset }}" ${{ matrix.flags }} -DOLLAMA_RUNNER_DIR="${{ matrix.runner_dir }}"
+          cmake --build --parallel --preset "${{ matrix.preset }}"
+          cmake --install build --component "${{ startsWith(matrix.preset, 'CUDA ') && 'CUDA' || startsWith(matrix.preset, 'ROCm ') && 'HIP' || 'CPU' }}" --strip --parallel 8
+          Remove-Item -Path dist\lib\ollama\rocm\rocblas\library\*gfx906* -ErrorAction SilentlyContinue
+        env:
+          CMAKE_GENERATOR: Ninja
+      - uses: actions/upload-artifact@v4
+        with:
+          name: depends-${{ matrix.os }}-${{ matrix.arch }}-${{ matrix.preset }}
+          path: dist\*
+
+  windows-build:
+    strategy:
+      matrix:
+        os: [windows]
+        arch: [amd64, arm64]
+        include:
+        - os: windows
+          arch: amd64
+          llvmarch: x86_64
+        - os: windows
+          arch: arm64
+          llvmarch: aarch64
+    runs-on: ${{ matrix.arch == 'arm64' && format('{0}-{1}', matrix.os, matrix.arch) || matrix.os }}
+    environment: release
+    needs: [setup-environment]
+    env:
+      GOFLAGS: ${{ needs.setup-environment.outputs.GOFLAGS }}
+    steps:
+      - name: Install ARM64 system dependencies
+        if: matrix.arch == 'arm64'
+        run: |
+          $ErrorActionPreference = "Stop"
+          Set-ExecutionPolicy Bypass -Scope Process -Force
+          [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072
+          iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
+          echo "C:\ProgramData\chocolatey\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
+
+          choco install -y --no-progress git gzip
+          echo "C:\Program Files\Git\cmd" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
+      - name: Install clang and gcc-compat
+        run: |
+          $ErrorActionPreference = "Stop"
+          Set-ExecutionPolicy Bypass -Scope Process -Force
+          Invoke-WebRequest -Uri "https://github.com/mstorsjo/llvm-mingw/releases/download/20240619/llvm-mingw-20240619-ucrt-${{ matrix.llvmarch }}.zip" -OutFile "${{ runner.temp }}\llvm-mingw-ucrt.zip"
+          Expand-Archive -Path ${{ runner.temp }}\llvm-mingw-ucrt.zip -DestinationPath "C:\Program Files\"
+          $installPath=(Resolve-Path -Path "C:\Program Files\llvm-mingw-*-ucrt*").path
+          echo "$installPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
+      - uses: actions/checkout@v4
+      - uses: actions/setup-go@v5
+        with:
+          go-version-file: go.mod
+      - name: Verify gcc is actually clang
+        run: |
+          $ErrorActionPreference='Continue'
+          $version=& gcc -v 2>&1
+          $version=$version -join "`n"
+          echo "gcc is $version"
+          if ($version -notmatch 'clang') {
+            echo "ERROR: GCC must be clang for proper utf16 handling"
+            exit 1
+          }
+          $ErrorActionPreference='Stop'
+      - run: |
+          go build -o dist/${{ matrix.os }}-${{ matrix.arch }}/ .
+      - uses: actions/upload-artifact@v4
+        with:
+          name: build-${{ matrix.os }}-${{ matrix.arch }}
+          path: |
+            dist\${{ matrix.os }}-${{ matrix.arch }}\*.exe
+
+  linux-build:
+    strategy:
+      matrix:
+        include:
+          - os: linux
+            arch: amd64
+            target: archive_novulkan
+          - os: linux
+            arch: amd64
+            target: rocm
+          - os: linux
+            arch: arm64
+            target: archive_novulkan
+    runs-on: ${{ matrix.arch == 'arm64' && format('{0}-{1}', matrix.os, matrix.arch) || matrix.os }}
+    environment: release
+    needs: setup-environment
+    env:
+      GOFLAGS: ${{ needs.setup-environment.outputs.GOFLAGS }}
+    steps:
+      - uses: actions/checkout@v4
+      - uses: docker/setup-buildx-action@v3
+      - uses: docker/build-push-action@v6
+        with:
+          context: .
+          platforms: ${{ matrix.os }}/${{ matrix.arch }}
+          target: ${{ matrix.target }}
+          build-args: |
+            GOFLAGS=${{ env.GOFLAGS }}
+            CGO_CFLAGS=${{ env.CGO_CFLAGS }}
+            CGO_CXXFLAGS=${{ env.CGO_CXXFLAGS }}
+          outputs: type=local,dest=dist/${{ matrix.os }}-${{ matrix.arch }}
+          cache-from: type=registry,ref=${{ vars.DOCKER_REPO }}:latest
+          cache-to: type=inline
+      - run: |
+          for COMPONENT in bin/* lib/ollama/*; do
+            case "$COMPONENT" in
+              bin/ollama)                echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}.tar.in ;;
+              lib/ollama/*.so*)          echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}.tar.in ;;
+              lib/ollama/cuda_v*)        echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}.tar.in ;;
+              lib/ollama/cuda_jetpack5)  echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}-jetpack5.tar.in ;;
+              lib/ollama/cuda_jetpack6)  echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}-jetpack6.tar.in ;;
+              lib/ollama/rocm)           echo $COMPONENT >>ollama-${{ matrix.os }}-${{ matrix.arch }}-rocm.tar.in ;;
+            esac
+          done
+        working-directory: dist/${{ matrix.os }}-${{ matrix.arch }}
+      - run: |
+          echo "Manifests"
+          for ARCHIVE in dist/${{ matrix.os }}-${{ matrix.arch }}/*.tar.in ; do
+            echo $ARCHIVE
+            cat $ARCHIVE
+          done
+      - run: |
+          for ARCHIVE in dist/${{ matrix.os }}-${{ matrix.arch }}/*.tar.in; do
+            tar c -C dist/${{ matrix.os }}-${{ matrix.arch }} -T $ARCHIVE --owner 0 --group 0 | pigz -9vc >$(basename ${ARCHIVE//.*/}.tgz);
+          done
+      - uses: actions/upload-artifact@v4
+        with:
+          name: dist-${{ matrix.os }}-${{ matrix.arch }}-${{ matrix.target }}
+          path: |
+            *.tgz
+
+  # Build each Docker variant (OS, arch, and flavor) separately. Using QEMU is unreliable and slower.
+  docker-build-push:
+    strategy:
+      matrix:
+        include:
+          - os: linux
+            arch: arm64
+            target: novulkan
+            build-args: |
+              CGO_CFLAGS
+              CGO_CXXFLAGS
+              GOFLAGS
+          - os: linux
+            arch: amd64
+            target: novulkan
+            build-args: |
+              CGO_CFLAGS
+              CGO_CXXFLAGS
+              GOFLAGS
+          - os: linux
+            arch: amd64
+            suffix: '-rocm'
+            build-args: |
+              CGO_CFLAGS
+              CGO_CXXFLAGS
+              GOFLAGS
+              FLAVOR=rocm
+          - os: linux
+            arch: amd64
+            suffix: '-vulkan'
+            target: default
+            build-args: |
+              CGO_CFLAGS
+              CGO_CXXFLAGS
+              GOFLAGS
+    runs-on: ${{ matrix.arch == 'arm64' && format('{0}-{1}', matrix.os, matrix.arch) || matrix.os }}
+    environment: release
+    needs: setup-environment
+    env:
+      GOFLAGS: ${{ needs.setup-environment.outputs.GOFLAGS }}
+    steps:
+      - uses: actions/checkout@v4
+      - uses: docker/setup-buildx-action@v3
+      - uses: docker/login-action@v3
+        with:
+          username: ${{ vars.DOCKER_USER }}
+          password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
+      - id: build-push
+        uses: docker/build-push-action@v6
+        with:
+          context: .
+          platforms: ${{ matrix.os }}/${{ matrix.arch }}
+          target: ${{ matrix.target }}
+          build-args: ${{ matrix.build-args }}
+          outputs: type=image,name=${{ vars.DOCKER_REPO }},push-by-digest=true,name-canonical=true,push=true
+          cache-from: type=registry,ref=${{ vars.DOCKER_REPO }}:latest
+          cache-to: type=inline
+      - run: |
+          mkdir -p ${{ matrix.os }}-${{ matrix.arch }}
+          echo "${{ steps.build-push.outputs.digest }}" >${{ matrix.os }}-${{ matrix.arch }}-${{ matrix.suffix }}.txt
+        working-directory: ${{ runner.temp }}
+      - uses: actions/upload-artifact@v4
+        with:
+          name: digest-${{ matrix.os }}-${{ matrix.arch }}-${{ matrix.suffix }}
+          path: |
+            ${{ runner.temp }}/${{ matrix.os }}-${{ matrix.arch }}-${{ matrix.suffix }}.txt
+
+  # Merge Docker images for the same flavor into a single multi-arch manifest
+  docker-merge-push:
+    strategy:
+      matrix:
+        suffix: ['', '-rocm']
+    runs-on: linux
+    environment: release
+    needs: [docker-build-push]
+    steps:
+      - uses: docker/login-action@v3
+        with:
+          username: ${{ vars.DOCKER_USER }}
+          password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
+      - id: metadata
+        uses: docker/metadata-action@v4
+        with:
+          flavor: |
+            latest=false
+            suffix=${{ matrix.suffix }}
+          images: |
+            ${{ vars.DOCKER_REPO }}
+          tags: |
+            type=ref,enable=true,priority=600,prefix=pr-,event=pr
+            type=semver,pattern={{version}}
+      - uses: actions/download-artifact@v4
+        with:
+          pattern: digest-*
+          path: ${{ runner.temp }}
+          merge-multiple: true
+      - run: |
+          docker buildx imagetools create $(echo '${{ steps.metadata.outputs.json }}' | jq -cr '.tags | map("-t", .) | join(" ")') $(cat *-${{ matrix.suffix }}.txt | xargs printf '${{ vars.DOCKER_REPO }}@%s ')
+          docker buildx imagetools inspect ${{ vars.DOCKER_REPO }}:${{ steps.metadata.outputs.version }}
+        working-directory: ${{ runner.temp }}
+
+  # Trigger downstream release process
+  trigger:
+    runs-on: ubuntu-latest
+    environment: release
+    needs: [darwin-build, windows-build, windows-depends, linux-build]
+    permissions:
+      contents: write
+    env:
+      GH_TOKEN: ${{ github.token }}
+    steps:
+      - uses: actions/checkout@v4
+      - name: Create or update Release for tag
+        run: |
+          RELEASE_VERSION="$(echo ${GITHUB_REF_NAME} | cut -f1 -d-)"
+          echo "Looking for existing release for ${RELEASE_VERSION}"
+          OLD_TAG=$(gh release ls --json name,tagName | jq -r ".[] | select(.name == \"${RELEASE_VERSION}\") | .tagName")
+          if [ -n "$OLD_TAG" ]; then
+            echo "Updating release ${RELEASE_VERSION} to point to new tag ${GITHUB_REF_NAME}"
+            gh release edit ${OLD_TAG} --tag ${GITHUB_REF_NAME}
+          else
+            echo "Creating new release ${RELEASE_VERSION} pointing to tag ${GITHUB_REF_NAME}"
+            gh release create ${GITHUB_REF_NAME} \
+              --title ${RELEASE_VERSION} \
+              --draft \
+              --generate-notes \
+              --prerelease
+          fi
+      - name: Trigger downstream release process
+        run: |
+          curl -L \
+            -X POST \
+            -H "Accept: application/vnd.github+json" \
+            -H "Authorization: Bearer ${{ secrets.RELEASE_TOKEN }}" \
+            -H "X-GitHub-Api-Version: 2022-11-28" \
+            https://api.github.com/repos/ollama/${{ vars.RELEASE_REPO }}/dispatches \
+            -d "{\"event_type\": \"trigger-workflow\", \"client_payload\": {\"run_id\": \"${GITHUB_RUN_ID}\", \"version\": \"${GITHUB_REF_NAME#v}\", \"origin\": \"${GITHUB_REPOSITORY}\", \"publish\": \"1\"}}"
--- a/.github/workflows/tesla-k80-ci.yml
+++ b/.github/workflows/tesla-k80-ci.yml
@@ -1,53 +0,0 @@
-name: Tesla K80 Build
-
-on:
-  workflow_dispatch: # Manual trigger only
-
-jobs:
-  build:
-    runs-on: self-hosted
-
-    # Use specific labels if you want to target a particular self-hosted runner
-    # runs-on: [self-hosted, linux, cuda, tesla-k80]
-
-    timeout-minutes: 60 # Prevent hung jobs
-
-    steps:
-      - name: Checkout code
-        uses: actions/checkout@v4
-        with:
-          fetch-depth: 0 # Full history for accurate versioning
-
-      - name: Clean previous build
-        run: |
-          rm -rf build
-          rm -f ollama
-
-      - name: Configure CMake
-        run: |
-          CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake -B build
-        env:
-          CMAKE_BUILD_TYPE: Release
-
-      - name: Build C++/CUDA components
-        run: |
-          CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake --build build -j$(nproc)
-        timeout-minutes: 30
-
-      - name: Build Go binary
-        run: |
-          go build -v -o ollama .
-
-      - name: Verify binary was created
-        run: |
-          ls -lh ollama
-          ./ollama --version
-
-      - name: Upload ollama binary and libraries as artifact
-        uses: actions/upload-artifact@v4
-        with:
-          name: ollama-binary
-          path: |
-            ollama
-            build/lib/ollama/
-          retention-days: 7
--- a/.github/workflows/tesla-k80-multi-gpu-tests.yml
+++ b/.github/workflows/tesla-k80-multi-gpu-tests.yml
@@ -1,86 +0,0 @@
-name: Tesla K80 Multi-GPU Tests
-
-on:
-  workflow_dispatch: # Manual trigger only
-  schedule:
-    # Run weekly on Sundays at 2 AM UTC (less frequent than single-GPU tests)
-    - cron: "0 2 * * 0"
-
-jobs:
-  multi-gpu-test:
-    runs-on: self-hosted
-
-    timeout-minutes: 90 # Longer timeout for large models
-
-    steps:
-      - name: Checkout code
-        uses: actions/checkout@v4
-
-      - name: Download ollama binary from latest build
-        uses: dawidd6/action-download-artifact@v6
-        with:
-          workflow: tesla-k80-ci.yml
-          name: ollama-binary
-          github_token: ${{ secrets.GITHUB_TOKEN }}
-          check_artifacts: true
-          search_artifacts: true
-
-      - name: Make ollama binary executable
-        run: |
-          chmod +x ollama
-          ls -lh ollama
-          ./ollama --version
-
-      - name: Verify multi-GPU setup
-        run: |
-          nvidia-smi --list-gpus
-          GPU_COUNT=$(nvidia-smi --list-gpus | wc -l)
-          if [ "$GPU_COUNT" -lt 2 ]; then
-            echo "Error: Multi-GPU tests require at least 2 GPUs. Found: $GPU_COUNT"
-            exit 1
-          fi
-          echo "Found $GPU_COUNT GPUs - proceeding with multi-GPU tests"
-
-      - name: Build test-runner
-        run: |
-          cd cmd/test-runner
-          go mod init github.com/ollama/ollama/cmd/test-runner || true
-          go mod tidy
-          go build -o ../../test-runner .
-          cd ../..
-          ls -lh test-runner
-
-      - name: Validate multi-GPU test configuration
-        run: |
-          ./test-runner validate --config test/config/models.yaml
-
-      - name: Run multi-GPU tests
-        run: |
-          ./test-runner run --profile multi-gpu --config test/config/models.yaml --output test-report-multi-gpu --verbose
-        timeout-minutes: 60
-
-      - name: Check multi-GPU test results
-        run: |
-          if ! jq -e '.summary.failed == 0' test-report-multi-gpu.json; then
-            echo "Multi-GPU tests failed!"
-            jq '.results[] | select(.status == "FAILED")' test-report-multi-gpu.json
-            exit 1
-          fi
-          echo "All multi-GPU tests passed!"
-
-      - name: Display GPU memory usage
-        if: always()
-        run: |
-          echo "=== Final GPU Memory State ==="
-          nvidia-smi
-
-      - name: Upload multi-GPU test results
-        if: always()
-        uses: actions/upload-artifact@v4
-        with:
-          name: multi-gpu-test-results
-          path: |
-            test-report-multi-gpu.json
-            test-report-multi-gpu.md
-            ollama.log
-          retention-days: 30 # Keep longer for analysis
--- a/.github/workflows/tesla-k80-single-gpu-tests.yml
+++ b/.github/workflows/tesla-k80-single-gpu-tests.yml
@@ -1,94 +0,0 @@
-name: Tesla K80 Single-GPU Tests
-
-on:
-  workflow_dispatch: # Manual trigger only
-
-jobs:
-  test:
-    runs-on: self-hosted
-
-    timeout-minutes: 60
-
-    steps:
-      - name: Checkout code
-        uses: actions/checkout@v4
-
-      - name: Download ollama binary from latest build
-        uses: dawidd6/action-download-artifact@v6
-        with:
-          workflow: tesla-k80-ci.yml
-          name: ollama-binary
-          github_token: ${{ secrets.GITHUB_TOKEN }}
-          check_artifacts: true
-          search_artifacts: true
-
-      - name: Make ollama binary executable
-        run: |
-          chmod +x ollama
-          ls -lh ollama
-          ./ollama --version
-
-      - name: Build test-runner
-        run: |
-          cd cmd/test-runner
-          go mod init github.com/ollama/ollama/cmd/test-runner || true
-          go mod tidy
-          go build -o ../../test-runner .
-          cd ../..
-          ls -lh test-runner
-
-      - name: Validate test configuration
-        run: |
-          ./test-runner validate --config test/config/quick.yaml
-
-      - name: Run quick tests
-        run: |
-          ./test-runner run --profile quick --config test/config/quick.yaml --output test-report-quick --verbose
-        timeout-minutes: 10
-
-      - name: Check quick test results
-        run: |
-          if ! jq -e '.summary.failed == 0' test-report-quick.json; then
-            echo "Quick tests failed!"
-            jq '.results[] | select(.status == "FAILED")' test-report-quick.json
-            exit 1
-          fi
-          echo "Quick tests passed!"
-
-      - name: Upload quick test results
-        if: always()
-        uses: actions/upload-artifact@v4
-        with:
-          name: quick-test-results
-          path: |
-            test-report-quick.json
-            test-report-quick.md
-            ollama.log
-          retention-days: 7
-
-      - name: Run full tests (if quick tests passed)
-        if: success()
-        run: |
-          ./test-runner run --profile full --config test/config/models.yaml --output test-report-full --verbose
-        timeout-minutes: 45
-
-      - name: Check full test results
-        if: success()
-        run: |
-          if ! jq -e '.summary.failed == 0' test-report-full.json; then
-            echo "Full tests failed!"
-            jq '.results[] | select(.status == "FAILED")' test-report-full.json
-            exit 1
-          fi
-          echo "All tests passed!"
-
-      - name: Upload full test results
-        if: always()
-        uses: actions/upload-artifact@v4
-        with:
-          name: full-test-results
-          path: |
-            test-report-full.json
-            test-report-full.md
-            ollama.log
-          retention-days: 14
--- a/.github/workflows/test.yaml
+++ b/.github/workflows/test.yaml
@@ -0,0 +1,287 @@
+name: test
+
+concurrency:
+  # For PRs, later CI runs preempt previous ones. e.g. a force push on a PR
+  # cancels running CI jobs and starts all new ones.
+  #
+  # For non-PR pushes, concurrency.group needs to be unique for every distinct
+  # CI run we want to have happen. Use run_id, which in practice means all
+  # non-PR CI runs will be allowed to run without preempting each other.
+  group: ${{ github.workflow }}-$${{ github.pull_request.number || github.run_id }}
+  cancel-in-progress: true
+
+on:
+  pull_request:
+    paths:
+      - '**/*'
+      - '!docs/**'
+      - '!README.md'
+
+jobs:
+  changes:
+    runs-on: ubuntu-latest
+    outputs:
+      changed: ${{ steps.changes.outputs.changed }}
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+      - id: changes
+        run: |
+          changed() {
+            local BASE=${{ github.event.pull_request.base.sha }}
+            local HEAD=${{ github.event.pull_request.head.sha }}
+            local MERGE_BASE=$(git merge-base $BASE $HEAD)
+            git diff-tree -r --no-commit-id --name-only "$MERGE_BASE" "$HEAD" \
+              | xargs python3 -c "import sys; from pathlib import Path; print(any(Path(x).match(glob) for x in sys.argv[1:] for glob in '$*'.split(' ')))"
+          }
+
+          echo changed=$(changed 'llama/llama.cpp/**/*' 'ml/backend/ggml/ggml/**/*') | tee -a $GITHUB_OUTPUT
+
+  linux:
+    needs: [changes]
+    if: needs.changes.outputs.changed == 'True'
+    strategy:
+      matrix:
+        include:
+          - preset: CPU
+          - preset: CUDA
+            container: nvidia/cuda:13.0.0-devel-ubuntu22.04
+            flags: '-DCMAKE_CUDA_ARCHITECTURES=87'
+          - preset: ROCm
+            container: rocm/dev-ubuntu-22.04:6.1.2
+            extra-packages: rocm-libs
+            flags: '-DAMDGPU_TARGETS=gfx1010 -DCMAKE_PREFIX_PATH=/opt/rocm'
+          - preset: Vulkan
+            container: ubuntu:22.04
+            extra-packages: >
+              mesa-vulkan-drivers vulkan-tools
+              libvulkan1 libvulkan-dev
+              vulkan-sdk cmake ccache g++ make
+    runs-on: linux
+    container: ${{ matrix.container }}
+    steps:
+      - uses: actions/checkout@v4
+      - run: |
+          [ -n "${{ matrix.container }}" ] || sudo=sudo
+          $sudo apt-get update
+          # Add LunarG Vulkan SDK apt repo for Ubuntu 22.04
+          if [ "${{ matrix.preset }}" = "Vulkan" ]; then
+            $sudo apt-get install -y --no-install-recommends wget gnupg ca-certificates software-properties-common
+            wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | $sudo gpg --dearmor -o /usr/share/keyrings/lunarg-archive-keyring.gpg
+            # Use signed-by to bind the repo to the installed keyring to avoid NO_PUBKEY
+            echo "deb [signed-by=/usr/share/keyrings/lunarg-archive-keyring.gpg]  https://packages.lunarg.com/vulkan/1.4.313 jammy main" | $sudo tee /etc/apt/sources.list.d/lunarg-vulkan-1.4.313-jammy.list > /dev/null
+            $sudo apt-get update
+          fi
+          $sudo apt-get install -y cmake ccache ${{ matrix.extra-packages }}
+          # Export VULKAN_SDK if provided by LunarG package (defensive)
+          if [ -d "/usr/lib/x86_64-linux-gnu/vulkan" ] && [ "${{ matrix.preset }}" = "Vulkan" ]; then
+            echo "VULKAN_SDK=/usr" >> $GITHUB_ENV
+          fi
+        env:
+          DEBIAN_FRONTEND: noninteractive
+      - uses: actions/cache@v4
+        with:
+          path: /github/home/.cache/ccache
+          key: ccache-${{ runner.os }}-${{ runner.arch }}-${{ matrix.preset }}
+      - run: |
+          cmake --preset ${{ matrix.preset }} ${{ matrix.flags }}
+          cmake --build --preset ${{ matrix.preset }} --parallel
+
+  windows:
+    needs: [changes]
+    if: needs.changes.outputs.changed == 'True'
+    strategy:
+      matrix:
+        include:
+          - preset: CPU
+          - preset: CUDA
+            install: https://developer.download.nvidia.com/compute/cuda/13.0.0/local_installers/cuda_13.0.0_windows.exe
+            flags: '-DCMAKE_CUDA_ARCHITECTURES=80'
+            cuda-components:
+              - '"cudart"'
+              - '"nvcc"'
+              - '"cublas"'
+              - '"cublas_dev"'
+              - '"crt"'
+              - '"nvvm"'
+              - '"nvptxcompiler"'
+            cuda-version: '13.0'
+          - preset: ROCm
+            install: https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-24.Q4-WinSvr2022-For-HIP.exe
+            flags: '-DAMDGPU_TARGETS=gfx1010 -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_FLAGS="-parallel-jobs=4 -Wno-ignored-attributes -Wno-deprecated-pragma" -DCMAKE_CXX_FLAGS="-parallel-jobs=4 -Wno-ignored-attributes -Wno-deprecated-pragma"'
+          - preset: Vulkan
+            install: https://sdk.lunarg.com/sdk/download/1.4.321.1/windows/vulkansdk-windows-X64-1.4.321.1.exe
+    runs-on: windows
+    steps:
+      - run: |
+          choco install -y --no-progress ccache ninja
+          ccache -o cache_dir=${{ github.workspace }}\.ccache
+      - if: matrix.preset == 'CUDA' || matrix.preset == 'ROCm' || matrix.preset == 'Vulkan'
+        id: cache-install
+        uses: actions/cache/restore@v4
+        with:
+          path: |
+            C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
+            C:\Program Files\AMD\ROCm
+            C:\VulkanSDK
+          key: ${{ matrix.install }}
+      - if: matrix.preset == 'CUDA'
+        name: Install CUDA ${{ matrix.cuda-version }}
+        run: |
+          $ErrorActionPreference = "Stop"
+          if ("${{ steps.cache-install.outputs.cache-hit }}" -ne 'true') {
+            Invoke-WebRequest -Uri "${{ matrix.install }}" -OutFile "install.exe"
+            $subpackages = @(${{ join(matrix.cuda-components, ', ') }}) | Foreach-Object {"${_}_${{ matrix.cuda-version }}"}
+            Start-Process -FilePath .\install.exe -ArgumentList (@("-s") + $subpackages) -NoNewWindow -Wait
+          }
+
+          $cudaPath = (Resolve-Path "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\*").path
+          echo "$cudaPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
+      - if: matrix.preset == 'ROCm'
+        name: Install ROCm ${{ matrix.rocm-version }}
+        run: |
+          $ErrorActionPreference = "Stop"
+          if ("${{ steps.cache-install.outputs.cache-hit }}" -ne 'true') {
+            Invoke-WebRequest -Uri "${{ matrix.install }}" -OutFile "install.exe"
+            Start-Process -FilePath .\install.exe -ArgumentList '-install' -NoNewWindow -Wait
+          }
+
+          $hipPath = (Resolve-Path "C:\Program Files\AMD\ROCm\*").path
+          echo "$hipPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
+          echo "CC=$hipPath\bin\clang.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
+          echo "CXX=$hipPath\bin\clang++.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
+          echo "HIPCXX=$hipPath\bin\clang++.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
+          echo "HIP_PLATFORM=amd" | Out-File -FilePath $env:GITHUB_ENV -Append
+          echo "CMAKE_PREFIX_PATH=$hipPath" | Out-File -FilePath $env:GITHUB_ENV -Append
+      - if: matrix.preset == 'Vulkan'
+        name: Install Vulkan ${{ matrix.rocm-version }}
+        run: |
+          $ErrorActionPreference = "Stop"
+          if ("${{ steps.cache-install.outputs.cache-hit }}" -ne 'true') {
+            Invoke-WebRequest -Uri "${{ matrix.install }}" -OutFile "install.exe"
+            Start-Process -FilePath .\install.exe -ArgumentList "-c","--am","--al","in" -NoNewWindow -Wait
+          }
+          
+          $vulkanPath = (Resolve-Path "C:\VulkanSDK\*").path
+          echo "$vulkanPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
+          echo "VULKAN_SDK=$vulkanPath" >> $env:GITHUB_ENV
+      - if: ${{ !cancelled() && steps.cache-install.outputs.cache-hit != 'true' }}
+        uses: actions/cache/save@v4
+        with:
+          path: |
+            C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
+            C:\Program Files\AMD\ROCm
+          key: ${{ matrix.install }}
+      - uses: actions/checkout@v4
+      - uses: actions/cache@v4
+        with:
+          path: ${{ github.workspace }}\.ccache
+          key: ccache-${{ runner.os }}-${{ runner.arch }}-${{ matrix.preset }}
+      - run: |
+          Import-Module 'C:\Program Files\Microsoft Visual Studio\2022\Enterprise\Common7\Tools\Microsoft.VisualStudio.DevShell.dll'
+          Enter-VsDevShell -VsInstallPath 'C:\Program Files\Microsoft Visual Studio\2022\Enterprise' -SkipAutomaticLocation  -DevCmdArguments '-arch=x64 -no_logo'
+          cmake --preset "${{ matrix.preset }}" ${{ matrix.flags }}
+          cmake --build --parallel --preset "${{ matrix.preset }}"
+        env:
+          CMAKE_GENERATOR: Ninja
+
+  go_mod_tidy:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: check that 'go mod tidy' is clean
+        run: go mod tidy --diff || (echo "Please run 'go mod tidy'." && exit 1)
+
+  test:
+    strategy:
+      matrix:
+        os: [ubuntu-latest, macos-latest, windows-latest]
+    runs-on: ${{ matrix.os }}
+    env:
+      CGO_ENABLED: '1'
+      GOEXPERIMENT: 'synctest'
+    steps:
+      - name: checkout
+        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # 4.2.2
+
+      - name: cache restore
+        uses: actions/cache/restore@1bd1e32a3bdc45362d1e726936510720a7c30a57 # v4.2.0
+        with:
+          # Note: unlike the other setups, this is only grabbing the mod download
+          # cache, rather than the whole mod directory, as the download cache
+          # contains zips that can be unpacked in parallel faster than they can be
+          # fetched and extracted by tar
+          path: |
+            ~/.cache/go-build
+            ~/go/pkg/mod/cache
+            ~\AppData\Local\go-build
+          # NOTE: The -3- here should be incremented when the scheme of data to be
+          # cached changes (e.g. path above changes).
+          key: ${{ github.job }}-${{ runner.os }}-${{ matrix.goarch }}-${{ matrix.buildflags }}-go-3-${{ hashFiles('**/go.sum') }}-${{ github.run_id }}
+          restore-keys: |
+            ${{ github.job }}-${{ runner.os }}-${{ matrix.goarch }}-${{ matrix.buildflags }}-go-3-${{ hashFiles('**/go.sum') }}
+            ${{ github.job }}-${{ runner.os }}-${{ matrix.goarch }}-${{ matrix.buildflags }}-go-3-
+
+      - name: Setup Go
+        uses: actions/setup-go@v5
+        with:
+          # The caching strategy of setup-go is less than ideal, and wastes
+          # time by not saving artifacts due to small failures like the linter
+          # complaining, etc. This means subsequent have to rebuild their world
+          # again until all checks pass. For instance, if you mispell a word,
+          # you're punished until you fix it. This is more hostile than
+          # helpful.
+          cache: false
+
+          go-version-file: go.mod
+
+      # It is tempting to run this in a platform independent way, but the past
+      # shows this codebase will see introductions of platform specific code
+      # generation, and so we need to check this per platform to ensure we
+      # don't abuse go generate on specific platforms.
+      - name: check that 'go generate' is clean
+        if: always()
+        run: |
+          go generate ./...
+          git diff --name-only --exit-code || (echo "Please run 'go generate ./...'." && exit 1)
+
+      - name: go test
+        if: always()
+        run: go test -count=1 -benchtime=1x ./...
+
+      # TODO(bmizerany): replace this heavy tool with just the
+      # tools/checks/binaries we want and then make them all run in parallel
+      # across jobs, not on a single tiny vm on Github Actions.
+      - uses: golangci/golangci-lint-action@v6
+        with:
+          args: --timeout 10m0s -v
+
+      - name: cache save
+        # Always save the cache, even if the job fails. The artifacts produced
+        # during the building of test binaries are not all for naught. They can
+        # be used to speed up subsequent runs.
+        if: always()
+
+        uses: actions/cache/save@1bd1e32a3bdc45362d1e726936510720a7c30a57 # v4.2.0
+        with:
+          # Note: unlike the other setups, this is only grabbing the mod download
+          # cache, rather than the whole mod directory, as the download cache
+          # contains zips that can be unpacked in parallel faster than they can be
+          # fetched and extracted by tar
+          path: |
+            ~/.cache/go-build
+            ~/go/pkg/mod/cache
+            ~\AppData\Local\go-build
+          # NOTE: The -3- here should be incremented when the scheme of data to be
+          # cached changes (e.g. path above changes).
+          key: ${{ github.job }}-${{ runner.os }}-${{ matrix.goarch }}-${{ matrix.buildflags }}-go-3-${{ hashFiles('**/go.sum') }}-${{ github.run_id }}
+
+  patches:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Verify patches apply cleanly and do not change files
+        run: |
+          make -f Makefile.sync clean checkout apply-patches sync
+          git diff --compact-summary --exit-code