Files
ollama37/.github/workflows/CLAUDE.md
Shang Chieh Tseng b402b073c5 Split Tesla K80 workflows into build and test; add test framework plan
- Changed tesla-k80-ci.yml to manual trigger only, simplified to build-only workflow
- Created tesla-k80-tests.yml for separate test execution (manual trigger)
- Added .github/workflows/CLAUDE.md with comprehensive test framework design
- Removed binary artifact upload (not needed for single self-hosted runner)
- Replaced README.md with CLAUDE.md for better documentation structure

Test framework plan:
- Go-based test runner at cmd/test-runner/
- YAML configuration for multi-model testing
- Server lifecycle management with log monitoring
- API-based testing with structured reporting
- Support for test profiles (quick/full/stress)
2025-10-30 10:59:52 +08:00

7.5 KiB

GitHub Actions Workflows - Tesla K80 Testing

Overview

This directory contains workflows for automated testing of ollama37 on Tesla K80 (CUDA Compute Capability 3.7) hardware.

Workflows

1. tesla-k80-ci.yml - Build Workflow

Trigger: Manual only (workflow_dispatch)

Purpose: Build the ollama binary with CUDA 3.7 support

Steps:

  1. Checkout code
  2. Clean previous build artifacts
  3. Configure CMake with GCC 10 and CUDA 11
  4. Build C++/CUDA components
  5. Build Go binary
  6. Verify binary
  7. Upload binary artifact

Artifacts: ollama-binary-{sha} - Compiled binary for the commit

2. tesla-k80-tests.yml - Test Workflow

Trigger: Manual only (workflow_dispatch)

Purpose: Run comprehensive tests using the test framework

Steps:

  1. Checkout code
  2. Verify ollama binary exists
  3. Run test-runner tool (see below)
  4. Upload test results and logs

Artifacts: Test reports, logs, analysis results

Test Framework Architecture

TODO: Implement Go-based Test Runner

Goal: Create a dedicated Go test orchestrator at cmd/test-runner/main.go that manages the complete test lifecycle for Tesla K80.

Task Breakdown

  1. Design test configuration format

    • Create test/config/models.yaml - List of models to test with parameters
    • Define model test spec: name, size, expected behavior, test prompts
    • Support test profiles: quick (small models), full (all sizes), stress test
  2. Implement server lifecycle management

    • Start ./ollama serve as subprocess
    • Capture stdout/stderr to log file
    • Monitor server readiness (health check API)
    • Graceful shutdown on test completion or failure
    • Timeout handling for hung processes
  3. Implement real-time log monitoring

    • Goroutine to tail server logs
    • Pattern matching for critical events:
      • GPU initialization messages
      • Model loading progress
      • CUDA errors or warnings
      • Memory allocation failures
      • CPU fallback warnings
    • Store events for later analysis
  4. Implement model testing logic

    • For each model in config:
      • Pull model via API (if not cached)
      • Wait for model ready
      • Parse logs for GPU loading confirmation
      • Send chat API request with test prompt
      • Validate response (not empty, reasonable length, coherent)
      • Check logs for errors during inference
      • Record timing metrics (load time, first token, completion)
  5. Implement test validation

    • GPU loading verification:
      • Parse logs for "loaded model" + GPU device ID
      • Check for "offloading N layers to GPU"
      • Verify no "using CPU backend" messages
    • Response quality checks:
      • Response not empty
      • Minimum token count (avoid truncated responses)
      • JSON structure valid (for API responses)
    • Error detection:
      • No CUDA errors in logs
      • No OOM (out of memory) errors
      • No model loading failures
  6. Implement structured reporting

    • Generate JSON report with:
      • Test summary (pass/fail/skip counts)
      • Per-model results (status, timings, errors)
      • Log excerpts for failures
      • GPU metrics (memory usage, utilization)
    • Generate human-readable summary (markdown/text)
    • Exit code: 0 for all pass, 1 for any failure
  7. Implement CLI interface

    • Flags:
      • --config - Path to test config file
      • --profile - Test profile to run (quick/full/stress)
      • --ollama-bin - Path to ollama binary (default: ./ollama)
      • --output - Report output path
      • --verbose - Detailed logging
      • --keep-models - Don't delete models after test
    • Subcommands:
      • run - Run tests
      • validate - Validate config only
      • list - List available test profiles/models
  8. Update GitHub Actions workflow

    • Build test-runner binary in CI workflow
    • Run test-runner in test workflow
    • Parse JSON report for pass/fail
    • Upload structured results as artifacts

File Structure

cmd/test-runner/
  main.go              # CLI entry point
  config.go            # Config loading and validation
  server.go            # Server lifecycle management
  monitor.go           # Log monitoring and parsing
  test.go              # Model test execution
  validate.go          # Response and log validation
  report.go            # Test report generation

test/config/
  models.yaml          # Default test configuration
  quick.yaml           # Quick test profile (small models)
  full.yaml            # Full test profile (all sizes)

.github/workflows/
  tesla-k80-ci.yml     # Build workflow (manual)
  tesla-k80-tests.yml  # Test workflow (manual, uses test-runner)

Example Test Configuration (models.yaml)

profiles:
  quick:
    models:
      - name: gemma2:2b
        prompts:
          - "Hello, respond with a greeting."
        min_response_tokens: 5
        timeout: 30s
      
  full:
    models:
      - name: gemma2:2b
        prompts:
          - "Hello, respond with a greeting."
          - "What is 2+2?"
        min_response_tokens: 5
        timeout: 30s
      
      - name: gemma3:4b
        prompts:
          - "Explain photosynthesis in one sentence."
        min_response_tokens: 10
        timeout: 60s
      
      - name: gemma3:12b
        prompts:
          - "Write a haiku about GPUs."
        min_response_tokens: 15
        timeout: 120s

validation:
  gpu_required: true
  check_patterns:
    success:
      - "loaded model"
      - "offload.*GPU"
    failure:
      - "CUDA.*error"
      - "out of memory"
      - "CPU backend"

Example Test Runner Usage

# Build test runner
go build -o test-runner ./cmd/test-runner

# Run quick test profile
./test-runner run --config test/config/models.yaml --profile quick

# Run full test with verbose output
./test-runner run --profile full --verbose --output test-report.json

# Validate config only
./test-runner validate --config test/config/models.yaml

# List available profiles
./test-runner list

Integration with GitHub Actions

- name: Build test runner
  run: go build -o test-runner ./cmd/test-runner

- name: Run tests
  run: |
    ./test-runner run --profile full --output test-report.json --verbose
  timeout-minutes: 45

- name: Check test results
  run: |
    if ! jq -e '.summary.failed == 0' test-report.json; then
      echo "Tests failed!"
      jq '.failures' test-report.json
      exit 1
    fi

- name: Upload test report
  uses: actions/upload-artifact@v4
  with:
    name: test-report
    path: |
      test-report.json
      ollama.log

Prerequisites

Self-Hosted Runner Setup

  1. Install GitHub Actions Runner on your Tesla K80 machine:

    mkdir -p ~/actions-runner && cd ~/actions-runner
    curl -o actions-runner-linux-x64-2.XXX.X.tar.gz -L \
      https://github.com/actions/runner/releases/download/vX.XXX.X/actions-runner-linux-x64-2.XXX.X.tar.gz
    tar xzf ./actions-runner-linux-x64-2.XXX.X.tar.gz
    
    # Configure (use token from GitHub)
    ./config.sh --url https://github.com/YOUR_USERNAME/ollama37 --token YOUR_TOKEN
    
    # Install and start as a service
    sudo ./svc.sh install
    sudo ./svc.sh start
    
  2. Verify runner environment has:

    • CUDA 11.4+ toolkit installed
    • GCC 10 at /usr/local/bin/gcc and /usr/local/bin/g++
    • CMake 3.24+
    • Go 1.24+
    • NVIDIA driver with Tesla K80 support
    • Network access to download models

Security Considerations

  • Self-hosted runners should be on a secure, isolated machine
  • Consider using runner groups to restrict repository access
  • Do not use self-hosted runners for public repositories (untrusted PRs)
  • Keep runner software updated