- Changed tesla-k80-ci.yml to manual trigger only, simplified to build-only workflow - Created tesla-k80-tests.yml for separate test execution (manual trigger) - Added .github/workflows/CLAUDE.md with comprehensive test framework design - Removed binary artifact upload (not needed for single self-hosted runner) - Replaced README.md with CLAUDE.md for better documentation structure Test framework plan: - Go-based test runner at cmd/test-runner/ - YAML configuration for multi-model testing - Server lifecycle management with log monitoring - API-based testing with structured reporting - Support for test profiles (quick/full/stress)
7.5 KiB
GitHub Actions Workflows - Tesla K80 Testing
Overview
This directory contains workflows for automated testing of ollama37 on Tesla K80 (CUDA Compute Capability 3.7) hardware.
Workflows
1. tesla-k80-ci.yml - Build Workflow
Trigger: Manual only (workflow_dispatch)
Purpose: Build the ollama binary with CUDA 3.7 support
Steps:
- Checkout code
- Clean previous build artifacts
- Configure CMake with GCC 10 and CUDA 11
- Build C++/CUDA components
- Build Go binary
- Verify binary
- Upload binary artifact
Artifacts: ollama-binary-{sha} - Compiled binary for the commit
2. tesla-k80-tests.yml - Test Workflow
Trigger: Manual only (workflow_dispatch)
Purpose: Run comprehensive tests using the test framework
Steps:
- Checkout code
- Verify ollama binary exists
- Run test-runner tool (see below)
- Upload test results and logs
Artifacts: Test reports, logs, analysis results
Test Framework Architecture
TODO: Implement Go-based Test Runner
Goal: Create a dedicated Go test orchestrator at cmd/test-runner/main.go that manages the complete test lifecycle for Tesla K80.
Task Breakdown
-
Design test configuration format
- Create
test/config/models.yaml- List of models to test with parameters - Define model test spec: name, size, expected behavior, test prompts
- Support test profiles: quick (small models), full (all sizes), stress test
- Create
-
Implement server lifecycle management
- Start
./ollama serveas subprocess - Capture stdout/stderr to log file
- Monitor server readiness (health check API)
- Graceful shutdown on test completion or failure
- Timeout handling for hung processes
- Start
-
Implement real-time log monitoring
- Goroutine to tail server logs
- Pattern matching for critical events:
- GPU initialization messages
- Model loading progress
- CUDA errors or warnings
- Memory allocation failures
- CPU fallback warnings
- Store events for later analysis
-
Implement model testing logic
- For each model in config:
- Pull model via API (if not cached)
- Wait for model ready
- Parse logs for GPU loading confirmation
- Send chat API request with test prompt
- Validate response (not empty, reasonable length, coherent)
- Check logs for errors during inference
- Record timing metrics (load time, first token, completion)
- For each model in config:
-
Implement test validation
- GPU loading verification:
- Parse logs for "loaded model" + GPU device ID
- Check for "offloading N layers to GPU"
- Verify no "using CPU backend" messages
- Response quality checks:
- Response not empty
- Minimum token count (avoid truncated responses)
- JSON structure valid (for API responses)
- Error detection:
- No CUDA errors in logs
- No OOM (out of memory) errors
- No model loading failures
- GPU loading verification:
-
Implement structured reporting
- Generate JSON report with:
- Test summary (pass/fail/skip counts)
- Per-model results (status, timings, errors)
- Log excerpts for failures
- GPU metrics (memory usage, utilization)
- Generate human-readable summary (markdown/text)
- Exit code: 0 for all pass, 1 for any failure
- Generate JSON report with:
-
Implement CLI interface
- Flags:
--config- Path to test config file--profile- Test profile to run (quick/full/stress)--ollama-bin- Path to ollama binary (default: ./ollama)--output- Report output path--verbose- Detailed logging--keep-models- Don't delete models after test
- Subcommands:
run- Run testsvalidate- Validate config onlylist- List available test profiles/models
- Flags:
-
Update GitHub Actions workflow
- Build test-runner binary in CI workflow
- Run test-runner in test workflow
- Parse JSON report for pass/fail
- Upload structured results as artifacts
File Structure
cmd/test-runner/
main.go # CLI entry point
config.go # Config loading and validation
server.go # Server lifecycle management
monitor.go # Log monitoring and parsing
test.go # Model test execution
validate.go # Response and log validation
report.go # Test report generation
test/config/
models.yaml # Default test configuration
quick.yaml # Quick test profile (small models)
full.yaml # Full test profile (all sizes)
.github/workflows/
tesla-k80-ci.yml # Build workflow (manual)
tesla-k80-tests.yml # Test workflow (manual, uses test-runner)
Example Test Configuration (models.yaml)
profiles:
quick:
models:
- name: gemma2:2b
prompts:
- "Hello, respond with a greeting."
min_response_tokens: 5
timeout: 30s
full:
models:
- name: gemma2:2b
prompts:
- "Hello, respond with a greeting."
- "What is 2+2?"
min_response_tokens: 5
timeout: 30s
- name: gemma3:4b
prompts:
- "Explain photosynthesis in one sentence."
min_response_tokens: 10
timeout: 60s
- name: gemma3:12b
prompts:
- "Write a haiku about GPUs."
min_response_tokens: 15
timeout: 120s
validation:
gpu_required: true
check_patterns:
success:
- "loaded model"
- "offload.*GPU"
failure:
- "CUDA.*error"
- "out of memory"
- "CPU backend"
Example Test Runner Usage
# Build test runner
go build -o test-runner ./cmd/test-runner
# Run quick test profile
./test-runner run --config test/config/models.yaml --profile quick
# Run full test with verbose output
./test-runner run --profile full --verbose --output test-report.json
# Validate config only
./test-runner validate --config test/config/models.yaml
# List available profiles
./test-runner list
Integration with GitHub Actions
- name: Build test runner
run: go build -o test-runner ./cmd/test-runner
- name: Run tests
run: |
./test-runner run --profile full --output test-report.json --verbose
timeout-minutes: 45
- name: Check test results
run: |
if ! jq -e '.summary.failed == 0' test-report.json; then
echo "Tests failed!"
jq '.failures' test-report.json
exit 1
fi
- name: Upload test report
uses: actions/upload-artifact@v4
with:
name: test-report
path: |
test-report.json
ollama.log
Prerequisites
Self-Hosted Runner Setup
-
Install GitHub Actions Runner on your Tesla K80 machine:
mkdir -p ~/actions-runner && cd ~/actions-runner curl -o actions-runner-linux-x64-2.XXX.X.tar.gz -L \ https://github.com/actions/runner/releases/download/vX.XXX.X/actions-runner-linux-x64-2.XXX.X.tar.gz tar xzf ./actions-runner-linux-x64-2.XXX.X.tar.gz # Configure (use token from GitHub) ./config.sh --url https://github.com/YOUR_USERNAME/ollama37 --token YOUR_TOKEN # Install and start as a service sudo ./svc.sh install sudo ./svc.sh start -
Verify runner environment has:
- CUDA 11.4+ toolkit installed
- GCC 10 at
/usr/local/bin/gccand/usr/local/bin/g++ - CMake 3.24+
- Go 1.24+
- NVIDIA driver with Tesla K80 support
- Network access to download models
Security Considerations
- Self-hosted runners should be on a secure, isolated machine
- Consider using runner groups to restrict repository access
- Do not use self-hosted runners for public repositories (untrusted PRs)
- Keep runner software updated