Files
ollama37/.github/workflows/CLAUDE.md
Shang Chieh Tseng b402b073c5 Split Tesla K80 workflows into build and test; add test framework plan
- Changed tesla-k80-ci.yml to manual trigger only, simplified to build-only workflow
- Created tesla-k80-tests.yml for separate test execution (manual trigger)
- Added .github/workflows/CLAUDE.md with comprehensive test framework design
- Removed binary artifact upload (not needed for single self-hosted runner)
- Replaced README.md with CLAUDE.md for better documentation structure

Test framework plan:
- Go-based test runner at cmd/test-runner/
- YAML configuration for multi-model testing
- Server lifecycle management with log monitoring
- API-based testing with structured reporting
- Support for test profiles (quick/full/stress)
2025-10-30 10:59:52 +08:00

267 lines
7.5 KiB
Markdown

# GitHub Actions Workflows - Tesla K80 Testing
## Overview
This directory contains workflows for automated testing of ollama37 on Tesla K80 (CUDA Compute Capability 3.7) hardware.
## Workflows
### 1. tesla-k80-ci.yml - Build Workflow
**Trigger**: Manual only (`workflow_dispatch`)
**Purpose**: Build the ollama binary with CUDA 3.7 support
**Steps**:
1. Checkout code
2. Clean previous build artifacts
3. Configure CMake with GCC 10 and CUDA 11
4. Build C++/CUDA components
5. Build Go binary
6. Verify binary
7. Upload binary artifact
**Artifacts**: `ollama-binary-{sha}` - Compiled binary for the commit
### 2. tesla-k80-tests.yml - Test Workflow
**Trigger**: Manual only (`workflow_dispatch`)
**Purpose**: Run comprehensive tests using the test framework
**Steps**:
1. Checkout code
2. Verify ollama binary exists
3. Run test-runner tool (see below)
4. Upload test results and logs
**Artifacts**: Test reports, logs, analysis results
## Test Framework Architecture
### TODO: Implement Go-based Test Runner
**Goal**: Create a dedicated Go test orchestrator at `cmd/test-runner/main.go` that manages the complete test lifecycle for Tesla K80.
#### Task Breakdown
1. **Design test configuration format**
- Create `test/config/models.yaml` - List of models to test with parameters
- Define model test spec: name, size, expected behavior, test prompts
- Support test profiles: quick (small models), full (all sizes), stress test
2. **Implement server lifecycle management**
- Start `./ollama serve` as subprocess
- Capture stdout/stderr to log file
- Monitor server readiness (health check API)
- Graceful shutdown on test completion or failure
- Timeout handling for hung processes
3. **Implement real-time log monitoring**
- Goroutine to tail server logs
- Pattern matching for critical events:
- GPU initialization messages
- Model loading progress
- CUDA errors or warnings
- Memory allocation failures
- CPU fallback warnings
- Store events for later analysis
4. **Implement model testing logic**
- For each model in config:
- Pull model via API (if not cached)
- Wait for model ready
- Parse logs for GPU loading confirmation
- Send chat API request with test prompt
- Validate response (not empty, reasonable length, coherent)
- Check logs for errors during inference
- Record timing metrics (load time, first token, completion)
5. **Implement test validation**
- GPU loading verification:
- Parse logs for "loaded model" + GPU device ID
- Check for "offloading N layers to GPU"
- Verify no "using CPU backend" messages
- Response quality checks:
- Response not empty
- Minimum token count (avoid truncated responses)
- JSON structure valid (for API responses)
- Error detection:
- No CUDA errors in logs
- No OOM (out of memory) errors
- No model loading failures
6. **Implement structured reporting**
- Generate JSON report with:
- Test summary (pass/fail/skip counts)
- Per-model results (status, timings, errors)
- Log excerpts for failures
- GPU metrics (memory usage, utilization)
- Generate human-readable summary (markdown/text)
- Exit code: 0 for all pass, 1 for any failure
7. **Implement CLI interface**
- Flags:
- `--config` - Path to test config file
- `--profile` - Test profile to run (quick/full/stress)
- `--ollama-bin` - Path to ollama binary (default: ./ollama)
- `--output` - Report output path
- `--verbose` - Detailed logging
- `--keep-models` - Don't delete models after test
- Subcommands:
- `run` - Run tests
- `validate` - Validate config only
- `list` - List available test profiles/models
8. **Update GitHub Actions workflow**
- Build test-runner binary in CI workflow
- Run test-runner in test workflow
- Parse JSON report for pass/fail
- Upload structured results as artifacts
#### File Structure
```
cmd/test-runner/
main.go # CLI entry point
config.go # Config loading and validation
server.go # Server lifecycle management
monitor.go # Log monitoring and parsing
test.go # Model test execution
validate.go # Response and log validation
report.go # Test report generation
test/config/
models.yaml # Default test configuration
quick.yaml # Quick test profile (small models)
full.yaml # Full test profile (all sizes)
.github/workflows/
tesla-k80-ci.yml # Build workflow (manual)
tesla-k80-tests.yml # Test workflow (manual, uses test-runner)
```
#### Example Test Configuration (models.yaml)
```yaml
profiles:
quick:
models:
- name: gemma2:2b
prompts:
- "Hello, respond with a greeting."
min_response_tokens: 5
timeout: 30s
full:
models:
- name: gemma2:2b
prompts:
- "Hello, respond with a greeting."
- "What is 2+2?"
min_response_tokens: 5
timeout: 30s
- name: gemma3:4b
prompts:
- "Explain photosynthesis in one sentence."
min_response_tokens: 10
timeout: 60s
- name: gemma3:12b
prompts:
- "Write a haiku about GPUs."
min_response_tokens: 15
timeout: 120s
validation:
gpu_required: true
check_patterns:
success:
- "loaded model"
- "offload.*GPU"
failure:
- "CUDA.*error"
- "out of memory"
- "CPU backend"
```
#### Example Test Runner Usage
```bash
# Build test runner
go build -o test-runner ./cmd/test-runner
# Run quick test profile
./test-runner run --config test/config/models.yaml --profile quick
# Run full test with verbose output
./test-runner run --profile full --verbose --output test-report.json
# Validate config only
./test-runner validate --config test/config/models.yaml
# List available profiles
./test-runner list
```
#### Integration with GitHub Actions
```yaml
- name: Build test runner
run: go build -o test-runner ./cmd/test-runner
- name: Run tests
run: |
./test-runner run --profile full --output test-report.json --verbose
timeout-minutes: 45
- name: Check test results
run: |
if ! jq -e '.summary.failed == 0' test-report.json; then
echo "Tests failed!"
jq '.failures' test-report.json
exit 1
fi
- name: Upload test report
uses: actions/upload-artifact@v4
with:
name: test-report
path: |
test-report.json
ollama.log
```
## Prerequisites
### Self-Hosted Runner Setup
1. **Install GitHub Actions Runner on your Tesla K80 machine**:
```bash
mkdir -p ~/actions-runner && cd ~/actions-runner
curl -o actions-runner-linux-x64-2.XXX.X.tar.gz -L \
https://github.com/actions/runner/releases/download/vX.XXX.X/actions-runner-linux-x64-2.XXX.X.tar.gz
tar xzf ./actions-runner-linux-x64-2.XXX.X.tar.gz
# Configure (use token from GitHub)
./config.sh --url https://github.com/YOUR_USERNAME/ollama37 --token YOUR_TOKEN
# Install and start as a service
sudo ./svc.sh install
sudo ./svc.sh start
```
2. **Verify runner environment has**:
- CUDA 11.4+ toolkit installed
- GCC 10 at `/usr/local/bin/gcc` and `/usr/local/bin/g++`
- CMake 3.24+
- Go 1.24+
- NVIDIA driver with Tesla K80 support
- Network access to download models
## Security Considerations
- Self-hosted runners should be on a secure, isolated machine
- Consider using runner groups to restrict repository access
- Do not use self-hosted runners for public repositories (untrusted PRs)
- Keep runner software updated