mirror of
https://github.com/dogkeeper886/ollama37.git
synced 2025-12-10 15:57:04 +00:00
- Changed tesla-k80-ci.yml to manual trigger only, simplified to build-only workflow - Created tesla-k80-tests.yml for separate test execution (manual trigger) - Added .github/workflows/CLAUDE.md with comprehensive test framework design - Removed binary artifact upload (not needed for single self-hosted runner) - Replaced README.md with CLAUDE.md for better documentation structure Test framework plan: - Go-based test runner at cmd/test-runner/ - YAML configuration for multi-model testing - Server lifecycle management with log monitoring - API-based testing with structured reporting - Support for test profiles (quick/full/stress)
267 lines
7.5 KiB
Markdown
267 lines
7.5 KiB
Markdown
# GitHub Actions Workflows - Tesla K80 Testing
|
|
|
|
## Overview
|
|
|
|
This directory contains workflows for automated testing of ollama37 on Tesla K80 (CUDA Compute Capability 3.7) hardware.
|
|
|
|
## Workflows
|
|
|
|
### 1. tesla-k80-ci.yml - Build Workflow
|
|
**Trigger**: Manual only (`workflow_dispatch`)
|
|
|
|
**Purpose**: Build the ollama binary with CUDA 3.7 support
|
|
|
|
**Steps**:
|
|
1. Checkout code
|
|
2. Clean previous build artifacts
|
|
3. Configure CMake with GCC 10 and CUDA 11
|
|
4. Build C++/CUDA components
|
|
5. Build Go binary
|
|
6. Verify binary
|
|
7. Upload binary artifact
|
|
|
|
**Artifacts**: `ollama-binary-{sha}` - Compiled binary for the commit
|
|
|
|
### 2. tesla-k80-tests.yml - Test Workflow
|
|
**Trigger**: Manual only (`workflow_dispatch`)
|
|
|
|
**Purpose**: Run comprehensive tests using the test framework
|
|
|
|
**Steps**:
|
|
1. Checkout code
|
|
2. Verify ollama binary exists
|
|
3. Run test-runner tool (see below)
|
|
4. Upload test results and logs
|
|
|
|
**Artifacts**: Test reports, logs, analysis results
|
|
|
|
## Test Framework Architecture
|
|
|
|
### TODO: Implement Go-based Test Runner
|
|
|
|
**Goal**: Create a dedicated Go test orchestrator at `cmd/test-runner/main.go` that manages the complete test lifecycle for Tesla K80.
|
|
|
|
#### Task Breakdown
|
|
|
|
1. **Design test configuration format**
|
|
- Create `test/config/models.yaml` - List of models to test with parameters
|
|
- Define model test spec: name, size, expected behavior, test prompts
|
|
- Support test profiles: quick (small models), full (all sizes), stress test
|
|
|
|
2. **Implement server lifecycle management**
|
|
- Start `./ollama serve` as subprocess
|
|
- Capture stdout/stderr to log file
|
|
- Monitor server readiness (health check API)
|
|
- Graceful shutdown on test completion or failure
|
|
- Timeout handling for hung processes
|
|
|
|
3. **Implement real-time log monitoring**
|
|
- Goroutine to tail server logs
|
|
- Pattern matching for critical events:
|
|
- GPU initialization messages
|
|
- Model loading progress
|
|
- CUDA errors or warnings
|
|
- Memory allocation failures
|
|
- CPU fallback warnings
|
|
- Store events for later analysis
|
|
|
|
4. **Implement model testing logic**
|
|
- For each model in config:
|
|
- Pull model via API (if not cached)
|
|
- Wait for model ready
|
|
- Parse logs for GPU loading confirmation
|
|
- Send chat API request with test prompt
|
|
- Validate response (not empty, reasonable length, coherent)
|
|
- Check logs for errors during inference
|
|
- Record timing metrics (load time, first token, completion)
|
|
|
|
5. **Implement test validation**
|
|
- GPU loading verification:
|
|
- Parse logs for "loaded model" + GPU device ID
|
|
- Check for "offloading N layers to GPU"
|
|
- Verify no "using CPU backend" messages
|
|
- Response quality checks:
|
|
- Response not empty
|
|
- Minimum token count (avoid truncated responses)
|
|
- JSON structure valid (for API responses)
|
|
- Error detection:
|
|
- No CUDA errors in logs
|
|
- No OOM (out of memory) errors
|
|
- No model loading failures
|
|
|
|
6. **Implement structured reporting**
|
|
- Generate JSON report with:
|
|
- Test summary (pass/fail/skip counts)
|
|
- Per-model results (status, timings, errors)
|
|
- Log excerpts for failures
|
|
- GPU metrics (memory usage, utilization)
|
|
- Generate human-readable summary (markdown/text)
|
|
- Exit code: 0 for all pass, 1 for any failure
|
|
|
|
7. **Implement CLI interface**
|
|
- Flags:
|
|
- `--config` - Path to test config file
|
|
- `--profile` - Test profile to run (quick/full/stress)
|
|
- `--ollama-bin` - Path to ollama binary (default: ./ollama)
|
|
- `--output` - Report output path
|
|
- `--verbose` - Detailed logging
|
|
- `--keep-models` - Don't delete models after test
|
|
- Subcommands:
|
|
- `run` - Run tests
|
|
- `validate` - Validate config only
|
|
- `list` - List available test profiles/models
|
|
|
|
8. **Update GitHub Actions workflow**
|
|
- Build test-runner binary in CI workflow
|
|
- Run test-runner in test workflow
|
|
- Parse JSON report for pass/fail
|
|
- Upload structured results as artifacts
|
|
|
|
#### File Structure
|
|
|
|
```
|
|
cmd/test-runner/
|
|
main.go # CLI entry point
|
|
config.go # Config loading and validation
|
|
server.go # Server lifecycle management
|
|
monitor.go # Log monitoring and parsing
|
|
test.go # Model test execution
|
|
validate.go # Response and log validation
|
|
report.go # Test report generation
|
|
|
|
test/config/
|
|
models.yaml # Default test configuration
|
|
quick.yaml # Quick test profile (small models)
|
|
full.yaml # Full test profile (all sizes)
|
|
|
|
.github/workflows/
|
|
tesla-k80-ci.yml # Build workflow (manual)
|
|
tesla-k80-tests.yml # Test workflow (manual, uses test-runner)
|
|
```
|
|
|
|
#### Example Test Configuration (models.yaml)
|
|
|
|
```yaml
|
|
profiles:
|
|
quick:
|
|
models:
|
|
- name: gemma2:2b
|
|
prompts:
|
|
- "Hello, respond with a greeting."
|
|
min_response_tokens: 5
|
|
timeout: 30s
|
|
|
|
full:
|
|
models:
|
|
- name: gemma2:2b
|
|
prompts:
|
|
- "Hello, respond with a greeting."
|
|
- "What is 2+2?"
|
|
min_response_tokens: 5
|
|
timeout: 30s
|
|
|
|
- name: gemma3:4b
|
|
prompts:
|
|
- "Explain photosynthesis in one sentence."
|
|
min_response_tokens: 10
|
|
timeout: 60s
|
|
|
|
- name: gemma3:12b
|
|
prompts:
|
|
- "Write a haiku about GPUs."
|
|
min_response_tokens: 15
|
|
timeout: 120s
|
|
|
|
validation:
|
|
gpu_required: true
|
|
check_patterns:
|
|
success:
|
|
- "loaded model"
|
|
- "offload.*GPU"
|
|
failure:
|
|
- "CUDA.*error"
|
|
- "out of memory"
|
|
- "CPU backend"
|
|
```
|
|
|
|
#### Example Test Runner Usage
|
|
|
|
```bash
|
|
# Build test runner
|
|
go build -o test-runner ./cmd/test-runner
|
|
|
|
# Run quick test profile
|
|
./test-runner run --config test/config/models.yaml --profile quick
|
|
|
|
# Run full test with verbose output
|
|
./test-runner run --profile full --verbose --output test-report.json
|
|
|
|
# Validate config only
|
|
./test-runner validate --config test/config/models.yaml
|
|
|
|
# List available profiles
|
|
./test-runner list
|
|
```
|
|
|
|
#### Integration with GitHub Actions
|
|
|
|
```yaml
|
|
- name: Build test runner
|
|
run: go build -o test-runner ./cmd/test-runner
|
|
|
|
- name: Run tests
|
|
run: |
|
|
./test-runner run --profile full --output test-report.json --verbose
|
|
timeout-minutes: 45
|
|
|
|
- name: Check test results
|
|
run: |
|
|
if ! jq -e '.summary.failed == 0' test-report.json; then
|
|
echo "Tests failed!"
|
|
jq '.failures' test-report.json
|
|
exit 1
|
|
fi
|
|
|
|
- name: Upload test report
|
|
uses: actions/upload-artifact@v4
|
|
with:
|
|
name: test-report
|
|
path: |
|
|
test-report.json
|
|
ollama.log
|
|
```
|
|
|
|
## Prerequisites
|
|
|
|
### Self-Hosted Runner Setup
|
|
|
|
1. **Install GitHub Actions Runner on your Tesla K80 machine**:
|
|
```bash
|
|
mkdir -p ~/actions-runner && cd ~/actions-runner
|
|
curl -o actions-runner-linux-x64-2.XXX.X.tar.gz -L \
|
|
https://github.com/actions/runner/releases/download/vX.XXX.X/actions-runner-linux-x64-2.XXX.X.tar.gz
|
|
tar xzf ./actions-runner-linux-x64-2.XXX.X.tar.gz
|
|
|
|
# Configure (use token from GitHub)
|
|
./config.sh --url https://github.com/YOUR_USERNAME/ollama37 --token YOUR_TOKEN
|
|
|
|
# Install and start as a service
|
|
sudo ./svc.sh install
|
|
sudo ./svc.sh start
|
|
```
|
|
|
|
2. **Verify runner environment has**:
|
|
- CUDA 11.4+ toolkit installed
|
|
- GCC 10 at `/usr/local/bin/gcc` and `/usr/local/bin/g++`
|
|
- CMake 3.24+
|
|
- Go 1.24+
|
|
- NVIDIA driver with Tesla K80 support
|
|
- Network access to download models
|
|
|
|
## Security Considerations
|
|
|
|
- Self-hosted runners should be on a secure, isolated machine
|
|
- Consider using runner groups to restrict repository access
|
|
- Do not use self-hosted runners for public repositories (untrusted PRs)
|
|
- Keep runner software updated
|