# GitHub Actions Workflows - Tesla K80 Testing ## Overview This directory contains workflows for automated testing of ollama37 on Tesla K80 (CUDA Compute Capability 3.7) hardware. ## Workflows ### 1. tesla-k80-ci.yml - Build Workflow **Trigger**: Manual only (`workflow_dispatch`) **Purpose**: Build the ollama binary with CUDA 3.7 support **Steps**: 1. Checkout code 2. Clean previous build artifacts 3. Configure CMake with GCC 10 and CUDA 11 4. Build C++/CUDA components 5. Build Go binary 6. Verify binary 7. Upload binary artifact **Artifacts**: `ollama-binary-{sha}` - Compiled binary for the commit ### 2. tesla-k80-tests.yml - Test Workflow **Trigger**: Manual only (`workflow_dispatch`) **Purpose**: Run comprehensive tests using the test framework **Steps**: 1. Checkout code 2. Verify ollama binary exists 3. Run test-runner tool (see below) 4. Upload test results and logs **Artifacts**: Test reports, logs, analysis results ## Test Framework Architecture ### TODO: Implement Go-based Test Runner **Goal**: Create a dedicated Go test orchestrator at `cmd/test-runner/main.go` that manages the complete test lifecycle for Tesla K80. #### Task Breakdown 1. **Design test configuration format** - Create `test/config/models.yaml` - List of models to test with parameters - Define model test spec: name, size, expected behavior, test prompts - Support test profiles: quick (small models), full (all sizes), stress test 2. **Implement server lifecycle management** - Start `./ollama serve` as subprocess - Capture stdout/stderr to log file - Monitor server readiness (health check API) - Graceful shutdown on test completion or failure - Timeout handling for hung processes 3. **Implement real-time log monitoring** - Goroutine to tail server logs - Pattern matching for critical events: - GPU initialization messages - Model loading progress - CUDA errors or warnings - Memory allocation failures - CPU fallback warnings - Store events for later analysis 4. **Implement model testing logic** - For each model in config: - Pull model via API (if not cached) - Wait for model ready - Parse logs for GPU loading confirmation - Send chat API request with test prompt - Validate response (not empty, reasonable length, coherent) - Check logs for errors during inference - Record timing metrics (load time, first token, completion) 5. **Implement test validation** - GPU loading verification: - Parse logs for "loaded model" + GPU device ID - Check for "offloading N layers to GPU" - Verify no "using CPU backend" messages - Response quality checks: - Response not empty - Minimum token count (avoid truncated responses) - JSON structure valid (for API responses) - Error detection: - No CUDA errors in logs - No OOM (out of memory) errors - No model loading failures 6. **Implement structured reporting** - Generate JSON report with: - Test summary (pass/fail/skip counts) - Per-model results (status, timings, errors) - Log excerpts for failures - GPU metrics (memory usage, utilization) - Generate human-readable summary (markdown/text) - Exit code: 0 for all pass, 1 for any failure 7. **Implement CLI interface** - Flags: - `--config` - Path to test config file - `--profile` - Test profile to run (quick/full/stress) - `--ollama-bin` - Path to ollama binary (default: ./ollama) - `--output` - Report output path - `--verbose` - Detailed logging - `--keep-models` - Don't delete models after test - Subcommands: - `run` - Run tests - `validate` - Validate config only - `list` - List available test profiles/models 8. **Update GitHub Actions workflow** - Build test-runner binary in CI workflow - Run test-runner in test workflow - Parse JSON report for pass/fail - Upload structured results as artifacts #### File Structure ``` cmd/test-runner/ main.go # CLI entry point config.go # Config loading and validation server.go # Server lifecycle management monitor.go # Log monitoring and parsing test.go # Model test execution validate.go # Response and log validation report.go # Test report generation test/config/ models.yaml # Default test configuration quick.yaml # Quick test profile (small models) full.yaml # Full test profile (all sizes) .github/workflows/ tesla-k80-ci.yml # Build workflow (manual) tesla-k80-tests.yml # Test workflow (manual, uses test-runner) ``` #### Example Test Configuration (models.yaml) ```yaml profiles: quick: models: - name: gemma2:2b prompts: - "Hello, respond with a greeting." min_response_tokens: 5 timeout: 30s full: models: - name: gemma2:2b prompts: - "Hello, respond with a greeting." - "What is 2+2?" min_response_tokens: 5 timeout: 30s - name: gemma3:4b prompts: - "Explain photosynthesis in one sentence." min_response_tokens: 10 timeout: 60s - name: gemma3:12b prompts: - "Write a haiku about GPUs." min_response_tokens: 15 timeout: 120s validation: gpu_required: true check_patterns: success: - "loaded model" - "offload.*GPU" failure: - "CUDA.*error" - "out of memory" - "CPU backend" ``` #### Example Test Runner Usage ```bash # Build test runner go build -o test-runner ./cmd/test-runner # Run quick test profile ./test-runner run --config test/config/models.yaml --profile quick # Run full test with verbose output ./test-runner run --profile full --verbose --output test-report.json # Validate config only ./test-runner validate --config test/config/models.yaml # List available profiles ./test-runner list ``` #### Integration with GitHub Actions ```yaml - name: Build test runner run: go build -o test-runner ./cmd/test-runner - name: Run tests run: | ./test-runner run --profile full --output test-report.json --verbose timeout-minutes: 45 - name: Check test results run: | if ! jq -e '.summary.failed == 0' test-report.json; then echo "Tests failed!" jq '.failures' test-report.json exit 1 fi - name: Upload test report uses: actions/upload-artifact@v4 with: name: test-report path: | test-report.json ollama.log ``` ## Prerequisites ### Self-Hosted Runner Setup 1. **Install GitHub Actions Runner on your Tesla K80 machine**: ```bash mkdir -p ~/actions-runner && cd ~/actions-runner curl -o actions-runner-linux-x64-2.XXX.X.tar.gz -L \ https://github.com/actions/runner/releases/download/vX.XXX.X/actions-runner-linux-x64-2.XXX.X.tar.gz tar xzf ./actions-runner-linux-x64-2.XXX.X.tar.gz # Configure (use token from GitHub) ./config.sh --url https://github.com/YOUR_USERNAME/ollama37 --token YOUR_TOKEN # Install and start as a service sudo ./svc.sh install sudo ./svc.sh start ``` 2. **Verify runner environment has**: - CUDA 11.4+ toolkit installed - GCC 10 at `/usr/local/bin/gcc` and `/usr/local/bin/g++` - CMake 3.24+ - Go 1.24+ - NVIDIA driver with Tesla K80 support - Network access to download models ## Security Considerations - Self-hosted runners should be on a secure, isolated machine - Consider using runner groups to restrict repository access - Do not use self-hosted runners for public repositories (untrusted PRs) - Keep runner software updated