Split Tesla K80 workflows into build and test; add test framework plan

- Changed tesla-k80-ci.yml to manual trigger only, simplified to build-only workflow - Created tesla-k80-tests.yml for separate test execution (manual trigger) - Added .github/workflows/CLAUDE.md with comprehensive test framework design - Removed binary artifact upload (not needed for single self-hosted runner) - Replaced README.md with CLAUDE.md for better documentation structure Test framework plan: - Go-based test runner at cmd/test-runner/ - YAML configuration for multi-model testing - Server lifecycle management with log monitoring - API-based testing with structured reporting - Support for test profiles (quick/full/stress)
2025-12-18 11:47:07 +00:00 · 2025-10-30 10:59:52 +08:00
parent 7e317fdd74
commit b402b073c5
4 changed files with 420 additions and 299 deletions
--- a/.github/workflows/CLAUDE.md
+++ b/.github/workflows/CLAUDE.md
@@ -0,0 +1,266 @@
+# GitHub Actions Workflows - Tesla K80 Testing
+
+## Overview
+
+This directory contains workflows for automated testing of ollama37 on Tesla K80 (CUDA Compute Capability 3.7) hardware.
+
+## Workflows
+
+### 1. tesla-k80-ci.yml - Build Workflow
+**Trigger**: Manual only (`workflow_dispatch`)
+
+**Purpose**: Build the ollama binary with CUDA 3.7 support
+
+**Steps**:
+1. Checkout code
+2. Clean previous build artifacts
+3. Configure CMake with GCC 10 and CUDA 11
+4. Build C++/CUDA components
+5. Build Go binary
+6. Verify binary
+7. Upload binary artifact
+
+**Artifacts**: `ollama-binary-{sha}` - Compiled binary for the commit
+
+### 2. tesla-k80-tests.yml - Test Workflow
+**Trigger**: Manual only (`workflow_dispatch`)
+
+**Purpose**: Run comprehensive tests using the test framework
+
+**Steps**:
+1. Checkout code
+2. Verify ollama binary exists
+3. Run test-runner tool (see below)
+4. Upload test results and logs
+
+**Artifacts**: Test reports, logs, analysis results
+
+## Test Framework Architecture
+
+### TODO: Implement Go-based Test Runner
+
+**Goal**: Create a dedicated Go test orchestrator at `cmd/test-runner/main.go` that manages the complete test lifecycle for Tesla K80.
+
+#### Task Breakdown
+
+1. **Design test configuration format**
+   - Create `test/config/models.yaml` - List of models to test with parameters
+   - Define model test spec: name, size, expected behavior, test prompts
+   - Support test profiles: quick (small models), full (all sizes), stress test
+
+2. **Implement server lifecycle management**
+   - Start `./ollama serve` as subprocess
+   - Capture stdout/stderr to log file
+   - Monitor server readiness (health check API)
+   - Graceful shutdown on test completion or failure
+   - Timeout handling for hung processes
+
+3. **Implement real-time log monitoring**
+   - Goroutine to tail server logs
+   - Pattern matching for critical events:
+     - GPU initialization messages
+     - Model loading progress
+     - CUDA errors or warnings
+     - Memory allocation failures
+     - CPU fallback warnings
+   - Store events for later analysis
+
+4. **Implement model testing logic**
+   - For each model in config:
+     - Pull model via API (if not cached)
+     - Wait for model ready
+     - Parse logs for GPU loading confirmation
+     - Send chat API request with test prompt
+     - Validate response (not empty, reasonable length, coherent)
+     - Check logs for errors during inference
+     - Record timing metrics (load time, first token, completion)
+
+5. **Implement test validation**
+   - GPU loading verification:
+     - Parse logs for "loaded model" + GPU device ID
+     - Check for "offloading N layers to GPU"
+     - Verify no "using CPU backend" messages
+   - Response quality checks:
+     - Response not empty
+     - Minimum token count (avoid truncated responses)
+     - JSON structure valid (for API responses)
+   - Error detection:
+     - No CUDA errors in logs
+     - No OOM (out of memory) errors
+     - No model loading failures
+
+6. **Implement structured reporting**
+   - Generate JSON report with:
+     - Test summary (pass/fail/skip counts)
+     - Per-model results (status, timings, errors)
+     - Log excerpts for failures
+     - GPU metrics (memory usage, utilization)
+   - Generate human-readable summary (markdown/text)
+   - Exit code: 0 for all pass, 1 for any failure
+
+7. **Implement CLI interface**
+   - Flags:
+     - `--config` - Path to test config file
+     - `--profile` - Test profile to run (quick/full/stress)
+     - `--ollama-bin` - Path to ollama binary (default: ./ollama)
+     - `--output` - Report output path
+     - `--verbose` - Detailed logging
+     - `--keep-models` - Don't delete models after test
+   - Subcommands:
+     - `run` - Run tests
+     - `validate` - Validate config only
+     - `list` - List available test profiles/models
+
+8. **Update GitHub Actions workflow**
+   - Build test-runner binary in CI workflow
+   - Run test-runner in test workflow
+   - Parse JSON report for pass/fail
+   - Upload structured results as artifacts
+
+#### File Structure
+
+```
+cmd/test-runner/
+  main.go              # CLI entry point
+  config.go            # Config loading and validation
+  server.go            # Server lifecycle management
+  monitor.go           # Log monitoring and parsing
+  test.go              # Model test execution
+  validate.go          # Response and log validation
+  report.go            # Test report generation
+
+test/config/
+  models.yaml          # Default test configuration
+  quick.yaml           # Quick test profile (small models)
+  full.yaml            # Full test profile (all sizes)
+
+.github/workflows/
+  tesla-k80-ci.yml     # Build workflow (manual)
+  tesla-k80-tests.yml  # Test workflow (manual, uses test-runner)
+```
+
+#### Example Test Configuration (models.yaml)
+
+```yaml
+profiles:
+  quick:
+    models:
+      - name: gemma2:2b
+        prompts:
+          - "Hello, respond with a greeting."
+        min_response_tokens: 5
+        timeout: 30s
+      
+  full:
+    models:
+      - name: gemma2:2b
+        prompts:
+          - "Hello, respond with a greeting."
+          - "What is 2+2?"
+        min_response_tokens: 5
+        timeout: 30s
+      
+      - name: gemma3:4b
+        prompts:
+          - "Explain photosynthesis in one sentence."
+        min_response_tokens: 10
+        timeout: 60s
+      
+      - name: gemma3:12b
+        prompts:
+          - "Write a haiku about GPUs."
+        min_response_tokens: 15
+        timeout: 120s
+
+validation:
+  gpu_required: true
+  check_patterns:
+    success:
+      - "loaded model"
+      - "offload.*GPU"
+    failure:
+      - "CUDA.*error"
+      - "out of memory"
+      - "CPU backend"
+```
+
+#### Example Test Runner Usage
+
+```bash
+# Build test runner
+go build -o test-runner ./cmd/test-runner
+
+# Run quick test profile
+./test-runner run --config test/config/models.yaml --profile quick
+
+# Run full test with verbose output
+./test-runner run --profile full --verbose --output test-report.json
+
+# Validate config only
+./test-runner validate --config test/config/models.yaml
+
+# List available profiles
+./test-runner list
+```
+
+#### Integration with GitHub Actions
+
+```yaml
+- name: Build test runner
+  run: go build -o test-runner ./cmd/test-runner
+
+- name: Run tests
+  run: |
+    ./test-runner run --profile full --output test-report.json --verbose
+  timeout-minutes: 45
+
+- name: Check test results
+  run: |
+    if ! jq -e '.summary.failed == 0' test-report.json; then
+      echo "Tests failed!"
+      jq '.failures' test-report.json
+      exit 1
+    fi
+
+- name: Upload test report
+  uses: actions/upload-artifact@v4
+  with:
+    name: test-report
+    path: |
+      test-report.json
+      ollama.log
+```
+
+## Prerequisites
+
+### Self-Hosted Runner Setup
+
+1. **Install GitHub Actions Runner on your Tesla K80 machine**:
+   ```bash
+   mkdir -p ~/actions-runner && cd ~/actions-runner
+   curl -o actions-runner-linux-x64-2.XXX.X.tar.gz -L \
+     https://github.com/actions/runner/releases/download/vX.XXX.X/actions-runner-linux-x64-2.XXX.X.tar.gz
+   tar xzf ./actions-runner-linux-x64-2.XXX.X.tar.gz
+   
+   # Configure (use token from GitHub)
+   ./config.sh --url https://github.com/YOUR_USERNAME/ollama37 --token YOUR_TOKEN
+   
+   # Install and start as a service
+   sudo ./svc.sh install
+   sudo ./svc.sh start
+   ```
+
+2. **Verify runner environment has**:
+   - CUDA 11.4+ toolkit installed
+   - GCC 10 at `/usr/local/bin/gcc` and `/usr/local/bin/g++`
+   - CMake 3.24+
+   - Go 1.24+
+   - NVIDIA driver with Tesla K80 support
+   - Network access to download models
+
+## Security Considerations
+
+- Self-hosted runners should be on a secure, isolated machine
+- Consider using runner groups to restrict repository access
+- Do not use self-hosted runners for public repositories (untrusted PRs)
+- Keep runner software updated