From 92acf0f91ef8ef87550d034b5327107ed7021a45 Mon Sep 17 00:00:00 2001 From: Shang Chieh Tseng Date: Tue, 28 Oct 2025 18:09:49 +0800 Subject: [PATCH] Add GitHub Actions workflow for Tesla K80 CI/CD - Tesla K80 build and test workflow with self-hosted runner - Build using GCC 10 and CUDA 11.4 for Compute Capability 3.7 - Run unit tests, integration tests, and model inference tests - Test gemma2:2b model loading and GPU acceleration - Use Claude headless mode to analyze server logs and verify proper GPU initialization - Upload logs, analysis results, and binary artifacts - Comprehensive documentation in workflows README --- .github/workflows/README.md | 150 +++++++++++++++++++++++ .github/workflows/tesla-k80-ci.yml | 185 +++++++++++++++++++++++++++++ 2 files changed, 335 insertions(+) create mode 100644 .github/workflows/README.md create mode 100644 .github/workflows/tesla-k80-ci.yml diff --git a/.github/workflows/README.md b/.github/workflows/README.md new file mode 100644 index 00000000..17aa99f8 --- /dev/null +++ b/.github/workflows/README.md @@ -0,0 +1,150 @@ +# GitHub Actions Workflows + +## Tesla K80 CI Workflow + +The `tesla-k80-ci.yml` workflow builds and tests ollama with CUDA Compute Capability 3.7 support using a self-hosted runner. + +### Prerequisites + +#### Self-Hosted Runner Setup + +1. **Install GitHub Actions Runner on your Tesla K80 machine**: + ```bash + # Navigate to your repository on GitHub: + # Settings > Actions > Runners > New self-hosted runner + + # Follow the provided instructions to download and configure the runner + mkdir -p ~/actions-runner && cd ~/actions-runner + curl -o actions-runner-linux-x64-2.XXX.X.tar.gz -L \ + https://github.com/actions/runner/releases/download/vX.XXX.X/actions-runner-linux-x64-2.XXX.X.tar.gz + tar xzf ./actions-runner-linux-x64-2.XXX.X.tar.gz + + # Configure (use token from GitHub) + ./config.sh --url https://github.com/YOUR_USERNAME/ollama37 --token YOUR_TOKEN + + # Install and start as a service + sudo ./svc.sh install + sudo ./svc.sh start + ``` + +2. **Verify runner environment has**: + - CUDA 11.4+ toolkit installed + - GCC 10 at `/usr/local/bin/gcc` and `/usr/local/bin/g++` + - CMake 3.24+ + - Go 1.24+ (or let the workflow install it) + - NVIDIA driver with Tesla K80 support + - Network access to download Go dependencies and models + - **Claude CLI** installed and configured (`claude -p` must be available) + - Install: Follow instructions at https://docs.claude.com/en/docs/claude-code/installation + - The runner needs API access to use Claude for log analysis + +3. **Optional: Add runner labels**: + - You can add custom labels like `tesla-k80`, `cuda`, `gpu` during runner configuration + - Then target specific runners by uncommenting the labeled `runs-on` line in the workflow + +#### Environment Variables (Optional) + +You can set repository secrets or environment variables for: +- `OLLAMA_DEBUG=1` - Enable debug logging +- `OLLAMA_MODELS` - Custom model storage path +- Any other ollama configuration + +### Workflow Triggers + +The workflow runs on: +- **Push** to `main` or `develop` branches +- **Pull requests** to `main` branch +- **Manual dispatch** via GitHub Actions UI + +### Workflow Steps + +1. **Environment Setup**: Checkout code, install Go, display system info +2. **Build**: Clean previous builds, configure CMake with GCC 10, build C++/CUDA components and Go binary +3. **Unit Tests**: Run Go unit tests with race detector +4. **Integration Tests**: Start ollama server, wait for ready, run integration tests +5. **Model Tests**: Pull gemma2:2b, run inference, verify GPU acceleration +6. **Log Analysis**: Use Claude headless mode to validate model loaded properly with Tesla K80 +7. **Cleanup**: Stop server, upload logs/artifacts + +### Artifacts + +- **ollama-logs-and-analysis** (always): Server logs, Claude analysis prompt, and analysis result +- **ollama-binary-{sha}** (on success): Compiled ollama binary for the commit + +### Log Analysis with Claude + +The workflow uses Claude in headless mode (`claude -p`) to intelligently analyze ollama server logs and verify proper Tesla K80 GPU initialization. This provides automated validation that: + +1. **Model Loading**: Gemma2:2b loaded without errors +2. **GPU Acceleration**: CUDA properly detected and initialized for Compute 3.7 +3. **No CPU Fallback**: Model is running on GPU, not falling back to CPU +4. **No Compatibility Issues**: No CUDA version warnings or errors +5. **Memory Allocation**: Successful GPU memory allocation +6. **Inference Success**: Model inference completed without errors + +**Analysis Results**: +- `PASS`: All checks passed, model working correctly with GPU +- `WARN: `: Model works but has warnings worth reviewing +- `FAIL: `: Critical issues detected, workflow fails + +This approach is superior to simple grep/pattern matching because Claude can: +- Understand context and correlate multiple log entries +- Distinguish between critical errors and benign warnings +- Identify subtle issues like silent CPU fallback +- Provide human-readable explanations of problems + +**Example**: If logs show "CUDA initialization successful" but later "using CPU backend", Claude will catch this inconsistency and fail the test, while simple pattern matching might miss it. + +### Customization + +#### Testing different models + +Uncomment and expand the "Test model operations" step: + +```yaml +- name: Test model operations + run: | + ./ollama pull llama3.2:1b + ./ollama run llama3.2:1b "test prompt" --verbose + nvidia-smi # Verify GPU was used +``` + +#### Running on specific branches + +Modify the `on` section: + +```yaml +on: + push: + branches: [ main, develop, feature/* ] +``` + +#### Scheduled runs + +Add cron schedule for nightly builds: + +```yaml +on: + schedule: + - cron: '0 2 * * *' # 2 AM daily +``` + +### Troubleshooting + +**Runner offline**: Check runner service status +```bash +sudo systemctl status actions.runner.* +``` + +**Build failures**: Check uploaded logs in Actions > workflow run > Artifacts + +**GPU not detected**: Verify `nvidia-smi` works on the runner machine + +**Permissions**: Ensure runner user has access to CUDA libraries and can bind to port 11434 + +### Security Considerations + +- Self-hosted runners should be on a secure, isolated machine +- Consider using runner groups to restrict which repositories can use the runner +- Do not use self-hosted runners for public repositories (untrusted PRs) +- Keep the runner software updated diff --git a/.github/workflows/tesla-k80-ci.yml b/.github/workflows/tesla-k80-ci.yml new file mode 100644 index 00000000..07400dc1 --- /dev/null +++ b/.github/workflows/tesla-k80-ci.yml @@ -0,0 +1,185 @@ +name: Tesla K80 Build and Test + +on: + push: + branches: [main, develop] + pull_request: + branches: [main] + workflow_dispatch: # Allow manual trigger + +jobs: + build-and-test: + runs-on: self-hosted + + # Use specific labels if you want to target a particular self-hosted runner + # runs-on: [self-hosted, linux, cuda, tesla-k80] + + timeout-minutes: 60 # Prevent hung jobs + + steps: + - name: Checkout code + uses: actions/checkout@v4 + with: + fetch-depth: 0 # Full history for accurate versioning + + - name: Clean previous build + run: | + rm -rf build + rm -f ollama + + - name: Configure CMake + run: | + CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake -B build + env: + CMAKE_BUILD_TYPE: Release + + - name: Build C++/CUDA components + run: | + CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake --build build --config Release + timeout-minutes: 30 + + - name: Build Go binary + run: | + go build -v -o ollama . + + - name: Verify binary + run: | + ls -lh ollama + file ollama + ./ollama --version + + - name: Run Go unit tests + run: | + go test -v -race -timeout 10m ./... + continue-on-error: false + + - name: Start ollama server (background) + run: | + ./ollama serve > ollama.log 2>&1 & + echo $! > ollama.pid + echo "Ollama server started with PID $(cat ollama.pid)" + + - name: Wait for server to be ready + run: | + for i in {1..30}; do + if curl -s http://localhost:11434/api/tags > /dev/null 2>&1; then + echo "Server is ready!" + exit 0 + fi + echo "Waiting for server... attempt $i/30" + sleep 2 + done + echo "Server failed to start" + cat ollama.log + exit 1 + + - name: Run integration tests + run: | + go test -v -timeout 20m ./integration/... + continue-on-error: false + + - name: Clear server logs for model test + run: | + # Truncate log file to start fresh for model loading test + > ollama.log + + - name: Pull gemma3:4b model + run: | + echo "Pulling gemma3:4b model..." + ./ollama pull gemma2:2b + echo "Model pull completed" + timeout-minutes: 15 + + - name: Run inference with gemma3:4b + run: | + echo "Running inference test..." + ./ollama run gemma2:2b "Hello, this is a test. Please respond with a short greeting." --verbose + echo "Inference completed" + timeout-minutes: 5 + + - name: Wait for logs to flush + run: sleep 3 + + - name: Analyze server logs with Claude + run: | + echo "Analyzing ollama server logs for proper model loading..." + + # Create analysis prompt + cat > log_analysis_prompt.txt << 'EOF' + Analyze the following Ollama server logs from a Tesla K80 (CUDA Compute Capability 3.7) system. + + Verify that: + 1. The model loaded successfully without errors + 2. CUDA/GPU acceleration was properly detected and initialized + 3. The model is using the Tesla K80 GPU (not CPU fallback) + 4. There are no CUDA compatibility warnings or errors + 5. Memory allocation was successful + 6. Inference completed without errors + + Respond with: + - "PASS" if all checks pass and model loaded properly with GPU acceleration + - "FAIL: " if there are critical issues + - "WARN: " if there are warnings but model works + + Be specific about what succeeded or failed. Look for CUDA errors, memory issues, or CPU fallback warnings. + + Server logs: + --- + EOF + + cat ollama.log >> log_analysis_prompt.txt + + # Run Claude in headless mode to analyze + claude -p log_analysis_prompt.txt > log_analysis_result.txt + + echo "=== Claude Analysis Result ===" + cat log_analysis_result.txt + + # Check if analysis passed + if grep -q "^PASS" log_analysis_result.txt; then + echo "✓ Log analysis PASSED - Model loaded correctly on Tesla K80" + exit 0 + elif grep -q "^WARN" log_analysis_result.txt; then + echo "⚠ Log analysis has WARNINGS - Review needed" + cat log_analysis_result.txt + exit 0 # Don't fail on warnings, but they're visible + else + echo "✗ Log analysis FAILED - Model loading issues detected" + cat log_analysis_result.txt + exit 1 + fi + + - name: Check GPU memory usage + if: always() + run: | + echo "=== GPU Memory Status ===" + nvidia-smi --query-gpu=memory.used,memory.total --format=csv + + - name: Stop ollama server + if: always() + run: | + if [ -f ollama.pid ]; then + kill $(cat ollama.pid) || true + rm ollama.pid + fi + pkill -f "ollama serve" || true + + - name: Upload logs and analysis + if: always() + uses: actions/upload-artifact@v4 + with: + name: ollama-logs-and-analysis + path: | + ollama.log + log_analysis_prompt.txt + log_analysis_result.txt + build/**/*.log + retention-days: 7 + + - name: Upload binary artifact + if: success() + uses: actions/upload-artifact@v4 + with: + name: ollama-binary-${{ github.sha }} + path: ollama + retention-days: 14