- Tesla K80 build and test workflow with self-hosted runner - Build using GCC 10 and CUDA 11.4 for Compute Capability 3.7 - Run unit tests, integration tests, and model inference tests - Test gemma2:2b model loading and GPU acceleration - Use Claude headless mode to analyze server logs and verify proper GPU initialization - Upload logs, analysis results, and binary artifacts - Comprehensive documentation in workflows README
GitHub Actions Workflows
Tesla K80 CI Workflow
The tesla-k80-ci.yml workflow builds and tests ollama with CUDA Compute Capability 3.7 support using a self-hosted runner.
Prerequisites
Self-Hosted Runner Setup
-
Install GitHub Actions Runner on your Tesla K80 machine:
# Navigate to your repository on GitHub: # Settings > Actions > Runners > New self-hosted runner # Follow the provided instructions to download and configure the runner mkdir -p ~/actions-runner && cd ~/actions-runner curl -o actions-runner-linux-x64-2.XXX.X.tar.gz -L \ https://github.com/actions/runner/releases/download/vX.XXX.X/actions-runner-linux-x64-2.XXX.X.tar.gz tar xzf ./actions-runner-linux-x64-2.XXX.X.tar.gz # Configure (use token from GitHub) ./config.sh --url https://github.com/YOUR_USERNAME/ollama37 --token YOUR_TOKEN # Install and start as a service sudo ./svc.sh install sudo ./svc.sh start -
Verify runner environment has:
- CUDA 11.4+ toolkit installed
- GCC 10 at
/usr/local/bin/gccand/usr/local/bin/g++ - CMake 3.24+
- Go 1.24+ (or let the workflow install it)
- NVIDIA driver with Tesla K80 support
- Network access to download Go dependencies and models
- Claude CLI installed and configured (
claude -pmust be available)- Install: Follow instructions at https://docs.claude.com/en/docs/claude-code/installation
- The runner needs API access to use Claude for log analysis
-
Optional: Add runner labels:
- You can add custom labels like
tesla-k80,cuda,gpuduring runner configuration - Then target specific runners by uncommenting the labeled
runs-online in the workflow
- You can add custom labels like
Environment Variables (Optional)
You can set repository secrets or environment variables for:
OLLAMA_DEBUG=1- Enable debug loggingOLLAMA_MODELS- Custom model storage path- Any other ollama configuration
Workflow Triggers
The workflow runs on:
- Push to
mainordevelopbranches - Pull requests to
mainbranch - Manual dispatch via GitHub Actions UI
Workflow Steps
- Environment Setup: Checkout code, install Go, display system info
- Build: Clean previous builds, configure CMake with GCC 10, build C++/CUDA components and Go binary
- Unit Tests: Run Go unit tests with race detector
- Integration Tests: Start ollama server, wait for ready, run integration tests
- Model Tests: Pull gemma2:2b, run inference, verify GPU acceleration
- Log Analysis: Use Claude headless mode to validate model loaded properly with Tesla K80
- Cleanup: Stop server, upload logs/artifacts
Artifacts
- ollama-logs-and-analysis (always): Server logs, Claude analysis prompt, and analysis result
- ollama-binary-{sha} (on success): Compiled ollama binary for the commit
Log Analysis with Claude
The workflow uses Claude in headless mode (claude -p) to intelligently analyze ollama server logs and verify proper Tesla K80 GPU initialization. This provides automated validation that:
- Model Loading: Gemma2:2b loaded without errors
- GPU Acceleration: CUDA properly detected and initialized for Compute 3.7
- No CPU Fallback: Model is running on GPU, not falling back to CPU
- No Compatibility Issues: No CUDA version warnings or errors
- Memory Allocation: Successful GPU memory allocation
- Inference Success: Model inference completed without errors
Analysis Results:
PASS: All checks passed, model working correctly with GPUWARN: <reason>: Model works but has warnings worth reviewingFAIL: <reason>: Critical issues detected, workflow fails
This approach is superior to simple grep/pattern matching because Claude can:
- Understand context and correlate multiple log entries
- Distinguish between critical errors and benign warnings
- Identify subtle issues like silent CPU fallback
- Provide human-readable explanations of problems
Example: If logs show "CUDA initialization successful" but later "using CPU backend", Claude will catch this inconsistency and fail the test, while simple pattern matching might miss it.
Customization
Testing different models
Uncomment and expand the "Test model operations" step:
- name: Test model operations
run: |
./ollama pull llama3.2:1b
./ollama run llama3.2:1b "test prompt" --verbose
nvidia-smi # Verify GPU was used
Running on specific branches
Modify the on section:
on:
push:
branches: [ main, develop, feature/* ]
Scheduled runs
Add cron schedule for nightly builds:
on:
schedule:
- cron: '0 2 * * *' # 2 AM daily
Troubleshooting
Runner offline: Check runner service status
sudo systemctl status actions.runner.*
Build failures: Check uploaded logs in Actions > workflow run > Artifacts
GPU not detected: Verify nvidia-smi works on the runner machine
Permissions: Ensure runner user has access to CUDA libraries and can bind to port 11434
Security Considerations
- Self-hosted runners should be on a secure, isolated machine
- Consider using runner groups to restrict which repositories can use the runner
- Do not use self-hosted runners for public repositories (untrusted PRs)
- Keep the runner software updated