Add GitHub Actions workflow for Tesla K80 CI/CD

- Tesla K80 build and test workflow with self-hosted runner
- Build using GCC 10 and CUDA 11.4 for Compute Capability 3.7
- Run unit tests, integration tests, and model inference tests
- Test gemma2:2b model loading and GPU acceleration
- Use Claude headless mode to analyze server logs and verify proper GPU initialization
- Upload logs, analysis results, and binary artifacts
- Comprehensive documentation in workflows README
This commit is contained in:
Shang Chieh Tseng
2025-10-28 18:09:49 +08:00
parent fe0fd5b494
commit 92acf0f91e
2 changed files with 335 additions and 0 deletions

150
.github/workflows/README.md vendored Normal file
View File

@@ -0,0 +1,150 @@
# GitHub Actions Workflows
## Tesla K80 CI Workflow
The `tesla-k80-ci.yml` workflow builds and tests ollama with CUDA Compute Capability 3.7 support using a self-hosted runner.
### Prerequisites
#### Self-Hosted Runner Setup
1. **Install GitHub Actions Runner on your Tesla K80 machine**:
```bash
# Navigate to your repository on GitHub:
# Settings > Actions > Runners > New self-hosted runner
# Follow the provided instructions to download and configure the runner
mkdir -p ~/actions-runner && cd ~/actions-runner
curl -o actions-runner-linux-x64-2.XXX.X.tar.gz -L \
https://github.com/actions/runner/releases/download/vX.XXX.X/actions-runner-linux-x64-2.XXX.X.tar.gz
tar xzf ./actions-runner-linux-x64-2.XXX.X.tar.gz
# Configure (use token from GitHub)
./config.sh --url https://github.com/YOUR_USERNAME/ollama37 --token YOUR_TOKEN
# Install and start as a service
sudo ./svc.sh install
sudo ./svc.sh start
```
2. **Verify runner environment has**:
- CUDA 11.4+ toolkit installed
- GCC 10 at `/usr/local/bin/gcc` and `/usr/local/bin/g++`
- CMake 3.24+
- Go 1.24+ (or let the workflow install it)
- NVIDIA driver with Tesla K80 support
- Network access to download Go dependencies and models
- **Claude CLI** installed and configured (`claude -p` must be available)
- Install: Follow instructions at https://docs.claude.com/en/docs/claude-code/installation
- The runner needs API access to use Claude for log analysis
3. **Optional: Add runner labels**:
- You can add custom labels like `tesla-k80`, `cuda`, `gpu` during runner configuration
- Then target specific runners by uncommenting the labeled `runs-on` line in the workflow
#### Environment Variables (Optional)
You can set repository secrets or environment variables for:
- `OLLAMA_DEBUG=1` - Enable debug logging
- `OLLAMA_MODELS` - Custom model storage path
- Any other ollama configuration
### Workflow Triggers
The workflow runs on:
- **Push** to `main` or `develop` branches
- **Pull requests** to `main` branch
- **Manual dispatch** via GitHub Actions UI
### Workflow Steps
1. **Environment Setup**: Checkout code, install Go, display system info
2. **Build**: Clean previous builds, configure CMake with GCC 10, build C++/CUDA components and Go binary
3. **Unit Tests**: Run Go unit tests with race detector
4. **Integration Tests**: Start ollama server, wait for ready, run integration tests
5. **Model Tests**: Pull gemma2:2b, run inference, verify GPU acceleration
6. **Log Analysis**: Use Claude headless mode to validate model loaded properly with Tesla K80
7. **Cleanup**: Stop server, upload logs/artifacts
### Artifacts
- **ollama-logs-and-analysis** (always): Server logs, Claude analysis prompt, and analysis result
- **ollama-binary-{sha}** (on success): Compiled ollama binary for the commit
### Log Analysis with Claude
The workflow uses Claude in headless mode (`claude -p`) to intelligently analyze ollama server logs and verify proper Tesla K80 GPU initialization. This provides automated validation that:
1. **Model Loading**: Gemma2:2b loaded without errors
2. **GPU Acceleration**: CUDA properly detected and initialized for Compute 3.7
3. **No CPU Fallback**: Model is running on GPU, not falling back to CPU
4. **No Compatibility Issues**: No CUDA version warnings or errors
5. **Memory Allocation**: Successful GPU memory allocation
6. **Inference Success**: Model inference completed without errors
**Analysis Results**:
- `PASS`: All checks passed, model working correctly with GPU
- `WARN: <reason>`: Model works but has warnings worth reviewing
- `FAIL: <reason>`: Critical issues detected, workflow fails
This approach is superior to simple grep/pattern matching because Claude can:
- Understand context and correlate multiple log entries
- Distinguish between critical errors and benign warnings
- Identify subtle issues like silent CPU fallback
- Provide human-readable explanations of problems
**Example**: If logs show "CUDA initialization successful" but later "using CPU backend", Claude will catch this inconsistency and fail the test, while simple pattern matching might miss it.
### Customization
#### Testing different models
Uncomment and expand the "Test model operations" step:
```yaml
- name: Test model operations
run: |
./ollama pull llama3.2:1b
./ollama run llama3.2:1b "test prompt" --verbose
nvidia-smi # Verify GPU was used
```
#### Running on specific branches
Modify the `on` section:
```yaml
on:
push:
branches: [ main, develop, feature/* ]
```
#### Scheduled runs
Add cron schedule for nightly builds:
```yaml
on:
schedule:
- cron: '0 2 * * *' # 2 AM daily
```
### Troubleshooting
**Runner offline**: Check runner service status
```bash
sudo systemctl status actions.runner.*
```
**Build failures**: Check uploaded logs in Actions > workflow run > Artifacts
**GPU not detected**: Verify `nvidia-smi` works on the runner machine
**Permissions**: Ensure runner user has access to CUDA libraries and can bind to port 11434
### Security Considerations
- Self-hosted runners should be on a secure, isolated machine
- Consider using runner groups to restrict which repositories can use the runner
- Do not use self-hosted runners for public repositories (untrusted PRs)
- Keep the runner software updated

185
.github/workflows/tesla-k80-ci.yml vendored Normal file
View File

@@ -0,0 +1,185 @@
name: Tesla K80 Build and Test
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
workflow_dispatch: # Allow manual trigger
jobs:
build-and-test:
runs-on: self-hosted
# Use specific labels if you want to target a particular self-hosted runner
# runs-on: [self-hosted, linux, cuda, tesla-k80]
timeout-minutes: 60 # Prevent hung jobs
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0 # Full history for accurate versioning
- name: Clean previous build
run: |
rm -rf build
rm -f ollama
- name: Configure CMake
run: |
CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake -B build
env:
CMAKE_BUILD_TYPE: Release
- name: Build C++/CUDA components
run: |
CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake --build build --config Release
timeout-minutes: 30
- name: Build Go binary
run: |
go build -v -o ollama .
- name: Verify binary
run: |
ls -lh ollama
file ollama
./ollama --version
- name: Run Go unit tests
run: |
go test -v -race -timeout 10m ./...
continue-on-error: false
- name: Start ollama server (background)
run: |
./ollama serve > ollama.log 2>&1 &
echo $! > ollama.pid
echo "Ollama server started with PID $(cat ollama.pid)"
- name: Wait for server to be ready
run: |
for i in {1..30}; do
if curl -s http://localhost:11434/api/tags > /dev/null 2>&1; then
echo "Server is ready!"
exit 0
fi
echo "Waiting for server... attempt $i/30"
sleep 2
done
echo "Server failed to start"
cat ollama.log
exit 1
- name: Run integration tests
run: |
go test -v -timeout 20m ./integration/...
continue-on-error: false
- name: Clear server logs for model test
run: |
# Truncate log file to start fresh for model loading test
> ollama.log
- name: Pull gemma3:4b model
run: |
echo "Pulling gemma3:4b model..."
./ollama pull gemma2:2b
echo "Model pull completed"
timeout-minutes: 15
- name: Run inference with gemma3:4b
run: |
echo "Running inference test..."
./ollama run gemma2:2b "Hello, this is a test. Please respond with a short greeting." --verbose
echo "Inference completed"
timeout-minutes: 5
- name: Wait for logs to flush
run: sleep 3
- name: Analyze server logs with Claude
run: |
echo "Analyzing ollama server logs for proper model loading..."
# Create analysis prompt
cat > log_analysis_prompt.txt << 'EOF'
Analyze the following Ollama server logs from a Tesla K80 (CUDA Compute Capability 3.7) system.
Verify that:
1. The model loaded successfully without errors
2. CUDA/GPU acceleration was properly detected and initialized
3. The model is using the Tesla K80 GPU (not CPU fallback)
4. There are no CUDA compatibility warnings or errors
5. Memory allocation was successful
6. Inference completed without errors
Respond with:
- "PASS" if all checks pass and model loaded properly with GPU acceleration
- "FAIL: <reason>" if there are critical issues
- "WARN: <reason>" if there are warnings but model works
Be specific about what succeeded or failed. Look for CUDA errors, memory issues, or CPU fallback warnings.
Server logs:
---
EOF
cat ollama.log >> log_analysis_prompt.txt
# Run Claude in headless mode to analyze
claude -p log_analysis_prompt.txt > log_analysis_result.txt
echo "=== Claude Analysis Result ==="
cat log_analysis_result.txt
# Check if analysis passed
if grep -q "^PASS" log_analysis_result.txt; then
echo "✓ Log analysis PASSED - Model loaded correctly on Tesla K80"
exit 0
elif grep -q "^WARN" log_analysis_result.txt; then
echo "⚠ Log analysis has WARNINGS - Review needed"
cat log_analysis_result.txt
exit 0 # Don't fail on warnings, but they're visible
else
echo "✗ Log analysis FAILED - Model loading issues detected"
cat log_analysis_result.txt
exit 1
fi
- name: Check GPU memory usage
if: always()
run: |
echo "=== GPU Memory Status ==="
nvidia-smi --query-gpu=memory.used,memory.total --format=csv
- name: Stop ollama server
if: always()
run: |
if [ -f ollama.pid ]; then
kill $(cat ollama.pid) || true
rm ollama.pid
fi
pkill -f "ollama serve" || true
- name: Upload logs and analysis
if: always()
uses: actions/upload-artifact@v4
with:
name: ollama-logs-and-analysis
path: |
ollama.log
log_analysis_prompt.txt
log_analysis_result.txt
build/**/*.log
retention-days: 7
- name: Upload binary artifact
if: success()
uses: actions/upload-artifact@v4
with:
name: ollama-binary-${{ github.sha }}
path: ollama
retention-days: 14