mirror of
https://github.com/dogkeeper886/ollama37.git
synced 2025-12-10 15:57:04 +00:00
Add GitHub Actions workflow for Tesla K80 CI/CD
- Tesla K80 build and test workflow with self-hosted runner - Build using GCC 10 and CUDA 11.4 for Compute Capability 3.7 - Run unit tests, integration tests, and model inference tests - Test gemma2:2b model loading and GPU acceleration - Use Claude headless mode to analyze server logs and verify proper GPU initialization - Upload logs, analysis results, and binary artifacts - Comprehensive documentation in workflows README
This commit is contained in:
150
.github/workflows/README.md
vendored
Normal file
150
.github/workflows/README.md
vendored
Normal file
@@ -0,0 +1,150 @@
|
|||||||
|
# GitHub Actions Workflows
|
||||||
|
|
||||||
|
## Tesla K80 CI Workflow
|
||||||
|
|
||||||
|
The `tesla-k80-ci.yml` workflow builds and tests ollama with CUDA Compute Capability 3.7 support using a self-hosted runner.
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
#### Self-Hosted Runner Setup
|
||||||
|
|
||||||
|
1. **Install GitHub Actions Runner on your Tesla K80 machine**:
|
||||||
|
```bash
|
||||||
|
# Navigate to your repository on GitHub:
|
||||||
|
# Settings > Actions > Runners > New self-hosted runner
|
||||||
|
|
||||||
|
# Follow the provided instructions to download and configure the runner
|
||||||
|
mkdir -p ~/actions-runner && cd ~/actions-runner
|
||||||
|
curl -o actions-runner-linux-x64-2.XXX.X.tar.gz -L \
|
||||||
|
https://github.com/actions/runner/releases/download/vX.XXX.X/actions-runner-linux-x64-2.XXX.X.tar.gz
|
||||||
|
tar xzf ./actions-runner-linux-x64-2.XXX.X.tar.gz
|
||||||
|
|
||||||
|
# Configure (use token from GitHub)
|
||||||
|
./config.sh --url https://github.com/YOUR_USERNAME/ollama37 --token YOUR_TOKEN
|
||||||
|
|
||||||
|
# Install and start as a service
|
||||||
|
sudo ./svc.sh install
|
||||||
|
sudo ./svc.sh start
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Verify runner environment has**:
|
||||||
|
- CUDA 11.4+ toolkit installed
|
||||||
|
- GCC 10 at `/usr/local/bin/gcc` and `/usr/local/bin/g++`
|
||||||
|
- CMake 3.24+
|
||||||
|
- Go 1.24+ (or let the workflow install it)
|
||||||
|
- NVIDIA driver with Tesla K80 support
|
||||||
|
- Network access to download Go dependencies and models
|
||||||
|
- **Claude CLI** installed and configured (`claude -p` must be available)
|
||||||
|
- Install: Follow instructions at https://docs.claude.com/en/docs/claude-code/installation
|
||||||
|
- The runner needs API access to use Claude for log analysis
|
||||||
|
|
||||||
|
3. **Optional: Add runner labels**:
|
||||||
|
- You can add custom labels like `tesla-k80`, `cuda`, `gpu` during runner configuration
|
||||||
|
- Then target specific runners by uncommenting the labeled `runs-on` line in the workflow
|
||||||
|
|
||||||
|
#### Environment Variables (Optional)
|
||||||
|
|
||||||
|
You can set repository secrets or environment variables for:
|
||||||
|
- `OLLAMA_DEBUG=1` - Enable debug logging
|
||||||
|
- `OLLAMA_MODELS` - Custom model storage path
|
||||||
|
- Any other ollama configuration
|
||||||
|
|
||||||
|
### Workflow Triggers
|
||||||
|
|
||||||
|
The workflow runs on:
|
||||||
|
- **Push** to `main` or `develop` branches
|
||||||
|
- **Pull requests** to `main` branch
|
||||||
|
- **Manual dispatch** via GitHub Actions UI
|
||||||
|
|
||||||
|
### Workflow Steps
|
||||||
|
|
||||||
|
1. **Environment Setup**: Checkout code, install Go, display system info
|
||||||
|
2. **Build**: Clean previous builds, configure CMake with GCC 10, build C++/CUDA components and Go binary
|
||||||
|
3. **Unit Tests**: Run Go unit tests with race detector
|
||||||
|
4. **Integration Tests**: Start ollama server, wait for ready, run integration tests
|
||||||
|
5. **Model Tests**: Pull gemma2:2b, run inference, verify GPU acceleration
|
||||||
|
6. **Log Analysis**: Use Claude headless mode to validate model loaded properly with Tesla K80
|
||||||
|
7. **Cleanup**: Stop server, upload logs/artifacts
|
||||||
|
|
||||||
|
### Artifacts
|
||||||
|
|
||||||
|
- **ollama-logs-and-analysis** (always): Server logs, Claude analysis prompt, and analysis result
|
||||||
|
- **ollama-binary-{sha}** (on success): Compiled ollama binary for the commit
|
||||||
|
|
||||||
|
### Log Analysis with Claude
|
||||||
|
|
||||||
|
The workflow uses Claude in headless mode (`claude -p`) to intelligently analyze ollama server logs and verify proper Tesla K80 GPU initialization. This provides automated validation that:
|
||||||
|
|
||||||
|
1. **Model Loading**: Gemma2:2b loaded without errors
|
||||||
|
2. **GPU Acceleration**: CUDA properly detected and initialized for Compute 3.7
|
||||||
|
3. **No CPU Fallback**: Model is running on GPU, not falling back to CPU
|
||||||
|
4. **No Compatibility Issues**: No CUDA version warnings or errors
|
||||||
|
5. **Memory Allocation**: Successful GPU memory allocation
|
||||||
|
6. **Inference Success**: Model inference completed without errors
|
||||||
|
|
||||||
|
**Analysis Results**:
|
||||||
|
- `PASS`: All checks passed, model working correctly with GPU
|
||||||
|
- `WARN: <reason>`: Model works but has warnings worth reviewing
|
||||||
|
- `FAIL: <reason>`: Critical issues detected, workflow fails
|
||||||
|
|
||||||
|
This approach is superior to simple grep/pattern matching because Claude can:
|
||||||
|
- Understand context and correlate multiple log entries
|
||||||
|
- Distinguish between critical errors and benign warnings
|
||||||
|
- Identify subtle issues like silent CPU fallback
|
||||||
|
- Provide human-readable explanations of problems
|
||||||
|
|
||||||
|
**Example**: If logs show "CUDA initialization successful" but later "using CPU backend", Claude will catch this inconsistency and fail the test, while simple pattern matching might miss it.
|
||||||
|
|
||||||
|
### Customization
|
||||||
|
|
||||||
|
#### Testing different models
|
||||||
|
|
||||||
|
Uncomment and expand the "Test model operations" step:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- name: Test model operations
|
||||||
|
run: |
|
||||||
|
./ollama pull llama3.2:1b
|
||||||
|
./ollama run llama3.2:1b "test prompt" --verbose
|
||||||
|
nvidia-smi # Verify GPU was used
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Running on specific branches
|
||||||
|
|
||||||
|
Modify the `on` section:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [ main, develop, feature/* ]
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Scheduled runs
|
||||||
|
|
||||||
|
Add cron schedule for nightly builds:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
on:
|
||||||
|
schedule:
|
||||||
|
- cron: '0 2 * * *' # 2 AM daily
|
||||||
|
```
|
||||||
|
|
||||||
|
### Troubleshooting
|
||||||
|
|
||||||
|
**Runner offline**: Check runner service status
|
||||||
|
```bash
|
||||||
|
sudo systemctl status actions.runner.*
|
||||||
|
```
|
||||||
|
|
||||||
|
**Build failures**: Check uploaded logs in Actions > workflow run > Artifacts
|
||||||
|
|
||||||
|
**GPU not detected**: Verify `nvidia-smi` works on the runner machine
|
||||||
|
|
||||||
|
**Permissions**: Ensure runner user has access to CUDA libraries and can bind to port 11434
|
||||||
|
|
||||||
|
### Security Considerations
|
||||||
|
|
||||||
|
- Self-hosted runners should be on a secure, isolated machine
|
||||||
|
- Consider using runner groups to restrict which repositories can use the runner
|
||||||
|
- Do not use self-hosted runners for public repositories (untrusted PRs)
|
||||||
|
- Keep the runner software updated
|
||||||
185
.github/workflows/tesla-k80-ci.yml
vendored
Normal file
185
.github/workflows/tesla-k80-ci.yml
vendored
Normal file
@@ -0,0 +1,185 @@
|
|||||||
|
name: Tesla K80 Build and Test
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [main, develop]
|
||||||
|
pull_request:
|
||||||
|
branches: [main]
|
||||||
|
workflow_dispatch: # Allow manual trigger
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
build-and-test:
|
||||||
|
runs-on: self-hosted
|
||||||
|
|
||||||
|
# Use specific labels if you want to target a particular self-hosted runner
|
||||||
|
# runs-on: [self-hosted, linux, cuda, tesla-k80]
|
||||||
|
|
||||||
|
timeout-minutes: 60 # Prevent hung jobs
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
fetch-depth: 0 # Full history for accurate versioning
|
||||||
|
|
||||||
|
- name: Clean previous build
|
||||||
|
run: |
|
||||||
|
rm -rf build
|
||||||
|
rm -f ollama
|
||||||
|
|
||||||
|
- name: Configure CMake
|
||||||
|
run: |
|
||||||
|
CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake -B build
|
||||||
|
env:
|
||||||
|
CMAKE_BUILD_TYPE: Release
|
||||||
|
|
||||||
|
- name: Build C++/CUDA components
|
||||||
|
run: |
|
||||||
|
CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake --build build --config Release
|
||||||
|
timeout-minutes: 30
|
||||||
|
|
||||||
|
- name: Build Go binary
|
||||||
|
run: |
|
||||||
|
go build -v -o ollama .
|
||||||
|
|
||||||
|
- name: Verify binary
|
||||||
|
run: |
|
||||||
|
ls -lh ollama
|
||||||
|
file ollama
|
||||||
|
./ollama --version
|
||||||
|
|
||||||
|
- name: Run Go unit tests
|
||||||
|
run: |
|
||||||
|
go test -v -race -timeout 10m ./...
|
||||||
|
continue-on-error: false
|
||||||
|
|
||||||
|
- name: Start ollama server (background)
|
||||||
|
run: |
|
||||||
|
./ollama serve > ollama.log 2>&1 &
|
||||||
|
echo $! > ollama.pid
|
||||||
|
echo "Ollama server started with PID $(cat ollama.pid)"
|
||||||
|
|
||||||
|
- name: Wait for server to be ready
|
||||||
|
run: |
|
||||||
|
for i in {1..30}; do
|
||||||
|
if curl -s http://localhost:11434/api/tags > /dev/null 2>&1; then
|
||||||
|
echo "Server is ready!"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
echo "Waiting for server... attempt $i/30"
|
||||||
|
sleep 2
|
||||||
|
done
|
||||||
|
echo "Server failed to start"
|
||||||
|
cat ollama.log
|
||||||
|
exit 1
|
||||||
|
|
||||||
|
- name: Run integration tests
|
||||||
|
run: |
|
||||||
|
go test -v -timeout 20m ./integration/...
|
||||||
|
continue-on-error: false
|
||||||
|
|
||||||
|
- name: Clear server logs for model test
|
||||||
|
run: |
|
||||||
|
# Truncate log file to start fresh for model loading test
|
||||||
|
> ollama.log
|
||||||
|
|
||||||
|
- name: Pull gemma3:4b model
|
||||||
|
run: |
|
||||||
|
echo "Pulling gemma3:4b model..."
|
||||||
|
./ollama pull gemma2:2b
|
||||||
|
echo "Model pull completed"
|
||||||
|
timeout-minutes: 15
|
||||||
|
|
||||||
|
- name: Run inference with gemma3:4b
|
||||||
|
run: |
|
||||||
|
echo "Running inference test..."
|
||||||
|
./ollama run gemma2:2b "Hello, this is a test. Please respond with a short greeting." --verbose
|
||||||
|
echo "Inference completed"
|
||||||
|
timeout-minutes: 5
|
||||||
|
|
||||||
|
- name: Wait for logs to flush
|
||||||
|
run: sleep 3
|
||||||
|
|
||||||
|
- name: Analyze server logs with Claude
|
||||||
|
run: |
|
||||||
|
echo "Analyzing ollama server logs for proper model loading..."
|
||||||
|
|
||||||
|
# Create analysis prompt
|
||||||
|
cat > log_analysis_prompt.txt << 'EOF'
|
||||||
|
Analyze the following Ollama server logs from a Tesla K80 (CUDA Compute Capability 3.7) system.
|
||||||
|
|
||||||
|
Verify that:
|
||||||
|
1. The model loaded successfully without errors
|
||||||
|
2. CUDA/GPU acceleration was properly detected and initialized
|
||||||
|
3. The model is using the Tesla K80 GPU (not CPU fallback)
|
||||||
|
4. There are no CUDA compatibility warnings or errors
|
||||||
|
5. Memory allocation was successful
|
||||||
|
6. Inference completed without errors
|
||||||
|
|
||||||
|
Respond with:
|
||||||
|
- "PASS" if all checks pass and model loaded properly with GPU acceleration
|
||||||
|
- "FAIL: <reason>" if there are critical issues
|
||||||
|
- "WARN: <reason>" if there are warnings but model works
|
||||||
|
|
||||||
|
Be specific about what succeeded or failed. Look for CUDA errors, memory issues, or CPU fallback warnings.
|
||||||
|
|
||||||
|
Server logs:
|
||||||
|
---
|
||||||
|
EOF
|
||||||
|
|
||||||
|
cat ollama.log >> log_analysis_prompt.txt
|
||||||
|
|
||||||
|
# Run Claude in headless mode to analyze
|
||||||
|
claude -p log_analysis_prompt.txt > log_analysis_result.txt
|
||||||
|
|
||||||
|
echo "=== Claude Analysis Result ==="
|
||||||
|
cat log_analysis_result.txt
|
||||||
|
|
||||||
|
# Check if analysis passed
|
||||||
|
if grep -q "^PASS" log_analysis_result.txt; then
|
||||||
|
echo "✓ Log analysis PASSED - Model loaded correctly on Tesla K80"
|
||||||
|
exit 0
|
||||||
|
elif grep -q "^WARN" log_analysis_result.txt; then
|
||||||
|
echo "⚠ Log analysis has WARNINGS - Review needed"
|
||||||
|
cat log_analysis_result.txt
|
||||||
|
exit 0 # Don't fail on warnings, but they're visible
|
||||||
|
else
|
||||||
|
echo "✗ Log analysis FAILED - Model loading issues detected"
|
||||||
|
cat log_analysis_result.txt
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
- name: Check GPU memory usage
|
||||||
|
if: always()
|
||||||
|
run: |
|
||||||
|
echo "=== GPU Memory Status ==="
|
||||||
|
nvidia-smi --query-gpu=memory.used,memory.total --format=csv
|
||||||
|
|
||||||
|
- name: Stop ollama server
|
||||||
|
if: always()
|
||||||
|
run: |
|
||||||
|
if [ -f ollama.pid ]; then
|
||||||
|
kill $(cat ollama.pid) || true
|
||||||
|
rm ollama.pid
|
||||||
|
fi
|
||||||
|
pkill -f "ollama serve" || true
|
||||||
|
|
||||||
|
- name: Upload logs and analysis
|
||||||
|
if: always()
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: ollama-logs-and-analysis
|
||||||
|
path: |
|
||||||
|
ollama.log
|
||||||
|
log_analysis_prompt.txt
|
||||||
|
log_analysis_result.txt
|
||||||
|
build/**/*.log
|
||||||
|
retention-days: 7
|
||||||
|
|
||||||
|
- name: Upload binary artifact
|
||||||
|
if: success()
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: ollama-binary-${{ github.sha }}
|
||||||
|
path: ollama
|
||||||
|
retention-days: 14
|
||||||
Reference in New Issue
Block a user