mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-10 15:57:04 +00:00

Files

Shang Chieh Tseng 92acf0f91e Add GitHub Actions workflow for Tesla K80 CI/CD

- Tesla K80 build and test workflow with self-hosted runner
- Build using GCC 10 and CUDA 11.4 for Compute Capability 3.7
- Run unit tests, integration tests, and model inference tests
- Test gemma2:2b model loading and GPU acceleration
- Use Claude headless mode to analyze server logs and verify proper GPU initialization
- Upload logs, analysis results, and binary artifacts
- Comprehensive documentation in workflows README

2025-10-28 18:09:49 +08:00

README.md

Add GitHub Actions workflow for Tesla K80 CI/CD

2025-10-28 18:09:49 +08:00

tesla-k80-ci.yml

Add GitHub Actions workflow for Tesla K80 CI/CD

2025-10-28 18:09:49 +08:00

README.md

GitHub Actions Workflows

Tesla K80 CI Workflow

The tesla-k80-ci.yml workflow builds and tests ollama with CUDA Compute Capability 3.7 support using a self-hosted runner.

Prerequisites

Self-Hosted Runner Setup

Install GitHub Actions Runner on your Tesla K80 machine:

# Navigate to your repository on GitHub:
# Settings > Actions > Runners > New self-hosted runner

# Follow the provided instructions to download and configure the runner
mkdir -p ~/actions-runner && cd ~/actions-runner
curl -o actions-runner-linux-x64-2.XXX.X.tar.gz -L \
  https://github.com/actions/runner/releases/download/vX.XXX.X/actions-runner-linux-x64-2.XXX.X.tar.gz
tar xzf ./actions-runner-linux-x64-2.XXX.X.tar.gz

# Configure (use token from GitHub)
./config.sh --url https://github.com/YOUR_USERNAME/ollama37 --token YOUR_TOKEN

# Install and start as a service
sudo ./svc.sh install
sudo ./svc.sh start

Verify runner environment has:
- CUDA 11.4+ toolkit installed
- GCC 10 at /usr/local/bin/gcc and /usr/local/bin/g++
- CMake 3.24+
- Go 1.24+ (or let the workflow install it)
- NVIDIA driver with Tesla K80 support
- Network access to download Go dependencies and models
- Claude CLI installed and configured (claude -p must be available)
  - Install: Follow instructions at https://docs.claude.com/en/docs/claude-code/installation
  - The runner needs API access to use Claude for log analysis
Optional: Add runner labels:
- You can add custom labels like tesla-k80, cuda, gpu during runner configuration
- Then target specific runners by uncommenting the labeled runs-on line in the workflow

Environment Variables (Optional)

You can set repository secrets or environment variables for:

OLLAMA_DEBUG=1 - Enable debug logging
OLLAMA_MODELS - Custom model storage path
Any other ollama configuration

Workflow Triggers

The workflow runs on:

Push to main or develop branches
Pull requests to main branch
Manual dispatch via GitHub Actions UI

Workflow Steps

Environment Setup: Checkout code, install Go, display system info
Build: Clean previous builds, configure CMake with GCC 10, build C++/CUDA components and Go binary
Unit Tests: Run Go unit tests with race detector
Integration Tests: Start ollama server, wait for ready, run integration tests
Model Tests: Pull gemma2:2b, run inference, verify GPU acceleration
Log Analysis: Use Claude headless mode to validate model loaded properly with Tesla K80
Cleanup: Stop server, upload logs/artifacts

Artifacts

ollama-logs-and-analysis (always): Server logs, Claude analysis prompt, and analysis result
ollama-binary-{sha} (on success): Compiled ollama binary for the commit

Log Analysis with Claude

The workflow uses Claude in headless mode (claude -p) to intelligently analyze ollama server logs and verify proper Tesla K80 GPU initialization. This provides automated validation that:

Model Loading: Gemma2:2b loaded without errors
GPU Acceleration: CUDA properly detected and initialized for Compute 3.7
No CPU Fallback: Model is running on GPU, not falling back to CPU
No Compatibility Issues: No CUDA version warnings or errors
Memory Allocation: Successful GPU memory allocation
Inference Success: Model inference completed without errors

Analysis Results:

PASS: All checks passed, model working correctly with GPU
WARN: <reason>: Model works but has warnings worth reviewing
FAIL: <reason>: Critical issues detected, workflow fails

This approach is superior to simple grep/pattern matching because Claude can:

Understand context and correlate multiple log entries
Distinguish between critical errors and benign warnings
Identify subtle issues like silent CPU fallback
Provide human-readable explanations of problems

Example: If logs show "CUDA initialization successful" but later "using CPU backend", Claude will catch this inconsistency and fail the test, while simple pattern matching might miss it.

Customization

Testing different models

Uncomment and expand the "Test model operations" step:

- name: Test model operations
  run: |
    ./ollama pull llama3.2:1b
    ./ollama run llama3.2:1b "test prompt" --verbose
    nvidia-smi  # Verify GPU was used

Running on specific branches

Modify the on section:

on:
  push:
    branches: [ main, develop, feature/* ]

Scheduled runs

Add cron schedule for nightly builds:

on:
  schedule:
    - cron: '0 2 * * *'  # 2 AM daily

Troubleshooting

Runner offline: Check runner service status

sudo systemctl status actions.runner.*

Build failures: Check uploaded logs in Actions > workflow run > Artifacts

GPU not detected: Verify nvidia-smi works on the runner machine

Permissions: Ensure runner user has access to CUDA libraries and can bind to port 11434

Security Considerations

Self-hosted runners should be on a secure, isolated machine
Consider using runner groups to restrict which repositories can use the runner
Do not use self-hosted runners for public repositories (untrusted PRs)
Keep the runner software updated