ollama37/docs/CICD.md

# CI/CD Plan for Ollama37

This document describes the CI/CD pipeline for building and testing Ollama37 with Tesla K80 (CUDA compute capability 3.7) support.

## Infrastructure Overview

```
┌─────────────────────────────────────────────────────────────────────────┐
│                              GITHUB                                      │
│                     dogkeeper886/ollama37                                │
│                                                                         │
│  Push to main ──────────────────────────────────────────────────────┐   │
└─────────────────────────────────────────────────────────────────────│───┘
                                                                      │
                                                                      ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         CI/CD NODE                                       │
│                                                                         │
│  Hardware:                                                              │
│    - Tesla K80 GPU (compute capability 3.7)                            │
│    - NVIDIA Driver 470.x                                               │
│                                                                         │
│  Software:                                                              │
│    - Rocky Linux 9.7                                                   │
│    - Docker 29.1.3 + Docker Compose 5.0.0                              │
│    - NVIDIA Container Toolkit                                          │
│    - GitHub Actions Runner (self-hosted, labels: k80, cuda11)          │
│                                                                         │
│  Services:                                                              │
│    - TestLink (http://localhost:8090) - Test management                │
│    - TestLink MCP - Claude Code integration                            │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘
                                                                      │
                                                                      ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         SERVE NODE                                       │
│                                                                         │
│  Services:                                                              │
│    - Ollama (production)                                               │
│    - Dify (LLM application platform)                                   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘
```

## Build Strategy: Docker-Based

We use the two-stage Docker build system located in `/docker/`:

### Stage 1: Builder Image (Cached)

**Image:** `ollama37-builder:latest` (~15GB)

**Contents:**
- Rocky Linux 8
- CUDA 11.4 toolkit
- GCC 10 (built from source)
- CMake 4.0 (built from source)
- Go 1.25.3

**Build time:** ~90 minutes (first time only, then cached)

**Build command:**
```bash
cd docker && make build-builder
```

### Stage 2: Runtime Image (Per Build)

**Image:** `ollama37:latest` (~18GB)

**Process:**
1. Clone source from GitHub
2. Configure with CMake ("CUDA 11" preset)
3. Build C/C++/CUDA libraries
4. Build Go binary
5. Package runtime environment

**Build time:** ~10 minutes

**Build command:**
```bash
cd docker && make build-runtime
```

## Pipeline Stages

### Stage 1: Docker Build

**Trigger:** Push to `main` branch

**Steps:**
1. Checkout repository
2. Ensure builder image exists (build if not)
3. Build runtime image: `make build-runtime`
4. Verify image created successfully

**Test Cases:**
- TC-BUILD-001: Builder Image Verification
- TC-BUILD-002: Runtime Image Build
- TC-BUILD-003: Image Size Validation

### Stage 2: Container Startup

**Steps:**
1. Start container with GPU: `docker compose up -d`
2. Wait for health check to pass
3. Verify Ollama server is responding

**Test Cases:**
- TC-RUNTIME-001: Container Startup
- TC-RUNTIME-002: GPU Detection
- TC-RUNTIME-003: Health Check

### Stage 3: Inference Tests

**Steps:**
1. Pull test model (gemma3:4b)
2. Run inference tests
3. Verify CUBLAS legacy fallback

**Test Cases:**
- TC-INFERENCE-001: Model Pull
- TC-INFERENCE-002: Basic Inference
- TC-INFERENCE-003: API Endpoint Test
- TC-INFERENCE-004: CUBLAS Fallback Verification

### Stage 4: Cleanup & Report

**Steps:**
1. Stop container: `docker compose down`
2. Report results to TestLink
3. Clean up resources

## Test Case Design

### Build Tests (Suite: Build Tests)

| ID | Name | Type | Description |
|----|------|------|-------------|
| TC-BUILD-001 | Builder Image Verification | Automated | Verify builder image exists with correct tools |
| TC-BUILD-002 | Runtime Image Build | Automated | Build runtime image from GitHub source |
| TC-BUILD-003 | Image Size Validation | Automated | Verify image sizes are within expected range |

### Runtime Tests (Suite: Runtime Tests)

| ID | Name | Type | Description |
|----|------|------|-------------|
| TC-RUNTIME-001 | Container Startup | Automated | Start container with GPU passthrough |
| TC-RUNTIME-002 | GPU Detection | Automated | Verify Tesla K80 detected inside container |
| TC-RUNTIME-003 | Health Check | Automated | Verify Ollama health check passes |

### Inference Tests (Suite: Inference Tests)

| ID | Name | Type | Description |
|----|------|------|-------------|
| TC-INFERENCE-001 | Model Pull | Automated | Pull gemma3:4b model |
| TC-INFERENCE-002 | Basic Inference | Automated | Run simple prompt and verify response |
| TC-INFERENCE-003 | API Endpoint Test | Automated | Test /api/generate endpoint |
| TC-INFERENCE-004 | CUBLAS Fallback Verification | Automated | Verify legacy CUBLAS functions used |

## GitHub Actions Workflow

**File:** `.github/workflows/build-test.yml`

**Triggers:**
- Push to `main` branch
- Pull request to `main` branch
- Manual trigger (workflow_dispatch)

**Runner:** Self-hosted with labels `[self-hosted, k80, cuda11]`

**Jobs:**
1. `build` - Build Docker runtime image
2. `test` - Run inference tests in container
3. `report` - Report results to TestLink

## TestLink Integration

**URL:** http://localhost:8090

**Project:** ollama37

**Test Suites:**
- Build Tests
- Runtime Tests
- Inference Tests

**Test Plan:** Created per release/sprint

**Builds:** Created per CI run (commit SHA)

**Execution Recording:**
- Each test case result recorded via TestLink API
- Pass/Fail status with notes
- Linked to specific build/commit

## Makefile Targets for CI

| Target | Description | When to Use |
|--------|-------------|-------------|
| `make build-builder` | Build base image | First time setup |
| `make build-runtime` | Build from GitHub | Normal CI builds |
| `make build-runtime-no-cache` | Fresh GitHub clone | When cache is stale |
| `make build-runtime-local` | Build from local | Local testing |

## Environment Variables

### Build Environment

| Variable | Value | Description |
|----------|-------|-------------|
| `BUILDER_IMAGE` | ollama37-builder | Builder image name |
| `RUNTIME_IMAGE` | ollama37 | Runtime image name |

### Runtime Environment

| Variable | Value | Description |
|----------|-------|-------------|
| `OLLAMA_HOST` | 0.0.0.0:11434 | Server listen address |
| `NVIDIA_VISIBLE_DEVICES` | all | GPU visibility |
| `OLLAMA_DEBUG` | 1 (optional) | Enable debug logging |
| `GGML_CUDA_DEBUG` | 1 (optional) | Enable CUDA debug |

### TestLink Environment

| Variable | Value | Description |
|----------|-------|-------------|
| `TESTLINK_URL` | http://localhost:8090 | TestLink server URL |
| `TESTLINK_API_KEY` | (configured) | API key for automation |

## Prerequisites

### One-Time Setup on CI/CD Node

1. **Install GitHub Actions Runner:**
   ```bash
   mkdir -p ~/actions-runner && cd ~/actions-runner
   curl -o actions-runner-linux-x64-2.321.0.tar.gz -L \
     https://github.com/actions/runner/releases/download/v2.321.0/actions-runner-linux-x64-2.321.0.tar.gz
   tar xzf ./actions-runner-linux-x64-2.321.0.tar.gz
   ./config.sh --url https://github.com/dogkeeper886/ollama37 --token YOUR_TOKEN --labels k80,cuda11
   sudo ./svc.sh install && sudo ./svc.sh start
   ```

2. **Build Builder Image (one-time, ~90 min):**
   ```bash
   cd /home/jack/src/ollama37/docker
   make build-builder
   ```

3. **Verify GPU Access in Docker:**
   ```bash
   docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
   ```

4. **Start TestLink:**
   ```bash
   cd /home/jack/src/testlink-code
   docker compose up -d
   ```

## Monitoring & Logs

### View CI/CD Logs

```bash
# GitHub Actions Runner logs
journalctl -u actions.runner.* -f

# Docker build logs
docker compose logs -f

# TestLink logs
cd /home/jack/src/testlink-code && docker compose logs -f
```

### Test Results

- **TestLink Dashboard:** http://localhost:8090
- **GitHub Actions:** https://github.com/dogkeeper886/ollama37/actions

## Troubleshooting

### Builder Image Missing

```bash
cd docker && make build-builder
```

### GPU Not Detected in Container

```bash
# Check UVM device files on host
ls -l /dev/nvidia-uvm*

# Create if missing
nvidia-modprobe -u -c=0

# Restart container
docker compose restart
```

### Build Cache Stale

```bash
cd docker && make build-runtime-no-cache
```

### TestLink Connection Failed

```bash
# Check TestLink is running
curl http://localhost:8090

# Restart if needed
cd /home/jack/src/testlink-code && docker compose restart
```