Problem: Tests used `docker compose logs --since=5m` which caused:
- Log overlap between tests
- Logs from previous tests included
- Missing logs if test exceeded 5 minutes
Solution:
- New LogCollector class runs `docker compose logs --follow`
- Marks test start/end boundaries
- Writes test-specific logs to /tmp/test-{testId}-logs.txt
- Test steps access via TEST_ID environment variable
Changes:
- tests/src/log-collector.ts: New LogCollector class
- tests/src/executor.ts: Integrate LogCollector, set TEST_ID env
- tests/src/cli.ts: Start/stop LogCollector for runtime/inference
- All test cases: Use log collector with fallback to docker compose
Also updated docs/CICD.md with:
- Test runner CLI documentation
- Judge modes (simple, llm, dual)
- Log collector integration
- Updated test case list (12b, 27b models)
- Model unload strategy
- Troubleshooting guide
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
12 KiB
CI/CD Pipeline for Ollama37
This document describes the CI/CD pipeline for building and testing Ollama37 with Tesla K80 (CUDA compute capability 3.7) support.
Infrastructure Overview
┌─────────────────────────────────────────────────────────────────────────┐
│ GITHUB │
│ dogkeeper886/ollama37 │
│ │
│ Push to main ──────────────────────────────────────────────────────┐ │
└─────────────────────────────────────────────────────────────────────│───┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ CI/CD NODE │
│ │
│ Hardware: │
│ - Tesla K80 GPU (compute capability 3.7) │
│ - NVIDIA Driver 470.x │
│ │
│ Software: │
│ - Rocky Linux 9.7 │
│ - Docker 29.1.3 + Docker Compose 5.0.0 │
│ - NVIDIA Container Toolkit │
│ - GitHub Actions Runner (self-hosted) │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Test Framework
Test Runner CLI
The test runner is located in tests/src/ and provides a CLI tool:
cd tests
npm run dev -- run [options]
Commands:
run- Execute test caseslist- List all available test cases
Options:
| Option | Default | Description |
|---|---|---|
-s, --suite <suite> |
all | Filter by suite (build, runtime, inference) |
-i, --id <id> |
- | Run specific test by ID |
-w, --workers <n> |
1 | Parallel worker count |
-d, --dry-run |
false | Preview without executing |
-o, --output <format> |
console | Output format: console, json, junit |
--no-llm |
false | Skip LLM, use simple exit code check only |
--judge-model <model> |
gemma3:12b | Model for LLM judging |
--dual-judge |
true | Run both simple and LLM judge |
--ollama-url <url> |
localhost:11434 | Test subject server |
--judge-url <url> |
localhost:11435 | Separate judge instance |
Judge Modes
The test framework supports three judge modes:
| Mode | Flag | Description |
|---|---|---|
| Simple | --no-llm |
Exit code checking only (exit 0 = pass) |
| LLM | --judge-model |
Semantic analysis of test logs using LLM |
| Dual | --dual-judge |
Both must pass (default) |
LLM Judge:
- Analyzes test execution logs semantically
- Detects hidden issues (e.g., CUDA errors with exit 0)
- Uses configurable model (default: gemma3:12b)
- Batches tests for efficient judging
Simple Judge:
- Fast, deterministic
- Checks exit codes only
- Fallback when LLM unavailable
Log Collector
The test framework includes a log collector that solves log overlap issues:
Problem: docker compose logs --since=5m can include logs from previous tests or miss logs if a test exceeds 5 minutes.
Solution: LogCollector class that:
- Runs
docker compose logs --followas background process - Marks test start/end boundaries
- Writes test-specific logs to
/tmp/test-{testId}-logs.txt - Provides precise logs for each test
Test steps access logs via:
LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
GitHub Workflows
Located in .github/workflows/:
| Workflow | Purpose |
|---|---|
build.yml |
Docker image build verification |
runtime.yml |
Container startup and GPU detection |
inference.yml |
Model inference tests (4b, 12b, 27b) |
full-pipeline.yml |
Orchestrates all stages sequentially |
Workflow Inputs
| Parameter | Default | Options | Description |
|---|---|---|---|
judge_mode |
dual | simple, llm, dual | Judge strategy |
judge_model |
gemma3:12b | Any model | LLM for evaluation |
use_existing_container |
false | true, false | Reuse running container |
keep_container |
false | true, false | Leave container running |
Example: Run Inference Tests
# Manual trigger via GitHub Actions UI
# Or via gh CLI:
gh workflow run inference.yml \
-f judge_mode=dual \
-f judge_model=gemma3:12b
Test Suites
Build Suite (3 tests)
| ID | Name | Timeout | Description |
|---|---|---|---|
| TC-BUILD-001 | Builder Image Verification | 2m | Verify builder image exists |
| TC-BUILD-002 | Runtime Image Build | 30m | Build runtime image |
| TC-BUILD-003 | Image Size Validation | 30s | Check image sizes |
Runtime Suite (3 tests)
| ID | Name | Timeout | Description |
|---|---|---|---|
| TC-RUNTIME-001 | Container Startup | 2m | Start container with GPU |
| TC-RUNTIME-002 | GPU Detection | 2m | Verify K80 detected |
| TC-RUNTIME-003 | Health Check | 3m | API health verification |
Inference Suite (5 tests)
| ID | Name | Model | Timeout | Description |
|---|---|---|---|---|
| TC-INFERENCE-001 | Model Pull | gemma3:4b | 10m | Pull and warmup 4b model |
| TC-INFERENCE-002 | Basic Inference | gemma3:4b | 3m | Simple prompt test |
| TC-INFERENCE-003 | API Endpoint Test | gemma3:4b | 2m | REST API verification |
| TC-INFERENCE-004 | Medium Model | gemma3:12b | 10m | 12b inference (single GPU) |
| TC-INFERENCE-005 | Large Model Dual-GPU | gemma3:27b | 15m | 27b inference (dual GPU) |
Model Unload Strategy
Each model size test unloads its model after completion:
4b tests (001-003) → unload 4b
12b test (004) → unload 12b
27b test (005) → unload 27b
Workflow-level cleanup (if: always()) provides safety fallback.
Test Case Structure
Test cases are YAML files in tests/testcases/{suite}/:
id: TC-INFERENCE-002
name: Basic Inference
suite: inference
priority: 2
timeout: 180000
dependencies:
- TC-INFERENCE-001
steps:
- name: Run simple math question
command: docker exec ollama37 ollama run gemma3:4b "What is 2+2?"
timeout: 120000
- name: Check for errors in logs
command: |
if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
else
LOGS=$(cd docker && docker compose logs --since=5m 2>&1)
fi
# Check for CUDA errors...
criteria: |
Expected:
- Model responds with "4" or equivalent
- NO CUBLAS_STATUS_ errors
- NO CUDA errors
Build System
Docker Images
Builder Image: ollama37-builder:latest (~15GB)
- Rocky Linux 8
- CUDA 11.4 toolkit
- GCC 10, CMake 4.0, Go 1.25.3
- Build time: ~90 minutes (cached)
Runtime Image: ollama37:latest (~18GB)
- Built from GitHub source
- Build time: ~10 minutes
Build Commands
cd docker
# Build base image (first time only)
make build-builder
# Build runtime from GitHub
make build-runtime
# Build without cache
make build-runtime-no-cache
# Build from local source
make build-runtime-local
Running Tests Locally
Prerequisites
- Docker with NVIDIA runtime
- Node.js 20+
- Tesla K80 GPU (or compatible)
Quick Start
# Start the container
cd docker && docker compose up -d
# Install test runner
cd tests && npm ci
# Run all tests with dual judge
npm run dev -- run --dual-judge
# Run specific suite
npm run dev -- run --suite inference
# Run single test
npm run dev -- run --id TC-INFERENCE-002
# Simple mode (no LLM)
npm run dev -- run --no-llm
# JSON output
npm run dev -- run -o json > results.json
Test Output
Results are saved to /tmp/:
/tmp/build-results.json/tmp/runtime-results.json/tmp/inference-results.json
JSON structure:
{
"summary": {
"total": 5,
"passed": 5,
"failed": 0,
"timestamp": "2025-12-17T...",
"simple": { "passed": 5, "failed": 0 },
"llm": { "passed": 5, "failed": 0 }
},
"results": [...]
}
Environment Variables
Build Environment
| Variable | Value | Description |
|---|---|---|
BUILDER_IMAGE |
ollama37-builder | Builder image name |
RUNTIME_IMAGE |
ollama37 | Runtime image name |
Runtime Environment
| Variable | Value | Description |
|---|---|---|
OLLAMA_HOST |
0.0.0.0:11434 | Server listen address |
NVIDIA_VISIBLE_DEVICES |
all | GPU visibility |
OLLAMA_DEBUG |
1 (optional) | Enable debug logging |
GGML_CUDA_DEBUG |
1 (optional) | Enable CUDA debug |
Test Environment
| Variable | Description |
|---|---|
TEST_ID |
Current test ID (set by executor) |
OLLAMA_HOST |
Test subject URL |
Troubleshooting
GPU Not Detected in Container
# Check UVM device files
ls -l /dev/nvidia-uvm*
# Create if missing
nvidia-modprobe -u -c=0
# Restart container
docker compose restart
LLM Judge Timeout
# Use simple mode
npm run dev -- run --no-llm
# Or increase judge model size
npm run dev -- run --judge-model gemma3:4b
Log Collector Issues
If test step can't find logs:
# Check log file exists
ls -l /tmp/test-*-logs.txt
# Fallback to direct logs
docker compose logs --since=5m
Build Failures
# Clean build
cd docker && make build-runtime-no-cache
# Check builder image
docker images | grep ollama37-builder
Error Patterns
The test framework checks for these critical errors:
| Pattern | Severity | Description |
|---|---|---|
CUBLAS_STATUS_* |
Critical | CUDA/cuBLAS error (K80-specific) |
CUDA error |
Critical | General CUDA failure |
cudaMalloc failed |
Critical | GPU memory allocation failure |
out of memory |
Critical | VRAM exhausted |
level=ERROR |
Warning | Ollama application error |
panic, fatal |
Critical | Runtime crash |
id=cpu library=cpu |
Critical | CPU-only fallback (GPU not detected) |
File Structure
tests/
├── src/
│ ├── cli.ts # CLI entry point
│ ├── executor.ts # Test execution engine
│ ├── judge.ts # LLM/simple judging
│ ├── loader.ts # YAML test case parser
│ ├── log-collector.ts # Docker log collector
│ ├── reporter.ts # Output formatters
│ └── types.ts # Type definitions
├── testcases/
│ ├── build/ # Build test cases
│ ├── runtime/ # Runtime test cases
│ └── inference/ # Inference test cases
└── package.json
.github/workflows/
├── build.yml # Build verification
├── runtime.yml # Container/GPU tests
├── inference.yml # Model inference tests
└── full-pipeline.yml # Complete pipeline