mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-20 12:47:00 +00:00

Files

Shang Chieh Tseng 2c5094db92 Add LogCollector for precise test log boundaries

Problem: Tests used `docker compose logs --since=5m` which caused:
- Log overlap between tests
- Logs from previous tests included
- Missing logs if test exceeded 5 minutes

Solution:
- New LogCollector class runs `docker compose logs --follow`
- Marks test start/end boundaries
- Writes test-specific logs to /tmp/test-{testId}-logs.txt
- Test steps access via TEST_ID environment variable

Changes:
- tests/src/log-collector.ts: New LogCollector class
- tests/src/executor.ts: Integrate LogCollector, set TEST_ID env
- tests/src/cli.ts: Start/stop LogCollector for runtime/inference
- All test cases: Use log collector with fallback to docker compose

Also updated docs/CICD.md with:
- Test runner CLI documentation
- Judge modes (simple, llm, dual)
- Log collector integration
- Updated test case list (12b, 27b models)
- Model unload strategy
- Troubleshooting guide

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-17 17:46:49 +08:00

12 KiB

Raw Blame History

CI/CD Pipeline for Ollama37

This document describes the CI/CD pipeline for building and testing Ollama37 with Tesla K80 (CUDA compute capability 3.7) support.

Infrastructure Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                              GITHUB                                      │
│                     dogkeeper886/ollama37                                │
│                                                                         │
│  Push to main ──────────────────────────────────────────────────────┐   │
└─────────────────────────────────────────────────────────────────────│───┘
                                                                      │
                                                                      ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         CI/CD NODE                                       │
│                                                                         │
│  Hardware:                                                              │
│    - Tesla K80 GPU (compute capability 3.7)                            │
│    - NVIDIA Driver 470.x                                               │
│                                                                         │
│  Software:                                                              │
│    - Rocky Linux 9.7                                                   │
│    - Docker 29.1.3 + Docker Compose 5.0.0                              │
│    - NVIDIA Container Toolkit                                          │
│    - GitHub Actions Runner (self-hosted)                               │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Test Framework

Test Runner CLI

The test runner is located in tests/src/ and provides a CLI tool:

cd tests
npm run dev -- run [options]

Commands:

run - Execute test cases
list - List all available test cases

Options:

Option	Default	Description
`-s, --suite <suite>`	all	Filter by suite (build, runtime, inference)
`-i, --id <id>`	-	Run specific test by ID
`-w, --workers <n>`	1	Parallel worker count
`-d, --dry-run`	false	Preview without executing
`-o, --output <format>`	console	Output format: console, json, junit
`--no-llm`	false	Skip LLM, use simple exit code check only
`--judge-model <model>`	gemma3:12b	Model for LLM judging
`--dual-judge`	true	Run both simple and LLM judge
`--ollama-url <url>`	localhost:11434	Test subject server
`--judge-url <url>`	localhost:11435	Separate judge instance

Judge Modes

The test framework supports three judge modes:

Mode	Flag	Description
Simple	`--no-llm`	Exit code checking only (exit 0 = pass)
LLM	`--judge-model`	Semantic analysis of test logs using LLM
Dual	`--dual-judge`	Both must pass (default)

LLM Judge:

Analyzes test execution logs semantically
Detects hidden issues (e.g., CUDA errors with exit 0)
Uses configurable model (default: gemma3:12b)
Batches tests for efficient judging

Simple Judge:

Fast, deterministic
Checks exit codes only
Fallback when LLM unavailable

Log Collector

The test framework includes a log collector that solves log overlap issues:

Problem: docker compose logs --since=5m can include logs from previous tests or miss logs if a test exceeds 5 minutes.

Solution: LogCollector class that:

Runs docker compose logs --follow as background process
Marks test start/end boundaries
Writes test-specific logs to /tmp/test-{testId}-logs.txt
Provides precise logs for each test

Test steps access logs via:

LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)

GitHub Workflows

Located in .github/workflows/:

Workflow	Purpose
`build.yml`	Docker image build verification
`runtime.yml`	Container startup and GPU detection
`inference.yml`	Model inference tests (4b, 12b, 27b)
`full-pipeline.yml`	Orchestrates all stages sequentially

Workflow Inputs

Parameter	Default	Options	Description
`judge_mode`	dual	simple, llm, dual	Judge strategy
`judge_model`	gemma3:12b	Any model	LLM for evaluation
`use_existing_container`	false	true, false	Reuse running container
`keep_container`	false	true, false	Leave container running

Example: Run Inference Tests

# Manual trigger via GitHub Actions UI
# Or via gh CLI:
gh workflow run inference.yml \
  -f judge_mode=dual \
  -f judge_model=gemma3:12b

Test Suites

Build Suite (3 tests)

ID	Name	Timeout	Description
TC-BUILD-001	Builder Image Verification	2m	Verify builder image exists
TC-BUILD-002	Runtime Image Build	30m	Build runtime image
TC-BUILD-003	Image Size Validation	30s	Check image sizes

Runtime Suite (3 tests)

ID	Name	Timeout	Description
TC-RUNTIME-001	Container Startup	2m	Start container with GPU
TC-RUNTIME-002	GPU Detection	2m	Verify K80 detected
TC-RUNTIME-003	Health Check	3m	API health verification

Inference Suite (5 tests)

ID	Name	Model	Timeout	Description
TC-INFERENCE-001	Model Pull	gemma3:4b	10m	Pull and warmup 4b model
TC-INFERENCE-002	Basic Inference	gemma3:4b	3m	Simple prompt test
TC-INFERENCE-003	API Endpoint Test	gemma3:4b	2m	REST API verification
TC-INFERENCE-004	Medium Model	gemma3:12b	10m	12b inference (single GPU)
TC-INFERENCE-005	Large Model Dual-GPU	gemma3:27b	15m	27b inference (dual GPU)

Model Unload Strategy

Each model size test unloads its model after completion:

4b tests (001-003) → unload 4b
12b test (004) → unload 12b
27b test (005) → unload 27b

Workflow-level cleanup (if: always()) provides safety fallback.

Test Case Structure

Test cases are YAML files in tests/testcases/{suite}/:

id: TC-INFERENCE-002
name: Basic Inference
suite: inference
priority: 2
timeout: 180000

dependencies:
  - TC-INFERENCE-001

steps:
  - name: Run simple math question
    command: docker exec ollama37 ollama run gemma3:4b "What is 2+2?"
    timeout: 120000

  - name: Check for errors in logs
    command: |
      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs --since=5m 2>&1)
      fi
      # Check for CUDA errors...

criteria: |
  Expected:
  - Model responds with "4" or equivalent
  - NO CUBLAS_STATUS_ errors
  - NO CUDA errors

Build System

Docker Images

Builder Image: ollama37-builder:latest (~15GB)

Rocky Linux 8
CUDA 11.4 toolkit
GCC 10, CMake 4.0, Go 1.25.3
Build time: ~90 minutes (cached)

Runtime Image: ollama37:latest (~18GB)

Built from GitHub source
Build time: ~10 minutes

Build Commands

cd docker

# Build base image (first time only)
make build-builder

# Build runtime from GitHub
make build-runtime

# Build without cache
make build-runtime-no-cache

# Build from local source
make build-runtime-local

Running Tests Locally

Prerequisites

Docker with NVIDIA runtime
Node.js 20+
Tesla K80 GPU (or compatible)

Quick Start

# Start the container
cd docker && docker compose up -d

# Install test runner
cd tests && npm ci

# Run all tests with dual judge
npm run dev -- run --dual-judge

# Run specific suite
npm run dev -- run --suite inference

# Run single test
npm run dev -- run --id TC-INFERENCE-002

# Simple mode (no LLM)
npm run dev -- run --no-llm

# JSON output
npm run dev -- run -o json > results.json

Test Output

Results are saved to /tmp/:

/tmp/build-results.json
/tmp/runtime-results.json
/tmp/inference-results.json

JSON structure:

{
  "summary": {
    "total": 5,
    "passed": 5,
    "failed": 0,
    "timestamp": "2025-12-17T...",
    "simple": { "passed": 5, "failed": 0 },
    "llm": { "passed": 5, "failed": 0 }
  },
  "results": [...]
}

Environment Variables

Build Environment

Variable	Value	Description
`BUILDER_IMAGE`	ollama37-builder	Builder image name
`RUNTIME_IMAGE`	ollama37	Runtime image name

Runtime Environment

Variable	Value	Description
`OLLAMA_HOST`	0.0.0.0:11434	Server listen address
`NVIDIA_VISIBLE_DEVICES`	all	GPU visibility
`OLLAMA_DEBUG`	1 (optional)	Enable debug logging
`GGML_CUDA_DEBUG`	1 (optional)	Enable CUDA debug

Test Environment

Variable	Description
`TEST_ID`	Current test ID (set by executor)
`OLLAMA_HOST`	Test subject URL

Troubleshooting

GPU Not Detected in Container

# Check UVM device files
ls -l /dev/nvidia-uvm*

# Create if missing
nvidia-modprobe -u -c=0

# Restart container
docker compose restart

LLM Judge Timeout

# Use simple mode
npm run dev -- run --no-llm

# Or increase judge model size
npm run dev -- run --judge-model gemma3:4b

Log Collector Issues

If test step can't find logs:

# Check log file exists
ls -l /tmp/test-*-logs.txt

# Fallback to direct logs
docker compose logs --since=5m

Build Failures

# Clean build
cd docker && make build-runtime-no-cache

# Check builder image
docker images | grep ollama37-builder

Error Patterns

The test framework checks for these critical errors:

Pattern	Severity	Description
`CUBLAS_STATUS_*`	Critical	CUDA/cuBLAS error (K80-specific)
`CUDA error`	Critical	General CUDA failure
`cudaMalloc failed`	Critical	GPU memory allocation failure
`out of memory`	Critical	VRAM exhausted
`level=ERROR`	Warning	Ollama application error
`panic`, `fatal`	Critical	Runtime crash
`id=cpu library=cpu`	Critical	CPU-only fallback (GPU not detected)

File Structure

tests/
├── src/
│   ├── cli.ts           # CLI entry point
│   ├── executor.ts      # Test execution engine
│   ├── judge.ts         # LLM/simple judging
│   ├── loader.ts        # YAML test case parser
│   ├── log-collector.ts # Docker log collector
│   ├── reporter.ts      # Output formatters
│   └── types.ts         # Type definitions
├── testcases/
│   ├── build/           # Build test cases
│   ├── runtime/         # Runtime test cases
│   └── inference/       # Inference test cases
└── package.json

.github/workflows/
├── build.yml            # Build verification
├── runtime.yml          # Container/GPU tests
├── inference.yml        # Model inference tests
└── full-pipeline.yml    # Complete pipeline

12 KiB Raw Blame History