Add LogCollector for precise test log boundaries

Problem: Tests used `docker compose logs --since=5m` which caused:
- Log overlap between tests
- Logs from previous tests included
- Missing logs if test exceeded 5 minutes

Solution:
- New LogCollector class runs `docker compose logs --follow`
- Marks test start/end boundaries
- Writes test-specific logs to /tmp/test-{testId}-logs.txt
- Test steps access via TEST_ID environment variable

Changes:
- tests/src/log-collector.ts: New LogCollector class
- tests/src/executor.ts: Integrate LogCollector, set TEST_ID env
- tests/src/cli.ts: Start/stop LogCollector for runtime/inference
- All test cases: Use log collector with fallback to docker compose

Also updated docs/CICD.md with:
- Test runner CLI documentation
- Judge modes (simple, llm, dual)
- Log collector integration
- Updated test case list (12b, 27b models)
- Model unload strategy
- Troubleshooting guide

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Shang Chieh Tseng
2025-12-17 17:46:49 +08:00
parent 82ab6cc96e
commit 2c5094db92
12 changed files with 702 additions and 272 deletions

View File

@@ -1,4 +1,4 @@
# CI/CD Plan for Ollama37
# CI/CD Pipeline for Ollama37
This document describes the CI/CD pipeline for building and testing Ollama37 with Tesla K80 (CUDA compute capability 3.7) support.
@@ -24,185 +24,269 @@ This document describes the CI/CD pipeline for building and testing Ollama37 wit
│ - Rocky Linux 9.7 │
│ - Docker 29.1.3 + Docker Compose 5.0.0 │
│ - NVIDIA Container Toolkit │
│ - GitHub Actions Runner (self-hosted, labels: k80, cuda11)
│ │
│ Services: │
│ - TestLink (http://localhost:8090) - Test management │
│ - TestLink MCP - Claude Code integration │
│ │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ SERVE NODE │
│ │
│ Services: │
│ - Ollama (production) │
│ - Dify (LLM application platform) │
│ - GitHub Actions Runner (self-hosted)
│ │
└─────────────────────────────────────────────────────────────────────────┘
```
## Build Strategy: Docker-Based
## Test Framework
We use the two-stage Docker build system located in `/docker/`:
### Test Runner CLI
### Stage 1: Builder Image (Cached)
The test runner is located in `tests/src/` and provides a CLI tool:
**Image:** `ollama37-builder:latest` (~15GB)
```bash
cd tests
npm run dev -- run [options]
```
**Contents:**
**Commands:**
- `run` - Execute test cases
- `list` - List all available test cases
**Options:**
| Option | Default | Description |
|--------|---------|-------------|
| `-s, --suite <suite>` | all | Filter by suite (build, runtime, inference) |
| `-i, --id <id>` | - | Run specific test by ID |
| `-w, --workers <n>` | 1 | Parallel worker count |
| `-d, --dry-run` | false | Preview without executing |
| `-o, --output <format>` | console | Output format: console, json, junit |
| `--no-llm` | false | Skip LLM, use simple exit code check only |
| `--judge-model <model>` | gemma3:12b | Model for LLM judging |
| `--dual-judge` | true | Run both simple and LLM judge |
| `--ollama-url <url>` | localhost:11434 | Test subject server |
| `--judge-url <url>` | localhost:11435 | Separate judge instance |
### Judge Modes
The test framework supports three judge modes:
| Mode | Flag | Description |
|------|------|-------------|
| **Simple** | `--no-llm` | Exit code checking only (exit 0 = pass) |
| **LLM** | `--judge-model` | Semantic analysis of test logs using LLM |
| **Dual** | `--dual-judge` | Both must pass (default) |
**LLM Judge:**
- Analyzes test execution logs semantically
- Detects hidden issues (e.g., CUDA errors with exit 0)
- Uses configurable model (default: gemma3:12b)
- Batches tests for efficient judging
**Simple Judge:**
- Fast, deterministic
- Checks exit codes only
- Fallback when LLM unavailable
### Log Collector
The test framework includes a log collector that solves log overlap issues:
**Problem:** `docker compose logs --since=5m` can include logs from previous tests or miss logs if a test exceeds 5 minutes.
**Solution:** LogCollector class that:
1. Runs `docker compose logs --follow` as background process
2. Marks test start/end boundaries
3. Writes test-specific logs to `/tmp/test-{testId}-logs.txt`
4. Provides precise logs for each test
Test steps access logs via:
```bash
LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
```
## GitHub Workflows
Located in `.github/workflows/`:
| Workflow | Purpose |
|----------|---------|
| `build.yml` | Docker image build verification |
| `runtime.yml` | Container startup and GPU detection |
| `inference.yml` | Model inference tests (4b, 12b, 27b) |
| `full-pipeline.yml` | Orchestrates all stages sequentially |
### Workflow Inputs
| Parameter | Default | Options | Description |
|-----------|---------|---------|-------------|
| `judge_mode` | dual | simple, llm, dual | Judge strategy |
| `judge_model` | gemma3:12b | Any model | LLM for evaluation |
| `use_existing_container` | false | true, false | Reuse running container |
| `keep_container` | false | true, false | Leave container running |
### Example: Run Inference Tests
```bash
# Manual trigger via GitHub Actions UI
# Or via gh CLI:
gh workflow run inference.yml \
-f judge_mode=dual \
-f judge_model=gemma3:12b
```
## Test Suites
### Build Suite (3 tests)
| ID | Name | Timeout | Description |
|----|------|---------|-------------|
| TC-BUILD-001 | Builder Image Verification | 2m | Verify builder image exists |
| TC-BUILD-002 | Runtime Image Build | 30m | Build runtime image |
| TC-BUILD-003 | Image Size Validation | 30s | Check image sizes |
### Runtime Suite (3 tests)
| ID | Name | Timeout | Description |
|----|------|---------|-------------|
| TC-RUNTIME-001 | Container Startup | 2m | Start container with GPU |
| TC-RUNTIME-002 | GPU Detection | 2m | Verify K80 detected |
| TC-RUNTIME-003 | Health Check | 3m | API health verification |
### Inference Suite (5 tests)
| ID | Name | Model | Timeout | Description |
|----|------|-------|---------|-------------|
| TC-INFERENCE-001 | Model Pull | gemma3:4b | 10m | Pull and warmup 4b model |
| TC-INFERENCE-002 | Basic Inference | gemma3:4b | 3m | Simple prompt test |
| TC-INFERENCE-003 | API Endpoint Test | gemma3:4b | 2m | REST API verification |
| TC-INFERENCE-004 | Medium Model | gemma3:12b | 10m | 12b inference (single GPU) |
| TC-INFERENCE-005 | Large Model Dual-GPU | gemma3:27b | 15m | 27b inference (dual GPU) |
### Model Unload Strategy
Each model size test unloads its model after completion:
```
4b tests (001-003) → unload 4b
12b test (004) → unload 12b
27b test (005) → unload 27b
```
Workflow-level cleanup (`if: always()`) provides safety fallback.
## Test Case Structure
Test cases are YAML files in `tests/testcases/{suite}/`:
```yaml
id: TC-INFERENCE-002
name: Basic Inference
suite: inference
priority: 2
timeout: 180000
dependencies:
- TC-INFERENCE-001
steps:
- name: Run simple math question
command: docker exec ollama37 ollama run gemma3:4b "What is 2+2?"
timeout: 120000
- name: Check for errors in logs
command: |
if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
else
LOGS=$(cd docker && docker compose logs --since=5m 2>&1)
fi
# Check for CUDA errors...
criteria: |
Expected:
- Model responds with "4" or equivalent
- NO CUBLAS_STATUS_ errors
- NO CUDA errors
```
## Build System
### Docker Images
**Builder Image:** `ollama37-builder:latest` (~15GB)
- Rocky Linux 8
- CUDA 11.4 toolkit
- GCC 10 (built from source)
- CMake 4.0 (built from source)
- Go 1.25.3
- GCC 10, CMake 4.0, Go 1.25.3
- Build time: ~90 minutes (cached)
**Build time:** ~90 minutes (first time only, then cached)
**Runtime Image:** `ollama37:latest` (~18GB)
- Built from GitHub source
- Build time: ~10 minutes
### Build Commands
**Build command:**
```bash
cd docker && make build-builder
cd docker
# Build base image (first time only)
make build-builder
# Build runtime from GitHub
make build-runtime
# Build without cache
make build-runtime-no-cache
# Build from local source
make build-runtime-local
```
### Stage 2: Runtime Image (Per Build)
## Running Tests Locally
**Image:** `ollama37:latest` (~18GB)
### Prerequisites
**Process:**
1. Clone source from GitHub
2. Configure with CMake ("CUDA 11" preset)
3. Build C/C++/CUDA libraries
4. Build Go binary
5. Package runtime environment
1. Docker with NVIDIA runtime
2. Node.js 20+
3. Tesla K80 GPU (or compatible)
**Build time:** ~10 minutes
### Quick Start
**Build command:**
```bash
cd docker && make build-runtime
# Start the container
cd docker && docker compose up -d
# Install test runner
cd tests && npm ci
# Run all tests with dual judge
npm run dev -- run --dual-judge
# Run specific suite
npm run dev -- run --suite inference
# Run single test
npm run dev -- run --id TC-INFERENCE-002
# Simple mode (no LLM)
npm run dev -- run --no-llm
# JSON output
npm run dev -- run -o json > results.json
```
## Pipeline Stages
### Test Output
### Stage 1: Docker Build
Results are saved to `/tmp/`:
- `/tmp/build-results.json`
- `/tmp/runtime-results.json`
- `/tmp/inference-results.json`
**Trigger:** Push to `main` branch
**Steps:**
1. Checkout repository
2. Ensure builder image exists (build if not)
3. Build runtime image: `make build-runtime`
4. Verify image created successfully
**Test Cases:**
- TC-BUILD-001: Builder Image Verification
- TC-BUILD-002: Runtime Image Build
- TC-BUILD-003: Image Size Validation
### Stage 2: Container Startup
**Steps:**
1. Start container with GPU: `docker compose up -d`
2. Wait for health check to pass
3. Verify Ollama server is responding
**Test Cases:**
- TC-RUNTIME-001: Container Startup
- TC-RUNTIME-002: GPU Detection
- TC-RUNTIME-003: Health Check
### Stage 3: Inference Tests
**Steps:**
1. Pull test model (gemma3:4b)
2. Run inference tests
3. Verify CUBLAS legacy fallback
**Test Cases:**
- TC-INFERENCE-001: Model Pull
- TC-INFERENCE-002: Basic Inference
- TC-INFERENCE-003: API Endpoint Test
- TC-INFERENCE-004: CUBLAS Fallback Verification
### Stage 4: Cleanup & Report
**Steps:**
1. Stop container: `docker compose down`
2. Report results to TestLink
3. Clean up resources
## Test Case Design
### Build Tests (Suite: Build Tests)
| ID | Name | Type | Description |
|----|------|------|-------------|
| TC-BUILD-001 | Builder Image Verification | Automated | Verify builder image exists with correct tools |
| TC-BUILD-002 | Runtime Image Build | Automated | Build runtime image from GitHub source |
| TC-BUILD-003 | Image Size Validation | Automated | Verify image sizes are within expected range |
### Runtime Tests (Suite: Runtime Tests)
| ID | Name | Type | Description |
|----|------|------|-------------|
| TC-RUNTIME-001 | Container Startup | Automated | Start container with GPU passthrough |
| TC-RUNTIME-002 | GPU Detection | Automated | Verify Tesla K80 detected inside container |
| TC-RUNTIME-003 | Health Check | Automated | Verify Ollama health check passes |
### Inference Tests (Suite: Inference Tests)
| ID | Name | Type | Description |
|----|------|------|-------------|
| TC-INFERENCE-001 | Model Pull | Automated | Pull gemma3:4b model |
| TC-INFERENCE-002 | Basic Inference | Automated | Run simple prompt and verify response |
| TC-INFERENCE-003 | API Endpoint Test | Automated | Test /api/generate endpoint |
| TC-INFERENCE-004 | CUBLAS Fallback Verification | Automated | Verify legacy CUBLAS functions used |
## GitHub Actions Workflow
**File:** `.github/workflows/build-test.yml`
**Triggers:**
- Push to `main` branch
- Pull request to `main` branch
- Manual trigger (workflow_dispatch)
**Runner:** Self-hosted with labels `[self-hosted, k80, cuda11]`
**Jobs:**
1. `build` - Build Docker runtime image
2. `test` - Run inference tests in container
3. `report` - Report results to TestLink
## TestLink Integration
**URL:** http://localhost:8090
**Project:** ollama37
**Test Suites:**
- Build Tests
- Runtime Tests
- Inference Tests
**Test Plan:** Created per release/sprint
**Builds:** Created per CI run (commit SHA)
**Execution Recording:**
- Each test case result recorded via TestLink API
- Pass/Fail status with notes
- Linked to specific build/commit
## Makefile Targets for CI
| Target | Description | When to Use |
|--------|-------------|-------------|
| `make build-builder` | Build base image | First time setup |
| `make build-runtime` | Build from GitHub | Normal CI builds |
| `make build-runtime-no-cache` | Fresh GitHub clone | When cache is stale |
| `make build-runtime-local` | Build from local | Local testing |
JSON structure:
```json
{
"summary": {
"total": 5,
"passed": 5,
"failed": 0,
"timestamp": "2025-12-17T...",
"simple": { "passed": 5, "failed": 0 },
"llm": { "passed": 5, "failed": 0 }
},
"results": [...]
}
```
## Environment Variables
@@ -222,76 +306,19 @@ cd docker && make build-runtime
| `OLLAMA_DEBUG` | 1 (optional) | Enable debug logging |
| `GGML_CUDA_DEBUG` | 1 (optional) | Enable CUDA debug |
### TestLink Environment
### Test Environment
| Variable | Value | Description |
|----------|-------|-------------|
| `TESTLINK_URL` | http://localhost:8090 | TestLink server URL |
| `TESTLINK_API_KEY` | (configured) | API key for automation |
## Prerequisites
### One-Time Setup on CI/CD Node
1. **Install GitHub Actions Runner:**
```bash
mkdir -p ~/actions-runner && cd ~/actions-runner
curl -o actions-runner-linux-x64-2.321.0.tar.gz -L \
https://github.com/actions/runner/releases/download/v2.321.0/actions-runner-linux-x64-2.321.0.tar.gz
tar xzf ./actions-runner-linux-x64-2.321.0.tar.gz
./config.sh --url https://github.com/dogkeeper886/ollama37 --token YOUR_TOKEN --labels k80,cuda11
sudo ./svc.sh install && sudo ./svc.sh start
```
2. **Build Builder Image (one-time, ~90 min):**
```bash
cd /home/jack/src/ollama37/docker
make build-builder
```
3. **Verify GPU Access in Docker:**
```bash
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
```
4. **Start TestLink:**
```bash
cd /home/jack/src/testlink-code
docker compose up -d
```
## Monitoring & Logs
### View CI/CD Logs
```bash
# GitHub Actions Runner logs
journalctl -u actions.runner.* -f
# Docker build logs
docker compose logs -f
# TestLink logs
cd /home/jack/src/testlink-code && docker compose logs -f
```
### Test Results
- **TestLink Dashboard:** http://localhost:8090
- **GitHub Actions:** https://github.com/dogkeeper886/ollama37/actions
| Variable | Description |
|----------|-------------|
| `TEST_ID` | Current test ID (set by executor) |
| `OLLAMA_HOST` | Test subject URL |
## Troubleshooting
### Builder Image Missing
```bash
cd docker && make build-builder
```
### GPU Not Detected in Container
```bash
# Check UVM device files on host
# Check UVM device files
ls -l /dev/nvidia-uvm*
# Create if missing
@@ -301,18 +328,72 @@ nvidia-modprobe -u -c=0
docker compose restart
```
### Build Cache Stale
### LLM Judge Timeout
```bash
# Use simple mode
npm run dev -- run --no-llm
# Or increase judge model size
npm run dev -- run --judge-model gemma3:4b
```
### Log Collector Issues
If test step can't find logs:
```bash
# Check log file exists
ls -l /tmp/test-*-logs.txt
# Fallback to direct logs
docker compose logs --since=5m
```
### Build Failures
```bash
# Clean build
cd docker && make build-runtime-no-cache
# Check builder image
docker images | grep ollama37-builder
```
### TestLink Connection Failed
## Error Patterns
```bash
# Check TestLink is running
curl http://localhost:8090
The test framework checks for these critical errors:
| Pattern | Severity | Description |
|---------|----------|-------------|
| `CUBLAS_STATUS_*` | Critical | CUDA/cuBLAS error (K80-specific) |
| `CUDA error` | Critical | General CUDA failure |
| `cudaMalloc failed` | Critical | GPU memory allocation failure |
| `out of memory` | Critical | VRAM exhausted |
| `level=ERROR` | Warning | Ollama application error |
| `panic`, `fatal` | Critical | Runtime crash |
| `id=cpu library=cpu` | Critical | CPU-only fallback (GPU not detected) |
## File Structure
# Restart if needed
cd /home/jack/src/testlink-code && docker compose restart
```
tests/
├── src/
│ ├── cli.ts # CLI entry point
│ ├── executor.ts # Test execution engine
│ ├── judge.ts # LLM/simple judging
│ ├── loader.ts # YAML test case parser
│ ├── log-collector.ts # Docker log collector
│ ├── reporter.ts # Output formatters
│ └── types.ts # Type definitions
├── testcases/
│ ├── build/ # Build test cases
│ ├── runtime/ # Runtime test cases
│ └── inference/ # Inference test cases
└── package.json
.github/workflows/
├── build.yml # Build verification
├── runtime.yml # Container/GPU tests
├── inference.yml # Model inference tests
└── full-pipeline.yml # Complete pipeline
```