Add LogCollector for precise test log boundaries

Problem: Tests used `docker compose logs --since=5m` which caused: - Log overlap between tests - Logs from previous tests included - Missing logs if test exceeded 5 minutes Solution: - New LogCollector class runs `docker compose logs --follow` - Marks test start/end boundaries - Writes test-specific logs to /tmp/test-{testId}-logs.txt - Test steps access via TEST_ID environment variable Changes: - tests/src/log-collector.ts: New LogCollector class - tests/src/executor.ts: Integrate LogCollector, set TEST_ID env - tests/src/cli.ts: Start/stop LogCollector for runtime/inference - All test cases: Use log collector with fallback to docker compose Also updated docs/CICD.md with: - Test runner CLI documentation - Judge modes (simple, llm, dual) - Log collector integration - Updated test case list (12b, 27b models) - Model unload strategy - Troubleshooting guide 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 20:57:01 +00:00 · 2025-12-17 17:46:49 +08:00
parent 82ab6cc96e
commit 2c5094db92
12 changed files with 702 additions and 272 deletions
--- a/docs/CICD.md
+++ b/docs/CICD.md
@@ -1,4 +1,4 @@
-# CI/CD Plan for Ollama37
+# CI/CD Pipeline for Ollama37

 This document describes the CI/CD pipeline for building and testing Ollama37 with Tesla K80 (CUDA compute capability 3.7) support.

@@ -24,185 +24,269 @@ This document describes the CI/CD pipeline for building and testing Ollama37 wit
 │    - Rocky Linux 9.7                                                   │
 │    - Docker 29.1.3 + Docker Compose 5.0.0                              │
 │    - NVIDIA Container Toolkit                                          │
-│    - GitHub Actions Runner (self-hosted, labels: k80, cuda11)          │
-│                                                                         │
-│  Services:                                                              │
-│    - TestLink (http://localhost:8090) - Test management                │
-│    - TestLink MCP - Claude Code integration                            │
-│                                                                         │
-└─────────────────────────────────────────────────────────────────────────┘
-                                                                      │
-                                                                      ▼
-┌─────────────────────────────────────────────────────────────────────────┐
-│                         SERVE NODE                                       │
-│                                                                         │
-│  Services:                                                              │
-│    - Ollama (production)                                               │
-│    - Dify (LLM application platform)                                   │
+│    - GitHub Actions Runner (self-hosted)                               │
 │                                                                         │
 └─────────────────────────────────────────────────────────────────────────┘
 ```

-## Build Strategy: Docker-Based
+## Test Framework

-We use the two-stage Docker build system located in `/docker/`:
+### Test Runner CLI

-### Stage 1: Builder Image (Cached)
+The test runner is located in `tests/src/` and provides a CLI tool:

-**Image:** `ollama37-builder:latest` (~15GB)
+```bash
+cd tests
+npm run dev -- run [options]
+```

-**Contents:**
+**Commands:**
+- `run` - Execute test cases
+- `list` - List all available test cases
+
+**Options:**
+| Option | Default | Description |
+|--------|---------|-------------|
+| `-s, --suite <suite>` | all | Filter by suite (build, runtime, inference) |
+| `-i, --id <id>` | - | Run specific test by ID |
+| `-w, --workers <n>` | 1 | Parallel worker count |
+| `-d, --dry-run` | false | Preview without executing |
+| `-o, --output <format>` | console | Output format: console, json, junit |
+| `--no-llm` | false | Skip LLM, use simple exit code check only |
+| `--judge-model <model>` | gemma3:12b | Model for LLM judging |
+| `--dual-judge` | true | Run both simple and LLM judge |
+| `--ollama-url <url>` | localhost:11434 | Test subject server |
+| `--judge-url <url>` | localhost:11435 | Separate judge instance |
+
+### Judge Modes
+
+The test framework supports three judge modes:
+
+| Mode | Flag | Description |
+|------|------|-------------|
+| **Simple** | `--no-llm` | Exit code checking only (exit 0 = pass) |
+| **LLM** | `--judge-model` | Semantic analysis of test logs using LLM |
+| **Dual** | `--dual-judge` | Both must pass (default) |
+
+**LLM Judge:**
+- Analyzes test execution logs semantically
+- Detects hidden issues (e.g., CUDA errors with exit 0)
+- Uses configurable model (default: gemma3:12b)
+- Batches tests for efficient judging
+
+**Simple Judge:**
+- Fast, deterministic
+- Checks exit codes only
+- Fallback when LLM unavailable
+
+### Log Collector
+
+The test framework includes a log collector that solves log overlap issues:
+
+**Problem:** `docker compose logs --since=5m` can include logs from previous tests or miss logs if a test exceeds 5 minutes.
+
+**Solution:** LogCollector class that:
+1. Runs `docker compose logs --follow` as background process
+2. Marks test start/end boundaries
+3. Writes test-specific logs to `/tmp/test-{testId}-logs.txt`
+4. Provides precise logs for each test
+
+Test steps access logs via:
+```bash
+LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
+```
+
+## GitHub Workflows
+
+Located in `.github/workflows/`:
+
+| Workflow | Purpose |
+|----------|---------|
+| `build.yml` | Docker image build verification |
+| `runtime.yml` | Container startup and GPU detection |
+| `inference.yml` | Model inference tests (4b, 12b, 27b) |
+| `full-pipeline.yml` | Orchestrates all stages sequentially |
+
+### Workflow Inputs
+
+| Parameter | Default | Options | Description |
+|-----------|---------|---------|-------------|
+| `judge_mode` | dual | simple, llm, dual | Judge strategy |
+| `judge_model` | gemma3:12b | Any model | LLM for evaluation |
+| `use_existing_container` | false | true, false | Reuse running container |
+| `keep_container` | false | true, false | Leave container running |
+
+### Example: Run Inference Tests
+
+```bash
+# Manual trigger via GitHub Actions UI
+# Or via gh CLI:
+gh workflow run inference.yml \
+  -f judge_mode=dual \
+  -f judge_model=gemma3:12b
+```
+
+## Test Suites
+
+### Build Suite (3 tests)
+
+| ID | Name | Timeout | Description |
+|----|------|---------|-------------|
+| TC-BUILD-001 | Builder Image Verification | 2m | Verify builder image exists |
+| TC-BUILD-002 | Runtime Image Build | 30m | Build runtime image |
+| TC-BUILD-003 | Image Size Validation | 30s | Check image sizes |
+
+### Runtime Suite (3 tests)
+
+| ID | Name | Timeout | Description |
+|----|------|---------|-------------|
+| TC-RUNTIME-001 | Container Startup | 2m | Start container with GPU |
+| TC-RUNTIME-002 | GPU Detection | 2m | Verify K80 detected |
+| TC-RUNTIME-003 | Health Check | 3m | API health verification |
+
+### Inference Suite (5 tests)
+
+| ID | Name | Model | Timeout | Description |
+|----|------|-------|---------|-------------|
+| TC-INFERENCE-001 | Model Pull | gemma3:4b | 10m | Pull and warmup 4b model |
+| TC-INFERENCE-002 | Basic Inference | gemma3:4b | 3m | Simple prompt test |
+| TC-INFERENCE-003 | API Endpoint Test | gemma3:4b | 2m | REST API verification |
+| TC-INFERENCE-004 | Medium Model | gemma3:12b | 10m | 12b inference (single GPU) |
+| TC-INFERENCE-005 | Large Model Dual-GPU | gemma3:27b | 15m | 27b inference (dual GPU) |
+
+### Model Unload Strategy
+
+Each model size test unloads its model after completion:
+
+```
+4b tests (001-003) → unload 4b
+12b test (004) → unload 12b
+27b test (005) → unload 27b
+```
+
+Workflow-level cleanup (`if: always()`) provides safety fallback.
+
+## Test Case Structure
+
+Test cases are YAML files in `tests/testcases/{suite}/`:
+
+```yaml
+id: TC-INFERENCE-002
+name: Basic Inference
+suite: inference
+priority: 2
+timeout: 180000
+
+dependencies:
+  - TC-INFERENCE-001
+
+steps:
+  - name: Run simple math question
+    command: docker exec ollama37 ollama run gemma3:4b "What is 2+2?"
+    timeout: 120000
+
+  - name: Check for errors in logs
+    command: |
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
+        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
+      else
+        LOGS=$(cd docker && docker compose logs --since=5m 2>&1)
+      fi
+      # Check for CUDA errors...
+
+criteria: |
+  Expected:
+  - Model responds with "4" or equivalent
+  - NO CUBLAS_STATUS_ errors
+  - NO CUDA errors
+```
+
+## Build System
+
+### Docker Images
+
+**Builder Image:** `ollama37-builder:latest` (~15GB)
 - Rocky Linux 8
 - CUDA 11.4 toolkit
- GCC 10 (built from source)
- CMake 4.0 (built from source)
- Go 1.25.3
+- GCC 10, CMake 4.0, Go 1.25.3
+- Build time: ~90 minutes (cached)

-**Build time:** ~90 minutes (first time only, then cached)
+**Runtime Image:** `ollama37:latest` (~18GB)
+- Built from GitHub source
+- Build time: ~10 minutes
+
+### Build Commands

-**Build command:**
 ```bash
-cd docker && make build-builder
+cd docker
+
+# Build base image (first time only)
+make build-builder
+
+# Build runtime from GitHub
+make build-runtime
+
+# Build without cache
+make build-runtime-no-cache
+
+# Build from local source
+make build-runtime-local
 ```

-### Stage 2: Runtime Image (Per Build)
+## Running Tests Locally

-**Image:** `ollama37:latest` (~18GB)
+### Prerequisites

-**Process:**
-1. Clone source from GitHub
-2. Configure with CMake ("CUDA 11" preset)
-3. Build C/C++/CUDA libraries
-4. Build Go binary
-5. Package runtime environment
+1. Docker with NVIDIA runtime
+2. Node.js 20+
+3. Tesla K80 GPU (or compatible)

-**Build time:** ~10 minutes
+### Quick Start

-**Build command:**
 ```bash
-cd docker && make build-runtime
+# Start the container
+cd docker && docker compose up -d
+
+# Install test runner
+cd tests && npm ci
+
+# Run all tests with dual judge
+npm run dev -- run --dual-judge
+
+# Run specific suite
+npm run dev -- run --suite inference
+
+# Run single test
+npm run dev -- run --id TC-INFERENCE-002
+
+# Simple mode (no LLM)
+npm run dev -- run --no-llm
+
+# JSON output
+npm run dev -- run -o json > results.json
 ```

-## Pipeline Stages
+### Test Output

-### Stage 1: Docker Build
+Results are saved to `/tmp/`:
+- `/tmp/build-results.json`
+- `/tmp/runtime-results.json`
+- `/tmp/inference-results.json`

-**Trigger:** Push to `main` branch
-
-**Steps:**
-1. Checkout repository
-2. Ensure builder image exists (build if not)
-3. Build runtime image: `make build-runtime`
-4. Verify image created successfully
-
-**Test Cases:**
- TC-BUILD-001: Builder Image Verification
- TC-BUILD-002: Runtime Image Build
- TC-BUILD-003: Image Size Validation
-
-### Stage 2: Container Startup
-
-**Steps:**
-1. Start container with GPU: `docker compose up -d`
-2. Wait for health check to pass
-3. Verify Ollama server is responding
-
-**Test Cases:**
- TC-RUNTIME-001: Container Startup
- TC-RUNTIME-002: GPU Detection
- TC-RUNTIME-003: Health Check
-
-### Stage 3: Inference Tests
-
-**Steps:**
-1. Pull test model (gemma3:4b)
-2. Run inference tests
-3. Verify CUBLAS legacy fallback
-
-**Test Cases:**
- TC-INFERENCE-001: Model Pull
- TC-INFERENCE-002: Basic Inference
- TC-INFERENCE-003: API Endpoint Test
- TC-INFERENCE-004: CUBLAS Fallback Verification
-
-### Stage 4: Cleanup & Report
-
-**Steps:**
-1. Stop container: `docker compose down`
-2. Report results to TestLink
-3. Clean up resources
-
-## Test Case Design
-
-### Build Tests (Suite: Build Tests)
-
-| ID | Name | Type | Description |
-|----|------|------|-------------|
-| TC-BUILD-001 | Builder Image Verification | Automated | Verify builder image exists with correct tools |
-| TC-BUILD-002 | Runtime Image Build | Automated | Build runtime image from GitHub source |
-| TC-BUILD-003 | Image Size Validation | Automated | Verify image sizes are within expected range |
-
-### Runtime Tests (Suite: Runtime Tests)
-
-| ID | Name | Type | Description |
-|----|------|------|-------------|
-| TC-RUNTIME-001 | Container Startup | Automated | Start container with GPU passthrough |
-| TC-RUNTIME-002 | GPU Detection | Automated | Verify Tesla K80 detected inside container |
-| TC-RUNTIME-003 | Health Check | Automated | Verify Ollama health check passes |
-
-### Inference Tests (Suite: Inference Tests)
-
-| ID | Name | Type | Description |
-|----|------|------|-------------|
-| TC-INFERENCE-001 | Model Pull | Automated | Pull gemma3:4b model |
-| TC-INFERENCE-002 | Basic Inference | Automated | Run simple prompt and verify response |
-| TC-INFERENCE-003 | API Endpoint Test | Automated | Test /api/generate endpoint |
-| TC-INFERENCE-004 | CUBLAS Fallback Verification | Automated | Verify legacy CUBLAS functions used |
-
-## GitHub Actions Workflow
-
-**File:** `.github/workflows/build-test.yml`
-
-**Triggers:**
- Push to `main` branch
- Pull request to `main` branch
- Manual trigger (workflow_dispatch)
-
-**Runner:** Self-hosted with labels `[self-hosted, k80, cuda11]`
-
-**Jobs:**
-1. `build` - Build Docker runtime image
-2. `test` - Run inference tests in container
-3. `report` - Report results to TestLink
-
-## TestLink Integration
-
-**URL:** http://localhost:8090
-
-**Project:** ollama37
-
-**Test Suites:**
- Build Tests
- Runtime Tests
- Inference Tests
-
-**Test Plan:** Created per release/sprint
-
-**Builds:** Created per CI run (commit SHA)
-
-**Execution Recording:**
- Each test case result recorded via TestLink API
- Pass/Fail status with notes
- Linked to specific build/commit
-
-## Makefile Targets for CI
-
-| Target | Description | When to Use |
-|--------|-------------|-------------|
-| `make build-builder` | Build base image | First time setup |
-| `make build-runtime` | Build from GitHub | Normal CI builds |
-| `make build-runtime-no-cache` | Fresh GitHub clone | When cache is stale |
-| `make build-runtime-local` | Build from local | Local testing |
+JSON structure:
+```json
+{
+  "summary": {
+    "total": 5,
+    "passed": 5,
+    "failed": 0,
+    "timestamp": "2025-12-17T...",
+    "simple": { "passed": 5, "failed": 0 },
+    "llm": { "passed": 5, "failed": 0 }
+  },
+  "results": [...]
+}
+```

 ## Environment Variables

@@ -222,76 +306,19 @@ cd docker && make build-runtime
 | `OLLAMA_DEBUG` | 1 (optional) | Enable debug logging |
 | `GGML_CUDA_DEBUG` | 1 (optional) | Enable CUDA debug |

-### TestLink Environment
+### Test Environment

-| Variable | Value | Description |
-|----------|-------|-------------|
-| `TESTLINK_URL` | http://localhost:8090 | TestLink server URL |
-| `TESTLINK_API_KEY` | (configured) | API key for automation |
-
-## Prerequisites
-
-### One-Time Setup on CI/CD Node
-
-1. **Install GitHub Actions Runner:**
-   ```bash
-   mkdir -p ~/actions-runner && cd ~/actions-runner
-   curl -o actions-runner-linux-x64-2.321.0.tar.gz -L \
-     https://github.com/actions/runner/releases/download/v2.321.0/actions-runner-linux-x64-2.321.0.tar.gz
-   tar xzf ./actions-runner-linux-x64-2.321.0.tar.gz
-   ./config.sh --url https://github.com/dogkeeper886/ollama37 --token YOUR_TOKEN --labels k80,cuda11
-   sudo ./svc.sh install && sudo ./svc.sh start
-   ```
-
-2. **Build Builder Image (one-time, ~90 min):**
-   ```bash
-   cd /home/jack/src/ollama37/docker
-   make build-builder
-   ```
-
-3. **Verify GPU Access in Docker:**
-   ```bash
-   docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
-   ```
-
-4. **Start TestLink:**
-   ```bash
-   cd /home/jack/src/testlink-code
-   docker compose up -d
-   ```
-
-## Monitoring & Logs
-
-### View CI/CD Logs
-
-```bash
-# GitHub Actions Runner logs
-journalctl -u actions.runner.* -f
-
-# Docker build logs
-docker compose logs -f
-
-# TestLink logs
-cd /home/jack/src/testlink-code && docker compose logs -f
-```
-
-### Test Results
-
- **TestLink Dashboard:** http://localhost:8090
- **GitHub Actions:** https://github.com/dogkeeper886/ollama37/actions
+| Variable | Description |
+|----------|-------------|
+| `TEST_ID` | Current test ID (set by executor) |
+| `OLLAMA_HOST` | Test subject URL |

 ## Troubleshooting

-### Builder Image Missing
-
-```bash
-cd docker && make build-builder
-```
-
 ### GPU Not Detected in Container

 ```bash
-# Check UVM device files on host
+# Check UVM device files
 ls -l /dev/nvidia-uvm*

 # Create if missing
@@ -301,18 +328,72 @@ nvidia-modprobe -u -c=0
 docker compose restart
 ```

-### Build Cache Stale
+### LLM Judge Timeout

 ```bash
+# Use simple mode
+npm run dev -- run --no-llm
+
+# Or increase judge model size
+npm run dev -- run --judge-model gemma3:4b
+```
+
+### Log Collector Issues
+
+If test step can't find logs:
+```bash
+# Check log file exists
+ls -l /tmp/test-*-logs.txt
+
+# Fallback to direct logs
+docker compose logs --since=5m
+```
+
+### Build Failures
+
+```bash
+# Clean build
 cd docker && make build-runtime-no-cache
+
+# Check builder image
+docker images | grep ollama37-builder
 ```

-### TestLink Connection Failed
+## Error Patterns

-```bash
-# Check TestLink is running
-curl http://localhost:8090
+The test framework checks for these critical errors:
+
+| Pattern | Severity | Description |
+|---------|----------|-------------|
+| `CUBLAS_STATUS_*` | Critical | CUDA/cuBLAS error (K80-specific) |
+| `CUDA error` | Critical | General CUDA failure |
+| `cudaMalloc failed` | Critical | GPU memory allocation failure |
+| `out of memory` | Critical | VRAM exhausted |
+| `level=ERROR` | Warning | Ollama application error |
+| `panic`, `fatal` | Critical | Runtime crash |
+| `id=cpu library=cpu` | Critical | CPU-only fallback (GPU not detected) |
+
+## File Structure

-# Restart if needed
-cd /home/jack/src/testlink-code && docker compose restart
+```
+tests/
+├── src/
+│   ├── cli.ts           # CLI entry point
+│   ├── executor.ts      # Test execution engine
+│   ├── judge.ts         # LLM/simple judging
+│   ├── loader.ts        # YAML test case parser
+│   ├── log-collector.ts # Docker log collector
+│   ├── reporter.ts      # Output formatters
+│   └── types.ts         # Type definitions
+├── testcases/
+│   ├── build/           # Build test cases
+│   ├── runtime/         # Runtime test cases
+│   └── inference/       # Inference test cases
+└── package.json
+
+.github/workflows/
+├── build.yml            # Build verification
+├── runtime.yml          # Container/GPU tests
+├── inference.yml        # Model inference tests
+└── full-pipeline.yml    # Complete pipeline
 ```