From 2c5094db924f2f73d3832fd2ba80043526129440 Mon Sep 17 00:00:00 2001 From: Shang Chieh Tseng Date: Wed, 17 Dec 2025 17:46:49 +0800 Subject: [PATCH] Add LogCollector for precise test log boundaries MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Problem: Tests used `docker compose logs --since=5m` which caused: - Log overlap between tests - Logs from previous tests included - Missing logs if test exceeded 5 minutes Solution: - New LogCollector class runs `docker compose logs --follow` - Marks test start/end boundaries - Writes test-specific logs to /tmp/test-{testId}-logs.txt - Test steps access via TEST_ID environment variable Changes: - tests/src/log-collector.ts: New LogCollector class - tests/src/executor.ts: Integrate LogCollector, set TEST_ID env - tests/src/cli.ts: Start/stop LogCollector for runtime/inference - All test cases: Use log collector with fallback to docker compose Also updated docs/CICD.md with: - Test runner CLI documentation - Judge modes (simple, llm, dual) - Log collector integration - Updated test case list (12b, 27b models) - Model unload strategy - Troubleshooting guide 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- docs/CICD.md | 533 ++++++++++-------- tests/src/cli.ts | 34 +- tests/src/executor.ts | 32 +- tests/src/log-collector.ts | 200 +++++++ .../testcases/inference/TC-INFERENCE-001.yml | 32 +- .../testcases/inference/TC-INFERENCE-002.yml | 24 +- .../testcases/inference/TC-INFERENCE-003.yml | 24 +- .../testcases/inference/TC-INFERENCE-004.yml | 16 +- .../testcases/inference/TC-INFERENCE-005.yml | 16 +- tests/testcases/runtime/TC-RUNTIME-001.yml | 15 +- tests/testcases/runtime/TC-RUNTIME-002.yml | 24 +- tests/testcases/runtime/TC-RUNTIME-003.yml | 24 +- 12 files changed, 702 insertions(+), 272 deletions(-) create mode 100644 tests/src/log-collector.ts diff --git a/docs/CICD.md b/docs/CICD.md index 0c1068c7..07244686 100644 --- a/docs/CICD.md +++ b/docs/CICD.md @@ -1,4 +1,4 @@ -# CI/CD Plan for Ollama37 +# CI/CD Pipeline for Ollama37 This document describes the CI/CD pipeline for building and testing Ollama37 with Tesla K80 (CUDA compute capability 3.7) support. @@ -24,185 +24,269 @@ This document describes the CI/CD pipeline for building and testing Ollama37 wit │ - Rocky Linux 9.7 │ │ - Docker 29.1.3 + Docker Compose 5.0.0 │ │ - NVIDIA Container Toolkit │ -│ - GitHub Actions Runner (self-hosted, labels: k80, cuda11) │ -│ │ -│ Services: │ -│ - TestLink (http://localhost:8090) - Test management │ -│ - TestLink MCP - Claude Code integration │ -│ │ -└─────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────────────────┐ -│ SERVE NODE │ -│ │ -│ Services: │ -│ - Ollama (production) │ -│ - Dify (LLM application platform) │ +│ - GitHub Actions Runner (self-hosted) │ │ │ └─────────────────────────────────────────────────────────────────────────┘ ``` -## Build Strategy: Docker-Based +## Test Framework -We use the two-stage Docker build system located in `/docker/`: +### Test Runner CLI -### Stage 1: Builder Image (Cached) +The test runner is located in `tests/src/` and provides a CLI tool: -**Image:** `ollama37-builder:latest` (~15GB) +```bash +cd tests +npm run dev -- run [options] +``` -**Contents:** +**Commands:** +- `run` - Execute test cases +- `list` - List all available test cases + +**Options:** +| Option | Default | Description | +|--------|---------|-------------| +| `-s, --suite ` | all | Filter by suite (build, runtime, inference) | +| `-i, --id ` | - | Run specific test by ID | +| `-w, --workers ` | 1 | Parallel worker count | +| `-d, --dry-run` | false | Preview without executing | +| `-o, --output ` | console | Output format: console, json, junit | +| `--no-llm` | false | Skip LLM, use simple exit code check only | +| `--judge-model ` | gemma3:12b | Model for LLM judging | +| `--dual-judge` | true | Run both simple and LLM judge | +| `--ollama-url ` | localhost:11434 | Test subject server | +| `--judge-url ` | localhost:11435 | Separate judge instance | + +### Judge Modes + +The test framework supports three judge modes: + +| Mode | Flag | Description | +|------|------|-------------| +| **Simple** | `--no-llm` | Exit code checking only (exit 0 = pass) | +| **LLM** | `--judge-model` | Semantic analysis of test logs using LLM | +| **Dual** | `--dual-judge` | Both must pass (default) | + +**LLM Judge:** +- Analyzes test execution logs semantically +- Detects hidden issues (e.g., CUDA errors with exit 0) +- Uses configurable model (default: gemma3:12b) +- Batches tests for efficient judging + +**Simple Judge:** +- Fast, deterministic +- Checks exit codes only +- Fallback when LLM unavailable + +### Log Collector + +The test framework includes a log collector that solves log overlap issues: + +**Problem:** `docker compose logs --since=5m` can include logs from previous tests or miss logs if a test exceeds 5 minutes. + +**Solution:** LogCollector class that: +1. Runs `docker compose logs --follow` as background process +2. Marks test start/end boundaries +3. Writes test-specific logs to `/tmp/test-{testId}-logs.txt` +4. Provides precise logs for each test + +Test steps access logs via: +```bash +LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) +``` + +## GitHub Workflows + +Located in `.github/workflows/`: + +| Workflow | Purpose | +|----------|---------| +| `build.yml` | Docker image build verification | +| `runtime.yml` | Container startup and GPU detection | +| `inference.yml` | Model inference tests (4b, 12b, 27b) | +| `full-pipeline.yml` | Orchestrates all stages sequentially | + +### Workflow Inputs + +| Parameter | Default | Options | Description | +|-----------|---------|---------|-------------| +| `judge_mode` | dual | simple, llm, dual | Judge strategy | +| `judge_model` | gemma3:12b | Any model | LLM for evaluation | +| `use_existing_container` | false | true, false | Reuse running container | +| `keep_container` | false | true, false | Leave container running | + +### Example: Run Inference Tests + +```bash +# Manual trigger via GitHub Actions UI +# Or via gh CLI: +gh workflow run inference.yml \ + -f judge_mode=dual \ + -f judge_model=gemma3:12b +``` + +## Test Suites + +### Build Suite (3 tests) + +| ID | Name | Timeout | Description | +|----|------|---------|-------------| +| TC-BUILD-001 | Builder Image Verification | 2m | Verify builder image exists | +| TC-BUILD-002 | Runtime Image Build | 30m | Build runtime image | +| TC-BUILD-003 | Image Size Validation | 30s | Check image sizes | + +### Runtime Suite (3 tests) + +| ID | Name | Timeout | Description | +|----|------|---------|-------------| +| TC-RUNTIME-001 | Container Startup | 2m | Start container with GPU | +| TC-RUNTIME-002 | GPU Detection | 2m | Verify K80 detected | +| TC-RUNTIME-003 | Health Check | 3m | API health verification | + +### Inference Suite (5 tests) + +| ID | Name | Model | Timeout | Description | +|----|------|-------|---------|-------------| +| TC-INFERENCE-001 | Model Pull | gemma3:4b | 10m | Pull and warmup 4b model | +| TC-INFERENCE-002 | Basic Inference | gemma3:4b | 3m | Simple prompt test | +| TC-INFERENCE-003 | API Endpoint Test | gemma3:4b | 2m | REST API verification | +| TC-INFERENCE-004 | Medium Model | gemma3:12b | 10m | 12b inference (single GPU) | +| TC-INFERENCE-005 | Large Model Dual-GPU | gemma3:27b | 15m | 27b inference (dual GPU) | + +### Model Unload Strategy + +Each model size test unloads its model after completion: + +``` +4b tests (001-003) → unload 4b +12b test (004) → unload 12b +27b test (005) → unload 27b +``` + +Workflow-level cleanup (`if: always()`) provides safety fallback. + +## Test Case Structure + +Test cases are YAML files in `tests/testcases/{suite}/`: + +```yaml +id: TC-INFERENCE-002 +name: Basic Inference +suite: inference +priority: 2 +timeout: 180000 + +dependencies: + - TC-INFERENCE-001 + +steps: + - name: Run simple math question + command: docker exec ollama37 ollama run gemma3:4b "What is 2+2?" + timeout: 120000 + + - name: Check for errors in logs + command: | + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs --since=5m 2>&1) + fi + # Check for CUDA errors... + +criteria: | + Expected: + - Model responds with "4" or equivalent + - NO CUBLAS_STATUS_ errors + - NO CUDA errors +``` + +## Build System + +### Docker Images + +**Builder Image:** `ollama37-builder:latest` (~15GB) - Rocky Linux 8 - CUDA 11.4 toolkit -- GCC 10 (built from source) -- CMake 4.0 (built from source) -- Go 1.25.3 +- GCC 10, CMake 4.0, Go 1.25.3 +- Build time: ~90 minutes (cached) -**Build time:** ~90 minutes (first time only, then cached) +**Runtime Image:** `ollama37:latest` (~18GB) +- Built from GitHub source +- Build time: ~10 minutes + +### Build Commands -**Build command:** ```bash -cd docker && make build-builder +cd docker + +# Build base image (first time only) +make build-builder + +# Build runtime from GitHub +make build-runtime + +# Build without cache +make build-runtime-no-cache + +# Build from local source +make build-runtime-local ``` -### Stage 2: Runtime Image (Per Build) +## Running Tests Locally -**Image:** `ollama37:latest` (~18GB) +### Prerequisites -**Process:** -1. Clone source from GitHub -2. Configure with CMake ("CUDA 11" preset) -3. Build C/C++/CUDA libraries -4. Build Go binary -5. Package runtime environment +1. Docker with NVIDIA runtime +2. Node.js 20+ +3. Tesla K80 GPU (or compatible) -**Build time:** ~10 minutes +### Quick Start -**Build command:** ```bash -cd docker && make build-runtime +# Start the container +cd docker && docker compose up -d + +# Install test runner +cd tests && npm ci + +# Run all tests with dual judge +npm run dev -- run --dual-judge + +# Run specific suite +npm run dev -- run --suite inference + +# Run single test +npm run dev -- run --id TC-INFERENCE-002 + +# Simple mode (no LLM) +npm run dev -- run --no-llm + +# JSON output +npm run dev -- run -o json > results.json ``` -## Pipeline Stages +### Test Output -### Stage 1: Docker Build +Results are saved to `/tmp/`: +- `/tmp/build-results.json` +- `/tmp/runtime-results.json` +- `/tmp/inference-results.json` -**Trigger:** Push to `main` branch - -**Steps:** -1. Checkout repository -2. Ensure builder image exists (build if not) -3. Build runtime image: `make build-runtime` -4. Verify image created successfully - -**Test Cases:** -- TC-BUILD-001: Builder Image Verification -- TC-BUILD-002: Runtime Image Build -- TC-BUILD-003: Image Size Validation - -### Stage 2: Container Startup - -**Steps:** -1. Start container with GPU: `docker compose up -d` -2. Wait for health check to pass -3. Verify Ollama server is responding - -**Test Cases:** -- TC-RUNTIME-001: Container Startup -- TC-RUNTIME-002: GPU Detection -- TC-RUNTIME-003: Health Check - -### Stage 3: Inference Tests - -**Steps:** -1. Pull test model (gemma3:4b) -2. Run inference tests -3. Verify CUBLAS legacy fallback - -**Test Cases:** -- TC-INFERENCE-001: Model Pull -- TC-INFERENCE-002: Basic Inference -- TC-INFERENCE-003: API Endpoint Test -- TC-INFERENCE-004: CUBLAS Fallback Verification - -### Stage 4: Cleanup & Report - -**Steps:** -1. Stop container: `docker compose down` -2. Report results to TestLink -3. Clean up resources - -## Test Case Design - -### Build Tests (Suite: Build Tests) - -| ID | Name | Type | Description | -|----|------|------|-------------| -| TC-BUILD-001 | Builder Image Verification | Automated | Verify builder image exists with correct tools | -| TC-BUILD-002 | Runtime Image Build | Automated | Build runtime image from GitHub source | -| TC-BUILD-003 | Image Size Validation | Automated | Verify image sizes are within expected range | - -### Runtime Tests (Suite: Runtime Tests) - -| ID | Name | Type | Description | -|----|------|------|-------------| -| TC-RUNTIME-001 | Container Startup | Automated | Start container with GPU passthrough | -| TC-RUNTIME-002 | GPU Detection | Automated | Verify Tesla K80 detected inside container | -| TC-RUNTIME-003 | Health Check | Automated | Verify Ollama health check passes | - -### Inference Tests (Suite: Inference Tests) - -| ID | Name | Type | Description | -|----|------|------|-------------| -| TC-INFERENCE-001 | Model Pull | Automated | Pull gemma3:4b model | -| TC-INFERENCE-002 | Basic Inference | Automated | Run simple prompt and verify response | -| TC-INFERENCE-003 | API Endpoint Test | Automated | Test /api/generate endpoint | -| TC-INFERENCE-004 | CUBLAS Fallback Verification | Automated | Verify legacy CUBLAS functions used | - -## GitHub Actions Workflow - -**File:** `.github/workflows/build-test.yml` - -**Triggers:** -- Push to `main` branch -- Pull request to `main` branch -- Manual trigger (workflow_dispatch) - -**Runner:** Self-hosted with labels `[self-hosted, k80, cuda11]` - -**Jobs:** -1. `build` - Build Docker runtime image -2. `test` - Run inference tests in container -3. `report` - Report results to TestLink - -## TestLink Integration - -**URL:** http://localhost:8090 - -**Project:** ollama37 - -**Test Suites:** -- Build Tests -- Runtime Tests -- Inference Tests - -**Test Plan:** Created per release/sprint - -**Builds:** Created per CI run (commit SHA) - -**Execution Recording:** -- Each test case result recorded via TestLink API -- Pass/Fail status with notes -- Linked to specific build/commit - -## Makefile Targets for CI - -| Target | Description | When to Use | -|--------|-------------|-------------| -| `make build-builder` | Build base image | First time setup | -| `make build-runtime` | Build from GitHub | Normal CI builds | -| `make build-runtime-no-cache` | Fresh GitHub clone | When cache is stale | -| `make build-runtime-local` | Build from local | Local testing | +JSON structure: +```json +{ + "summary": { + "total": 5, + "passed": 5, + "failed": 0, + "timestamp": "2025-12-17T...", + "simple": { "passed": 5, "failed": 0 }, + "llm": { "passed": 5, "failed": 0 } + }, + "results": [...] +} +``` ## Environment Variables @@ -222,76 +306,19 @@ cd docker && make build-runtime | `OLLAMA_DEBUG` | 1 (optional) | Enable debug logging | | `GGML_CUDA_DEBUG` | 1 (optional) | Enable CUDA debug | -### TestLink Environment +### Test Environment -| Variable | Value | Description | -|----------|-------|-------------| -| `TESTLINK_URL` | http://localhost:8090 | TestLink server URL | -| `TESTLINK_API_KEY` | (configured) | API key for automation | - -## Prerequisites - -### One-Time Setup on CI/CD Node - -1. **Install GitHub Actions Runner:** - ```bash - mkdir -p ~/actions-runner && cd ~/actions-runner - curl -o actions-runner-linux-x64-2.321.0.tar.gz -L \ - https://github.com/actions/runner/releases/download/v2.321.0/actions-runner-linux-x64-2.321.0.tar.gz - tar xzf ./actions-runner-linux-x64-2.321.0.tar.gz - ./config.sh --url https://github.com/dogkeeper886/ollama37 --token YOUR_TOKEN --labels k80,cuda11 - sudo ./svc.sh install && sudo ./svc.sh start - ``` - -2. **Build Builder Image (one-time, ~90 min):** - ```bash - cd /home/jack/src/ollama37/docker - make build-builder - ``` - -3. **Verify GPU Access in Docker:** - ```bash - docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi - ``` - -4. **Start TestLink:** - ```bash - cd /home/jack/src/testlink-code - docker compose up -d - ``` - -## Monitoring & Logs - -### View CI/CD Logs - -```bash -# GitHub Actions Runner logs -journalctl -u actions.runner.* -f - -# Docker build logs -docker compose logs -f - -# TestLink logs -cd /home/jack/src/testlink-code && docker compose logs -f -``` - -### Test Results - -- **TestLink Dashboard:** http://localhost:8090 -- **GitHub Actions:** https://github.com/dogkeeper886/ollama37/actions +| Variable | Description | +|----------|-------------| +| `TEST_ID` | Current test ID (set by executor) | +| `OLLAMA_HOST` | Test subject URL | ## Troubleshooting -### Builder Image Missing - -```bash -cd docker && make build-builder -``` - ### GPU Not Detected in Container ```bash -# Check UVM device files on host +# Check UVM device files ls -l /dev/nvidia-uvm* # Create if missing @@ -301,18 +328,72 @@ nvidia-modprobe -u -c=0 docker compose restart ``` -### Build Cache Stale +### LLM Judge Timeout ```bash +# Use simple mode +npm run dev -- run --no-llm + +# Or increase judge model size +npm run dev -- run --judge-model gemma3:4b +``` + +### Log Collector Issues + +If test step can't find logs: +```bash +# Check log file exists +ls -l /tmp/test-*-logs.txt + +# Fallback to direct logs +docker compose logs --since=5m +``` + +### Build Failures + +```bash +# Clean build cd docker && make build-runtime-no-cache + +# Check builder image +docker images | grep ollama37-builder ``` -### TestLink Connection Failed +## Error Patterns -```bash -# Check TestLink is running -curl http://localhost:8090 +The test framework checks for these critical errors: + +| Pattern | Severity | Description | +|---------|----------|-------------| +| `CUBLAS_STATUS_*` | Critical | CUDA/cuBLAS error (K80-specific) | +| `CUDA error` | Critical | General CUDA failure | +| `cudaMalloc failed` | Critical | GPU memory allocation failure | +| `out of memory` | Critical | VRAM exhausted | +| `level=ERROR` | Warning | Ollama application error | +| `panic`, `fatal` | Critical | Runtime crash | +| `id=cpu library=cpu` | Critical | CPU-only fallback (GPU not detected) | + +## File Structure -# Restart if needed -cd /home/jack/src/testlink-code && docker compose restart +``` +tests/ +├── src/ +│ ├── cli.ts # CLI entry point +│ ├── executor.ts # Test execution engine +│ ├── judge.ts # LLM/simple judging +│ ├── loader.ts # YAML test case parser +│ ├── log-collector.ts # Docker log collector +│ ├── reporter.ts # Output formatters +│ └── types.ts # Type definitions +├── testcases/ +│ ├── build/ # Build test cases +│ ├── runtime/ # Runtime test cases +│ └── inference/ # Inference test cases +└── package.json + +.github/workflows/ +├── build.yml # Build verification +├── runtime.yml # Container/GPU tests +├── inference.yml # Model inference tests +└── full-pipeline.yml # Complete pipeline ``` diff --git a/tests/src/cli.ts b/tests/src/cli.ts index beb79a40..b7c2b370 100644 --- a/tests/src/cli.ts +++ b/tests/src/cli.ts @@ -9,6 +9,7 @@ import { TestExecutor } from "./executor.js"; import { LLMJudge } from "./judge.js"; import { Reporter, TestLinkReporter } from "./reporter.js"; import { RunnerOptions, Judgment } from "./types.js"; +import { LogCollector } from "./log-collector.js"; const __dirname = path.dirname(fileURLToPath(import.meta.url)); const defaultTestcasesDir = path.join(__dirname, "..", "testcases"); @@ -68,9 +69,12 @@ program log("=".repeat(60)); const loader = new TestLoader(options.testcasesDir); - const executor = new TestExecutor(path.join(__dirname, "..", "..")); + const projectRoot = path.join(__dirname, "..", ".."); const judge = new LLMJudge(options.judgeUrl, options.judgeModel); + // LogCollector will be created if running runtime or inference tests + let logCollector: LogCollector | undefined; + // Load test cases log("\nLoading test cases..."); let testCases = await loader.loadAll(); @@ -107,10 +111,38 @@ program process.exit(0); } + // Check if we need log collector (runtime or inference tests) + const needsLogCollector = testCases.some( + (tc) => tc.suite === "runtime" || tc.suite === "inference" + ); + + if (needsLogCollector) { + log("\nStarting log collector..."); + logCollector = new LogCollector(projectRoot); + try { + await logCollector.start(); + log(" Log collector started"); + } catch (error) { + log(` Warning: Log collector failed to start: ${error}`); + log(" Tests will run without precise log collection"); + logCollector = undefined; + } + } + + // Create executor with optional log collector + const executor = new TestExecutor(projectRoot, logCollector); + // Execute tests (progress goes to stderr via executor) const workers = parseInt(options.workers); const results = await executor.executeAll(testCases, workers); + // Stop log collector if running + if (logCollector) { + log("\nStopping log collector..."); + await logCollector.stop(); + log(" Log collector stopped"); + } + // Judge results log("\nJudging results..."); let judgments: Judgment[]; diff --git a/tests/src/executor.ts b/tests/src/executor.ts index d25e6429..73597c28 100644 --- a/tests/src/executor.ts +++ b/tests/src/executor.ts @@ -1,6 +1,7 @@ import { exec } from 'child_process' import { promisify } from 'util' import { TestCase, TestResult, StepResult } from './types.js' +import { LogCollector } from './log-collector.js' const execAsync = promisify(exec) @@ -17,9 +18,12 @@ export class TestExecutor { private workingDir: string private totalTests: number = 0 private currentTest: number = 0 + private logCollector: LogCollector | null = null + private currentTestId: string | null = null - constructor(workingDir: string = process.cwd()) { + constructor(workingDir: string = process.cwd(), logCollector?: LogCollector) { this.workingDir = workingDir + this.logCollector = logCollector || null } // Progress output goes to stderr (visible in console) @@ -34,12 +38,24 @@ export class TestExecutor { let exitCode = 0 let timedOut = false + // Build environment with TEST_ID for log access + const env = { ...process.env } + if (this.currentTestId) { + env.TEST_ID = this.currentTestId + } + + // Update log file before each step so test can access current logs + if (this.logCollector && this.currentTestId) { + this.logCollector.writeCurrentLogs(this.currentTestId) + } + try { const result = await execAsync(command, { cwd: this.workingDir, timeout, maxBuffer: 10 * 1024 * 1024, // 10MB buffer - shell: '/bin/bash' + shell: '/bin/bash', + env }) stdout = result.stdout stderr = result.stderr @@ -77,8 +93,14 @@ export class TestExecutor { const timestamp = new Date().toISOString().substring(11, 19) this.currentTest++ + this.currentTestId = testCase.id this.progress(`[${timestamp}] [${this.currentTest}/${this.totalTests}] ${testCase.id}: ${testCase.name}`) + // Mark test start for log collection + if (this.logCollector) { + this.logCollector.markTestStart(testCase.id) + } + for (let i = 0; i < testCase.steps.length; i++) { const step = testCase.steps[i] const stepTimestamp = new Date().toISOString().substring(11, 19) @@ -106,6 +128,12 @@ export class TestExecutor { const totalDuration = Date.now() - startTime + // Mark test end for log collection + if (this.logCollector) { + this.logCollector.markTestEnd(testCase.id) + } + this.currentTestId = null + // Combine all logs const logs = stepResults.map(r => { return `=== Step: ${r.name} === diff --git a/tests/src/log-collector.ts b/tests/src/log-collector.ts new file mode 100644 index 00000000..b7eb8ca9 --- /dev/null +++ b/tests/src/log-collector.ts @@ -0,0 +1,200 @@ +import { spawn, ChildProcess } from "child_process"; +import { writeFileSync } from "fs"; +import path from "path"; + +interface LogEntry { + timestamp: Date; + line: string; +} + +interface TestMarker { + start: number; + end?: number; +} + +/** + * LogCollector runs `docker compose logs --follow` as a background process + * and captures logs with precise test boundaries. + * + * This solves the log overlap problem where `docker compose logs --since=5m` + * could include logs from previous tests or miss logs if a test exceeds 5 minutes. + */ +export class LogCollector { + private process: ChildProcess | null = null; + private logs: LogEntry[] = []; + private testMarkers: Map = new Map(); + private workingDir: string; + private isRunning: boolean = false; + + constructor(workingDir: string) { + // workingDir should be the project root, docker-compose.yml is in docker/ + this.workingDir = path.join(workingDir, "docker"); + } + + /** + * Start the log collector background process + */ + async start(): Promise { + if (this.isRunning) { + return; + } + + return new Promise((resolve, reject) => { + // Spawn docker compose logs --follow + this.process = spawn("docker", ["compose", "logs", "--follow", "--timestamps"], { + cwd: this.workingDir, + stdio: ["ignore", "pipe", "pipe"], + }); + + this.isRunning = true; + + // Handle stdout (main log output) + this.process.stdout?.on("data", (data: Buffer) => { + const lines = data.toString().split("\n").filter((l) => l.trim()); + for (const line of lines) { + this.logs.push({ + timestamp: new Date(), + line: line, + }); + } + }); + + // Handle stderr (docker compose messages) + this.process.stderr?.on("data", (data: Buffer) => { + const lines = data.toString().split("\n").filter((l) => l.trim()); + for (const line of lines) { + this.logs.push({ + timestamp: new Date(), + line: `[stderr] ${line}`, + }); + } + }); + + this.process.on("error", (err) => { + this.isRunning = false; + reject(err); + }); + + this.process.on("close", (code) => { + this.isRunning = false; + }); + + // Give it a moment to start + setTimeout(() => { + if (this.isRunning) { + resolve(); + } else { + reject(new Error("Log collector failed to start")); + } + }, 500); + }); + } + + /** + * Stop the log collector background process + */ + async stop(): Promise { + if (!this.process || !this.isRunning) { + return; + } + + return new Promise((resolve) => { + this.process!.on("close", () => { + this.isRunning = false; + resolve(); + }); + + this.process!.kill("SIGTERM"); + + // Force kill after 5 seconds + setTimeout(() => { + if (this.isRunning && this.process) { + this.process.kill("SIGKILL"); + } + resolve(); + }, 5000); + }); + } + + /** + * Mark the start of a test - records current log index + */ + markTestStart(testId: string): void { + this.testMarkers.set(testId, { + start: this.logs.length, + }); + } + + /** + * Mark the end of a test - records current log index and writes logs to file + */ + markTestEnd(testId: string): void { + const marker = this.testMarkers.get(testId); + if (marker) { + marker.end = this.logs.length; + + // Write logs to temp file for test steps to access + const logs = this.getLogsForTest(testId); + const filePath = `/tmp/test-${testId}-logs.txt`; + writeFileSync(filePath, logs); + } + } + + /** + * Get logs for a specific test (between start and end markers) + */ + getLogsForTest(testId: string): string { + const marker = this.testMarkers.get(testId); + if (!marker) { + return ""; + } + + const endIndex = marker.end ?? this.logs.length; + const testLogs = this.logs.slice(marker.start, endIndex); + + return testLogs.map((entry) => entry.line).join("\n"); + } + + /** + * Get all logs collected so far + */ + getAllLogs(): string { + return this.logs.map((entry) => entry.line).join("\n"); + } + + /** + * Get logs since a specific index + */ + getLogsSince(startIndex: number): string { + return this.logs.slice(startIndex).map((entry) => entry.line).join("\n"); + } + + /** + * Get current log count (useful for tracking) + */ + getLogCount(): number { + return this.logs.length; + } + + /** + * Check if the collector is running + */ + isActive(): boolean { + return this.isRunning; + } + + /** + * Write current test logs to file (call during test execution) + * This allows test steps to access logs accumulated so far + */ + writeCurrentLogs(testId: string): void { + const marker = this.testMarkers.get(testId); + if (!marker) { + return; + } + + const testLogs = this.logs.slice(marker.start); + const filePath = `/tmp/test-${testId}-logs.txt`; + writeFileSync(filePath, testLogs.map((entry) => entry.line).join("\n")); + } +} diff --git a/tests/testcases/inference/TC-INFERENCE-001.yml b/tests/testcases/inference/TC-INFERENCE-001.yml index 51f212c3..e23efbe6 100644 --- a/tests/testcases/inference/TC-INFERENCE-001.yml +++ b/tests/testcases/inference/TC-INFERENCE-001.yml @@ -27,8 +27,12 @@ steps: - name: Verify model loading in logs command: | - cd docker - LOGS=$(docker compose logs 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs 2>&1) + fi echo "=== Model Loading Check ===" @@ -58,8 +62,12 @@ steps: - name: Verify llama runner started command: | - cd docker - LOGS=$(docker compose logs 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs 2>&1) + fi echo "=== Llama Runner Check ===" @@ -74,8 +82,12 @@ steps: - name: Check for model loading errors command: | - cd docker - LOGS=$(docker compose logs 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs 2>&1) + fi echo "=== Model Loading Error Check ===" @@ -97,8 +109,12 @@ steps: - name: Display model memory allocation from logs command: | - cd docker - LOGS=$(docker compose logs 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs 2>&1) + fi echo "=== Model Memory Allocation ===" echo "$LOGS" | grep -E '(model weights|kv cache|compute graph|total memory).*device=' | tail -8 diff --git a/tests/testcases/inference/TC-INFERENCE-002.yml b/tests/testcases/inference/TC-INFERENCE-002.yml index 52d587a7..efad80cf 100644 --- a/tests/testcases/inference/TC-INFERENCE-002.yml +++ b/tests/testcases/inference/TC-INFERENCE-002.yml @@ -17,8 +17,12 @@ steps: - name: Check for inference errors in logs command: | - cd docker - LOGS=$(docker compose logs --since=5m 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs --since=5m 2>&1) + fi echo "=== Inference Error Check ===" @@ -47,8 +51,12 @@ steps: - name: Verify inference request in logs command: | - cd docker - LOGS=$(docker compose logs --since=5m 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs --since=5m 2>&1) + fi echo "=== Inference Request Verification ===" @@ -69,8 +77,12 @@ steps: - name: Display recent CUDA activity from logs command: | - cd docker - LOGS=$(docker compose logs --since=5m 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs --since=5m 2>&1) + fi echo "=== Recent CUDA Activity ===" echo "$LOGS" | grep -iE "(CUDA|cuda|device=CUDA)" | tail -5 || echo "No recent CUDA activity logged" diff --git a/tests/testcases/inference/TC-INFERENCE-003.yml b/tests/testcases/inference/TC-INFERENCE-003.yml index 81616e01..9fe5533d 100644 --- a/tests/testcases/inference/TC-INFERENCE-003.yml +++ b/tests/testcases/inference/TC-INFERENCE-003.yml @@ -22,8 +22,12 @@ steps: - name: Verify API requests logged successfully command: | - cd docker - LOGS=$(docker compose logs --since=5m 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs --since=5m 2>&1) + fi echo "=== API Request Log Verification ===" @@ -40,8 +44,12 @@ steps: - name: Check for API errors in logs command: | - cd docker - LOGS=$(docker compose logs --since=5m 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs --since=5m 2>&1) + fi echo "=== API Error Check ===" @@ -64,8 +72,12 @@ steps: - name: Display API response times from logs command: | - cd docker - LOGS=$(docker compose logs --since=5m 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs --since=5m 2>&1) + fi echo "=== API Response Times ===" diff --git a/tests/testcases/inference/TC-INFERENCE-004.yml b/tests/testcases/inference/TC-INFERENCE-004.yml index 5898895d..347c9225 100644 --- a/tests/testcases/inference/TC-INFERENCE-004.yml +++ b/tests/testcases/inference/TC-INFERENCE-004.yml @@ -27,8 +27,12 @@ steps: - name: Verify model loaded to GPU command: | - cd docker - LOGS=$(docker compose logs --since=5m 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs --since=5m 2>&1) + fi echo "=== Model Loading Check for gemma3:12b ===" @@ -63,8 +67,12 @@ steps: - name: Check for inference errors command: | - cd docker - LOGS=$(docker compose logs --since=5m 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs --since=5m 2>&1) + fi echo "=== Inference Error Check ===" diff --git a/tests/testcases/inference/TC-INFERENCE-005.yml b/tests/testcases/inference/TC-INFERENCE-005.yml index 43ddbb07..e0233cec 100644 --- a/tests/testcases/inference/TC-INFERENCE-005.yml +++ b/tests/testcases/inference/TC-INFERENCE-005.yml @@ -39,8 +39,12 @@ steps: - name: Verify model loaded across GPUs command: | - cd docker - LOGS=$(docker compose logs --since=10m 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs --since=10m 2>&1) + fi echo "=== Model Loading Check for gemma3:27b ===" @@ -96,8 +100,12 @@ steps: - name: Check for inference errors command: | - cd docker - LOGS=$(docker compose logs --since=10m 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs --since=10m 2>&1) + fi echo "=== Inference Error Check ===" diff --git a/tests/testcases/runtime/TC-RUNTIME-001.yml b/tests/testcases/runtime/TC-RUNTIME-001.yml index 3d0b389e..30a08e92 100644 --- a/tests/testcases/runtime/TC-RUNTIME-001.yml +++ b/tests/testcases/runtime/TC-RUNTIME-001.yml @@ -22,12 +22,21 @@ steps: - name: Capture startup logs command: | - cd docker && docker compose logs 2>&1 | head -100 + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + head -100 /tmp/test-${TEST_ID}-logs.txt + else + cd docker && docker compose logs 2>&1 | head -100 + fi - name: Check for startup errors in logs command: | - cd docker - LOGS=$(docker compose logs 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs 2>&1) + fi # Check for critical errors if echo "$LOGS" | grep -qE "(level=ERROR|CUBLAS_STATUS_|CUDA error|cudaMalloc failed)"; then diff --git a/tests/testcases/runtime/TC-RUNTIME-002.yml b/tests/testcases/runtime/TC-RUNTIME-002.yml index 46e16eff..cfe232a3 100644 --- a/tests/testcases/runtime/TC-RUNTIME-002.yml +++ b/tests/testcases/runtime/TC-RUNTIME-002.yml @@ -30,8 +30,12 @@ steps: - name: Verify GPU detection in Ollama logs command: | - cd docker - LOGS=$(docker compose logs 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs 2>&1) + fi echo "=== GPU Detection Check ===" @@ -60,8 +64,12 @@ steps: - name: Check for GPU-related errors in logs command: | - cd docker - LOGS=$(docker compose logs 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs 2>&1) + fi echo "=== GPU Error Check ===" @@ -82,8 +90,12 @@ steps: - name: Display GPU memory status from logs command: | - cd docker - LOGS=$(docker compose logs 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs 2>&1) + fi echo "=== GPU Memory Status ===" echo "$LOGS" | grep -E "gpu memory.*library=CUDA" | tail -4 diff --git a/tests/testcases/runtime/TC-RUNTIME-003.yml b/tests/testcases/runtime/TC-RUNTIME-003.yml index 9b03a968..b462bee3 100644 --- a/tests/testcases/runtime/TC-RUNTIME-003.yml +++ b/tests/testcases/runtime/TC-RUNTIME-003.yml @@ -30,8 +30,12 @@ steps: - name: Verify server listening in logs command: | - cd docker - LOGS=$(docker compose logs 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs 2>&1) + fi echo "=== Server Status Check ===" @@ -46,8 +50,12 @@ steps: - name: Check for runtime errors in logs command: | - cd docker - LOGS=$(docker compose logs 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs 2>&1) + fi echo "=== Runtime Error Check ===" @@ -71,8 +79,12 @@ steps: - name: Verify API request handling in logs command: | - cd docker - LOGS=$(docker compose logs 2>&1) + # Use log collector file if available, fallback to docker compose logs + if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then + LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt) + else + LOGS=$(cd docker && docker compose logs 2>&1) + fi echo "=== API Request Logs ==="