Add LogCollector for precise test log boundaries

Problem: Tests used `docker compose logs --since=5m` which caused: - Log overlap between tests - Logs from previous tests included - Missing logs if test exceeded 5 minutes Solution: - New LogCollector class runs `docker compose logs --follow` - Marks test start/end boundaries - Writes test-specific logs to /tmp/test-{testId}-logs.txt - Test steps access via TEST_ID environment variable Changes: - tests/src/log-collector.ts: New LogCollector class - tests/src/executor.ts: Integrate LogCollector, set TEST_ID env - tests/src/cli.ts: Start/stop LogCollector for runtime/inference - All test cases: Use log collector with fallback to docker compose Also updated docs/CICD.md with: - Test runner CLI documentation - Judge modes (simple, llm, dual) - Log collector integration - Updated test case list (12b, 27b models) - Model unload strategy - Troubleshooting guide 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-22 05:37:01 +00:00 · 2025-12-17 17:46:49 +08:00
parent 82ab6cc96e
commit 2c5094db92
12 changed files with 702 additions and 272 deletions
--- a/docs/CICD.md
+++ b/docs/CICD.md
@@ -1,4 +1,4 @@
-# CI/CD Plan for Ollama37
+# CI/CD Pipeline for Ollama37
 This document describes the CI/CD pipeline for building and testing Ollama37 with Tesla K80 (CUDA compute capability 3.7) support.
@@ -24,185 +24,269 @@ This document describes the CI/CD pipeline for building and testing Ollama37 wit
 │    - Rocky Linux 9.7                                                   │
 │    - Docker 29.1.3 + Docker Compose 5.0.0                              │
 │    - NVIDIA Container Toolkit                                          │
-│    - GitHub Actions Runner (self-hosted, labels: k80, cuda11)          │
+│    - GitHub Actions Runner (self-hosted)                               │
 │                                                                         │
 │  Services:                                                              │
 │    - TestLink (http://localhost:8090) - Test management                │
 │    - TestLink MCP - Claude Code integration                            │
 │                                                                         │
 └─────────────────────────────────────────────────────────────────────────┘
                                                                      │
                                                                      ▼
 ┌─────────────────────────────────────────────────────────────────────────┐
 │                         SERVE NODE                                       │
 │                                                                         │
 │  Services:                                                              │
 │    - Ollama (production)                                               │
 │    - Dify (LLM application platform)                                   │
 │                                                                         │
 └─────────────────────────────────────────────────────────────────────────┘
 ```
-## Build Strategy: Docker-Based
+## Test Framework
-We use the two-stage Docker build system located in `/docker/`:
+### Test Runner CLI
-### Stage 1: Builder Image (Cached)
+The test runner is located in `tests/src/` and provides a CLI tool:
-**Image:** `ollama37-builder:latest` (~15GB)
+```bash
 cd tests
 npm run dev -- run [options]
 ```
-**Contents:**
+**Commands:**
 - `run` - Execute test cases
 - `list` - List all available test cases
 **Options:**
 | Option | Default | Description |
 |--------|---------|-------------|
 | `-s, --suite <suite>` | all | Filter by suite (build, runtime, inference) |
 | `-i, --id <id>` | - | Run specific test by ID |
 | `-w, --workers <n>` | 1 | Parallel worker count |
 | `-d, --dry-run` | false | Preview without executing |
 | `-o, --output <format>` | console | Output format: console, json, junit |
 | `--no-llm` | false | Skip LLM, use simple exit code check only |
 | `--judge-model <model>` | gemma3:12b | Model for LLM judging |
 | `--dual-judge` | true | Run both simple and LLM judge |
 | `--ollama-url <url>` | localhost:11434 | Test subject server |
 | `--judge-url <url>` | localhost:11435 | Separate judge instance |
 ### Judge Modes
 The test framework supports three judge modes:
 | Mode | Flag | Description |
 |------|------|-------------|
 | **Simple** | `--no-llm` | Exit code checking only (exit 0 = pass) |
 | **LLM** | `--judge-model` | Semantic analysis of test logs using LLM |
 | **Dual** | `--dual-judge` | Both must pass (default) |
 **LLM Judge:**
 - Analyzes test execution logs semantically
 - Detects hidden issues (e.g., CUDA errors with exit 0)
 - Uses configurable model (default: gemma3:12b)
 - Batches tests for efficient judging
 **Simple Judge:**
 - Fast, deterministic
 - Checks exit codes only
 - Fallback when LLM unavailable
 ### Log Collector
 The test framework includes a log collector that solves log overlap issues:
 **Problem:** `docker compose logs --since=5m` can include logs from previous tests or miss logs if a test exceeds 5 minutes.
 **Solution:** LogCollector class that:
 1. Runs `docker compose logs --follow` as background process
 2. Marks test start/end boundaries
 3. Writes test-specific logs to `/tmp/test-{testId}-logs.txt`
 4. Provides precise logs for each test
 Test steps access logs via:
 ```bash
 LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
 ```
 ## GitHub Workflows
 Located in `.github/workflows/`:
 | Workflow | Purpose |
 |----------|---------|
 | `build.yml` | Docker image build verification |
 | `runtime.yml` | Container startup and GPU detection |
 | `inference.yml` | Model inference tests (4b, 12b, 27b) |
 | `full-pipeline.yml` | Orchestrates all stages sequentially |
 ### Workflow Inputs
 | Parameter | Default | Options | Description |
 |-----------|---------|---------|-------------|
 | `judge_mode` | dual | simple, llm, dual | Judge strategy |
 | `judge_model` | gemma3:12b | Any model | LLM for evaluation |
 | `use_existing_container` | false | true, false | Reuse running container |
 | `keep_container` | false | true, false | Leave container running |
 ### Example: Run Inference Tests
 ```bash
 # Manual trigger via GitHub Actions UI
 # Or via gh CLI:
 gh workflow run inference.yml \
  -f judge_mode=dual \
  -f judge_model=gemma3:12b
 ```
 ## Test Suites
 ### Build Suite (3 tests)
 | ID | Name | Timeout | Description |
 |----|------|---------|-------------|
 | TC-BUILD-001 | Builder Image Verification | 2m | Verify builder image exists |
 | TC-BUILD-002 | Runtime Image Build | 30m | Build runtime image |
 | TC-BUILD-003 | Image Size Validation | 30s | Check image sizes |
 ### Runtime Suite (3 tests)
 | ID | Name | Timeout | Description |
 |----|------|---------|-------------|
 | TC-RUNTIME-001 | Container Startup | 2m | Start container with GPU |
 | TC-RUNTIME-002 | GPU Detection | 2m | Verify K80 detected |
 | TC-RUNTIME-003 | Health Check | 3m | API health verification |
 ### Inference Suite (5 tests)
 | ID | Name | Model | Timeout | Description |
 |----|------|-------|---------|-------------|
 | TC-INFERENCE-001 | Model Pull | gemma3:4b | 10m | Pull and warmup 4b model |
 | TC-INFERENCE-002 | Basic Inference | gemma3:4b | 3m | Simple prompt test |
 | TC-INFERENCE-003 | API Endpoint Test | gemma3:4b | 2m | REST API verification |
 | TC-INFERENCE-004 | Medium Model | gemma3:12b | 10m | 12b inference (single GPU) |
 | TC-INFERENCE-005 | Large Model Dual-GPU | gemma3:27b | 15m | 27b inference (dual GPU) |
 ### Model Unload Strategy
 Each model size test unloads its model after completion:
 ```
 4b tests (001-003) → unload 4b
 12b test (004) → unload 12b
 27b test (005) → unload 27b
 ```
 Workflow-level cleanup (`if: always()`) provides safety fallback.
 ## Test Case Structure
 Test cases are YAML files in `tests/testcases/{suite}/`:
 ```yaml
 id: TC-INFERENCE-002
 name: Basic Inference
 suite: inference
 priority: 2
 timeout: 180000
 dependencies:
  - TC-INFERENCE-001
 steps:
  - name: Run simple math question
    command: docker exec ollama37 ollama run gemma3:4b "What is 2+2?"
    timeout: 120000
  - name: Check for errors in logs
    command: |
      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs --since=5m 2>&1)
      fi
      # Check for CUDA errors...
 criteria: |
  Expected:
  - Model responds with "4" or equivalent
  - NO CUBLAS_STATUS_ errors
  - NO CUDA errors
 ```
 ## Build System
 ### Docker Images
 **Builder Image:** `ollama37-builder:latest` (~15GB)
 - Rocky Linux 8
 - CUDA 11.4 toolkit
- GCC 10 (built from source)
+- GCC 10, CMake 4.0, Go 1.25.3
- CMake 4.0 (built from source)
+- Build time: ~90 minutes (cached)
 - Go 1.25.3
-**Build time:** ~90 minutes (first time only, then cached)
+**Runtime Image:** `ollama37:latest` (~18GB)
 - Built from GitHub source
 - Build time: ~10 minutes
 ### Build Commands
 **Build command:**
 ```bash
-cd docker && make build-builder
+cd docker
 # Build base image (first time only)
 make build-builder
 # Build runtime from GitHub
 make build-runtime
 # Build without cache
 make build-runtime-no-cache
 # Build from local source
 make build-runtime-local
 ```
-### Stage 2: Runtime Image (Per Build)
+## Running Tests Locally
-**Image:** `ollama37:latest` (~18GB)
+### Prerequisites
-**Process:**
+1. Docker with NVIDIA runtime
-1. Clone source from GitHub
+2. Node.js 20+
-2. Configure with CMake ("CUDA 11" preset)
+3. Tesla K80 GPU (or compatible)
 3. Build C/C++/CUDA libraries
 4. Build Go binary
 5. Package runtime environment
-**Build time:** ~10 minutes
+### Quick Start
 **Build command:**
 ```bash
-cd docker && make build-runtime
+# Start the container
 cd docker && docker compose up -d
 # Install test runner
 cd tests && npm ci
 # Run all tests with dual judge
 npm run dev -- run --dual-judge
 # Run specific suite
 npm run dev -- run --suite inference
 # Run single test
 npm run dev -- run --id TC-INFERENCE-002
 # Simple mode (no LLM)
 npm run dev -- run --no-llm
 # JSON output
 npm run dev -- run -o json > results.json
 ```
-## Pipeline Stages
+### Test Output
-### Stage 1: Docker Build
+Results are saved to `/tmp/`:
 - `/tmp/build-results.json`
 - `/tmp/runtime-results.json`
 - `/tmp/inference-results.json`
-**Trigger:** Push to `main` branch
+JSON structure:
-
+```json
-**Steps:**
+{
-1. Checkout repository
+  "summary": {
-2. Ensure builder image exists (build if not)
+    "total": 5,
-3. Build runtime image: `make build-runtime`
+    "passed": 5,
-4. Verify image created successfully
+    "failed": 0,
-
+    "timestamp": "2025-12-17T...",
-**Test Cases:**
+    "simple": { "passed": 5, "failed": 0 },
- TC-BUILD-001: Builder Image Verification
+    "llm": { "passed": 5, "failed": 0 }
- TC-BUILD-002: Runtime Image Build
+  },
- TC-BUILD-003: Image Size Validation
+  "results": [...]
-
+}
-### Stage 2: Container Startup
+```
 **Steps:**
 1. Start container with GPU: `docker compose up -d`
 2. Wait for health check to pass
 3. Verify Ollama server is responding
 **Test Cases:**
 - TC-RUNTIME-001: Container Startup
 - TC-RUNTIME-002: GPU Detection
 - TC-RUNTIME-003: Health Check
 ### Stage 3: Inference Tests
 **Steps:**
 1. Pull test model (gemma3:4b)
 2. Run inference tests
 3. Verify CUBLAS legacy fallback
 **Test Cases:**
 - TC-INFERENCE-001: Model Pull
 - TC-INFERENCE-002: Basic Inference
 - TC-INFERENCE-003: API Endpoint Test
 - TC-INFERENCE-004: CUBLAS Fallback Verification
 ### Stage 4: Cleanup & Report
 **Steps:**
 1. Stop container: `docker compose down`
 2. Report results to TestLink
 3. Clean up resources
 ## Test Case Design
 ### Build Tests (Suite: Build Tests)
 | ID | Name | Type | Description |
 |----|------|------|-------------|
 | TC-BUILD-001 | Builder Image Verification | Automated | Verify builder image exists with correct tools |
 | TC-BUILD-002 | Runtime Image Build | Automated | Build runtime image from GitHub source |
 | TC-BUILD-003 | Image Size Validation | Automated | Verify image sizes are within expected range |
 ### Runtime Tests (Suite: Runtime Tests)
 | ID | Name | Type | Description |
 |----|------|------|-------------|
 | TC-RUNTIME-001 | Container Startup | Automated | Start container with GPU passthrough |
 | TC-RUNTIME-002 | GPU Detection | Automated | Verify Tesla K80 detected inside container |
 | TC-RUNTIME-003 | Health Check | Automated | Verify Ollama health check passes |
 ### Inference Tests (Suite: Inference Tests)
 | ID | Name | Type | Description |
 |----|------|------|-------------|
 | TC-INFERENCE-001 | Model Pull | Automated | Pull gemma3:4b model |
 | TC-INFERENCE-002 | Basic Inference | Automated | Run simple prompt and verify response |
 | TC-INFERENCE-003 | API Endpoint Test | Automated | Test /api/generate endpoint |
 | TC-INFERENCE-004 | CUBLAS Fallback Verification | Automated | Verify legacy CUBLAS functions used |
 ## GitHub Actions Workflow
 **File:** `.github/workflows/build-test.yml`
 **Triggers:**
 - Push to `main` branch
 - Pull request to `main` branch
 - Manual trigger (workflow_dispatch)
 **Runner:** Self-hosted with labels `[self-hosted, k80, cuda11]`
 **Jobs:**
 1. `build` - Build Docker runtime image
 2. `test` - Run inference tests in container
 3. `report` - Report results to TestLink
 ## TestLink Integration
 **URL:** http://localhost:8090
 **Project:** ollama37
 **Test Suites:**
 - Build Tests
 - Runtime Tests
 - Inference Tests
 **Test Plan:** Created per release/sprint
 **Builds:** Created per CI run (commit SHA)
 **Execution Recording:**
 - Each test case result recorded via TestLink API
 - Pass/Fail status with notes
 - Linked to specific build/commit
 ## Makefile Targets for CI
 | Target | Description | When to Use |
 |--------|-------------|-------------|
 | `make build-builder` | Build base image | First time setup |
 | `make build-runtime` | Build from GitHub | Normal CI builds |
 | `make build-runtime-no-cache` | Fresh GitHub clone | When cache is stale |
 | `make build-runtime-local` | Build from local | Local testing |
 ## Environment Variables
@@ -222,76 +306,19 @@ cd docker && make build-runtime
 | `OLLAMA_DEBUG` | 1 (optional) | Enable debug logging |
 | `GGML_CUDA_DEBUG` | 1 (optional) | Enable CUDA debug |
-### TestLink Environment
+### Test Environment
-| Variable | Value | Description |
+| Variable | Description |
-|----------|-------|-------------|
+|----------|-------------|
-| `TESTLINK_URL` | http://localhost:8090 | TestLink server URL |
+| `TEST_ID` | Current test ID (set by executor) |
-| `TESTLINK_API_KEY` | (configured) | API key for automation |
+| `OLLAMA_HOST` | Test subject URL |
 ## Prerequisites
 ### One-Time Setup on CI/CD Node
 1. **Install GitHub Actions Runner:**
   ```bash
   mkdir -p ~/actions-runner && cd ~/actions-runner
   curl -o actions-runner-linux-x64-2.321.0.tar.gz -L \
     https://github.com/actions/runner/releases/download/v2.321.0/actions-runner-linux-x64-2.321.0.tar.gz
   tar xzf ./actions-runner-linux-x64-2.321.0.tar.gz
   ./config.sh --url https://github.com/dogkeeper886/ollama37 --token YOUR_TOKEN --labels k80,cuda11
   sudo ./svc.sh install && sudo ./svc.sh start
   ```
 2. **Build Builder Image (one-time, ~90 min):**
   ```bash
   cd /home/jack/src/ollama37/docker
   make build-builder
   ```
 3. **Verify GPU Access in Docker:**
   ```bash
   docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
   ```
 4. **Start TestLink:**
   ```bash
   cd /home/jack/src/testlink-code
   docker compose up -d
   ```
 ## Monitoring & Logs
 ### View CI/CD Logs
 ```bash
 # GitHub Actions Runner logs
 journalctl -u actions.runner.* -f
 # Docker build logs
 docker compose logs -f
 # TestLink logs
 cd /home/jack/src/testlink-code && docker compose logs -f
 ```
 ### Test Results
 - **TestLink Dashboard:** http://localhost:8090
 - **GitHub Actions:** https://github.com/dogkeeper886/ollama37/actions
 ## Troubleshooting
 ### Builder Image Missing
 ```bash
 cd docker && make build-builder
 ```
 ### GPU Not Detected in Container
 ```bash
-# Check UVM device files on host
+# Check UVM device files
 ls -l /dev/nvidia-uvm*
 # Create if missing
@@ -301,18 +328,72 @@ nvidia-modprobe -u -c=0
 docker compose restart
 ```
-### Build Cache Stale
+### LLM Judge Timeout
 ```bash
 # Use simple mode
 npm run dev -- run --no-llm
 # Or increase judge model size
 npm run dev -- run --judge-model gemma3:4b
 ```
 ### Log Collector Issues
 If test step can't find logs:
 ```bash
 # Check log file exists
 ls -l /tmp/test-*-logs.txt
 # Fallback to direct logs
 docker compose logs --since=5m
 ```
 ### Build Failures
 ```bash
 # Clean build
 cd docker && make build-runtime-no-cache
 # Check builder image
 docker images | grep ollama37-builder
 ```
-### TestLink Connection Failed
+## Error Patterns
-```bash
+The test framework checks for these critical errors:
-# Check TestLink is running
+
-curl http://localhost:8090
+| Pattern | Severity | Description |
 |---------|----------|-------------|
 | `CUBLAS_STATUS_*` | Critical | CUDA/cuBLAS error (K80-specific) |
 | `CUDA error` | Critical | General CUDA failure |
 | `cudaMalloc failed` | Critical | GPU memory allocation failure |
 | `out of memory` | Critical | VRAM exhausted |
 | `level=ERROR` | Warning | Ollama application error |
 | `panic`, `fatal` | Critical | Runtime crash |
 | `id=cpu library=cpu` | Critical | CPU-only fallback (GPU not detected) |
 ## File Structure
-# Restart if needed
+```
-cd /home/jack/src/testlink-code && docker compose restart
+tests/
 ├── src/
 │   ├── cli.ts           # CLI entry point
 │   ├── executor.ts      # Test execution engine
 │   ├── judge.ts         # LLM/simple judging
 │   ├── loader.ts        # YAML test case parser
 │   ├── log-collector.ts # Docker log collector
 │   ├── reporter.ts      # Output formatters
 │   └── types.ts         # Type definitions
 ├── testcases/
 │   ├── build/           # Build test cases
 │   ├── runtime/         # Runtime test cases
 │   └── inference/       # Inference test cases
 └── package.json
 .github/workflows/
 ├── build.yml            # Build verification
 ├── runtime.yml          # Container/GPU tests
 ├── inference.yml        # Model inference tests
 └── full-pipeline.yml    # Complete pipeline
 ```
--- a/tests/src/cli.ts
+++ b/tests/src/cli.ts
@@ -9,6 +9,7 @@ import { TestExecutor } from "./executor.js";
 import { LLMJudge } from "./judge.js";
 import { Reporter, TestLinkReporter } from "./reporter.js";
 import { RunnerOptions, Judgment } from "./types.js";
 import { LogCollector } from "./log-collector.js";
 const __dirname = path.dirname(fileURLToPath(import.meta.url));
 const defaultTestcasesDir = path.join(__dirname, "..", "testcases");
@@ -68,9 +69,12 @@ program
    log("=".repeat(60));
    const loader = new TestLoader(options.testcasesDir);
-    const executor = new TestExecutor(path.join(__dirname, "..", ".."));
+    const projectRoot = path.join(__dirname, "..", "..");
    const judge = new LLMJudge(options.judgeUrl, options.judgeModel);
    // LogCollector will be created if running runtime or inference tests
    let logCollector: LogCollector | undefined;
    // Load test cases
    log("\nLoading test cases...");
    let testCases = await loader.loadAll();
@@ -107,10 +111,38 @@ program
      process.exit(0);
    }
    // Check if we need log collector (runtime or inference tests)
    const needsLogCollector = testCases.some(
      (tc) => tc.suite === "runtime" || tc.suite === "inference"
    );
    if (needsLogCollector) {
      log("\nStarting log collector...");
      logCollector = new LogCollector(projectRoot);
      try {
        await logCollector.start();
        log("  Log collector started");
      } catch (error) {
        log(`  Warning: Log collector failed to start: ${error}`);
        log("  Tests will run without precise log collection");
        logCollector = undefined;
      }
    }
    // Create executor with optional log collector
    const executor = new TestExecutor(projectRoot, logCollector);
    // Execute tests (progress goes to stderr via executor)
    const workers = parseInt(options.workers);
    const results = await executor.executeAll(testCases, workers);
    // Stop log collector if running
    if (logCollector) {
      log("\nStopping log collector...");
      await logCollector.stop();
      log("  Log collector stopped");
    }
    // Judge results
    log("\nJudging results...");
    let judgments: Judgment[];
--- a/tests/src/executor.ts
+++ b/tests/src/executor.ts
@@ -1,6 +1,7 @@
 import { exec } from 'child_process'
 import { promisify } from 'util'
 import { TestCase, TestResult, StepResult } from './types.js'
 import { LogCollector } from './log-collector.js'
 const execAsync = promisify(exec)
@@ -17,9 +18,12 @@ export class TestExecutor {
  private workingDir: string
  private totalTests: number = 0
  private currentTest: number = 0
  private logCollector: LogCollector | null = null
  private currentTestId: string | null = null
-  constructor(workingDir: string = process.cwd()) {
+  constructor(workingDir: string = process.cwd(), logCollector?: LogCollector) {
    this.workingDir = workingDir
    this.logCollector = logCollector || null
  }
  // Progress output goes to stderr (visible in console)
@@ -34,12 +38,24 @@ export class TestExecutor {
    let exitCode = 0
    let timedOut = false
    // Build environment with TEST_ID for log access
    const env = { ...process.env }
    if (this.currentTestId) {
      env.TEST_ID = this.currentTestId
    }
    // Update log file before each step so test can access current logs
    if (this.logCollector && this.currentTestId) {
      this.logCollector.writeCurrentLogs(this.currentTestId)
    }
    try {
      const result = await execAsync(command, {
        cwd: this.workingDir,
        timeout,
        maxBuffer: 10 * 1024 * 1024, // 10MB buffer
-        shell: '/bin/bash'
+        shell: '/bin/bash',
        env
      })
      stdout = result.stdout
      stderr = result.stderr
@@ -77,8 +93,14 @@ export class TestExecutor {
    const timestamp = new Date().toISOString().substring(11, 19)
    this.currentTest++
    this.currentTestId = testCase.id
    this.progress(`[${timestamp}] [${this.currentTest}/${this.totalTests}] ${testCase.id}: ${testCase.name}`)
    // Mark test start for log collection
    if (this.logCollector) {
      this.logCollector.markTestStart(testCase.id)
    }
    for (let i = 0; i < testCase.steps.length; i++) {
      const step = testCase.steps[i]
      const stepTimestamp = new Date().toISOString().substring(11, 19)
@@ -106,6 +128,12 @@ export class TestExecutor {
    const totalDuration = Date.now() - startTime
    // Mark test end for log collection
    if (this.logCollector) {
      this.logCollector.markTestEnd(testCase.id)
    }
    this.currentTestId = null
    // Combine all logs
    const logs = stepResults.map(r => {
      return `=== Step: ${r.name} ===
--- a/tests/src/log-collector.ts
+++ b/tests/src/log-collector.ts
@@ -0,0 +1,200 @@
 import { spawn, ChildProcess } from "child_process";
 import { writeFileSync } from "fs";
 import path from "path";
 interface LogEntry {
  timestamp: Date;
  line: string;
 }
 interface TestMarker {
  start: number;
  end?: number;
 }
 /**
 * LogCollector runs `docker compose logs --follow` as a background process
 * and captures logs with precise test boundaries.
 *
 * This solves the log overlap problem where `docker compose logs --since=5m`
 * could include logs from previous tests or miss logs if a test exceeds 5 minutes.
 */
 export class LogCollector {
  private process: ChildProcess | null = null;
  private logs: LogEntry[] = [];
  private testMarkers: Map<string, TestMarker> = new Map();
  private workingDir: string;
  private isRunning: boolean = false;
  constructor(workingDir: string) {
    // workingDir should be the project root, docker-compose.yml is in docker/
    this.workingDir = path.join(workingDir, "docker");
  }
  /**
   * Start the log collector background process
   */
  async start(): Promise<void> {
    if (this.isRunning) {
      return;
    }
    return new Promise((resolve, reject) => {
      // Spawn docker compose logs --follow
      this.process = spawn("docker", ["compose", "logs", "--follow", "--timestamps"], {
        cwd: this.workingDir,
        stdio: ["ignore", "pipe", "pipe"],
      });
      this.isRunning = true;
      // Handle stdout (main log output)
      this.process.stdout?.on("data", (data: Buffer) => {
        const lines = data.toString().split("\n").filter((l) => l.trim());
        for (const line of lines) {
          this.logs.push({
            timestamp: new Date(),
            line: line,
          });
        }
      });
      // Handle stderr (docker compose messages)
      this.process.stderr?.on("data", (data: Buffer) => {
        const lines = data.toString().split("\n").filter((l) => l.trim());
        for (const line of lines) {
          this.logs.push({
            timestamp: new Date(),
            line: `[stderr] ${line}`,
          });
        }
      });
      this.process.on("error", (err) => {
        this.isRunning = false;
        reject(err);
      });
      this.process.on("close", (code) => {
        this.isRunning = false;
      });
      // Give it a moment to start
      setTimeout(() => {
        if (this.isRunning) {
          resolve();
        } else {
          reject(new Error("Log collector failed to start"));
        }
      }, 500);
    });
  }
  /**
   * Stop the log collector background process
   */
  async stop(): Promise<void> {
    if (!this.process || !this.isRunning) {
      return;
    }
    return new Promise((resolve) => {
      this.process!.on("close", () => {
        this.isRunning = false;
        resolve();
      });
      this.process!.kill("SIGTERM");
      // Force kill after 5 seconds
      setTimeout(() => {
        if (this.isRunning && this.process) {
          this.process.kill("SIGKILL");
        }
        resolve();
      }, 5000);
    });
  }
  /**
   * Mark the start of a test - records current log index
   */
  markTestStart(testId: string): void {
    this.testMarkers.set(testId, {
      start: this.logs.length,
    });
  }
  /**
   * Mark the end of a test - records current log index and writes logs to file
   */
  markTestEnd(testId: string): void {
    const marker = this.testMarkers.get(testId);
    if (marker) {
      marker.end = this.logs.length;
      // Write logs to temp file for test steps to access
      const logs = this.getLogsForTest(testId);
      const filePath = `/tmp/test-${testId}-logs.txt`;
      writeFileSync(filePath, logs);
    }
  }
  /**
   * Get logs for a specific test (between start and end markers)
   */
  getLogsForTest(testId: string): string {
    const marker = this.testMarkers.get(testId);
    if (!marker) {
      return "";
    }
    const endIndex = marker.end ?? this.logs.length;
    const testLogs = this.logs.slice(marker.start, endIndex);
    return testLogs.map((entry) => entry.line).join("\n");
  }
  /**
   * Get all logs collected so far
   */
  getAllLogs(): string {
    return this.logs.map((entry) => entry.line).join("\n");
  }
  /**
   * Get logs since a specific index
   */
  getLogsSince(startIndex: number): string {
    return this.logs.slice(startIndex).map((entry) => entry.line).join("\n");
  }
  /**
   * Get current log count (useful for tracking)
   */
  getLogCount(): number {
    return this.logs.length;
  }
  /**
   * Check if the collector is running
   */
  isActive(): boolean {
    return this.isRunning;
  }
  /**
   * Write current test logs to file (call during test execution)
   * This allows test steps to access logs accumulated so far
   */
  writeCurrentLogs(testId: string): void {
    const marker = this.testMarkers.get(testId);
    if (!marker) {
      return;
    }
    const testLogs = this.logs.slice(marker.start);
    const filePath = `/tmp/test-${testId}-logs.txt`;
    writeFileSync(filePath, testLogs.map((entry) => entry.line).join("\n"));
  }
 }
--- a/tests/testcases/inference/TC-INFERENCE-001.yml
+++ b/tests/testcases/inference/TC-INFERENCE-001.yml
@@ -27,8 +27,12 @@ steps:
  - name: Verify model loading in logs
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs 2>&1)
      fi
      echo "=== Model Loading Check ==="
@@ -58,8 +62,12 @@ steps:
  - name: Verify llama runner started
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs 2>&1)
      fi
      echo "=== Llama Runner Check ==="
@@ -74,8 +82,12 @@ steps:
  - name: Check for model loading errors
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs 2>&1)
      fi
      echo "=== Model Loading Error Check ==="
@@ -97,8 +109,12 @@ steps:
  - name: Display model memory allocation from logs
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs 2>&1)
      fi
      echo "=== Model Memory Allocation ==="
      echo "$LOGS" | grep -E '(model weights|kv cache|compute graph|total memory).*device=' | tail -8
--- a/tests/testcases/inference/TC-INFERENCE-002.yml
+++ b/tests/testcases/inference/TC-INFERENCE-002.yml
@@ -17,8 +17,12 @@ steps:
  - name: Check for inference errors in logs
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs --since=5m 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs --since=5m 2>&1)
      fi
      echo "=== Inference Error Check ==="
@@ -47,8 +51,12 @@ steps:
  - name: Verify inference request in logs
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs --since=5m 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs --since=5m 2>&1)
      fi
      echo "=== Inference Request Verification ==="
@@ -69,8 +77,12 @@ steps:
  - name: Display recent CUDA activity from logs
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs --since=5m 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs --since=5m 2>&1)
      fi
      echo "=== Recent CUDA Activity ==="
      echo "$LOGS" | grep -iE "(CUDA|cuda|device=CUDA)" | tail -5 || echo "No recent CUDA activity logged"
--- a/tests/testcases/inference/TC-INFERENCE-003.yml
+++ b/tests/testcases/inference/TC-INFERENCE-003.yml
@@ -22,8 +22,12 @@ steps:
  - name: Verify API requests logged successfully
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs --since=5m 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs --since=5m 2>&1)
      fi
      echo "=== API Request Log Verification ==="
@@ -40,8 +44,12 @@ steps:
  - name: Check for API errors in logs
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs --since=5m 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs --since=5m 2>&1)
      fi
      echo "=== API Error Check ==="
@@ -64,8 +72,12 @@ steps:
  - name: Display API response times from logs
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs --since=5m 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs --since=5m 2>&1)
      fi
      echo "=== API Response Times ==="
--- a/tests/testcases/inference/TC-INFERENCE-004.yml
+++ b/tests/testcases/inference/TC-INFERENCE-004.yml
@@ -27,8 +27,12 @@ steps:
  - name: Verify model loaded to GPU
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs --since=5m 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs --since=5m 2>&1)
      fi
      echo "=== Model Loading Check for gemma3:12b ==="
@@ -63,8 +67,12 @@ steps:
  - name: Check for inference errors
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs --since=5m 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs --since=5m 2>&1)
      fi
      echo "=== Inference Error Check ==="
--- a/tests/testcases/inference/TC-INFERENCE-005.yml
+++ b/tests/testcases/inference/TC-INFERENCE-005.yml
@@ -39,8 +39,12 @@ steps:
  - name: Verify model loaded across GPUs
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs --since=10m 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs --since=10m 2>&1)
      fi
      echo "=== Model Loading Check for gemma3:27b ==="
@@ -96,8 +100,12 @@ steps:
  - name: Check for inference errors
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs --since=10m 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs --since=10m 2>&1)
      fi
      echo "=== Inference Error Check ==="
--- a/tests/testcases/runtime/TC-RUNTIME-001.yml
+++ b/tests/testcases/runtime/TC-RUNTIME-001.yml
@@ -22,12 +22,21 @@ steps:
  - name: Capture startup logs
    command: |
-      cd docker && docker compose logs 2>&1 | head -100
+      # Use log collector file if available, fallback to docker compose logs
      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        head -100 /tmp/test-${TEST_ID}-logs.txt
      else
        cd docker && docker compose logs 2>&1 | head -100
      fi
  - name: Check for startup errors in logs
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs 2>&1)
      fi
      # Check for critical errors
      if echo "$LOGS" | grep -qE "(level=ERROR|CUBLAS_STATUS_|CUDA error|cudaMalloc failed)"; then
--- a/tests/testcases/runtime/TC-RUNTIME-002.yml
+++ b/tests/testcases/runtime/TC-RUNTIME-002.yml
@@ -30,8 +30,12 @@ steps:
  - name: Verify GPU detection in Ollama logs
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs 2>&1)
      fi
      echo "=== GPU Detection Check ==="
@@ -60,8 +64,12 @@ steps:
  - name: Check for GPU-related errors in logs
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs 2>&1)
      fi
      echo "=== GPU Error Check ==="
@@ -82,8 +90,12 @@ steps:
  - name: Display GPU memory status from logs
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs 2>&1)
      fi
      echo "=== GPU Memory Status ==="
      echo "$LOGS" | grep -E "gpu memory.*library=CUDA" | tail -4
--- a/tests/testcases/runtime/TC-RUNTIME-003.yml
+++ b/tests/testcases/runtime/TC-RUNTIME-003.yml
@@ -30,8 +30,12 @@ steps:
  - name: Verify server listening in logs
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs 2>&1)
      fi
      echo "=== Server Status Check ==="
@@ -46,8 +50,12 @@ steps:
  - name: Check for runtime errors in logs
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs 2>&1)
      fi
      echo "=== Runtime Error Check ==="
@@ -71,8 +79,12 @@ steps:
  - name: Verify API request handling in logs
    command: |
-      cd docker
+      # Use log collector file if available, fallback to docker compose logs
-      LOGS=$(docker compose logs 2>&1)
+      if [ -f "/tmp/test-${TEST_ID}-logs.txt" ]; then
        LOGS=$(cat /tmp/test-${TEST_ID}-logs.txt)
      else
        LOGS=$(cd docker && docker compose logs 2>&1)
      fi
      echo "=== API Request Logs ==="