Commit Graph

13 Commits

Author SHA1 Message Date
Shang Chieh Tseng
ce2882b757 Fix runtime test log checks that require model loading
- Remove CUDA initialization checks from TC-RUNTIME-002 (ggml_cuda_init,
  load_backend only appear when a model is loaded, not at startup)
- Fix bash integer comparison error in TC-RUNTIME-003

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-17 00:00:24 +08:00
Shang Chieh Tseng
1a185f7926 Add comprehensive Ollama log checking and configurable LLM judge mode
Test case enhancements:
- TC-RUNTIME-001: Add startup log error checking (CUDA, CUBLAS, CPU fallback)
- TC-RUNTIME-002: Add GPU detection verification, CUDA init checks, error detection
- TC-RUNTIME-003: Add server listening verification, runtime error checks
- TC-INFERENCE-001: Add model loading logs, layer offload verification
- TC-INFERENCE-002: Add inference error checking (CUBLAS/CUDA errors)
- TC-INFERENCE-003: Add API request log verification, response time display

Workflow enhancements:
- Add judge_mode input (simple/llm/dual) to all workflows
- Add judge_model input to specify LLM model for judging
- Configurable via GitHub Actions UI without code changes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 23:27:57 +08:00
Shang Chieh Tseng
143e6fa8e4 Improve UVM device check messaging in TC-RUNTIME-002
- Rename step to "Verify UVM device files" for clarity
- Add "WARNING:" prefix when UVM device is missing
- Add "SUCCESS:" prefix when device is present
- Add confirmation message after UVM fix is applied
- Separate ls command for cleaner output

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 22:57:37 +08:00
Shang Chieh Tseng
c2f4f378cc Add dual-judge mode to test runner
New options:
- --dual-judge: Run both simple and LLM judge, fail if either fails
- --judge-url: Separate LLM Judge server URL (default: localhost:11435)
- --judge-model: Model for LLM judging (default: gemma3:4b)

Dual judge logic:
- Simple judge checks exit codes
- LLM judge analyzes logs semantically
- Final result: FAIL if either judge says FAIL
- Combines reasons from both judges on failure

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 22:58:28 +08:00
Shang Chieh Tseng
f59834c531 Improve test runner logging
- Strip ANSI escape codes from stdout/stderr to reduce log size
  (spinner animations were ~95% of inference log size)
- Add [TIMEOUT] indicator when commands are killed due to timeout
  for clearer failure diagnosis

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 21:45:33 +08:00
Shang Chieh Tseng
ebcca9f483 Add model warmup step to TC-INFERENCE-001
Tesla K80 needs ~60-180s to load model into VRAM on first inference.
Add warmup step with 5-minute timeout to preload model before
subsequent inference tests run.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 21:38:09 +08:00
Shang Chieh Tseng
3f3f68f08d Remove TC-INFERENCE-004: CUBLAS Fallback Verification
Redundant test - if TC-INFERENCE-002 (Basic Inference) passes,
CUBLAS fallback is already working. Any errors would cause
inference to fail, making a separate error-check test unnecessary.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 20:38:35 +08:00
Shang Chieh Tseng
8d65fd4211 Update TC-RUNTIME-002 to handle UVM device workaround
- Add step to check/create /dev/nvidia-uvm device files
- Use nvidia-modprobe -u -c=0 if UVM devices missing
- Restart container after creating UVM devices
- Update criteria to clarify GPU detection requirements
- Increase timeout to 120s for container restart

Fixes issue where nvidia-smi works but Ollama only detects CPU.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 19:58:16 +08:00
Shang Chieh Tseng
23c92954d7 Fix Unicode encoding for CI compatibility
Replace Unicode characters with ASCII equivalents:
- Line separators: '─' -> '-'
- Pass indicator: '✓' -> '[PASS]'
- Fail indicator: '✗' -> '[FAIL]'

GitHub Actions terminal has encoding issues with UTF-8 chars.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 19:06:07 +08:00
Shang Chieh Tseng
52ccb96a01 Reduce TC-BUILD-002 timeout to 30 minutes
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 19:02:27 +08:00
Shang Chieh Tseng
03da57629e Increase TC-BUILD-002 timeout to 60 minutes and improve logging
- Timeout: 900s -> 3600s (60 min) for runtime image build
- Add tee to capture full build log to /tmp/build-runtime.log
- Add step to show last 200 lines of build log for debugging
- Helps diagnose build failures with proper log capture

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 18:31:28 +08:00
Shang Chieh Tseng
54248f42b0 Improve CI test transparency with dual-stream output
- Separate progress output (stderr) from JSON results (stdout)
- Add timestamps, test counters, and step progress to executor
- Update CLI to use stderr for progress messages
- Update workflow to capture JSON to file while showing progress
- Add --silent flag to suppress npm banner noise

This allows real-time visibility into test execution during CI runs
while preserving clean JSON output for artifact collection.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 17:50:32 +08:00
Shang Chieh Tseng
d11140c016 Add GitHub Actions CI/CD pipeline and test framework
- Add .github/workflows/build-test.yml for automated testing
- Add tests/ directory with TypeScript test runner
- Add docs/CICD.md documentation
- Remove .gitlab-ci.yml (migrated to GitHub Actions)
- Update .gitignore for test artifacts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 14:06:44 +08:00