Commit Graph

12 Commits

Author SHA1 Message Date
Shang Chieh Tseng
2c5094db92 Add LogCollector for precise test log boundaries
Problem: Tests used `docker compose logs --since=5m` which caused:
- Log overlap between tests
- Logs from previous tests included
- Missing logs if test exceeded 5 minutes

Solution:
- New LogCollector class runs `docker compose logs --follow`
- Marks test start/end boundaries
- Writes test-specific logs to /tmp/test-{testId}-logs.txt
- Test steps access via TEST_ID environment variable

Changes:
- tests/src/log-collector.ts: New LogCollector class
- tests/src/executor.ts: Integrate LogCollector, set TEST_ID env
- tests/src/cli.ts: Start/stop LogCollector for runtime/inference
- All test cases: Use log collector with fallback to docker compose

Also updated docs/CICD.md with:
- Test runner CLI documentation
- Judge modes (simple, llm, dual)
- Log collector integration
- Updated test case list (12b, 27b models)
- Model unload strategy
- Troubleshooting guide

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-17 17:46:49 +08:00
Shang Chieh Tseng
82ab6cc96e Refactor model unload: each test cleans up its own model
- TC-INFERENCE-003: Add unload step for gemma3:4b at end
- TC-INFERENCE-004: Remove redundant 4b unload at start
- TC-INFERENCE-005: Remove redundant 12b unload at start

Each model size test now handles its own VRAM cleanup.
Workflow-level unload remains as safety fallback for failures.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-17 17:20:44 +08:00
Shang Chieh Tseng
806232d95f Add multi-model inference tests for gemma3 12b and 27b
- TC-INFERENCE-004: gemma3:12b single GPU test
- TC-INFERENCE-005: gemma3:27b dual-GPU test (K80 layer split)
- Each test unloads previous model before loading next
- Workflows unload all 3 model sizes after inference suite
- 27b test verifies both GPUs have memory allocated
2025-12-17 17:01:25 +08:00
Shang Chieh Tseng
ce2882b757 Fix runtime test log checks that require model loading
- Remove CUDA initialization checks from TC-RUNTIME-002 (ggml_cuda_init,
  load_backend only appear when a model is loaded, not at startup)
- Fix bash integer comparison error in TC-RUNTIME-003

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-17 00:00:24 +08:00
Shang Chieh Tseng
1a185f7926 Add comprehensive Ollama log checking and configurable LLM judge mode
Test case enhancements:
- TC-RUNTIME-001: Add startup log error checking (CUDA, CUBLAS, CPU fallback)
- TC-RUNTIME-002: Add GPU detection verification, CUDA init checks, error detection
- TC-RUNTIME-003: Add server listening verification, runtime error checks
- TC-INFERENCE-001: Add model loading logs, layer offload verification
- TC-INFERENCE-002: Add inference error checking (CUBLAS/CUDA errors)
- TC-INFERENCE-003: Add API request log verification, response time display

Workflow enhancements:
- Add judge_mode input (simple/llm/dual) to all workflows
- Add judge_model input to specify LLM model for judging
- Configurable via GitHub Actions UI without code changes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 23:27:57 +08:00
Shang Chieh Tseng
143e6fa8e4 Improve UVM device check messaging in TC-RUNTIME-002
- Rename step to "Verify UVM device files" for clarity
- Add "WARNING:" prefix when UVM device is missing
- Add "SUCCESS:" prefix when device is present
- Add confirmation message after UVM fix is applied
- Separate ls command for cleaner output

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 22:57:37 +08:00
Shang Chieh Tseng
ebcca9f483 Add model warmup step to TC-INFERENCE-001
Tesla K80 needs ~60-180s to load model into VRAM on first inference.
Add warmup step with 5-minute timeout to preload model before
subsequent inference tests run.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 21:38:09 +08:00
Shang Chieh Tseng
3f3f68f08d Remove TC-INFERENCE-004: CUBLAS Fallback Verification
Redundant test - if TC-INFERENCE-002 (Basic Inference) passes,
CUBLAS fallback is already working. Any errors would cause
inference to fail, making a separate error-check test unnecessary.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 20:38:35 +08:00
Shang Chieh Tseng
8d65fd4211 Update TC-RUNTIME-002 to handle UVM device workaround
- Add step to check/create /dev/nvidia-uvm device files
- Use nvidia-modprobe -u -c=0 if UVM devices missing
- Restart container after creating UVM devices
- Update criteria to clarify GPU detection requirements
- Increase timeout to 120s for container restart

Fixes issue where nvidia-smi works but Ollama only detects CPU.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 19:58:16 +08:00
Shang Chieh Tseng
52ccb96a01 Reduce TC-BUILD-002 timeout to 30 minutes
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 19:02:27 +08:00
Shang Chieh Tseng
03da57629e Increase TC-BUILD-002 timeout to 60 minutes and improve logging
- Timeout: 900s -> 3600s (60 min) for runtime image build
- Add tee to capture full build log to /tmp/build-runtime.log
- Add step to show last 200 lines of build log for debugging
- Helps diagnose build failures with proper log capture

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 18:31:28 +08:00
Shang Chieh Tseng
d11140c016 Add GitHub Actions CI/CD pipeline and test framework
- Add .github/workflows/build-test.yml for automated testing
- Add tests/ directory with TypeScript test runner
- Add docs/CICD.md documentation
- Remove .gitlab-ci.yml (migrated to GitHub Actions)
- Update .gitignore for test artifacts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 14:06:44 +08:00