Commit Graph

4 Commits

Author SHA1 Message Date
Shang Chieh Tseng
7bb050f146 Change workflow defaults: judge_mode=dual, judge_model=gemma3:12b 2025-12-17 16:43:38 +08:00
Shang Chieh Tseng
1a185f7926 Add comprehensive Ollama log checking and configurable LLM judge mode
Test case enhancements:
- TC-RUNTIME-001: Add startup log error checking (CUDA, CUBLAS, CPU fallback)
- TC-RUNTIME-002: Add GPU detection verification, CUDA init checks, error detection
- TC-RUNTIME-003: Add server listening verification, runtime error checks
- TC-INFERENCE-001: Add model loading logs, layer offload verification
- TC-INFERENCE-002: Add inference error checking (CUBLAS/CUDA errors)
- TC-INFERENCE-003: Add API request log verification, response time display

Workflow enhancements:
- Add judge_mode input (simple/llm/dual) to all workflows
- Add judge_model input to specify LLM model for judging
- Configurable via GitHub Actions UI without code changes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 23:27:57 +08:00
Shang Chieh Tseng
0e66cc6f93 Fix workflows to fail on test failures
The '|| true' was swallowing test runner exit codes, causing workflows
to pass even when tests failed. Added separate 'Check test results'
step that reads JSON summary and fails workflow if any tests failed.

Affected workflows:
- build.yml
- runtime.yml
- inference.yml
- full-pipeline.yml

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 21:48:40 +08:00
Shang Chieh Tseng
fb01b8b1ca Split monolithic workflow into modular components
Separate workflows for flexibility:
- build.yml: Build verification (standalone + reusable)
- runtime.yml: Container & runtime tests with container lifecycle
- inference.yml: Inference tests with optional container management
- full-pipeline.yml: Orchestrates all stages with LLM judge

Each workflow can be triggered independently for targeted testing,
or run the full pipeline for complete validation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 17:57:11 +08:00