- TC-INFERENCE-004: gemma3:12b single GPU test
- TC-INFERENCE-005: gemma3:27b dual-GPU test (K80 layer split)
- Each test unloads previous model before loading next
- Workflows unload all 3 model sizes after inference suite
- 27b test verifies both GPUs have memory allocated
- Add unloadModel() method to LLMJudge class
- CLI calls unloadModel() after judging completes
- Workflows unload gemma3:4b after inference tests
- Uses Ollama API with keep_alive:0 to trigger unload
The '|| true' was swallowing test runner exit codes, causing workflows
to pass even when tests failed. Added separate 'Check test results'
step that reads JSON summary and fails workflow if any tests failed.
Affected workflows:
- build.yml
- runtime.yml
- inference.yml
- full-pipeline.yml
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Separate workflows for flexibility:
- build.yml: Build verification (standalone + reusable)
- runtime.yml: Container & runtime tests with container lifecycle
- inference.yml: Inference tests with optional container management
- full-pipeline.yml: Orchestrates all stages with LLM judge
Each workflow can be triggered independently for targeted testing,
or run the full pipeline for complete validation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>