New options:
- --dual-judge: Run both simple and LLM judge, fail if either fails
- --judge-url: Separate LLM Judge server URL (default: localhost:11435)
- --judge-model: Model for LLM judging (default: gemma3:4b)
Dual judge logic:
- Simple judge checks exit codes
- LLM judge analyzes logs semantically
- Final result: FAIL if either judge says FAIL
- Combines reasons from both judges on failure
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Strip ANSI escape codes from stdout/stderr to reduce log size
(spinner animations were ~95% of inference log size)
- Add [TIMEOUT] indicator when commands are killed due to timeout
for clearer failure diagnosis
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Separate progress output (stderr) from JSON results (stdout)
- Add timestamps, test counters, and step progress to executor
- Update CLI to use stderr for progress messages
- Update workflow to capture JSON to file while showing progress
- Add --silent flag to suppress npm banner noise
This allows real-time visibility into test execution during CI runs
while preserving clean JSON output for artifact collection.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add .github/workflows/build-test.yml for automated testing
- Add tests/ directory with TypeScript test runner
- Add docs/CICD.md documentation
- Remove .gitlab-ci.yml (migrated to GitHub Actions)
- Update .gitignore for test artifacts
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>