The Claude AI validator was receiving detailed explanations with markdown
formatting (e.g., '**PASS**') instead of the expected simple format.
Updated the validation prompt to explicitly require responses to start
with either 'PASS' or 'FAIL: <reason>' without any additional formatting,
explanations, or markdown before the verdict.
This fixes the 'Warning: Unexpected Claude response format' error that
was causing valid test results to be incorrectly marked as unclear.
- Change temp directory from /tmp/test-runner-claude to .test-runner-temp
- Keeps temporary files within project bounds for Claude Code access
- Add .test-runner-temp to .gitignore to exclude from version control
- Fixes Claude AI validation permission issue
- Function in main.go renamed from validateConfig to validateConfigFile
- Resolves redeclaration error with validateConfig in config.go
- config.go has validateConfig(*Config) for internal validation
- main.go has validateConfigFile(string) for CLI command
- Rename validateConfig flag variable to validateConfigPath
- Resolves compilation error: validateConfig was both a *string variable and function name
- Function call now uses correct variable name
Add comprehensive test orchestration framework:
Test Runner (cmd/test-runner/):
- config.go: YAML configuration loading and validation
- server.go: Ollama server lifecycle management (start/stop/health checks)
- monitor.go: Real-time log monitoring with pattern matching
- test.go: Model testing via Ollama API (pull, chat, validation)
- validate.go: Test result validation (GPU usage, response quality, log analysis)
- report.go: Structured reporting (JSON and Markdown formats)
- main.go: CLI interface with run/validate/list commands
Test Configurations (test/config/):
- models.yaml: Full test suite with quick/full/stress profiles
- quick.yaml: Fast smoke test with gemma2:2b
Updated Workflow:
- tesla-k80-tests.yml: Use test-runner instead of shell scripts
- Run quick tests first, then full tests if passing
- Generate structured JSON reports for pass/fail checking
- Upload test results as artifacts
Features:
- Multi-model testing with configurable profiles
- API-based testing (not CLI commands)
- Real-time log monitoring for GPU events and errors
- Automatic validation of GPU loading and response quality
- Structured JSON and Markdown reports
- Graceful server lifecycle management
- Interrupt handling (Ctrl+C cleanup)
Addresses limitations of shell-based testing by providing:
- Better error handling and reporting
- Programmatic test orchestration
- Reusable test framework
- Clear pass/fail criteria
- Detailed test metrics and timing