Changes to test/config/models.yaml:
Quick profile:
- Use gemma3:4b (was gemma2:2b)
- Single prompt: 'Hello, respond with a brief greeting.'
- Timeout: 60s
- Purpose: Fast smoke test (~5 min)
Full profile:
- REMOVED: gemma2:2b, gemma3:4b (redundant with quick test)
- ONLY gemma3:12b (largest model for single K80)
- Single prompt: 'Hello, respond with a brief greeting.' (same as quick)
- Timeout: 120s (sufficient - loads in ~24s)
- Purpose: Validate Phase 2 memory optimization for large models
Rationale:
- Quick test validates basic functionality with gemma3:4b
- Full test validates single-GPU capability with gemma3:12b
- No need to test multiple sizes if both work
- Consistent prompts make comparison easier
- Tests the critical optimization: 12B model on single K80
Add comprehensive test orchestration framework:
Test Runner (cmd/test-runner/):
- config.go: YAML configuration loading and validation
- server.go: Ollama server lifecycle management (start/stop/health checks)
- monitor.go: Real-time log monitoring with pattern matching
- test.go: Model testing via Ollama API (pull, chat, validation)
- validate.go: Test result validation (GPU usage, response quality, log analysis)
- report.go: Structured reporting (JSON and Markdown formats)
- main.go: CLI interface with run/validate/list commands
Test Configurations (test/config/):
- models.yaml: Full test suite with quick/full/stress profiles
- quick.yaml: Fast smoke test with gemma2:2b
Updated Workflow:
- tesla-k80-tests.yml: Use test-runner instead of shell scripts
- Run quick tests first, then full tests if passing
- Generate structured JSON reports for pass/fail checking
- Upload test results as artifacts
Features:
- Multi-model testing with configurable profiles
- API-based testing (not CLI commands)
- Real-time log monitoring for GPU events and errors
- Automatic validation of GPU loading and response quality
- Structured JSON and Markdown reports
- Graceful server lifecycle management
- Interrupt handling (Ctrl+C cleanup)
Addresses limitations of shell-based testing by providing:
- Better error handling and reporting
- Programmatic test orchestration
- Reusable test framework
- Clear pass/fail criteria
- Detailed test metrics and timing