Files
ollama37/test/config/quick.yaml
Shang Chieh Tseng d59284d30a Implement Go-based test runner framework for Tesla K80 testing
Add comprehensive test orchestration framework:

Test Runner (cmd/test-runner/):
- config.go: YAML configuration loading and validation
- server.go: Ollama server lifecycle management (start/stop/health checks)
- monitor.go: Real-time log monitoring with pattern matching
- test.go: Model testing via Ollama API (pull, chat, validation)
- validate.go: Test result validation (GPU usage, response quality, log analysis)
- report.go: Structured reporting (JSON and Markdown formats)
- main.go: CLI interface with run/validate/list commands

Test Configurations (test/config/):
- models.yaml: Full test suite with quick/full/stress profiles
- quick.yaml: Fast smoke test with gemma2:2b

Updated Workflow:
- tesla-k80-tests.yml: Use test-runner instead of shell scripts
- Run quick tests first, then full tests if passing
- Generate structured JSON reports for pass/fail checking
- Upload test results as artifacts

Features:
- Multi-model testing with configurable profiles
- API-based testing (not CLI commands)
- Real-time log monitoring for GPU events and errors
- Automatic validation of GPU loading and response quality
- Structured JSON and Markdown reports
- Graceful server lifecycle management
- Interrupt handling (Ctrl+C cleanup)

Addresses limitations of shell-based testing by providing:
- Better error handling and reporting
- Programmatic test orchestration
- Reusable test framework
- Clear pass/fail criteria
- Detailed test metrics and timing
2025-10-30 11:04:48 +08:00

39 lines
738 B
YAML

# Quick test profile - fast smoke test with small model
# Run time: ~1-2 minutes
profiles:
quick:
timeout: 5m
models:
- name: gemma2:2b
prompts:
- "Hello, respond with a brief greeting."
min_response_tokens: 5
max_response_tokens: 100
timeout: 30s
validation:
gpu_required: true
single_gpu_preferred: true
check_patterns:
success:
- "loaded model"
- "offload.*GPU"
failure:
- "CUDA.*error"
- "out of memory"
- "CPU backend"
server:
host: "localhost"
port: 11434
startup_timeout: 30s
health_check_interval: 1s
health_check_endpoint: "/api/tags"
reporting:
formats:
- json
include_logs: true
log_excerpt_lines: 50