Simplify test profiles to focus on Tesla K80 capabilities

Changes to test/config/models.yaml:

Quick profile:
- Use gemma3:4b (was gemma2:2b)
- Single prompt: 'Hello, respond with a brief greeting.'
- Timeout: 60s
- Purpose: Fast smoke test (~5 min)

Full profile:
- REMOVED: gemma2:2b, gemma3:4b (redundant with quick test)
- ONLY gemma3:12b (largest model for single K80)
- Single prompt: 'Hello, respond with a brief greeting.' (same as quick)
- Timeout: 120s (sufficient - loads in ~24s)
- Purpose: Validate Phase 2 memory optimization for large models

Rationale:
- Quick test validates basic functionality with gemma3:4b
- Full test validates single-GPU capability with gemma3:12b
- No need to test multiple sizes if both work
- Consistent prompts make comparison easier
- Tests the critical optimization: 12B model on single K80
This commit is contained in:
Shang Chieh Tseng
2025-10-30 11:57:30 +08:00
parent 4de7dd453b
commit 1aa80e9411

View File

@@ -2,41 +2,26 @@
# This file defines test profiles with different model sizes and test scenarios
profiles:
# Quick test profile - small models only, fast execution
# Quick test profile - fast smoke test with medium model
quick:
timeout: 5m
models:
- name: gemma2:2b
- name: gemma3:4b
prompts:
- "Hello, respond with a brief greeting."
min_response_tokens: 5
max_response_tokens: 100
timeout: 30s
timeout: 60s
# Full test profile - comprehensive testing across model sizes
# Full test profile - test largest model that fits on single K80
full:
timeout: 30m
models:
- name: gemma2:2b
prompts:
- "Hello, respond with a brief greeting."
- "What is 2+2? Answer briefly."
min_response_tokens: 5
max_response_tokens: 100
timeout: 30s
- name: gemma3:4b
prompts:
- "Explain photosynthesis in one sentence."
min_response_tokens: 10
max_response_tokens: 200
timeout: 60s
- name: gemma3:12b
prompts:
- "Write a short haiku about GPUs."
min_response_tokens: 15
max_response_tokens: 150
- "Hello, respond with a brief greeting."
min_response_tokens: 5
max_response_tokens: 100
timeout: 120s
# Stress test profile - larger models and longer prompts
@@ -89,4 +74,4 @@ reporting:
- json
- markdown
include_logs: true
log_excerpt_lines: 50 # Lines of log to include per failure
log_excerpt_lines: 50 # Lines of log to include per failure