mirror of
https://github.com/dogkeeper886/ollama37.git
synced 2025-12-10 15:57:04 +00:00
Simplify test profiles to focus on Tesla K80 capabilities
Changes to test/config/models.yaml: Quick profile: - Use gemma3:4b (was gemma2:2b) - Single prompt: 'Hello, respond with a brief greeting.' - Timeout: 60s - Purpose: Fast smoke test (~5 min) Full profile: - REMOVED: gemma2:2b, gemma3:4b (redundant with quick test) - ONLY gemma3:12b (largest model for single K80) - Single prompt: 'Hello, respond with a brief greeting.' (same as quick) - Timeout: 120s (sufficient - loads in ~24s) - Purpose: Validate Phase 2 memory optimization for large models Rationale: - Quick test validates basic functionality with gemma3:4b - Full test validates single-GPU capability with gemma3:12b - No need to test multiple sizes if both work - Consistent prompts make comparison easier - Tests the critical optimization: 12B model on single K80
This commit is contained in:
@@ -2,41 +2,26 @@
|
||||
# This file defines test profiles with different model sizes and test scenarios
|
||||
|
||||
profiles:
|
||||
# Quick test profile - small models only, fast execution
|
||||
# Quick test profile - fast smoke test with medium model
|
||||
quick:
|
||||
timeout: 5m
|
||||
models:
|
||||
- name: gemma2:2b
|
||||
- name: gemma3:4b
|
||||
prompts:
|
||||
- "Hello, respond with a brief greeting."
|
||||
min_response_tokens: 5
|
||||
max_response_tokens: 100
|
||||
timeout: 30s
|
||||
timeout: 60s
|
||||
|
||||
# Full test profile - comprehensive testing across model sizes
|
||||
# Full test profile - test largest model that fits on single K80
|
||||
full:
|
||||
timeout: 30m
|
||||
models:
|
||||
- name: gemma2:2b
|
||||
prompts:
|
||||
- "Hello, respond with a brief greeting."
|
||||
- "What is 2+2? Answer briefly."
|
||||
min_response_tokens: 5
|
||||
max_response_tokens: 100
|
||||
timeout: 30s
|
||||
|
||||
- name: gemma3:4b
|
||||
prompts:
|
||||
- "Explain photosynthesis in one sentence."
|
||||
min_response_tokens: 10
|
||||
max_response_tokens: 200
|
||||
timeout: 60s
|
||||
|
||||
- name: gemma3:12b
|
||||
prompts:
|
||||
- "Write a short haiku about GPUs."
|
||||
min_response_tokens: 15
|
||||
max_response_tokens: 150
|
||||
- "Hello, respond with a brief greeting."
|
||||
min_response_tokens: 5
|
||||
max_response_tokens: 100
|
||||
timeout: 120s
|
||||
|
||||
# Stress test profile - larger models and longer prompts
|
||||
@@ -89,4 +74,4 @@ reporting:
|
||||
- json
|
||||
- markdown
|
||||
include_logs: true
|
||||
log_excerpt_lines: 50 # Lines of log to include per failure
|
||||
log_excerpt_lines: 50 # Lines of log to include per failure
|
||||
|
||||
Reference in New Issue
Block a user