Simplify test profiles to focus on Tesla K80 capabilities

Changes to test/config/models.yaml: Quick profile: - Use gemma3:4b (was gemma2:2b) - Single prompt: 'Hello, respond with a brief greeting.' - Timeout: 60s - Purpose: Fast smoke test (~5 min) Full profile: - REMOVED: gemma2:2b, gemma3:4b (redundant with quick test) - ONLY gemma3:12b (largest model for single K80) - Single prompt: 'Hello, respond with a brief greeting.' (same as quick) - Timeout: 120s (sufficient - loads in ~24s) - Purpose: Validate Phase 2 memory optimization for large models Rationale: - Quick test validates basic functionality with gemma3:4b - Full test validates single-GPU capability with gemma3:12b - No need to test multiple sizes if both work - Consistent prompts make comparison easier - Tests the critical optimization: 12B model on single K80
2025-12-10 15:57:04 +00:00 · 2025-10-30 11:57:30 +08:00
parent 4de7dd453b
commit 1aa80e9411
1 changed files with 8 additions and 23 deletions
--- a/test/config/models.yaml
+++ b/test/config/models.yaml
@@ -2,41 +2,26 @@
 # This file defines test profiles with different model sizes and test scenarios

 profiles:
-  # Quick test profile - small models only, fast execution
+  # Quick test profile - fast smoke test with medium model
  quick:
    timeout: 5m
    models:
-      - name: gemma2:2b
+      - name: gemma3:4b
        prompts:
          - "Hello, respond with a brief greeting."
        min_response_tokens: 5
        max_response_tokens: 100
-        timeout: 30s
+        timeout: 60s

-  # Full test profile - comprehensive testing across model sizes
+  # Full test profile - test largest model that fits on single K80
  full:
    timeout: 30m
    models:
-      - name: gemma2:2b
-        prompts:
-          - "Hello, respond with a brief greeting."
-          - "What is 2+2? Answer briefly."
-        min_response_tokens: 5
-        max_response_tokens: 100
-        timeout: 30s
-
-      - name: gemma3:4b
-        prompts:
-          - "Explain photosynthesis in one sentence."
-        min_response_tokens: 10
-        max_response_tokens: 200
-        timeout: 60s
-
      - name: gemma3:12b
        prompts:
-          - "Write a short haiku about GPUs."
-        min_response_tokens: 15
-        max_response_tokens: 150
+          - "Hello, respond with a brief greeting."
+        min_response_tokens: 5
+        max_response_tokens: 100
        timeout: 120s

  # Stress test profile - larger models and longer prompts
@@ -89,4 +74,4 @@ reporting:
    - json
    - markdown
  include_logs: true
-  log_excerpt_lines: 50  # Lines of log to include per failure
+  log_excerpt_lines: 50 # Lines of log to include per failure