ollama37

mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-10 07:46:59 +00:00

Author	SHA1	Message	Date
Shang Chieh Tseng	40b956b23c	Fix false positive CPU backend error in test configuration The test configuration was treating 'CPU backend' as a failure pattern, but this is incorrect. Loading the CPU backend library is normal - ollama loads both CUDA and CPU backends for fallback operations. The log line 'load_backend: loaded CPU backend from libggml-cpu-.so' is a success message, not an error. Changed failure patterns from: - 'CPU backend' (too broad, matches normal loading) - 'failed to load.CUDA' (too specific) To more accurate patterns: - 'failed to load.backend' (matches actual load failures) - 'backend.failed' (matches failure messages) This prevents false positives while still catching real backend failures.	2025-10-30 16:00:20 +08:00
Shang Chieh Tseng	f1d4c7f969	Fix test config: don't treat CPU backend loading as failure The failure pattern 'CPU backend' was incorrectly flagging the normal log message 'load_backend: loaded CPU backend from...' as an error. This is expected behavior - both CUDA and CPU backends are loaded, but GPU is actually used for computation (as shown by 'offloaded 35/35 layers to GPU'). Changed failure patterns to detect actual GPU failures: - Removed: 'CPU backend' (too broad, catches normal backend loading) - Added: 'failed to load.*CUDA' (actual load failures) - Added: 'no GPU detected' (GPU not available) Root cause: monitor.go processes failure patterns first (highest priority), so the 'CPU backend' pattern was creating EventError events before success patterns could be checked, causing tests to fail despite GPU working.	2025-10-30 15:39:17 +08:00
Shang Chieh Tseng	6c3876a30d	Add multi-GPU test workflow and rename single-GPU workflow - Rename tesla-k80-tests.yml to tesla-k80-single-gpu-tests.yml for clarity - Add new tesla-k80-multi-gpu-tests.yml workflow for large models - Add multi-gpu profile to test/config/models.yaml with gemma3:27b and gpt-oss:20b - Multi-GPU workflow includes GPU count verification and weekly schedule - Profile-specific validation allows multi-GPU splits for large models - Separate workflows optimize CI efficiency: quick tests vs. thorough tests	2025-10-30 12:04:50 +08:00
Shang Chieh Tseng	1aa80e9411	Simplify test profiles to focus on Tesla K80 capabilities Changes to test/config/models.yaml: Quick profile: - Use gemma3:4b (was gemma2:2b) - Single prompt: 'Hello, respond with a brief greeting.' - Timeout: 60s - Purpose: Fast smoke test (~5 min) Full profile: - REMOVED: gemma2:2b, gemma3:4b (redundant with quick test) - ONLY gemma3:12b (largest model for single K80) - Single prompt: 'Hello, respond with a brief greeting.' (same as quick) - Timeout: 120s (sufficient - loads in ~24s) - Purpose: Validate Phase 2 memory optimization for large models Rationale: - Quick test validates basic functionality with gemma3:4b - Full test validates single-GPU capability with gemma3:12b - No need to test multiple sizes if both work - Consistent prompts make comparison easier - Tests the critical optimization: 12B model on single K80	2025-10-30 11:57:30 +08:00
Shang Chieh Tseng	4de7dd453b	Add Claude AI-powered response validation and update test model Changes: 1. Update quick test to use gemma3:4b (was gemma2:2b) - Increased timeout to 60s for larger model 2. Implement Claude headless validation (validate.go) - Hybrid approach: simple checks first, then Claude validation ALWAYS runs - Claude validates response quality, coherence, relevance - Detects gibberish, errors, and malformed responses - Falls back to simple validation if Claude CLI unavailable - Verbose logging shows Claude validation results 3. Validation flow: - Step 1: Fast checks (empty response, token count) - Step 2: Claude AI analysis (runs regardless of simple check) - Claude result overrides simple checks - If Claude unavailable, uses simple validation only 4. Workflow improvements: - Remove useless GPU memory check step (server already stopped) - Cleaner workflow output Benefits: - Intelligent response quality validation - Catches subtle issues (gibberish, off-topic responses) - Better than hardcoded pattern matching - Graceful degradation when Claude unavailable	2025-10-30 11:42:10 +08:00
Shang Chieh Tseng	d59284d30a	Implement Go-based test runner framework for Tesla K80 testing Add comprehensive test orchestration framework: Test Runner (cmd/test-runner/): - config.go: YAML configuration loading and validation - server.go: Ollama server lifecycle management (start/stop/health checks) - monitor.go: Real-time log monitoring with pattern matching - test.go: Model testing via Ollama API (pull, chat, validation) - validate.go: Test result validation (GPU usage, response quality, log analysis) - report.go: Structured reporting (JSON and Markdown formats) - main.go: CLI interface with run/validate/list commands Test Configurations (test/config/): - models.yaml: Full test suite with quick/full/stress profiles - quick.yaml: Fast smoke test with gemma2:2b Updated Workflow: - tesla-k80-tests.yml: Use test-runner instead of shell scripts - Run quick tests first, then full tests if passing - Generate structured JSON reports for pass/fail checking - Upload test results as artifacts Features: - Multi-model testing with configurable profiles - API-based testing (not CLI commands) - Real-time log monitoring for GPU events and errors - Automatic validation of GPU loading and response quality - Structured JSON and Markdown reports - Graceful server lifecycle management - Interrupt handling (Ctrl+C cleanup) Addresses limitations of shell-based testing by providing: - Better error handling and reporting - Programmatic test orchestration - Reusable test framework - Clear pass/fail criteria - Detailed test metrics and timing	2025-10-30 11:04:48 +08:00

6 Commits