ollama37

mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-11 08:17:03 +00:00

Author	SHA1	Message	Date
Shang Chieh Tseng	40b956b23c	Fix false positive CPU backend error in test configuration The test configuration was treating 'CPU backend' as a failure pattern, but this is incorrect. Loading the CPU backend library is normal - ollama loads both CUDA and CPU backends for fallback operations. The log line 'load_backend: loaded CPU backend from libggml-cpu-.so' is a success message, not an error. Changed failure patterns from: - 'CPU backend' (too broad, matches normal loading) - 'failed to load.CUDA' (too specific) To more accurate patterns: - 'failed to load.backend' (matches actual load failures) - 'backend.failed' (matches failure messages) This prevents false positives while still catching real backend failures.	2025-10-30 16:00:20 +08:00
Shang Chieh Tseng	f1d4c7f969	Fix test config: don't treat CPU backend loading as failure The failure pattern 'CPU backend' was incorrectly flagging the normal log message 'load_backend: loaded CPU backend from...' as an error. This is expected behavior - both CUDA and CPU backends are loaded, but GPU is actually used for computation (as shown by 'offloaded 35/35 layers to GPU'). Changed failure patterns to detect actual GPU failures: - Removed: 'CPU backend' (too broad, catches normal backend loading) - Added: 'failed to load.*CUDA' (actual load failures) - Added: 'no GPU detected' (GPU not available) Root cause: monitor.go processes failure patterns first (highest priority), so the 'CPU backend' pattern was creating EventError events before success patterns could be checked, causing tests to fail despite GPU working.	2025-10-30 15:39:17 +08:00
Shang Chieh Tseng	4de7dd453b	Add Claude AI-powered response validation and update test model Changes: 1. Update quick test to use gemma3:4b (was gemma2:2b) - Increased timeout to 60s for larger model 2. Implement Claude headless validation (validate.go) - Hybrid approach: simple checks first, then Claude validation ALWAYS runs - Claude validates response quality, coherence, relevance - Detects gibberish, errors, and malformed responses - Falls back to simple validation if Claude CLI unavailable - Verbose logging shows Claude validation results 3. Validation flow: - Step 1: Fast checks (empty response, token count) - Step 2: Claude AI analysis (runs regardless of simple check) - Claude result overrides simple checks - If Claude unavailable, uses simple validation only 4. Workflow improvements: - Remove useless GPU memory check step (server already stopped) - Cleaner workflow output Benefits: - Intelligent response quality validation - Catches subtle issues (gibberish, off-topic responses) - Better than hardcoded pattern matching - Graceful degradation when Claude unavailable	2025-10-30 11:42:10 +08:00
Shang Chieh Tseng	d59284d30a	Implement Go-based test runner framework for Tesla K80 testing Add comprehensive test orchestration framework: Test Runner (cmd/test-runner/): - config.go: YAML configuration loading and validation - server.go: Ollama server lifecycle management (start/stop/health checks) - monitor.go: Real-time log monitoring with pattern matching - test.go: Model testing via Ollama API (pull, chat, validation) - validate.go: Test result validation (GPU usage, response quality, log analysis) - report.go: Structured reporting (JSON and Markdown formats) - main.go: CLI interface with run/validate/list commands Test Configurations (test/config/): - models.yaml: Full test suite with quick/full/stress profiles - quick.yaml: Fast smoke test with gemma2:2b Updated Workflow: - tesla-k80-tests.yml: Use test-runner instead of shell scripts - Run quick tests first, then full tests if passing - Generate structured JSON reports for pass/fail checking - Upload test results as artifacts Features: - Multi-model testing with configurable profiles - API-based testing (not CLI commands) - Real-time log monitoring for GPU events and errors - Automatic validation of GPU loading and response quality - Structured JSON and Markdown reports - Graceful server lifecycle management - Interrupt handling (Ctrl+C cleanup) Addresses limitations of shell-based testing by providing: - Better error handling and reporting - Programmatic test orchestration - Reusable test framework - Clear pass/fail criteria - Detailed test metrics and timing	2025-10-30 11:04:48 +08:00

4 Commits