ollama37

mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-11 08:17:03 +00:00

Author	SHA1	Message	Date
Shang Chieh Tseng	46f1038724	Fix Claude validation response format parsing The Claude AI validator was receiving detailed explanations with markdown formatting (e.g., 'PASS') instead of the expected simple format. Updated the validation prompt to explicitly require responses to start with either 'PASS' or 'FAIL: <reason>' without any additional formatting, explanations, or markdown before the verdict. This fixes the 'Warning: Unexpected Claude response format' error that was causing valid test results to be incorrectly marked as unclear.	2025-10-30 12:34:02 +08:00
Shang Chieh Tseng	c8b7015a2c	Move test-runner temp directory into project - Change temp directory from /tmp/test-runner-claude to .test-runner-temp - Keeps temporary files within project bounds for Claude Code access - Add .test-runner-temp to .gitignore to exclude from version control - Fixes Claude AI validation permission issue	2025-10-30 12:25:25 +08:00
Shang Chieh Tseng	9b487aa5f5	Rename validateConfig function to validateConfigFile to avoid conflict - Function in main.go renamed from validateConfig to validateConfigFile - Resolves redeclaration error with validateConfig in config.go - config.go has validateConfig(*Config) for internal validation - main.go has validateConfigFile(string) for CLI command	2025-10-30 12:16:55 +08:00
Shang Chieh Tseng	a7b3f6eda5	Fix test-runner variable name conflict - Rename validateConfig flag variable to validateConfigPath - Resolves compilation error: validateConfig was both a *string variable and function name - Function call now uses correct variable name	2025-10-30 12:15:12 +08:00
Shang Chieh Tseng	4de7dd453b	Add Claude AI-powered response validation and update test model Changes: 1. Update quick test to use gemma3:4b (was gemma2:2b) - Increased timeout to 60s for larger model 2. Implement Claude headless validation (validate.go) - Hybrid approach: simple checks first, then Claude validation ALWAYS runs - Claude validates response quality, coherence, relevance - Detects gibberish, errors, and malformed responses - Falls back to simple validation if Claude CLI unavailable - Verbose logging shows Claude validation results 3. Validation flow: - Step 1: Fast checks (empty response, token count) - Step 2: Claude AI analysis (runs regardless of simple check) - Claude result overrides simple checks - If Claude unavailable, uses simple validation only 4. Workflow improvements: - Remove useless GPU memory check step (server already stopped) - Cleaner workflow output Benefits: - Intelligent response quality validation - Catches subtle issues (gibberish, off-topic responses) - Better than hardcoded pattern matching - Graceful degradation when Claude unavailable	2025-10-30 11:42:10 +08:00
Shang Chieh Tseng	d59284d30a	Implement Go-based test runner framework for Tesla K80 testing Add comprehensive test orchestration framework: Test Runner (cmd/test-runner/): - config.go: YAML configuration loading and validation - server.go: Ollama server lifecycle management (start/stop/health checks) - monitor.go: Real-time log monitoring with pattern matching - test.go: Model testing via Ollama API (pull, chat, validation) - validate.go: Test result validation (GPU usage, response quality, log analysis) - report.go: Structured reporting (JSON and Markdown formats) - main.go: CLI interface with run/validate/list commands Test Configurations (test/config/): - models.yaml: Full test suite with quick/full/stress profiles - quick.yaml: Fast smoke test with gemma2:2b Updated Workflow: - tesla-k80-tests.yml: Use test-runner instead of shell scripts - Run quick tests first, then full tests if passing - Generate structured JSON reports for pass/fail checking - Upload test results as artifacts Features: - Multi-model testing with configurable profiles - API-based testing (not CLI commands) - Real-time log monitoring for GPU events and errors - Automatic validation of GPU loading and response quality - Structured JSON and Markdown reports - Graceful server lifecycle management - Interrupt handling (Ctrl+C cleanup) Addresses limitations of shell-based testing by providing: - Better error handling and reporting - Programmatic test orchestration - Reusable test framework - Clear pass/fail criteria - Detailed test metrics and timing	2025-10-30 11:04:48 +08:00

6 Commits