ollama37

mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-20 04:37:00 +00:00

Author	SHA1	Message	Date
Shang Chieh Tseng	6bbdf3e148	Fix test-runner GPU detection by preserving startup events The log monitor was calling Reset() before each model test, which cleared all GPU detection events that occurred during server startup. This caused the validation to fail with 'GPU acceleration not detected' even though GPU was being used successfully. Root cause: GPU detection logs are written during server startup (lines like 'offloaded 35/35 layers to GPU'), but monitor.Reset() was clearing these events before validation could check them. Solution: Comment out the monitor.Reset() call to preserve GPU detection events from server startup. These events are still relevant for validating that the model is using GPU acceleration.	2025-10-30 15:27:40 +08:00
Shang Chieh Tseng	d8ea75a3e2	Fix test-runner to inherit LD_LIBRARY_PATH for CUDA backend loading The test-runner was starting the ollama server subprocess without inheriting environment variables, causing the GGML CUDA backend to fail loading even though LD_LIBRARY_PATH was set in the GitHub Actions workflow. Changes: - Added s.cmd.Env = os.Environ() to inherit all environment variables - This ensures LD_LIBRARY_PATH is passed to the ollama server subprocess - Fixes GPU offloading failure where layers were not being loaded to GPU Root cause analysis from logs: - GPUs were detected: Tesla K80 with 11.1 GiB available - Server scheduled 35 layers for GPU offload - But actual offload was 0/35 layers (all stayed on CPU) - Runner subprocess couldn't find CUDA libraries without LD_LIBRARY_PATH This fix ensures the runner subprocess can dynamically load libggml-cuda.so by inheriting the CUDA library paths from the parent process.	2025-10-30 14:08:24 +08:00
Shang Chieh Tseng	46f1038724	Fix Claude validation response format parsing The Claude AI validator was receiving detailed explanations with markdown formatting (e.g., 'PASS') instead of the expected simple format. Updated the validation prompt to explicitly require responses to start with either 'PASS' or 'FAIL: <reason>' without any additional formatting, explanations, or markdown before the verdict. This fixes the 'Warning: Unexpected Claude response format' error that was causing valid test results to be incorrectly marked as unclear.	2025-10-30 12:34:02 +08:00
Shang Chieh Tseng	c8b7015a2c	Move test-runner temp directory into project - Change temp directory from /tmp/test-runner-claude to .test-runner-temp - Keeps temporary files within project bounds for Claude Code access - Add .test-runner-temp to .gitignore to exclude from version control - Fixes Claude AI validation permission issue	2025-10-30 12:25:25 +08:00
Shang Chieh Tseng	9b487aa5f5	Rename validateConfig function to validateConfigFile to avoid conflict - Function in main.go renamed from validateConfig to validateConfigFile - Resolves redeclaration error with validateConfig in config.go - config.go has validateConfig(*Config) for internal validation - main.go has validateConfigFile(string) for CLI command	2025-10-30 12:16:55 +08:00
Shang Chieh Tseng	a7b3f6eda5	Fix test-runner variable name conflict - Rename validateConfig flag variable to validateConfigPath - Resolves compilation error: validateConfig was both a *string variable and function name - Function call now uses correct variable name	2025-10-30 12:15:12 +08:00
Shang Chieh Tseng	4de7dd453b	Add Claude AI-powered response validation and update test model Changes: 1. Update quick test to use gemma3:4b (was gemma2:2b) - Increased timeout to 60s for larger model 2. Implement Claude headless validation (validate.go) - Hybrid approach: simple checks first, then Claude validation ALWAYS runs - Claude validates response quality, coherence, relevance - Detects gibberish, errors, and malformed responses - Falls back to simple validation if Claude CLI unavailable - Verbose logging shows Claude validation results 3. Validation flow: - Step 1: Fast checks (empty response, token count) - Step 2: Claude AI analysis (runs regardless of simple check) - Claude result overrides simple checks - If Claude unavailable, uses simple validation only 4. Workflow improvements: - Remove useless GPU memory check step (server already stopped) - Cleaner workflow output Benefits: - Intelligent response quality validation - Catches subtle issues (gibberish, off-topic responses) - Better than hardcoded pattern matching - Graceful degradation when Claude unavailable	2025-10-30 11:42:10 +08:00
Shang Chieh Tseng	d59284d30a	Implement Go-based test runner framework for Tesla K80 testing Add comprehensive test orchestration framework: Test Runner (cmd/test-runner/): - config.go: YAML configuration loading and validation - server.go: Ollama server lifecycle management (start/stop/health checks) - monitor.go: Real-time log monitoring with pattern matching - test.go: Model testing via Ollama API (pull, chat, validation) - validate.go: Test result validation (GPU usage, response quality, log analysis) - report.go: Structured reporting (JSON and Markdown formats) - main.go: CLI interface with run/validate/list commands Test Configurations (test/config/): - models.yaml: Full test suite with quick/full/stress profiles - quick.yaml: Fast smoke test with gemma2:2b Updated Workflow: - tesla-k80-tests.yml: Use test-runner instead of shell scripts - Run quick tests first, then full tests if passing - Generate structured JSON reports for pass/fail checking - Upload test results as artifacts Features: - Multi-model testing with configurable profiles - API-based testing (not CLI commands) - Real-time log monitoring for GPU events and errors - Automatic validation of GPU loading and response quality - Structured JSON and Markdown reports - Graceful server lifecycle management - Interrupt handling (Ctrl+C cleanup) Addresses limitations of shell-based testing by providing: - Better error handling and reporting - Programmatic test orchestration - Reusable test framework - Clear pass/fail criteria - Detailed test metrics and timing	2025-10-30 11:04:48 +08:00

8 Commits