Files
ollama37/.github/workflows/tesla-k80-tests.yml
Shang Chieh Tseng 4de7dd453b Add Claude AI-powered response validation and update test model
Changes:
1. Update quick test to use gemma3:4b (was gemma2:2b)
   - Increased timeout to 60s for larger model

2. Implement Claude headless validation (validate.go)
   - Hybrid approach: simple checks first, then Claude validation ALWAYS runs
   - Claude validates response quality, coherence, relevance
   - Detects gibberish, errors, and malformed responses
   - Falls back to simple validation if Claude CLI unavailable
   - Verbose logging shows Claude validation results

3. Validation flow:
   - Step 1: Fast checks (empty response, token count)
   - Step 2: Claude AI analysis (runs regardless of simple check)
   - Claude result overrides simple checks
   - If Claude unavailable, uses simple validation only

4. Workflow improvements:
   - Remove useless GPU memory check step (server already stopped)
   - Cleaner workflow output

Benefits:
- Intelligent response quality validation
- Catches subtle issues (gibberish, off-topic responses)
- Better than hardcoded pattern matching
- Graceful degradation when Claude unavailable
2025-10-30 11:42:10 +08:00

88 lines
2.4 KiB
YAML

name: Tesla K80 Tests
on:
workflow_dispatch: # Manual trigger only
jobs:
test:
runs-on: self-hosted
timeout-minutes: 60
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Verify ollama binary exists
run: |
if [ ! -f ./ollama ]; then
echo "Error: ollama binary not found. Please run the build workflow first."
exit 1
fi
ls -lh ollama
- name: Build test-runner
run: |
cd cmd/test-runner
go mod init github.com/ollama/ollama/cmd/test-runner || true
go mod tidy
go build -o ../../test-runner .
cd ../..
ls -lh test-runner
- name: Validate test configuration
run: |
./test-runner validate --config test/config/quick.yaml
- name: Run quick tests
run: |
./test-runner run --profile quick --config test/config/quick.yaml --output test-report-quick --verbose
timeout-minutes: 10
- name: Check quick test results
run: |
if ! jq -e '.summary.failed == 0' test-report-quick.json; then
echo "Quick tests failed!"
jq '.results[] | select(.status == "FAILED")' test-report-quick.json
exit 1
fi
echo "Quick tests passed!"
- name: Upload quick test results
if: always()
uses: actions/upload-artifact@v4
with:
name: quick-test-results
path: |
test-report-quick.json
test-report-quick.md
ollama.log
retention-days: 7
- name: Run full tests (if quick tests passed)
if: success()
run: |
./test-runner run --profile full --config test/config/models.yaml --output test-report-full --verbose
timeout-minutes: 45
- name: Check full test results
if: success()
run: |
if ! jq -e '.summary.failed == 0' test-report-full.json; then
echo "Full tests failed!"
jq '.results[] | select(.status == "FAILED")' test-report-full.json
exit 1
fi
echo "All tests passed!"
- name: Upload full test results
if: always()
uses: actions/upload-artifact@v4
with:
name: full-test-results
path: |
test-report-full.json
test-report-full.md
ollama.log
retention-days: 14