Implement Go-based test runner framework for Tesla K80 testing

Add comprehensive test orchestration framework: Test Runner (cmd/test-runner/): - config.go: YAML configuration loading and validation - server.go: Ollama server lifecycle management (start/stop/health checks) - monitor.go: Real-time log monitoring with pattern matching - test.go: Model testing via Ollama API (pull, chat, validation) - validate.go: Test result validation (GPU usage, response quality, log analysis) - report.go: Structured reporting (JSON and Markdown formats) - main.go: CLI interface with run/validate/list commands Test Configurations (test/config/): - models.yaml: Full test suite with quick/full/stress profiles - quick.yaml: Fast smoke test with gemma2:2b Updated Workflow: - tesla-k80-tests.yml: Use test-runner instead of shell scripts - Run quick tests first, then full tests if passing - Generate structured JSON reports for pass/fail checking - Upload test results as artifacts Features: - Multi-model testing with configurable profiles - API-based testing (not CLI commands) - Real-time log monitoring for GPU events and errors - Automatic validation of GPU loading and response quality - Structured JSON and Markdown reports - Graceful server lifecycle management - Interrupt handling (Ctrl+C cleanup) Addresses limitations of shell-based testing by providing: - Better error handling and reporting - Programmatic test orchestration - Reusable test framework - Clear pass/fail criteria - Detailed test metrics and timing
2025-12-10 07:46:59 +00:00 · 2025-10-30 11:04:48 +08:00
parent aaaf334e7f
commit d59284d30a
10 changed files with 1631 additions and 113 deletions
--- a/.github/workflows/tesla-k80-tests.yml
+++ b/.github/workflows/tesla-k80-tests.yml
@@ -20,132 +20,74 @@ jobs:
            exit 1
          fi
          ls -lh ollama
-          file ollama
-          ./ollama --version

-      - name: Run Go unit tests
+      - name: Build test-runner
        run: |
-          go test -v -race -timeout 10m ./...
-        continue-on-error: false
+          cd cmd/test-runner
+          go mod init github.com/ollama/ollama/cmd/test-runner || true
+          go mod tidy
+          go build -o ../../test-runner .
+          cd ../..
+          ls -lh test-runner

-      - name: Start ollama server (background)
+      - name: Validate test configuration
        run: |
-          ./ollama serve > ollama.log 2>&1 &
-          echo $! > ollama.pid
-          echo "Ollama server started with PID $(cat ollama.pid)"
+          ./test-runner validate --config test/config/quick.yaml

-      - name: Wait for server to be ready
+      - name: Run quick tests
        run: |
-          for i in {1..30}; do
-            if curl -s http://localhost:11434/api/tags > /dev/null 2>&1; then
-              echo "Server is ready!"
-              exit 0
-            fi
-            echo "Waiting for server... attempt $i/30"
-            sleep 2
-          done
-          echo "Server failed to start"
-          cat ollama.log
-          exit 1
+          ./test-runner run --profile quick --config test/config/quick.yaml --output test-report-quick --verbose
+        timeout-minutes: 10

-      - name: Run integration tests
+      - name: Check quick test results
        run: |
-          go test -v -timeout 20m ./integration/...
-        continue-on-error: false
-
-      - name: Clear server logs for model test
-        run: |
-          # Truncate log file to start fresh for model loading test
-          > ollama.log
-
-      - name: Pull gemma2:2b model
-        run: |
-          echo "Pulling gemma2:2b model..."
-          ./ollama pull gemma2:2b
-          echo "Model pull completed"
-        timeout-minutes: 15
-
-      - name: Run inference with gemma2:2b
-        run: |
-          echo "Running inference test..."
-          ./ollama run gemma2:2b "Hello, this is a test. Please respond with a short greeting." --verbose
-          echo "Inference completed"
-        timeout-minutes: 5
-
-      - name: Wait for logs to flush
-        run: sleep 3
-
-      - name: Analyze server logs with Claude
-        run: |
-          echo "Analyzing ollama server logs for proper model loading..."
-
-          # Create analysis prompt
-          cat > log_analysis_prompt.txt << 'EOF'
-          Analyze the following Ollama server logs from a Tesla K80 (CUDA Compute Capability 3.7) system.
-
-          Verify that:
-          1. The model loaded successfully without errors
-          2. CUDA/GPU acceleration was properly detected and initialized
-          3. The model is using the Tesla K80 GPU (not CPU fallback)
-          4. There are no CUDA compatibility warnings or errors
-          5. Memory allocation was successful
-          6. Inference completed without errors
-
-          Respond with:
-          - "PASS" if all checks pass and model loaded properly with GPU acceleration
-          - "FAIL: <reason>" if there are critical issues
-          - "WARN: <reason>" if there are warnings but model works
-
-          Be specific about what succeeded or failed. Look for CUDA errors, memory issues, or CPU fallback warnings.
-
-          Server logs:
-          ---
-          EOF
-
-          cat ollama.log >> log_analysis_prompt.txt
-
-          # Run Claude in headless mode to analyze
-          claude -p log_analysis_prompt.txt > log_analysis_result.txt
-
-          echo "=== Claude Analysis Result ==="
-          cat log_analysis_result.txt
-
-          # Check if analysis passed
-          if grep -q "^PASS" log_analysis_result.txt; then
-            echo "✓ Log analysis PASSED - Model loaded correctly on Tesla K80"
-            exit 0
-          elif grep -q "^WARN" log_analysis_result.txt; then
-            echo "⚠ Log analysis has WARNINGS - Review needed"
-            cat log_analysis_result.txt
-            exit 0  # Don't fail on warnings, but they're visible
-          else
-            echo "✗ Log analysis FAILED - Model loading issues detected"
-            cat log_analysis_result.txt
+          if ! jq -e '.summary.failed == 0' test-report-quick.json; then
+            echo "Quick tests failed!"
+            jq '.results[] | select(.status == "FAILED")' test-report-quick.json
            exit 1
          fi
+          echo "Quick tests passed!"
+
+      - name: Upload quick test results
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: quick-test-results
+          path: |
+            test-report-quick.json
+            test-report-quick.md
+            ollama.log
+          retention-days: 7
+
+      - name: Run full tests (if quick tests passed)
+        if: success()
+        run: |
+          ./test-runner run --profile full --config test/config/models.yaml --output test-report-full --verbose
+        timeout-minutes: 45
+
+      - name: Check full test results
+        if: success()
+        run: |
+          if ! jq -e '.summary.failed == 0' test-report-full.json; then
+            echo "Full tests failed!"
+            jq '.results[] | select(.status == "FAILED")' test-report-full.json
+            exit 1
+          fi
+          echo "All tests passed!"
+
+      - name: Upload full test results
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: full-test-results
+          path: |
+            test-report-full.json
+            test-report-full.md
+            ollama.log
+          retention-days: 14

      - name: Check GPU memory usage
        if: always()
        run: |
          echo "=== GPU Memory Status ==="
          nvidia-smi --query-gpu=memory.used,memory.total --format=csv
-
-      - name: Stop ollama server
-        if: always()
-        run: |
-          if [ -f ollama.pid ]; then
-            kill $(cat ollama.pid) || true
-            rm ollama.pid
-          fi
-          pkill -f "ollama serve" || true
-
-      - name: Upload logs and analysis
-        if: always()
-        uses: actions/upload-artifact@v4
-        with:
-          name: ollama-test-logs-and-analysis
-          path: |
-            ollama.log
-            log_analysis_prompt.txt
-            log_analysis_result.txt
-          retention-days: 7