Add GitHub Actions CI/CD pipeline and test framework

- Add .github/workflows/build-test.yml for automated testing - Add tests/ directory with TypeScript test runner - Add docs/CICD.md documentation - Remove .gitlab-ci.yml (migrated to GitHub Actions) - Update .gitignore for test artifacts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 19:56:59 +00:00 · 2025-12-15 14:06:44 +08:00
parent 2b5aeaf86b
commit d11140c016
23 changed files with 3014 additions and 50 deletions
--- a/tests/testcases/build/TC-BUILD-001.yml
+++ b/tests/testcases/build/TC-BUILD-001.yml
@@ -0,0 +1,31 @@
+id: TC-BUILD-001
+name: Builder Image Verification
+suite: build
+priority: 1
+timeout: 120000
+
+dependencies: []
+
+steps:
+  - name: Check image exists
+    command: docker images ollama37-builder:latest --format '{{.Repository}}:{{.Tag}}'
+
+  - name: Verify CUDA toolkit
+    command: docker run --rm ollama37-builder:latest nvcc --version
+
+  - name: Verify GCC version
+    command: docker run --rm ollama37-builder:latest gcc --version | head -1
+
+  - name: Verify Go version
+    command: docker run --rm ollama37-builder:latest go version
+
+criteria: |
+  All commands should succeed (exit code 0).
+
+  Expected outputs:
+  - Image exists: should show "ollama37-builder:latest"
+  - CUDA: should show version 11.4 (accept 11.4.x)
+  - GCC: should show version 10 (accept GCC 10.x)
+  - Go: should show version 1.25 or higher
+
+  Accept minor version variations. Focus on major versions being correct.
--- a/tests/testcases/build/TC-BUILD-002.yml
+++ b/tests/testcases/build/TC-BUILD-002.yml
@@ -0,0 +1,27 @@
+id: TC-BUILD-002
+name: Runtime Image Build
+suite: build
+priority: 2
+timeout: 900000
+
+dependencies:
+  - TC-BUILD-001
+
+steps:
+  - name: Build runtime image
+    command: cd docker && make build-runtime-no-cache 2>&1 | tail -50
+    timeout: 900000
+
+  - name: Verify runtime image exists
+    command: docker images ollama37:latest --format '{{.Repository}}:{{.Tag}} {{.Size}}'
+
+criteria: |
+  The runtime Docker image should build successfully from GitHub source.
+
+  Expected:
+  - Build completes without fatal errors
+  - Final output should mention "successfully" or similar completion message
+  - Runtime image "ollama37:latest" should exist after build
+  - Image size should be substantial (>10GB is expected due to CUDA)
+
+  Accept build warnings. Only fail on actual build errors.
--- a/tests/testcases/build/TC-BUILD-003.yml
+++ b/tests/testcases/build/TC-BUILD-003.yml
@@ -0,0 +1,25 @@
+id: TC-BUILD-003
+name: Image Size Validation
+suite: build
+priority: 3
+timeout: 30000
+
+dependencies:
+  - TC-BUILD-002
+
+steps:
+  - name: Check builder image size
+    command: docker images ollama37-builder:latest --format '{{.Size}}'
+
+  - name: Check runtime image size
+    command: docker images ollama37:latest --format '{{.Size}}'
+
+criteria: |
+  Docker images should be within expected size ranges.
+
+  Expected:
+  - Builder image: 10GB to 20GB (contains CUDA, GCC, CMake, Go)
+  - Runtime image: 15GB to 25GB (contains builder + compiled ollama)
+
+  These are large images due to CUDA toolkit and build tools.
+  Accept sizes within reasonable range of expectations.
--- a/tests/testcases/inference/TC-INFERENCE-001.yml
+++ b/tests/testcases/inference/TC-INFERENCE-001.yml
@@ -0,0 +1,30 @@
+id: TC-INFERENCE-001
+name: Model Pull
+suite: inference
+priority: 1
+timeout: 600000
+
+dependencies:
+  - TC-RUNTIME-003
+
+steps:
+  - name: Check if model exists
+    command: docker exec ollama37 ollama list | grep -q "gemma3:4b" && echo "Model exists" || echo "Model not found"
+
+  - name: Pull model if needed
+    command: docker exec ollama37 ollama list | grep -q "gemma3:4b" || docker exec ollama37 ollama pull gemma3:4b
+    timeout: 600000
+
+  - name: Verify model available
+    command: docker exec ollama37 ollama list
+
+criteria: |
+  The gemma3:4b model should be available for inference.
+
+  Expected:
+  - Model is either already present or successfully downloaded
+  - "ollama list" shows gemma3:4b in the output
+  - No download errors
+
+  Accept if model already exists (skip download).
+  Model size is ~3GB, download may take time.
--- a/tests/testcases/inference/TC-INFERENCE-002.yml
+++ b/tests/testcases/inference/TC-INFERENCE-002.yml
@@ -0,0 +1,28 @@
+id: TC-INFERENCE-002
+name: Basic Inference
+suite: inference
+priority: 2
+timeout: 180000
+
+dependencies:
+  - TC-INFERENCE-001
+
+steps:
+  - name: Run simple math question
+    command: docker exec ollama37 ollama run gemma3:4b "What is 2+2? Answer with just the number." 2>&1
+    timeout: 120000
+
+  - name: Check GPU memory usage
+    command: docker exec ollama37 nvidia-smi --query-compute-apps=pid,used_memory --format=csv 2>/dev/null || echo "No GPU processes"
+
+criteria: |
+  Basic inference should work on Tesla K80.
+
+  Expected:
+  - Model responds to the math question
+  - Response should indicate "4" (accept variations: "4", "four", "The answer is 4", etc.)
+  - GPU memory should be allocated during inference
+  - No CUDA errors in output
+
+  This is AI-generated output - accept reasonable variations.
+  Focus on the model producing a coherent response.
--- a/tests/testcases/inference/TC-INFERENCE-003.yml
+++ b/tests/testcases/inference/TC-INFERENCE-003.yml
@@ -0,0 +1,34 @@
+id: TC-INFERENCE-003
+name: API Endpoint Test
+suite: inference
+priority: 3
+timeout: 120000
+
+dependencies:
+  - TC-INFERENCE-001
+
+steps:
+  - name: Test generate endpoint (non-streaming)
+    command: |
+      curl -s http://localhost:11434/api/generate \
+        -d '{"model":"gemma3:4b","prompt":"Say hello in one word","stream":false}' \
+        | head -c 500
+
+  - name: Test generate endpoint (streaming)
+    command: |
+      curl -s http://localhost:11434/api/generate \
+        -d '{"model":"gemma3:4b","prompt":"Count from 1 to 3","stream":true}' \
+        | head -5
+
+criteria: |
+  Ollama REST API should handle inference requests.
+
+  Expected for non-streaming:
+  - Returns JSON with "response" field
+  - Response contains some greeting (hello, hi, etc.)
+
+  Expected for streaming:
+  - Returns multiple JSON lines
+  - Each line contains partial response
+
+  Accept any valid JSON response. Content may vary.
--- a/tests/testcases/inference/TC-INFERENCE-004.yml
+++ b/tests/testcases/inference/TC-INFERENCE-004.yml
@@ -0,0 +1,32 @@
+id: TC-INFERENCE-004
+name: CUBLAS Fallback Verification
+suite: inference
+priority: 4
+timeout: 120000
+
+dependencies:
+  - TC-INFERENCE-002
+
+steps:
+  - name: Check for CUBLAS errors in logs
+    command: cd docker && docker compose logs 2>&1 | grep -i "CUBLAS_STATUS" | grep -v "SUCCESS" | head -10 || echo "No CUBLAS errors"
+
+  - name: Check compute capability detection
+    command: cd docker && docker compose logs 2>&1 | grep -iE "compute|capability|cc.*3" | head -10 || echo "No compute capability logs"
+
+  - name: Verify no GPU errors
+    command: cd docker && docker compose logs 2>&1 | grep -iE "error|fail" | grep -i gpu | head -10 || echo "No GPU errors"
+
+criteria: |
+  CUBLAS should work correctly on Tesla K80 using legacy fallback.
+
+  Expected:
+  - No CUBLAS_STATUS_ARCH_MISMATCH errors
+  - No CUBLAS_STATUS_NOT_SUPPORTED errors
+  - Compute capability 3.7 may be mentioned in debug logs
+  - No fatal GPU-related errors
+
+  The K80 uses legacy CUBLAS functions (cublasSgemmBatched)
+  instead of modern Ex variants. This should work transparently.
+
+  Accept warnings. Only fail on actual CUBLAS errors.
--- a/tests/testcases/runtime/TC-RUNTIME-001.yml
+++ b/tests/testcases/runtime/TC-RUNTIME-001.yml
@@ -0,0 +1,31 @@
+id: TC-RUNTIME-001
+name: Container Startup
+suite: runtime
+priority: 1
+timeout: 120000
+
+dependencies:
+  - TC-BUILD-002
+
+steps:
+  - name: Stop existing container
+    command: cd docker && docker compose down 2>/dev/null || true
+
+  - name: Start container with GPU
+    command: cd docker && docker compose up -d
+
+  - name: Wait for startup
+    command: sleep 15
+
+  - name: Check container status
+    command: cd docker && docker compose ps
+
+criteria: |
+  The ollama37 container should start successfully with GPU access.
+
+  Expected:
+  - Container starts without errors
+  - docker compose ps shows container in "Up" state
+  - No "Exited" or "Restarting" status
+
+  Accept startup warnings. Container should be running.
--- a/tests/testcases/runtime/TC-RUNTIME-002.yml
+++ b/tests/testcases/runtime/TC-RUNTIME-002.yml
@@ -0,0 +1,29 @@
+id: TC-RUNTIME-002
+name: GPU Detection
+suite: runtime
+priority: 2
+timeout: 60000
+
+dependencies:
+  - TC-RUNTIME-001
+
+steps:
+  - name: Check nvidia-smi inside container
+    command: docker exec ollama37 nvidia-smi
+
+  - name: Check CUDA libraries
+    command: docker exec ollama37 ldconfig -p | grep -i cuda | head -5
+
+  - name: Check Ollama GPU detection
+    command: cd docker && docker compose logs 2>&1 | grep -i gpu | head -10
+
+criteria: |
+  Tesla K80 GPU should be detected inside the container.
+
+  Expected:
+  - nvidia-smi shows Tesla K80 GPU(s)
+  - Driver version 470.x (or compatible)
+  - CUDA libraries are available (libcuda, libcublas, etc.)
+  - Ollama logs mention GPU detection
+
+  The K80 has 12GB VRAM per GPU. Accept variations in reported memory.
--- a/tests/testcases/runtime/TC-RUNTIME-003.yml
+++ b/tests/testcases/runtime/TC-RUNTIME-003.yml
@@ -0,0 +1,39 @@
+id: TC-RUNTIME-003
+name: Health Check
+suite: runtime
+priority: 3
+timeout: 180000
+
+dependencies:
+  - TC-RUNTIME-001
+
+steps:
+  - name: Wait for health check
+    command: |
+      for i in {1..30}; do
+        STATUS=$(docker inspect ollama37 --format='{{.State.Health.Status}}' 2>/dev/null || echo "starting")
+        echo "Health status: $STATUS (attempt $i/30)"
+        if [ "$STATUS" = "healthy" ]; then
+          echo "Container is healthy"
+          exit 0
+        fi
+        sleep 5
+      done
+      echo "Health check timeout"
+      exit 1
+
+  - name: Test API endpoint
+    command: curl -s http://localhost:11434/api/tags
+
+  - name: Check Ollama version
+    command: docker exec ollama37 ollama --version
+
+criteria: |
+  Ollama server should be healthy and API responsive.
+
+  Expected:
+  - Container health status becomes "healthy"
+  - /api/tags endpoint returns JSON response (even if empty models)
+  - ollama --version shows version information
+
+  Accept any valid JSON response from API. Version format may vary.