mirror of
https://github.com/dogkeeper886/ollama37.git
synced 2025-12-18 19:56:59 +00:00
Add GitHub Actions CI/CD pipeline and test framework
- Add .github/workflows/build-test.yml for automated testing - Add tests/ directory with TypeScript test runner - Add docs/CICD.md documentation - Remove .gitlab-ci.yml (migrated to GitHub Actions) - Update .gitignore for test artifacts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
31
tests/testcases/build/TC-BUILD-001.yml
Normal file
31
tests/testcases/build/TC-BUILD-001.yml
Normal file
@@ -0,0 +1,31 @@
|
||||
id: TC-BUILD-001
|
||||
name: Builder Image Verification
|
||||
suite: build
|
||||
priority: 1
|
||||
timeout: 120000
|
||||
|
||||
dependencies: []
|
||||
|
||||
steps:
|
||||
- name: Check image exists
|
||||
command: docker images ollama37-builder:latest --format '{{.Repository}}:{{.Tag}}'
|
||||
|
||||
- name: Verify CUDA toolkit
|
||||
command: docker run --rm ollama37-builder:latest nvcc --version
|
||||
|
||||
- name: Verify GCC version
|
||||
command: docker run --rm ollama37-builder:latest gcc --version | head -1
|
||||
|
||||
- name: Verify Go version
|
||||
command: docker run --rm ollama37-builder:latest go version
|
||||
|
||||
criteria: |
|
||||
All commands should succeed (exit code 0).
|
||||
|
||||
Expected outputs:
|
||||
- Image exists: should show "ollama37-builder:latest"
|
||||
- CUDA: should show version 11.4 (accept 11.4.x)
|
||||
- GCC: should show version 10 (accept GCC 10.x)
|
||||
- Go: should show version 1.25 or higher
|
||||
|
||||
Accept minor version variations. Focus on major versions being correct.
|
||||
27
tests/testcases/build/TC-BUILD-002.yml
Normal file
27
tests/testcases/build/TC-BUILD-002.yml
Normal file
@@ -0,0 +1,27 @@
|
||||
id: TC-BUILD-002
|
||||
name: Runtime Image Build
|
||||
suite: build
|
||||
priority: 2
|
||||
timeout: 900000
|
||||
|
||||
dependencies:
|
||||
- TC-BUILD-001
|
||||
|
||||
steps:
|
||||
- name: Build runtime image
|
||||
command: cd docker && make build-runtime-no-cache 2>&1 | tail -50
|
||||
timeout: 900000
|
||||
|
||||
- name: Verify runtime image exists
|
||||
command: docker images ollama37:latest --format '{{.Repository}}:{{.Tag}} {{.Size}}'
|
||||
|
||||
criteria: |
|
||||
The runtime Docker image should build successfully from GitHub source.
|
||||
|
||||
Expected:
|
||||
- Build completes without fatal errors
|
||||
- Final output should mention "successfully" or similar completion message
|
||||
- Runtime image "ollama37:latest" should exist after build
|
||||
- Image size should be substantial (>10GB is expected due to CUDA)
|
||||
|
||||
Accept build warnings. Only fail on actual build errors.
|
||||
25
tests/testcases/build/TC-BUILD-003.yml
Normal file
25
tests/testcases/build/TC-BUILD-003.yml
Normal file
@@ -0,0 +1,25 @@
|
||||
id: TC-BUILD-003
|
||||
name: Image Size Validation
|
||||
suite: build
|
||||
priority: 3
|
||||
timeout: 30000
|
||||
|
||||
dependencies:
|
||||
- TC-BUILD-002
|
||||
|
||||
steps:
|
||||
- name: Check builder image size
|
||||
command: docker images ollama37-builder:latest --format '{{.Size}}'
|
||||
|
||||
- name: Check runtime image size
|
||||
command: docker images ollama37:latest --format '{{.Size}}'
|
||||
|
||||
criteria: |
|
||||
Docker images should be within expected size ranges.
|
||||
|
||||
Expected:
|
||||
- Builder image: 10GB to 20GB (contains CUDA, GCC, CMake, Go)
|
||||
- Runtime image: 15GB to 25GB (contains builder + compiled ollama)
|
||||
|
||||
These are large images due to CUDA toolkit and build tools.
|
||||
Accept sizes within reasonable range of expectations.
|
||||
30
tests/testcases/inference/TC-INFERENCE-001.yml
Normal file
30
tests/testcases/inference/TC-INFERENCE-001.yml
Normal file
@@ -0,0 +1,30 @@
|
||||
id: TC-INFERENCE-001
|
||||
name: Model Pull
|
||||
suite: inference
|
||||
priority: 1
|
||||
timeout: 600000
|
||||
|
||||
dependencies:
|
||||
- TC-RUNTIME-003
|
||||
|
||||
steps:
|
||||
- name: Check if model exists
|
||||
command: docker exec ollama37 ollama list | grep -q "gemma3:4b" && echo "Model exists" || echo "Model not found"
|
||||
|
||||
- name: Pull model if needed
|
||||
command: docker exec ollama37 ollama list | grep -q "gemma3:4b" || docker exec ollama37 ollama pull gemma3:4b
|
||||
timeout: 600000
|
||||
|
||||
- name: Verify model available
|
||||
command: docker exec ollama37 ollama list
|
||||
|
||||
criteria: |
|
||||
The gemma3:4b model should be available for inference.
|
||||
|
||||
Expected:
|
||||
- Model is either already present or successfully downloaded
|
||||
- "ollama list" shows gemma3:4b in the output
|
||||
- No download errors
|
||||
|
||||
Accept if model already exists (skip download).
|
||||
Model size is ~3GB, download may take time.
|
||||
28
tests/testcases/inference/TC-INFERENCE-002.yml
Normal file
28
tests/testcases/inference/TC-INFERENCE-002.yml
Normal file
@@ -0,0 +1,28 @@
|
||||
id: TC-INFERENCE-002
|
||||
name: Basic Inference
|
||||
suite: inference
|
||||
priority: 2
|
||||
timeout: 180000
|
||||
|
||||
dependencies:
|
||||
- TC-INFERENCE-001
|
||||
|
||||
steps:
|
||||
- name: Run simple math question
|
||||
command: docker exec ollama37 ollama run gemma3:4b "What is 2+2? Answer with just the number." 2>&1
|
||||
timeout: 120000
|
||||
|
||||
- name: Check GPU memory usage
|
||||
command: docker exec ollama37 nvidia-smi --query-compute-apps=pid,used_memory --format=csv 2>/dev/null || echo "No GPU processes"
|
||||
|
||||
criteria: |
|
||||
Basic inference should work on Tesla K80.
|
||||
|
||||
Expected:
|
||||
- Model responds to the math question
|
||||
- Response should indicate "4" (accept variations: "4", "four", "The answer is 4", etc.)
|
||||
- GPU memory should be allocated during inference
|
||||
- No CUDA errors in output
|
||||
|
||||
This is AI-generated output - accept reasonable variations.
|
||||
Focus on the model producing a coherent response.
|
||||
34
tests/testcases/inference/TC-INFERENCE-003.yml
Normal file
34
tests/testcases/inference/TC-INFERENCE-003.yml
Normal file
@@ -0,0 +1,34 @@
|
||||
id: TC-INFERENCE-003
|
||||
name: API Endpoint Test
|
||||
suite: inference
|
||||
priority: 3
|
||||
timeout: 120000
|
||||
|
||||
dependencies:
|
||||
- TC-INFERENCE-001
|
||||
|
||||
steps:
|
||||
- name: Test generate endpoint (non-streaming)
|
||||
command: |
|
||||
curl -s http://localhost:11434/api/generate \
|
||||
-d '{"model":"gemma3:4b","prompt":"Say hello in one word","stream":false}' \
|
||||
| head -c 500
|
||||
|
||||
- name: Test generate endpoint (streaming)
|
||||
command: |
|
||||
curl -s http://localhost:11434/api/generate \
|
||||
-d '{"model":"gemma3:4b","prompt":"Count from 1 to 3","stream":true}' \
|
||||
| head -5
|
||||
|
||||
criteria: |
|
||||
Ollama REST API should handle inference requests.
|
||||
|
||||
Expected for non-streaming:
|
||||
- Returns JSON with "response" field
|
||||
- Response contains some greeting (hello, hi, etc.)
|
||||
|
||||
Expected for streaming:
|
||||
- Returns multiple JSON lines
|
||||
- Each line contains partial response
|
||||
|
||||
Accept any valid JSON response. Content may vary.
|
||||
32
tests/testcases/inference/TC-INFERENCE-004.yml
Normal file
32
tests/testcases/inference/TC-INFERENCE-004.yml
Normal file
@@ -0,0 +1,32 @@
|
||||
id: TC-INFERENCE-004
|
||||
name: CUBLAS Fallback Verification
|
||||
suite: inference
|
||||
priority: 4
|
||||
timeout: 120000
|
||||
|
||||
dependencies:
|
||||
- TC-INFERENCE-002
|
||||
|
||||
steps:
|
||||
- name: Check for CUBLAS errors in logs
|
||||
command: cd docker && docker compose logs 2>&1 | grep -i "CUBLAS_STATUS" | grep -v "SUCCESS" | head -10 || echo "No CUBLAS errors"
|
||||
|
||||
- name: Check compute capability detection
|
||||
command: cd docker && docker compose logs 2>&1 | grep -iE "compute|capability|cc.*3" | head -10 || echo "No compute capability logs"
|
||||
|
||||
- name: Verify no GPU errors
|
||||
command: cd docker && docker compose logs 2>&1 | grep -iE "error|fail" | grep -i gpu | head -10 || echo "No GPU errors"
|
||||
|
||||
criteria: |
|
||||
CUBLAS should work correctly on Tesla K80 using legacy fallback.
|
||||
|
||||
Expected:
|
||||
- No CUBLAS_STATUS_ARCH_MISMATCH errors
|
||||
- No CUBLAS_STATUS_NOT_SUPPORTED errors
|
||||
- Compute capability 3.7 may be mentioned in debug logs
|
||||
- No fatal GPU-related errors
|
||||
|
||||
The K80 uses legacy CUBLAS functions (cublasSgemmBatched)
|
||||
instead of modern Ex variants. This should work transparently.
|
||||
|
||||
Accept warnings. Only fail on actual CUBLAS errors.
|
||||
31
tests/testcases/runtime/TC-RUNTIME-001.yml
Normal file
31
tests/testcases/runtime/TC-RUNTIME-001.yml
Normal file
@@ -0,0 +1,31 @@
|
||||
id: TC-RUNTIME-001
|
||||
name: Container Startup
|
||||
suite: runtime
|
||||
priority: 1
|
||||
timeout: 120000
|
||||
|
||||
dependencies:
|
||||
- TC-BUILD-002
|
||||
|
||||
steps:
|
||||
- name: Stop existing container
|
||||
command: cd docker && docker compose down 2>/dev/null || true
|
||||
|
||||
- name: Start container with GPU
|
||||
command: cd docker && docker compose up -d
|
||||
|
||||
- name: Wait for startup
|
||||
command: sleep 15
|
||||
|
||||
- name: Check container status
|
||||
command: cd docker && docker compose ps
|
||||
|
||||
criteria: |
|
||||
The ollama37 container should start successfully with GPU access.
|
||||
|
||||
Expected:
|
||||
- Container starts without errors
|
||||
- docker compose ps shows container in "Up" state
|
||||
- No "Exited" or "Restarting" status
|
||||
|
||||
Accept startup warnings. Container should be running.
|
||||
29
tests/testcases/runtime/TC-RUNTIME-002.yml
Normal file
29
tests/testcases/runtime/TC-RUNTIME-002.yml
Normal file
@@ -0,0 +1,29 @@
|
||||
id: TC-RUNTIME-002
|
||||
name: GPU Detection
|
||||
suite: runtime
|
||||
priority: 2
|
||||
timeout: 60000
|
||||
|
||||
dependencies:
|
||||
- TC-RUNTIME-001
|
||||
|
||||
steps:
|
||||
- name: Check nvidia-smi inside container
|
||||
command: docker exec ollama37 nvidia-smi
|
||||
|
||||
- name: Check CUDA libraries
|
||||
command: docker exec ollama37 ldconfig -p | grep -i cuda | head -5
|
||||
|
||||
- name: Check Ollama GPU detection
|
||||
command: cd docker && docker compose logs 2>&1 | grep -i gpu | head -10
|
||||
|
||||
criteria: |
|
||||
Tesla K80 GPU should be detected inside the container.
|
||||
|
||||
Expected:
|
||||
- nvidia-smi shows Tesla K80 GPU(s)
|
||||
- Driver version 470.x (or compatible)
|
||||
- CUDA libraries are available (libcuda, libcublas, etc.)
|
||||
- Ollama logs mention GPU detection
|
||||
|
||||
The K80 has 12GB VRAM per GPU. Accept variations in reported memory.
|
||||
39
tests/testcases/runtime/TC-RUNTIME-003.yml
Normal file
39
tests/testcases/runtime/TC-RUNTIME-003.yml
Normal file
@@ -0,0 +1,39 @@
|
||||
id: TC-RUNTIME-003
|
||||
name: Health Check
|
||||
suite: runtime
|
||||
priority: 3
|
||||
timeout: 180000
|
||||
|
||||
dependencies:
|
||||
- TC-RUNTIME-001
|
||||
|
||||
steps:
|
||||
- name: Wait for health check
|
||||
command: |
|
||||
for i in {1..30}; do
|
||||
STATUS=$(docker inspect ollama37 --format='{{.State.Health.Status}}' 2>/dev/null || echo "starting")
|
||||
echo "Health status: $STATUS (attempt $i/30)"
|
||||
if [ "$STATUS" = "healthy" ]; then
|
||||
echo "Container is healthy"
|
||||
exit 0
|
||||
fi
|
||||
sleep 5
|
||||
done
|
||||
echo "Health check timeout"
|
||||
exit 1
|
||||
|
||||
- name: Test API endpoint
|
||||
command: curl -s http://localhost:11434/api/tags
|
||||
|
||||
- name: Check Ollama version
|
||||
command: docker exec ollama37 ollama --version
|
||||
|
||||
criteria: |
|
||||
Ollama server should be healthy and API responsive.
|
||||
|
||||
Expected:
|
||||
- Container health status becomes "healthy"
|
||||
- /api/tags endpoint returns JSON response (even if empty models)
|
||||
- ollama --version shows version information
|
||||
|
||||
Accept any valid JSON response from API. Version format may vary.
|
||||
Reference in New Issue
Block a user