mirror of
https://github.com/dogkeeper886/ollama37.git
synced 2025-12-20 20:57:01 +00:00
- Add .github/workflows/build-test.yml for automated testing - Add tests/ directory with TypeScript test runner - Add docs/CICD.md documentation - Remove .gitlab-ci.yml (migrated to GitHub Actions) - Update .gitignore for test artifacts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
29 lines
856 B
YAML
29 lines
856 B
YAML
id: TC-INFERENCE-002
|
|
name: Basic Inference
|
|
suite: inference
|
|
priority: 2
|
|
timeout: 180000
|
|
|
|
dependencies:
|
|
- TC-INFERENCE-001
|
|
|
|
steps:
|
|
- name: Run simple math question
|
|
command: docker exec ollama37 ollama run gemma3:4b "What is 2+2? Answer with just the number." 2>&1
|
|
timeout: 120000
|
|
|
|
- name: Check GPU memory usage
|
|
command: docker exec ollama37 nvidia-smi --query-compute-apps=pid,used_memory --format=csv 2>/dev/null || echo "No GPU processes"
|
|
|
|
criteria: |
|
|
Basic inference should work on Tesla K80.
|
|
|
|
Expected:
|
|
- Model responds to the math question
|
|
- Response should indicate "4" (accept variations: "4", "four", "The answer is 4", etc.)
|
|
- GPU memory should be allocated during inference
|
|
- No CUDA errors in output
|
|
|
|
This is AI-generated output - accept reasonable variations.
|
|
Focus on the model producing a coherent response.
|