Refactor model unload: each test cleans up its own model

- TC-INFERENCE-003: Add unload step for gemma3:4b at end
- TC-INFERENCE-004: Remove redundant 4b unload at start
- TC-INFERENCE-005: Remove redundant 12b unload at start

Each model size test now handles its own VRAM cleanup.
Workflow-level unload remains as safety fallback for failures.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Shang Chieh Tseng
2025-12-17 17:20:44 +08:00
parent 806232d95f
commit 82ab6cc96e
3 changed files with 7 additions and 14 deletions

View File

@@ -79,6 +79,13 @@ steps:
echo "Recent API requests:"
echo "$LOGS" | grep '\[GIN\]' | tail -5
- name: Unload model after 4b tests complete
command: |
echo "Unloading gemma3:4b from VRAM..."
curl -s http://localhost:11434/api/generate -d '{"model":"gemma3:4b","keep_alive":0}' || true
sleep 2
echo "Model unloaded"
criteria: |
Ollama REST API should handle inference requests.

View File

@@ -8,13 +8,6 @@ dependencies:
- TC-INFERENCE-003
steps:
- name: Unload previous model from VRAM
command: |
echo "Unloading any loaded models..."
curl -s http://localhost:11434/api/generate -d '{"model":"gemma3:4b","keep_alive":0}' || true
sleep 2
echo "Previous model unloaded"
- name: Check if gemma3:12b model exists
command: docker exec ollama37 ollama list | grep -q "gemma3:12b" && echo "Model exists" || echo "Model not found"

View File

@@ -8,13 +8,6 @@ dependencies:
- TC-INFERENCE-004
steps:
- name: Unload previous model from VRAM
command: |
echo "Unloading any loaded models..."
curl -s http://localhost:11434/api/generate -d '{"model":"gemma3:12b","keep_alive":0}' || true
sleep 2
echo "Previous model unloaded"
- name: Verify dual GPU availability
command: |
echo "=== GPU Configuration ==="