mirror of
https://github.com/dogkeeper886/ollama37.git
synced 2025-12-20 04:37:00 +00:00
Refactor model unload: each test cleans up its own model
- TC-INFERENCE-003: Add unload step for gemma3:4b at end - TC-INFERENCE-004: Remove redundant 4b unload at start - TC-INFERENCE-005: Remove redundant 12b unload at start Each model size test now handles its own VRAM cleanup. Workflow-level unload remains as safety fallback for failures. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -79,6 +79,13 @@ steps:
|
|||||||
echo "Recent API requests:"
|
echo "Recent API requests:"
|
||||||
echo "$LOGS" | grep '\[GIN\]' | tail -5
|
echo "$LOGS" | grep '\[GIN\]' | tail -5
|
||||||
|
|
||||||
|
- name: Unload model after 4b tests complete
|
||||||
|
command: |
|
||||||
|
echo "Unloading gemma3:4b from VRAM..."
|
||||||
|
curl -s http://localhost:11434/api/generate -d '{"model":"gemma3:4b","keep_alive":0}' || true
|
||||||
|
sleep 2
|
||||||
|
echo "Model unloaded"
|
||||||
|
|
||||||
criteria: |
|
criteria: |
|
||||||
Ollama REST API should handle inference requests.
|
Ollama REST API should handle inference requests.
|
||||||
|
|
||||||
|
|||||||
@@ -8,13 +8,6 @@ dependencies:
|
|||||||
- TC-INFERENCE-003
|
- TC-INFERENCE-003
|
||||||
|
|
||||||
steps:
|
steps:
|
||||||
- name: Unload previous model from VRAM
|
|
||||||
command: |
|
|
||||||
echo "Unloading any loaded models..."
|
|
||||||
curl -s http://localhost:11434/api/generate -d '{"model":"gemma3:4b","keep_alive":0}' || true
|
|
||||||
sleep 2
|
|
||||||
echo "Previous model unloaded"
|
|
||||||
|
|
||||||
- name: Check if gemma3:12b model exists
|
- name: Check if gemma3:12b model exists
|
||||||
command: docker exec ollama37 ollama list | grep -q "gemma3:12b" && echo "Model exists" || echo "Model not found"
|
command: docker exec ollama37 ollama list | grep -q "gemma3:12b" && echo "Model exists" || echo "Model not found"
|
||||||
|
|
||||||
|
|||||||
@@ -8,13 +8,6 @@ dependencies:
|
|||||||
- TC-INFERENCE-004
|
- TC-INFERENCE-004
|
||||||
|
|
||||||
steps:
|
steps:
|
||||||
- name: Unload previous model from VRAM
|
|
||||||
command: |
|
|
||||||
echo "Unloading any loaded models..."
|
|
||||||
curl -s http://localhost:11434/api/generate -d '{"model":"gemma3:12b","keep_alive":0}' || true
|
|
||||||
sleep 2
|
|
||||||
echo "Previous model unloaded"
|
|
||||||
|
|
||||||
- name: Verify dual GPU availability
|
- name: Verify dual GPU availability
|
||||||
command: |
|
command: |
|
||||||
echo "=== GPU Configuration ==="
|
echo "=== GPU Configuration ==="
|
||||||
|
|||||||
Reference in New Issue
Block a user