Refactor model unload: each test cleans up its own model

- TC-INFERENCE-003: Add unload step for gemma3:4b at end - TC-INFERENCE-004: Remove redundant 4b unload at start - TC-INFERENCE-005: Remove redundant 12b unload at start Each model size test now handles its own VRAM cleanup. Workflow-level unload remains as safety fallback for failures. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 21:26:59 +00:00 · 2025-12-17 17:20:44 +08:00
parent 806232d95f
commit 82ab6cc96e
3 changed files with 7 additions and 14 deletions
--- a/tests/testcases/inference/TC-INFERENCE-003.yml
+++ b/tests/testcases/inference/TC-INFERENCE-003.yml
@@ -79,6 +79,13 @@ steps:
      echo "Recent API requests:"
      echo "$LOGS" | grep '\[GIN\]' | tail -5

+  - name: Unload model after 4b tests complete
+    command: |
+      echo "Unloading gemma3:4b from VRAM..."
+      curl -s http://localhost:11434/api/generate -d '{"model":"gemma3:4b","keep_alive":0}' || true
+      sleep 2
+      echo "Model unloaded"
+
 criteria: |
  Ollama REST API should handle inference requests.