Files
ollama37/ml/backend
Shang Chieh Tseng 08f38b19ea Fix Tesla K80 multi-GPU model switching deadlocks and silent failures
Resolves two critical issues preventing robust model switching:

1. Scheduler deadlock: Fixed improper loop control flow that prevented
   model unloading from triggering after conflict detection. Added proper
   multi-GPU conflict detection and unload sequencing.

2. Silent inference failures: Changed critical cudaSetDevice() calls from
   graceful error handling back to CUDA_CHECK to prevent models from
   appearing to load successfully but failing silently during inference.

Result: Robust Tesla K80 dual-GPU model switching with self-healing
recovery instead of requiring system reboots.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 01:30:10 +08:00
..
2025-02-13 16:31:21 -08:00