Resolves two critical issues preventing robust model switching:
1. Scheduler deadlock: Fixed improper loop control flow that prevented
model unloading from triggering after conflict detection. Added proper
multi-GPU conflict detection and unload sequencing.
2. Silent inference failures: Changed critical cudaSetDevice() calls from
graceful error handling back to CUDA_CHECK to prevent models from
appearing to load successfully but failing silently during inference.
Result: Robust Tesla K80 dual-GPU model switching with self-healing
recovery instead of requiring system reboots.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>