Files
ollama37/llm
Shang Chieh Tseng 6d87524e22 Fix gemma3:12b to load on single Tesla K80 GPU
Problem: gemma3:12b (10.2 GiB actual) was splitting across 2 GPUs
despite fitting in single Tesla K80 (11.2 GiB available).

Root Cause: Graph memory estimates for CC 3.7 were 15-20% too high
(estimated 1.3 GiB, actual 1.1 GiB), causing single-GPU fit check
to fail by ~200 MiB margin.

Solution: Apply empirical 85% correction factor to graph estimates
for Tesla K80 (CC 3.7) based on measured actual usage.

Results:
- Memory estimate: 11.9 GiB → 11.0 GiB (-900 MiB)
- GPU split: 1,48 layers → single GPU (no split)
- GPU 0: 10,015 MiB (was 617 MiB)
- GPU 1: 7 MiB (was 9,866 MiB)
- Inference: 94% GPU utilization, no cross-GPU overhead

Testing:  gemma3:12b loads on single GPU with correct inference

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-30 00:15:59 +08:00
..
2025-05-05 11:08:12 -07:00
2024-11-19 16:26:57 -08:00