Optimize GPU memory estimation for single-GPU preference on Tesla K80

mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-10 07:46:59 +00:00

Implemented multi-GPU memory optimization to reduce unnecessary model splits
across dual Tesla K80 GPUs by fixing graph memory overestimation.

Changes:
1. Per-GPU graph allocation strategy
   - Secondary GPUs: 190 MiB (empirically measured)
   - Primary GPU: Full 1.3 GiB graph allocation
   - Applied during layer distribution, not just final allocation

2. Reverse-order layer distribution
   - Prefer loading all layers on last GPU (GPU 1) first
   - Only use secondary GPUs when primary is full
   - Changed from round-robin to reverse-order (j-1 instead of i%j)

Results:
✅ gemma3:4b: Single GPU (no split, was already working)
✅ gemma3:12b: 1,48 layer split (improved from 25,24 split)
   - GPU 0: 1 layer, 610 MiB (down from 4156 MiB)
   - GPU 1: 48 layers, 9857 MiB (primary)
   - Total actual: 10.5 GiB (fits in single K80's 11.2 GiB)

Memory estimate reduced from 13.0 GiB → 11.9 GiB, enabling more models
to run on single GPU with better performance (no cross-GPU overhead).

Files modified:
- llm/memory.go: Core allocation logic (lines 230-288)
- llm/CLAUDE.md: Detailed implementation guide
- CLAUDE.md: Project status and results summary

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

This commit is contained in:

Shang Chieh Tseng

2025-10-29 19:58:20 +08:00

parent 5077ab3fb4

commit 241a03402e

3 changed files with 35 additions and 16 deletions

									
										2

llm/CLAUDE.md
									
												View File
												
				@@ -1,6 +1,6 @@

				# LLM Package - Memory Estimation Optimization Guide

				**Status**: ⚠️ **IN PROGRESS** - Implementation pending

				**Status**: ✅ **COMPLETED** - Implemented and tested successfully

				This file contains instructions for optimizing GPU memory estimation to reduce unnecessary multi-GPU splits on Tesla K80 dual-GPU systems.

Optimize GPU memory estimation for single-GPU preference on Tesla K80

2 llm/CLAUDE.md Unescape Escape View File

2

llm/CLAUDE.md

View File