ollama37/llm/memory.go at d75557747357bfb3afd441a0cc207ec944bd3a18

mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-13 01:07:12 +00:00

Files

Jesse Gross d755577473 llm: Estimate projector memory correctly for Ollama engine

The Llama engine always places vision projectors on the first GPU
if one exists. However, the Ollama engine groups it with the output
layer, which means the projector is only offloaded if all other layers
are offloaded. The memory estimation code always assumes the former
layout - this changes it to use the correct layout based on the engine.

This addresses two impacts of the current behavior:
 - In multi-GPU setups, we can crash with OOM errors when we try to
   allocate memory on a full GPU while another still has space.
 - If the vision projector is large, it may prevent us from offloading
   anything when we could have fit some of the text layers.

2025-05-19 09:52:48 -07:00

12 KiB

Raw Blame History

View Raw

12 KiB Raw Blame History

12 KiB

Raw Blame History