ollama37/server/prompt.go at 6a89dcf848b1d041c7c9959dfc46c8c9b037df89

mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-11 08:17:03 +00:00

Files

Jesse Gross 34a75102f7 prompt: Use a single token when estimating mllama context size

Currently we assume that images take 768 tokens of context size for
the purposes of clipping old messages that exceed the context window.
However, our mllama implementation stores the full image embedding
in a single token. As a result, there is significant waste of context
space.

Ideally, we would handle this more generically and have the
implementation report the number of tokens. However, at the moment
this would just result in a similar set of 'if' conditions in the
runner plus APIs to report it back. So for now, we just keep this
simple.

2024-11-05 10:11:50 -08:00

3.6 KiB

Raw Blame History

View Raw

3.6 KiB Raw Blame History

3.6 KiB

Raw Blame History