Files
ollama37/llama/runner/cache.go
Jesse Gross c3ff916431 runner.go: Don't add inputs to cache view until actually processed
We need to track which tokens are in the cache ourselves. We currently
add tokens to the cache tracker when we add them to batch but they are
not actually in the cache until we call Decode. This can cause
confusion when we are shifting the cache.

Avoids "could not find a KV slot for the batch" issues.

Bug #7545
2024-11-20 12:49:24 -08:00

5.9 KiB