kvcache: Add check for values that fall out of sliding window cache

The sliding window cache trims entries that are outside the window for
the latest token. This works when we are extending the cache, such as
when the conversation continues. However, if we have a partial overlap
in conversation (including the BOS tokens), then we resume from a past
point in the conversation and the needed tokens are no longer stored
in memory. This verifies that the new window overlaps with the old one
before reusing the cache.

Co-authored-by: Jesse Gross <jesse@ollama.com>
This commit is contained in:
jmorganca
2025-03-30 16:05:40 -07:00
committed by Jesse Gross
parent 493385eb3e
commit b42970063d
7 changed files with 131 additions and 2 deletions

View File

@@ -87,6 +87,16 @@ func (c *WrapperCache) CopyPrefix(srcSeq, dstSeq int, len int32) {
}
}
func (c *WrapperCache) CanResume(seq int, pos int32) bool {
for _, cache := range c.caches {
if !cache.CanResume(seq, pos) {
return false
}
}
return true
}
func (c *WrapperCache) Remove(seq int, beginIndex, endIndex int32) error {
// If the one of these fails, the caller is supposed to retry with endIndex set to math.MaxInt32, which should not fail
for _, cache := range c.caches {