ollamarunner: Don't truncate a SameBatch

When truncating inputs to the the context window at the beginning of
a sequence, we remove the minimum amount possible. However, this
may cause us to truncate to the middle of a set of inputs that
the model specified should not be split up. To avoid this, we
need to remove the rest of the partial batch.
This commit is contained in:
Jesse Gross
2025-04-01 15:01:23 -07:00
committed by Jesse Gross
parent 9876c9faa4
commit 493385eb3e
2 changed files with 31 additions and 4 deletions

View File

@@ -225,6 +225,8 @@ func countCommonPrefix(a []input.Input, b []input.Input) int32 {
return count
}
// TODO(jessegross): If we need to reprocess the inputs we should ensure that
// we don't split up a SameBatch
func (c *InputCache) ShiftDiscard(inputLen int32, numKeep int32) int32 {
targetFree := (c.numCtx - numKeep) / 2
targetFree = max(targetFree, 1)