ml: Allow models to constrain inputs to a single batch

Models may require that a set of inputs all be processed as part
of the same batch. For example, if an image has multiple patches
with fully connected attention between them, we should not split
the batch in the middle of an image.

Fixes #9697
This commit is contained in:
Jesse Gross
2025-03-12 16:56:11 -07:00
committed by Jesse Gross
parent 3892c3a703
commit 9679f40146
5 changed files with 64 additions and 66 deletions

View File

@@ -15,6 +15,12 @@ type Input struct {
// stored in Multimodal, used for caching and comparing
// equality.
MultimodalHash uint64
// SameBatch forces the following number of tokens to be processed
// in a single batch, breaking and extending batches as needed.
// Useful for things like images that must be processed in one
// shot.
SameBatch int
}
// MultimodalIndex is a multimodal element (such as an image)