Jesse Gross
9679f40146
ml: Allow models to constrain inputs to a single batch
...
Models may require that a set of inputs all be processed as part
of the same batch. For example, if an image has multiple patches
with fully connected attention between them, we should not split
the batch in the middle of an image.
Fixes #9697
2025-03-14 15:38:54 -07:00
Michael Yang
5e2e0b46b1
fix: error if image requested without vision model
2025-03-13 10:52:09 -07:00
Bruce MacDonald
a70820daa0
models/gemma3: remove final logit softcap ( #9692 )
...
Softcap isn't in the whitepaper/implementation for the language model so we should remove it. There is no discernible difference in output with it removed.
2025-03-12 10:17:57 -07:00
jmorganca
83f0ec8269
all: address linter errors
2025-03-11 14:49:20 -07:00
Michael Yang
63a394068c
use 2d pooling
2025-03-11 14:49:20 -07:00
jmorganca
11bfa62796
add trailing \n\n after <end_of_image> to match reference implementation
2025-03-11 14:49:20 -07:00
jmorganca
f63e62e546
reduce kernel size, add TODO for loading from config
2025-03-11 14:49:20 -07:00
jmorganca
65b0f329d1
Revert "Allow models to force a new batch"
...
This reverts commit c7eae586b899083acebcd9b3847b89ea78c2850c.
2025-03-11 14:49:20 -07:00
Jesse Gross
06007c0a18
Allow models to force a new batch
...
This is useful for a few things:
- Work around bugs, such as having 2 images in one batch
- Keep the image in a single batch for fully connected attention
- Improve performance by not evaluating embeddings multiple times
2025-03-11 14:49:20 -07:00
Jesse Gross
a8e83a7654
Disable causal attention based on batch index
...
Currently we are using positions, which are relative to a
sequence and may not be unique.
2025-03-11 14:49:20 -07:00
Jesse Gross
2c40c4d35e
Fix follow up images and images split across batches
2025-03-11 14:49:19 -07:00
Michael Yang
e95278932b
use non-causal mask only for image positions
2025-03-11 14:49:19 -07:00
Michael Yang
9d2a20a763
use non-causal mask for inputs with images
2025-03-11 14:49:19 -07:00
Michael Yang
6b32a2d549
compat with upstream gguf
2025-03-11 14:49:19 -07:00
Michael Yang
f888912870
fix vision encoder
2025-03-11 14:49:19 -07:00
Patrick Devine
9b54267e69
fix configs
2025-03-11 14:49:19 -07:00
Michael Yang
46bb0169c4
update model
2025-03-11 14:49:19 -07:00
Michael Yang
8934324b72
use fast attention
2025-03-11 14:49:18 -07:00
Patrick Devine
c62861f4fa
fix conversion
2025-03-11 14:49:18 -07:00
Michael Yang
0df1800436
set non-causal attention
2025-03-11 14:49:18 -07:00
Jesse Gross
4346c2409d
fix drift from main
2025-03-11 14:49:18 -07:00
Michael Yang
4b037a97dc
add gemma vision encoder
2025-03-11 14:49:17 -07:00
Patrick Devine
5f74d1fd47
gemma2 impl
2025-03-11 14:35:08 -07:00