Commit Graph

14 Commits

Author SHA1 Message Date
Bruce MacDonald
a70820daa0 models/gemma3: remove final logit softcap (#9692)
Softcap isn't in the whitepaper/implementation for the language model so we should remove it. There is no discernible difference in output with it removed.
2025-03-12 10:17:57 -07:00
Jesse Gross
a8e83a7654 Disable causal attention based on batch index
Currently we are using positions, which are relative to a
sequence and may not be unique.
2025-03-11 14:49:20 -07:00
Jesse Gross
2c40c4d35e Fix follow up images and images split across batches 2025-03-11 14:49:19 -07:00
Michael Yang
e95278932b use non-causal mask only for image positions 2025-03-11 14:49:19 -07:00
Michael Yang
9d2a20a763 use non-causal mask for inputs with images 2025-03-11 14:49:19 -07:00
Michael Yang
6b32a2d549 compat with upstream gguf 2025-03-11 14:49:19 -07:00
Michael Yang
f888912870 fix vision encoder 2025-03-11 14:49:19 -07:00
Patrick Devine
9b54267e69 fix configs 2025-03-11 14:49:19 -07:00
Michael Yang
46bb0169c4 update model 2025-03-11 14:49:19 -07:00
Patrick Devine
c62861f4fa fix conversion 2025-03-11 14:49:18 -07:00
Michael Yang
0df1800436 set non-causal attention 2025-03-11 14:49:18 -07:00
Jesse Gross
4346c2409d fix drift from main 2025-03-11 14:49:18 -07:00
Michael Yang
4b037a97dc add gemma vision encoder 2025-03-11 14:49:17 -07:00
Patrick Devine
5f74d1fd47 gemma2 impl 2025-03-11 14:35:08 -07:00