Jesse Gross
|
a8e83a7654
|
Disable causal attention based on batch index
Currently we are using positions, which are relative to a
sequence and may not be unique.
|
2025-03-11 14:49:20 -07:00 |
|
Jesse Gross
|
2c40c4d35e
|
Fix follow up images and images split across batches
|
2025-03-11 14:49:19 -07:00 |
|
Michael Yang
|
e95278932b
|
use non-causal mask only for image positions
|
2025-03-11 14:49:19 -07:00 |
|
Michael Yang
|
9d2a20a763
|
use non-causal mask for inputs with images
|
2025-03-11 14:49:19 -07:00 |
|
Michael Yang
|
6b32a2d549
|
compat with upstream gguf
|
2025-03-11 14:49:19 -07:00 |
|
Michael Yang
|
f888912870
|
fix vision encoder
|
2025-03-11 14:49:19 -07:00 |
|
Patrick Devine
|
9b54267e69
|
fix configs
|
2025-03-11 14:49:19 -07:00 |
|
Michael Yang
|
46bb0169c4
|
update model
|
2025-03-11 14:49:19 -07:00 |
|
Patrick Devine
|
c62861f4fa
|
fix conversion
|
2025-03-11 14:49:18 -07:00 |
|
Michael Yang
|
0df1800436
|
set non-causal attention
|
2025-03-11 14:49:18 -07:00 |
|
Jesse Gross
|
4346c2409d
|
fix drift from main
|
2025-03-11 14:49:18 -07:00 |
|
Michael Yang
|
4b037a97dc
|
add gemma vision encoder
|
2025-03-11 14:49:17 -07:00 |
|
Patrick Devine
|
5f74d1fd47
|
gemma2 impl
|
2025-03-11 14:35:08 -07:00 |
|