ollama37/ml/backend/ggml/ggml.go at 26a26998fb24f1aaa1f0a95980050086d6cf64f0

mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-16 18:57:09 +00:00

Files

Jesse Gross 4100ed7bdd ml: Add support for quantized KV cache

Similar to the llama engine, quantizing the KV cache requires
flash attention to be enabled through the Ollama server.

2025-03-07 18:43:39 -08:00

View Raw