ollama37/server/routes.go at f2e9c9aff5f59b21a5d9a9668408732b3de01e20

mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-14 17:57:06 +00:00

Files

Jesse Gross f2e9c9aff5 server: Reduce gpt-oss context length for small VRAM GPUs

gpt-oss works best with a context length of at least 8k. However,
for GPUs with limited amount of VRAM, there is a significant
performance hit to this increased context. In these cases, we
switch to the Ollama default of 4k

2025-08-07 14:23:55 -07:00

47 KiB

Raw Blame History

View Raw

47 KiB Raw Blame History

47 KiB

Raw Blame History