Files
ollama37/server
Jesse Gross f2e9c9aff5 server: Reduce gpt-oss context length for small VRAM GPUs
gpt-oss works best with a context length of at least 8k. However,
for GPUs with limited amount of VRAM, there is a significant
performance hit to this increased context. In these cases, we
switch to the Ollama default of 4k
2025-08-07 14:23:55 -07:00
..
2024-07-26 14:14:48 -07:00
2025-08-05 12:21:16 -07:00
2025-08-05 12:21:16 -07:00
2025-08-05 12:21:16 -07:00
2025-08-05 12:21:16 -07:00
2025-08-05 12:21:16 -07:00