ollama37/server/sched.go at 34b9db5afc43b352c5ef04fe6ef52684bfdd57b5

mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-10 15:57:04 +00:00

Files

Daniel Hiltgen 34b9db5afc Request and model concurrency

This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.

2024-04-22 19:29:12 -07:00

18 KiB

Raw Blame History

View Raw

18 KiB Raw Blame History

18 KiB

Raw Blame History