mirror of
https://github.com/dogkeeper886/ollama37.git
synced 2025-12-11 00:07:07 +00:00
Make maximum pending request configurable
This also bumps up the default to be 50 queued requests instead of 10.
This commit is contained in:
@@ -232,3 +232,9 @@ curl http://localhost:11434/api/generate -d '{"model": "llama3", "keep_alive": 0
|
||||
Alternatively, you can change the amount of time all models are loaded into memory by setting the `OLLAMA_KEEP_ALIVE` environment variable when starting the Ollama server. The `OLLAMA_KEEP_ALIVE` variable uses the same parameter types as the `keep_alive` parameter types mentioned above. Refer to section explaining [how to configure the Ollama server](#how-do-i-configure-ollama-server) to correctly set the environment variable.
|
||||
|
||||
If you wish to override the `OLLAMA_KEEP_ALIVE` setting, use the `keep_alive` API parameter with the `/api/generate` or `/api/chat` API endpoints.
|
||||
|
||||
## How do I manage the maximum number of requests the server can queue
|
||||
|
||||
If too many requests are sent to the server, it will respond with a 503 error
|
||||
indicating the server is overloaded. You can adjust how many requests may be
|
||||
queue by setting `OLLAMA_MAX_QUEUE`
|
||||
Reference in New Issue
Block a user