llm: make load time stall duration configurable via OLLAMA_LOAD_TIMEOUT

With the new very large parameter models, some users are willing to wait for
a very long time for models to load.
This commit is contained in:
Daniel Hiltgen
2024-09-05 14:00:08 -07:00
committed by GitHub
parent b05c9e83d9
commit 6719097649
4 changed files with 60 additions and 7 deletions

View File

@@ -1422,6 +1422,7 @@ func NewCLI() *cobra.Command {
envVars["OLLAMA_FLASH_ATTENTION"],
envVars["OLLAMA_LLM_LIBRARY"],
envVars["OLLAMA_GPU_OVERHEAD"],
envVars["OLLAMA_LOAD_TIMEOUT"],
})
default:
appendEnvDocs(cmd, envs)