ollama37/llm/ext_server/server.cpp at 74d45f010276c2f2653f3ca8c4f76cb0552fb46e

mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-10 15:57:04 +00:00

Files

Jeffrey Morgan 15c2d8fe14 server: parallelize embeddings in API web handler instead of in subprocess runner (#6220 )

For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality.

2024-08-11 11:57:10 -07:00

122 KiB

Vendored

Raw Blame History

View Raw

122 KiB Vendored Raw Blame History

122 KiB

Vendored

Raw Blame History