mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-10 07:46:59 +00:00

Files

Jesse Gross 9679f40146 ml: Allow models to constrain inputs to a single batch

Models may require that a set of inputs all be processed as part
of the same batch. For example, if an image has multiple patches
with fully connected attention between them, we should not split
the batch in the middle of an image.

Fixes #9697

2025-03-14 15:38:54 -07:00

common

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

llamarunner

llm: remove internal subprocess req and resp types (#9324 )

2025-03-14 15:21:53 -07:00

ollamarunner

ml: Allow models to constrain inputs to a single batch

2025-03-14 15:38:54 -07:00

README.md

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

runner.go

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

README.md

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding