mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-18 11:47:07 +00:00

Files

Jesse Gross c3ff916431 runner.go: Don't add inputs to cache view until actually processed

We need to track which tokens are in the cache ourselves. We currently
add tokens to the cache tracker when we add them to batch but they are
not actually in the cache until we call Decode. This can cause
confusion when we are shifting the cache.

Avoids "could not find a KV slot for the batch" issues.

Bug #7545

2024-11-20 12:49:24 -08:00

cache_test.go

runner.go: Better abstract vision model integration

2024-10-30 14:53:43 -07:00

cache.go

runner.go: Don't add inputs to cache view until actually processed

2024-11-20 12:49:24 -08:00

image_test.go

runner.go: Better abstract vision model integration

2024-10-30 14:53:43 -07:00

image.go

runner.go: Check for zero length images

2024-11-08 09:39:32 -08:00

README.md

Re-introduce the llama package (#5034 )

2024-10-08 08:53:54 -07:00

requirements.go

Re-introduce the llama package (#5034 )

2024-10-08 08:53:54 -07:00

runner.go

runner.go: Don't add inputs to cache view until actually processed

2024-11-20 12:49:24 -08:00

stop_test.go

runner.go: Handle truncation of tokens for stop sequences

2024-10-09 20:39:04 -07:00

stop.go

runner.go: Handle truncation of tokens for stop sequences

2024-10-09 20:39:04 -07:00

README.md

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embeddings