mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-12 00:37:04 +00:00

Files

Jesse Gross 8a35bb926e runner.go: Increase survivability of main processing loop

Currently, if an error occurs during the prep stages (such as
tokenizing) of a single request, it will only affect that request.
However, if an error happens during decoding, it can take down the
entire runner.

Instead, it's better to drop the tokens that triggered the error and try to
keep going. However, we also need to stop when we run out of tokens,
otherwise, this just causes an infinite loop. This is likely the cause
of at least some of the hanging issues that have been reported.

Bug #7573

2024-11-14 17:18:41 -08:00

cache_test.go

runner.go: Better abstract vision model integration

2024-10-30 14:53:43 -07:00

cache.go

runner.go: Make KV entry accounting more robust

2024-11-11 20:23:03 -08:00

image_test.go

runner.go: Better abstract vision model integration

2024-10-30 14:53:43 -07:00

image.go

runner.go: Check for zero length images

2024-11-08 09:39:32 -08:00

README.md

Re-introduce the llama package (#5034 )

2024-10-08 08:53:54 -07:00

requirements.go

Re-introduce the llama package (#5034 )

2024-10-08 08:53:54 -07:00

runner.go

runner.go: Increase survivability of main processing loop

2024-11-14 17:18:41 -08:00

stop_test.go

runner.go: Handle truncation of tokens for stop sequences

2024-10-09 20:39:04 -07:00

stop.go

runner.go: Handle truncation of tokens for stop sequences

2024-10-09 20:39:04 -07:00

README.md

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embeddings