next ollama runner (#7913)

mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-12 08:47:01 +00:00

feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling (#8410)
- integration with Ollama and KV caching (#8301)
- more model support (#9080) with more coming soon

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

This commit is contained in:

Michael Yang

2025-02-14 00:31:21 +00:00

committed by

GitHub

parent 8cf16063a5

commit 58245413f4

57 changed files with 475427 additions and 494 deletions

									
										13

sample/greedy.go
									
										Normal file
									
												View File
												
				@@ -0,0 +1,13 @@

				package sample

				import "gonum.org/v1/gonum/floats"

				type greedy struct{}

				func Greedy() Sampler {

					return greedy{}

				}

				func (s greedy) Sample(t []float64) ([]float64, error) {

					return []float64{float64(floats.MaxIdx(t))}, nil

				}

next ollama runner (#7913)

13 sample/greedy.go Normal file Unescape Escape View File

13

sample/greedy.go Normal file

View File