mirror of
https://github.com/dogkeeper886/ollama37.git
synced 2025-12-11 00:07:07 +00:00
subprocess llama.cpp server (#401)
* remove c code * pack llama.cpp * use request context for llama_cpp * let llama_cpp decide the number of threads to use * stop llama runner when app stops * remove sample count and duration metrics * use go generate to get libraries * tmp dir for running llm
This commit is contained in:
@@ -1,19 +1,21 @@
|
||||
# Development
|
||||
|
||||
- Install cmake or (optionally, required tools for GPUs)
|
||||
- run `go generate ./...`
|
||||
- run `go build .`
|
||||
|
||||
Install required tools:
|
||||
|
||||
```
|
||||
brew install go
|
||||
brew install go cmake gcc
|
||||
```
|
||||
|
||||
Enable CGO:
|
||||
Get the required libraries:
|
||||
|
||||
```
|
||||
export CGO_ENABLED=1
|
||||
go generate ./...
|
||||
```
|
||||
|
||||
You will also need a C/C++ compiler such as GCC for MacOS and Linux or Mingw-w64 GCC for Windows.
|
||||
|
||||
Then build ollama:
|
||||
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user