subprocess llama.cpp server (#401)

* remove c code * pack llama.cpp * use request context for llama_cpp * let llama_cpp decide the number of threads to use * stop llama runner when app stops * remove sample count and duration metrics * use go generate to get libraries * tmp dir for running llm
2025-12-11 00:07:07 +00:00 · 2023-08-30 16:35:03 -04:00
parent f4432e1dba
commit 42998d797d
37 changed files with 958 additions and 43928 deletions
--- a/docs/development.md
+++ b/docs/development.md
@@ -1,19 +1,21 @@
 # Development

+- Install cmake or (optionally, required tools for GPUs)
+- run `go generate ./...`
+- run `go build .`
+
 Install required tools:

 ```
-brew install go
+brew install go cmake gcc
 ```

-Enable CGO:
+Get the required libraries:

 ```
-export CGO_ENABLED=1
+go generate ./...
 ```

-You will also need a C/C++ compiler such as GCC for MacOS and Linux or Mingw-w64 GCC for Windows.
-
 Then build ollama:

 ```