subprocess llama.cpp server (#401)

* remove c code * pack llama.cpp * use request context for llama_cpp * let llama_cpp decide the number of threads to use * stop llama runner when app stops * remove sample count and duration metrics * use go generate to get libraries * tmp dir for running llm
2025-12-10 15:57:04 +00:00 · 2023-08-30 16:35:03 -04:00
parent f4432e1dba
commit 42998d797d
37 changed files with 958 additions and 43928 deletions
--- a/.gitmodules
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "llm/llama.cpp/ggml"]
+	path = llm/llama.cpp/ggml
+	url = https://github.com/ggerganov/llama.cpp.git