Switch back to subprocessing for llama.cpp

This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems.  This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
This commit is contained in:
Daniel Hiltgen
2024-03-14 10:24:13 -07:00
parent 3b6a9154dd
commit 58d95cc9bd
35 changed files with 1416 additions and 1910 deletions

8
llm/llm_darwin_arm64.go Normal file
View File

@@ -0,0 +1,8 @@
package llm
import (
"embed"
)
//go:embed build/darwin/arm64/*/bin/*
var libEmbed embed.FS