Add support for new models and fix GitHub issues

- Add Gemma3n model support with text generation capabilities
- Add new CUDA mean operations for improved performance
- Add macOS documentation and performance tests
- Update LLAMA patches for ROCm/CUDA compatibility
- Fix various model conversion and processing issues
- Update CI workflows and build configurations
- Add library model tests and Shakespeare test data

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Shang Chieh Tseng
2025-07-20 00:12:36 +08:00
parent 1fa71c2670
commit cbcbc9ae07
81 changed files with 132316 additions and 747 deletions

View File

@@ -219,7 +219,7 @@ func Uint(key string, defaultValue uint) func() uint {
var (
// NumParallel sets the number of parallel model requests. NumParallel can be configured via the OLLAMA_NUM_PARALLEL environment variable.
NumParallel = Uint("OLLAMA_NUM_PARALLEL", 0)
NumParallel = Uint("OLLAMA_NUM_PARALLEL", 1)
// MaxRunners sets the maximum number of loaded models. MaxRunners can be configured via the OLLAMA_MAX_LOADED_MODELS environment variable.
MaxRunners = Uint("OLLAMA_MAX_LOADED_MODELS", 0)
// MaxQueue sets the maximum number of queued requests. MaxQueue can be configured via the OLLAMA_MAX_QUEUE environment variable.