73 Commits

Author SHA1 Message Date
Shang Chieh Tseng
ef14fb5b26 Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support
This commit represents a complete rework after pulling the latest changes from
official ollama/ollama repository and re-applying Tesla K80 compatibility patches.

## Key Changes

### CUDA Compute Capability 3.7 Support (Tesla K80)
- Added sm_37 (compute 3.7) to CMAKE_CUDA_ARCHITECTURES in CMakeLists.txt
- Updated CMakePresets.json to include compute 3.7 in "CUDA 11" preset
- Using 37-virtual (PTX with JIT compilation) for maximum compatibility

### Legacy Toolchain Compatibility
- **NVIDIA Driver**: 470.256.02 (last version supporting Kepler/K80)
- **CUDA Version**: 11.4.4 (last CUDA 11.x supporting compute 3.7)
- **GCC Version**: 10.5.0 (required by CUDA 11.4 host_config.h)

### CPU Architecture Trade-offs
Due to GCC 10.5 limitation, sacrificed newer CPU optimizations:
- Alderlake CPU variant enabled WITHOUT AVX_VNNI (requires GCC 11+)
- Still supports: SSE4.2, AVX, F16C, AVX2, BMI2, FMA
- Performance impact: ~3-7% on newer CPUs (acceptable for K80 compatibility)

### Build System Updates
- Modified ml/backend/ggml/ggml/src/ggml-cuda/CMakeLists.txt for compute 3.7
- Added -Wno-deprecated-gpu-targets flag to suppress warnings
- Updated ml/backend/ggml/ggml/src/CMakeLists.txt for Alderlake without AVX_VNNI

### Upstream Sync
Merged latest llama.cpp changes including:
- Enhanced KV cache management with ISWA and hybrid memory support
- Improved multi-modal support (mtmd framework)
- New model architectures (Gemma3, Llama4, Qwen3, etc.)
- GPU backend improvements for CUDA, Metal, and ROCm
- Updated quantization support and GGUF format handling

### Documentation
- Updated CLAUDE.md with comprehensive build instructions
- Documented toolchain constraints and CPU architecture trade-offs
- Removed outdated CI/CD workflows (tesla-k80-*.yml)
- Cleaned up temporary development artifacts

## Rationale

This fork maintains Tesla K80 GPU support (compute 3.7) which was dropped in
official Ollama due to legacy driver/CUDA requirements. The toolchain constraint
creates a deadlock:
- K80 → Driver 470 → CUDA 11.4 → GCC 10 → No AVX_VNNI

We accept the loss of cutting-edge CPU optimizations to enable running modern
LLMs on legacy but still capable Tesla K80 hardware (12GB VRAM per GPU).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 14:03:05 +08:00
Shang Chieh Tseng
cbcbc9ae07 Add support for new models and fix GitHub issues
- Add Gemma3n model support with text generation capabilities
- Add new CUDA mean operations for improved performance
- Add macOS documentation and performance tests
- Update LLAMA patches for ROCm/CUDA compatibility
- Fix various model conversion and processing issues
- Update CI workflows and build configurations
- Add library model tests and Shakespeare test data

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-20 00:12:36 +08:00
Patrick Devine
aa25aff10d client: add request signing to the client (#10881)
If OLLAMA_AUTH is set, sign each request w/ a timestamp and pass the signature in the token header
2025-05-27 16:50:57 -07:00
Steven Hartland
be2ac1ed93 docs: fix api examples link (#9360)
Fix the examples link in the go package documentation for the API.
2025-02-27 10:51:12 -08:00
Bruce MacDonald
14b5a9a150 api: document client stream behavior with a test (#8996)
Added unit tests to verify error handling behavior in the Client.stream and Client.do methods.
Tests cover various error scenarios including:
- Error responses with status codes >= 400
- Error messages with successful status codes
- Empty error messages
- Successful responses
2025-02-20 13:19:58 -08:00
Evan
76b2b723b2 api: fix typo in python ClientFromEnvironment docs (#7604) 2024-11-10 17:30:27 -08:00
longtao
0a8d6ea86d Fix typo and improve readability (#5964)
* Fix typo and improve readability

Summary:
* Rename updatAvailableMenuID to updateAvailableMenuID
* Replace unused cmd parameter with _ in RunServer function
* Fix typos in comments

(cherry picked from commit 5b8715f0b04773369e8eb1f9e6737995a0ab3ba7)

* Update api/client.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-08-13 17:54:19 -07:00
Michael Yang
b732beba6a lint 2024-08-01 17:06:06 -07:00
Michael Yang
4f1afd575d host 2024-07-22 11:25:30 -07:00
royjhan
b9f5e16c80 Introduce /api/embed endpoint supporting batch embedding (#5127)
* Initial Batch Embedding

* Revert "Initial Batch Embedding"

This reverts commit c22d54895a280b54c727279d85a5fc94defb5a29.

* Initial Draft

* mock up notes

* api/embed draft

* add server function

* check normalization

* clean up

* normalization

* playing around with truncate stuff

* Truncation

* Truncation

* move normalization to go

* Integration Test Template

* Truncation Integration Tests

* Clean up

* use float32

* move normalize

* move normalize test

* refactoring

* integration float32

* input handling and handler testing

* Refactoring of legacy and new

* clear comments

* merge conflicts

* touches

* embedding type 64

* merge conflicts

* fix hanging on single string

* refactoring

* test values

* set context length

* clean up

* testing clean up

* testing clean up

* remove function closure

* Revert "remove function closure"

This reverts commit 55d48c6ed17abe42e7a122e69d603ef0c1506787.

* remove function closure

* remove redundant error check

* clean up

* more clean up

* clean up
2024-07-15 12:14:24 -07:00
Patrick Devine
c69bc19e46 move OLLAMA_HOST to envconfig (#5009) 2024-06-12 18:48:16 -04:00
royjhan
4bf1da4944 Separate ListResponse and ModelResponse for api/tags vs api/ps (#4842)
* Remove false time fields

* Struct Separation for List and Process

* Remove Marshaler
2024-06-06 10:11:45 -07:00
Patrick Devine
6845988807 Ollama ps command for showing currently loaded models (#4327) 2024-05-13 17:17:36 -07:00
Eli Bendersky
d77c1c5f9d api: fill up API documentation (#3596)
* api: fill up API documentation

Followup for #2878

Now that the documentation is more complete, mention it in the README.

Updates #2840

* fix typo/lint

* Update README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-07 16:27:46 -07:00
Patrick Devine
9009bedf13 better checking for OLLAMA_HOST variable (#3661) 2024-04-29 19:14:07 -04:00
Daniel Hiltgen
34b9db5afc Request and model concurrency
This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
2024-04-22 19:29:12 -07:00
Eli Bendersky
ad90b9ab3d api: start adding documentation to package api (#2878)
* api: start adding documentation to package api

Updates #2840

* Fix lint typo report
2024-04-10 13:31:55 -04:00
Michael Yang
e1c9a2a00f no blob create if already exists 2024-04-08 15:09:48 -07:00
Patrick Devine
1b272d5bcd change github.com/jmorganca/ollama to github.com/ollama/ollama (#3347) 2024-03-26 13:04:17 -07:00
Michael Yang
897b213468 use http.DefaultClient (#2530)
default client already handles proxy
2024-02-20 18:34:47 -05:00
Brian Murray
0d6e3565ae Add embeddings to API (#1773) 2024-01-04 15:00:52 -05:00
Michael Yang
c3ff36088b Merge pull request #774 from jmorganca/mxyng/server-version
add version api and show server version in cli
2023-12-06 13:22:55 -08:00
Bruce MacDonald
195e3d9dbd chat api endpoint (#1392) 2023-12-05 14:57:33 -05:00
Michael Yang
0db4706ec2 api: add version api handler 2023-12-05 09:36:01 -08:00
Jeffrey Morgan
00d06619a1 Revert "chat api (#991)" while context variable is fixed
This reverts commit 7a0899d62d.
2023-12-04 21:16:27 -08:00
Bruce MacDonald
7a0899d62d chat api (#991)
- update chat docs
- add messages chat endpoint
- remove deprecated context and template generate parameters from docs
- context and template are still supported for the time being and will continue to work as expected
- add partial response to chat history
2023-12-04 18:01:06 -05:00
Michael Yang
1901044b07 use checksum reference 2023-11-15 15:16:23 -08:00
Michael Yang
1552cee59f client create modelfile 2023-11-15 15:16:23 -08:00
Michael Yang
60bb3c03a1 use http.Method 2023-11-02 13:12:45 -07:00
Bruce MacDonald
5c3491f425 allow for a configurable ollama model storage directory (#897)
* allow for a configurable ollama models directory

- set OLLAMA_MODELS in the environment that ollama is running in to change where model files are stored
- update docs

Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>
Co-Authored-By: Jay Nakrani <dhananjaynakrani@gmail.com>
Co-Authored-By: Akhil Acharya <akhilcacharya@gmail.com>
Co-Authored-By: Sasha Devol <sasha.devol@protonmail.com>
2023-10-27 10:19:59 -04:00
Michael Yang
28c3f288e2 client: fix trailing slash 2023-10-26 11:09:38 -07:00
Michael Yang
459f4a7889 fix: ollama host for hostname 2023-10-20 11:32:41 -07:00
Michael Yang
92189a5855 fix memory check 2023-10-13 14:47:29 -07:00
Michael Yang
b599946b74 add format bytes 2023-10-11 14:08:23 -07:00
Bruce MacDonald
274d5a5fdf optional parameter to not stream response (#639)
* update streaming request accept header
* add optional stream param to request bodies
2023-10-11 12:54:27 -04:00
Michael Yang
2cfffea02e handle client proxy 2023-10-09 12:33:47 -07:00
Bruce MacDonald
9e2de1bd2c increase streaming buffer size (#692) 2023-10-04 14:09:00 -04:00
Patrick Devine
790d24eb7b add show command (#474) 2023-09-06 11:04:17 -07:00
Michael Yang
246dc65417 loosen http status code checks 2023-08-28 18:34:53 -04:00
Jeffrey Morgan
22ab7f5f88 default host to 127.0.0.1, fixes #424 2023-08-26 11:59:28 -07:00
Michael Yang
2c7f956b38 add version 2023-08-22 09:40:58 -07:00
Jeffrey Morgan
54bb49a502 parse protocol for OLLAMA_HOST 2023-08-17 18:20:44 -04:00
Jeffrey Morgan
5ee6116420 set default OLLAMA_HOST to http://localhost:11434 2023-08-16 12:22:59 -04:00
Blake Mizerany
67e593e355 cmd: support OLLAMA_CLIENT_HOST environment variable (#262)
* cmd: support OLLAMA_HOST environment variable

This commit adds support for the OLLAMA_HOST environment
variable. This variable can be used to specify the host to which
the client should connect. This is useful when the client is
running somewhere other than the host where the server is running.

The new api.FromEnv function is used to read configure clients from the
environment. Clients wishing to use the environment variable being
consistent with the Ollama CLI can use this new function.

* Update api/client.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update api/client.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-08-16 11:03:48 -04:00
Bruce MacDonald
765994362c use head to check heartbeat 2023-08-01 14:50:38 -04:00
Bruce MacDonald
e72fe7945f check server is running before running command 2023-07-31 16:25:57 -04:00
Bruce MacDonald
4c1caa3733 download models when creating from modelfile 2023-07-25 14:25:13 -04:00
Bruce MacDonald
536028c35a better error message when model not found on pull 2023-07-24 17:48:17 -04:00
Patrick Devine
4cb42ca55e add copy command (#191) 2023-07-24 11:27:28 -04:00
Patrick Devine
6d6b0d3321 change error handler behavior and fix error when a model isn't found (#173) 2023-07-21 23:02:12 -07:00