Commit Graph

20 Commits

Author SHA1 Message Date
Shang Chieh Tseng
ef14fb5b26 Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support
This commit represents a complete rework after pulling the latest changes from
official ollama/ollama repository and re-applying Tesla K80 compatibility patches.

## Key Changes

### CUDA Compute Capability 3.7 Support (Tesla K80)
- Added sm_37 (compute 3.7) to CMAKE_CUDA_ARCHITECTURES in CMakeLists.txt
- Updated CMakePresets.json to include compute 3.7 in "CUDA 11" preset
- Using 37-virtual (PTX with JIT compilation) for maximum compatibility

### Legacy Toolchain Compatibility
- **NVIDIA Driver**: 470.256.02 (last version supporting Kepler/K80)
- **CUDA Version**: 11.4.4 (last CUDA 11.x supporting compute 3.7)
- **GCC Version**: 10.5.0 (required by CUDA 11.4 host_config.h)

### CPU Architecture Trade-offs
Due to GCC 10.5 limitation, sacrificed newer CPU optimizations:
- Alderlake CPU variant enabled WITHOUT AVX_VNNI (requires GCC 11+)
- Still supports: SSE4.2, AVX, F16C, AVX2, BMI2, FMA
- Performance impact: ~3-7% on newer CPUs (acceptable for K80 compatibility)

### Build System Updates
- Modified ml/backend/ggml/ggml/src/ggml-cuda/CMakeLists.txt for compute 3.7
- Added -Wno-deprecated-gpu-targets flag to suppress warnings
- Updated ml/backend/ggml/ggml/src/CMakeLists.txt for Alderlake without AVX_VNNI

### Upstream Sync
Merged latest llama.cpp changes including:
- Enhanced KV cache management with ISWA and hybrid memory support
- Improved multi-modal support (mtmd framework)
- New model architectures (Gemma3, Llama4, Qwen3, etc.)
- GPU backend improvements for CUDA, Metal, and ROCm
- Updated quantization support and GGUF format handling

### Documentation
- Updated CLAUDE.md with comprehensive build instructions
- Documented toolchain constraints and CPU architecture trade-offs
- Removed outdated CI/CD workflows (tesla-k80-*.yml)
- Cleaned up temporary development artifacts

## Rationale

This fork maintains Tesla K80 GPU support (compute 3.7) which was dropped in
official Ollama due to legacy driver/CUDA requirements. The toolchain constraint
creates a deadlock:
- K80 → Driver 470 → CUDA 11.4 → GCC 10 → No AVX_VNNI

We accept the loss of cutting-edge CPU optimizations to enable running modern
LLMs on legacy but still capable Tesla K80 hardware (12GB VRAM per GPU).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 14:03:05 +08:00
Bruce MacDonald
feb8923ada cmd: add ellipses to truncated show metadata (#10717)
When a piece of information has been truncated in the show output an ellipses to indicate that more data has not been displayed
2025-05-15 15:45:52 -07:00
Michael Yang
6e9a7a2568 lint: enable usetesting, disable tenv (#10594) 2025-05-08 11:42:14 -07:00
Michael Yang
d931ee8f22 create blobs in parallel (#10135)
* default max term height
* error on out of tree files
2025-05-05 11:59:26 -07:00
CYJiang
64a9cc8f05 cmd: add missing file close in tests (#10179) 2025-04-14 07:49:41 -04:00
Bruce MacDonald
e172f095ba api: return model capabilities from the show endpoint (#10066)
With support for multimodal models becoming more varied and common it is important for clients to be able to easily see what capabilities a model has. Retuning these from the show endpoint will allow clients to easily see what a model can do.
2025-04-01 15:21:46 -07:00
Patrick Devine
6d1103048e fix: show correct bool value for kv in verbose show information (#9928) 2025-03-21 11:13:54 -07:00
Patrick Devine
2c8b484643 fix: correctly save in interactive mode (#9788)
This fixes the case where a FROM line in previous modelfile points to a
file which may/may not be present in a different ollama instance. We
shouldn't be relying on the filename though and instead just check if
the FROM line was instead a valid model name and point to that instead.
2025-03-15 12:09:02 -07:00
Patrick Devine
4bed739259 add verbose mode to the show command (#9640)
Add metadata and tensor information to the show command to be able to
see more information about a model. This outputs the same data as
shown on the model details page on ollama.com
2025-03-13 14:24:27 -07:00
yuiseki
d721a02e7d test: add test cases for ListHandler (#9146) 2025-02-19 13:24:27 -08:00
Patrick Devine
a420a453b4 fix default modelfile for create (#8452) 2025-01-16 01:14:04 -08:00
Patrick Devine
32bd37adf8 make the modelfile path relative for ollama create (#8380) 2025-01-10 16:14:08 -08:00
Jeffrey Morgan
ff6c2d6dc8 cmd: don't rely on reading repo file for test (#7898) 2024-11-30 14:12:53 -08:00
Bruce MacDonald
a210ec74d2 cmd: print location of model after pushing (#7695)
After a user pushes their model it is not clear what to do next. Add a link
to the output of `ollama push` that tells the user where their model can now
be found.
2024-11-25 09:40:16 -08:00
Patrick Devine
d78fb62056 default to "FROM ." if a Modelfile isn't present (#7250) 2024-10-22 13:32:24 -07:00
Alex Mavrogiannis
f40bb398f6 Stop model before deletion if loaded (fixed #6957) (#7050) 2024-10-01 15:45:43 -07:00
Michael Yang
ecab6f1cc5 refactor show ouput
fixes line wrapping on long texts
2024-09-11 14:23:09 -07:00
Jeffrey Morgan
7e26a8df31 cmd: use environment variables for server options 2023-08-10 14:17:53 -07:00
Jeffrey Morgan
040a5b9750 clean up cli flags 2023-08-10 09:27:03 -07:00
cmiller01
93492f1e18 correct precedence of serve params (args over env over default) 2023-08-07 19:55:20 +00:00