ollama37

mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-09 23:37:06 +00:00

Author	SHA1	Message	Date
Shang Chieh Tseng	ef14fb5b26	Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support This commit represents a complete rework after pulling the latest changes from official ollama/ollama repository and re-applying Tesla K80 compatibility patches. ## Key Changes ### CUDA Compute Capability 3.7 Support (Tesla K80) - Added sm_37 (compute 3.7) to CMAKE_CUDA_ARCHITECTURES in CMakeLists.txt - Updated CMakePresets.json to include compute 3.7 in "CUDA 11" preset - Using 37-virtual (PTX with JIT compilation) for maximum compatibility ### Legacy Toolchain Compatibility - NVIDIA Driver: 470.256.02 (last version supporting Kepler/K80) - CUDA Version: 11.4.4 (last CUDA 11.x supporting compute 3.7) - GCC Version: 10.5.0 (required by CUDA 11.4 host_config.h) ### CPU Architecture Trade-offs Due to GCC 10.5 limitation, sacrificed newer CPU optimizations: - Alderlake CPU variant enabled WITHOUT AVX_VNNI (requires GCC 11+) - Still supports: SSE4.2, AVX, F16C, AVX2, BMI2, FMA - Performance impact: ~3-7% on newer CPUs (acceptable for K80 compatibility) ### Build System Updates - Modified ml/backend/ggml/ggml/src/ggml-cuda/CMakeLists.txt for compute 3.7 - Added -Wno-deprecated-gpu-targets flag to suppress warnings - Updated ml/backend/ggml/ggml/src/CMakeLists.txt for Alderlake without AVX_VNNI ### Upstream Sync Merged latest llama.cpp changes including: - Enhanced KV cache management with ISWA and hybrid memory support - Improved multi-modal support (mtmd framework) - New model architectures (Gemma3, Llama4, Qwen3, etc.) - GPU backend improvements for CUDA, Metal, and ROCm - Updated quantization support and GGUF format handling ### Documentation - Updated CLAUDE.md with comprehensive build instructions - Documented toolchain constraints and CPU architecture trade-offs - Removed outdated CI/CD workflows (tesla-k80-*.yml) - Cleaned up temporary development artifacts ## Rationale This fork maintains Tesla K80 GPU support (compute 3.7) which was dropped in official Ollama due to legacy driver/CUDA requirements. The toolchain constraint creates a deadlock: - K80 → Driver 470 → CUDA 11.4 → GCC 10 → No AVX_VNNI We accept the loss of cutting-edge CPU optimizations to enable running modern LLMs on legacy but still capable Tesla K80 hardware (12GB VRAM per GPU). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-05 14:03:05 +08:00
Ruyut	b72e5adb14	CONTRIBUTING: fix typo in commit message example (#11528 )	2025-07-25 14:24:06 -07:00
Devon Rifkin	c38680b8a1	CONTRIBUTING: fix code block formatting There were only 3 spaces instead of 4, so the example was being considered to include html elements	2025-04-07 13:53:33 -07:00
Blake Mizerany	2099e2d267	CONTRIBUTING: provide clarity on good commit messages, and bad (#9405 ) Also, our commit messages have been getting better, but we can do better, and be more consistent. This adds more clarity on how to write commit messages and provides examples of good and bad messages. Also, our contributing guide was lacking helpful guidance on how to start change proposals. This commit adds the start of that section. Soon, we should add a proposal template to the issue tracker with a link back to the proposal section, which should also be expanded upon.	2025-02-27 19:22:26 -08:00
Carter	369479cc30	docs: fix spelling error (#6391 ) change "dorrect" to "correct"	2024-09-04 09:42:33 -04:00
Jeffrey Morgan	8200c371ae	add `CONTRIBUTING.md` (#6349 )	2024-08-14 15:19:50 -07:00

6 Commits