ollama37

mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-10 07:46:59 +00:00

Files

Shang Chieh Tseng 5077ab3fb4 Document Phase 9 completion: Fix CUDA backend loading for CC 3.7

Phase 9 successfully resolved runtime loading issues where CUDA backend
failed to load due to undefined Flash Attention symbols.

Solution:
- Disabled flash attention helper functions (lines 126-274 in fattn.cu)
- Simplified ggml_cuda_flash_attn_ext() to abort immediately for CC 3.7
- Added GGML_UNUSED macros to prevent compiler warnings
- Added ggml_backend_cuda_score() function for backend selection

Testing Results:
✅ CUDA backend loads without undefined symbol errors
✅ GPU layers offload correctly (e.g., 35/35 for gemma3:4b)
✅ Fast GPU inference confirmed working

Flash Attention is not supported on CC 3.7 (requires Volta/Tensor Cores).
If attempted, gracefully aborts with clear error message.

All 9 phases of CC 3.7-only optimization now complete and tested.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-10-29 17:44:36 +08:00

backend

Document Phase 9 completion: Fix CUDA backend loading for CC 3.7

2025-10-29 17:44:36 +08:00

gpt-oss (#11672 )

2025-08-05 12:21:16 -07:00

backend.go

gpt-oss (#11672 )

2025-08-05 12:21:16 -07:00

CLAUDE.md

Complete CC 3.7-only CUDA optimization for Tesla K80 support

2025-10-29 15:21:08 +08:00