Shang Chieh Tseng 2be9575694 Fix BF16 compatibility for Tesla K80 (Compute Capability 3.7)
Add runtime check for BF16 support which requires Compute Capability 8.0+.
Tesla K80 and other CC 3.7 GPUs will fallback to FP16/FP32 operations.
This ensures the upstream BF16 optimizations work on newer GPUs while
maintaining compatibility with legacy hardware.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-08 15:15:49 +08:00
2025-08-05 16:46:24 -07:00
2025-05-12 11:43:00 -07:00
2024-08-01 17:06:06 -07:00
2025-08-05 12:21:16 -07:00
2025-08-05 15:56:12 -07:00
2025-08-05 12:21:16 -07:00
2025-08-06 18:54:20 -07:00
2025-08-05 16:46:24 -07:00
2025-08-05 16:46:24 -07:00
2023-08-22 09:40:58 -07:00
2025-01-29 15:03:38 -08:00
2025-07-23 13:23:32 -07:00
2023-06-26 15:57:13 -04:00
2024-08-01 17:06:06 -07:00
2024-07-30 21:01:12 -07:00

Ollama37 🚀

Tesla K80 Compatible Ollama Fork

Run modern LLMs on NVIDIA Tesla K80 and other CUDA Compute Capability 3.7 GPUs. While official Ollama dropped legacy GPU support, Ollama37 keeps your Tesla K80 hardware functional with the latest models and features.

Key Features

  • Tesla K80 Support - Full compatibility with CUDA Compute Capability 3.7
  • 🔄 Always Current - Synced with upstream Ollama for latest models and fixes
  • 🛠️ Optimized Build - CUDA 11 toolchain for maximum legacy GPU compatibility
  • 💰 Cost Effective - Leverage existing hardware without expensive upgrades

Quick Start

# Pull and run
docker pull dogkeeper886/ollama37
docker run --runtime=nvidia --gpus all -p 11434:11434 dogkeeper886/ollama37

Docker Compose

services:
  ollama:
    image: dogkeeper886/ollama37
    ports: ["11434:11434"]
    volumes: ["./.ollama:/root/.ollama"]
    runtime: nvidia
    restart: unless-stopped
docker-compose up -d

Usage

Run Your First Model

# Download and run a model
ollama pull gemma3
ollama run gemma3 "Why is the sky blue?"

# Interactive chat
ollama run gemma3

Supported Models

All models from ollama.com/library including Llama 3.2, Gemma3n, Qwen 2.5, Phi-4, and Code Llama.

REST API

# Generate response
curl http://localhost:11434/api/generate -d '{"model": "gemma3, "prompt": "Hello Tesla K80!"}'

# Chat
curl http://localhost:11434/api/chat -d '{"model": "gemma3, "messages": [{"role": "user", "content": "Hello!"}]}'

Technical Details

Tesla K80 Support

  • CUDA 3.7 Support: Maintained via CMAKE_CUDA_ARCHITECTURES "37;50;61;70;75;80"
  • CUDA 11 Toolchain: Compatible with legacy GPUs (CUDA 12 dropped 3.7 support)
  • Optimized Builds: Tesla K80-specific performance tuning

Recent Updates

  • v1.3.0 (2025-07-19): Added Gemma3n, Qwen2.5VL, latest upstream sync
  • v1.2.0 (2025-05-06): Qwen3, Gemma 3 12B, Phi-4 14B support

Building from Source

Docker Build

docker build -f ollama37.Dockerfile -t ollama37 .

Manual Build

For detailed manual compilation instructions including CUDA 11.4, GCC 10, and CMake setup, see our Manual Build Guide.

Contributing

Found an issue or want to contribute? Check our GitHub issues or submit Tesla K80-specific bug reports and compatibility fixes.

License

Same license as upstream Ollama. See LICENSE file for details.

Advanced Usage

Custom Models

# Import GGUF model
ollama create custom-model -f Modelfile

# Customize existing model
echo 'FROM llama3.2
PARAMETER temperature 0.8
SYSTEM "You are a helpful Tesla K80 expert."' > Modelfile
ollama create tesla-expert -f Modelfile

CLI Commands

ollama list              # List models
ollama show llama3.2     # Model info  
ollama ps               # Running models
ollama stop llama3.2    # Stop model
ollama serve            # Start server

Libraries & Community

See API documentation for complete REST API reference.

Description
No description provided
Readme MIT 39 MiB
Languages
Go 89.9%
GLSL 6.9%
Shell 0.7%
TypeScript 0.6%
PowerShell 0.5%
Other 1.3%