diff --git a/README.md b/README.md index 18cc405b..9a018975 100644 --- a/README.md +++ b/README.md @@ -1,624 +1,548 @@ -
- ollama - -
+# Ollama37 Docker Build System -# Ollama +**Two-stage Docker build for Ollama with CUDA 11.4 and Compute Capability 3.7 support (Tesla K80)** -Get up and running with large language models. +## Overview -### macOS +This Docker build system uses a two-stage architecture to build and run Ollama with Tesla K80 (compute capability 3.7) support: -[Download](https://ollama.com/download/Ollama.dmg) +1. **Builder Image** (`builder/Dockerfile`) - Base environment with build tools + - Rocky Linux 8 + - CUDA 11.4 toolkit (required for Tesla K80) + - GCC 10 (built from source, required by CUDA 11.4) + - CMake 4.0 (built from source) + - Go 1.25.3 -### Windows +2. **Runtime Image** (`runtime/Dockerfile`) - Two-stage build process + - **Stage 1 (compile)**: Clone source → Configure CMake → Build C/C++/CUDA → Build Go binary + - **Stage 2 (runtime)**: Copy artifacts → Setup runtime environment -[Download](https://ollama.com/download/OllamaSetup.exe) +The runtime uses the builder image as its base to ensure library path compatibility between build and runtime environments. -### Linux +## Prerequisites -```shell -curl -fsSL https://ollama.com/install.sh | sh +- Docker with NVIDIA Container Runtime +- Docker Compose +- NVIDIA GPU drivers (470+ for Tesla K80) +- Verify GPU access: + ```bash + docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi + ``` + +## Quick Start + +### 1. Build Images + +```bash +cd /home/jack/Documents/ollama37/docker +make build ``` -[Manual install instructions](https://docs.ollama.com/linux#manual-install) +This will: +1. Build the builder image (if not present) - **~90 minutes first time** +2. Build the runtime image - **~10 minutes** -### Docker +**First-time build:** ~100 minutes total (includes building GCC 10 and CMake 4 from source) -The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) `ollama/ollama` is available on Docker Hub. +**Subsequent builds:** ~10 minutes (builder image is cached) -### Libraries +### 2. Run with Docker Compose (Recommended) -- [ollama-python](https://github.com/ollama/ollama-python) -- [ollama-js](https://github.com/ollama/ollama-js) - -### Community - -- [Discord](https://discord.gg/ollama) -- [Reddit](https://reddit.com/r/ollama) - -## Quickstart - -To run and chat with [Gemma 3](https://ollama.com/library/gemma3): - -```shell -ollama run gemma3 +```bash +docker compose up -d ``` -## Model library - -Ollama supports a list of models available on [ollama.com/library](https://ollama.com/library 'ollama model library') - -Here are some example models that can be downloaded: - -| Model | Parameters | Size | Download | -| ------------------ | ---------- | ----- | -------------------------------- | -| Gemma 3 | 1B | 815MB | `ollama run gemma3:1b` | -| Gemma 3 | 4B | 3.3GB | `ollama run gemma3` | -| Gemma 3 | 12B | 8.1GB | `ollama run gemma3:12b` | -| Gemma 3 | 27B | 17GB | `ollama run gemma3:27b` | -| QwQ | 32B | 20GB | `ollama run qwq` | -| DeepSeek-R1 | 7B | 4.7GB | `ollama run deepseek-r1` | -| DeepSeek-R1 | 671B | 404GB | `ollama run deepseek-r1:671b` | -| Llama 4 | 109B | 67GB | `ollama run llama4:scout` | -| Llama 4 | 400B | 245GB | `ollama run llama4:maverick` | -| Llama 3.3 | 70B | 43GB | `ollama run llama3.3` | -| Llama 3.2 | 3B | 2.0GB | `ollama run llama3.2` | -| Llama 3.2 | 1B | 1.3GB | `ollama run llama3.2:1b` | -| Llama 3.2 Vision | 11B | 7.9GB | `ollama run llama3.2-vision` | -| Llama 3.2 Vision | 90B | 55GB | `ollama run llama3.2-vision:90b` | -| Llama 3.1 | 8B | 4.7GB | `ollama run llama3.1` | -| Llama 3.1 | 405B | 231GB | `ollama run llama3.1:405b` | -| Phi 4 | 14B | 9.1GB | `ollama run phi4` | -| Phi 4 Mini | 3.8B | 2.5GB | `ollama run phi4-mini` | -| Mistral | 7B | 4.1GB | `ollama run mistral` | -| Moondream 2 | 1.4B | 829MB | `ollama run moondream` | -| Neural Chat | 7B | 4.1GB | `ollama run neural-chat` | -| Starling | 7B | 4.1GB | `ollama run starling-lm` | -| Code Llama | 7B | 3.8GB | `ollama run codellama` | -| Llama 2 Uncensored | 7B | 3.8GB | `ollama run llama2-uncensored` | -| LLaVA | 7B | 4.5GB | `ollama run llava` | -| Granite-3.3 | 8B | 4.9GB | `ollama run granite3.3` | - -> [!NOTE] -> You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models. - -## Customize a model - -### Import from GGUF - -Ollama supports importing GGUF models in the Modelfile: - -1. Create a file named `Modelfile`, with a `FROM` instruction with the local filepath to the model you want to import. - - ``` - FROM ./vicuna-33b.Q4_0.gguf - ``` - -2. Create the model in Ollama - - ```shell - ollama create example -f Modelfile - ``` - -3. Run the model - - ```shell - ollama run example - ``` - -### Import from Safetensors - -See the [guide](https://docs.ollama.com/import) on importing models for more information. - -### Customize a prompt - -Models from the Ollama library can be customized with a prompt. For example, to customize the `llama3.2` model: - -```shell -ollama pull llama3.2 +Check logs: +```bash +docker compose logs -f ``` -Create a `Modelfile`: - -``` -FROM llama3.2 - -# set the temperature to 1 [higher is more creative, lower is more coherent] -PARAMETER temperature 1 - -# set the system message -SYSTEM """ -You are Mario from Super Mario Bros. Answer as Mario, the assistant, only. -""" +Stop the server: +```bash +docker compose down ``` -Next, create and run the model: +### 3. Run Manually -``` -ollama create mario -f ./Modelfile -ollama run mario ->>> hi -Hello! It's your friend Mario. +```bash +docker run -d \ + --name ollama37 \ + --runtime=nvidia \ + --gpus all \ + -p 11434:11434 \ + -v ollama-data:/root/.ollama \ + ollama37:latest ``` -For more information on working with a Modelfile, see the [Modelfile](https://docs.ollama.com/modelfile) documentation. +## Usage -## CLI Reference +### Using the API -### Create a model +```bash +# List models +curl http://localhost:11434/api/tags -`ollama create` is used to create a model from a Modelfile. +# Pull a model +curl http://localhost:11434/api/pull -d '{"name": "gemma3:4b"}' -```shell -ollama create mymodel -f ./Modelfile -``` - -### Pull a model - -```shell -ollama pull llama3.2 -``` - -> This command can also be used to update a local model. Only the diff will be pulled. - -### Remove a model - -```shell -ollama rm llama3.2 -``` - -### Copy a model - -```shell -ollama cp llama3.2 my-model -``` - -### Multiline input - -For multiline input, you can wrap text with `"""`: - -``` ->>> """Hello, -... world! -... """ -I'm a basic program that prints the famous "Hello, world!" message to the console. -``` - -### Multimodal models - -``` -ollama run llava "What's in this image? /Users/jmorgan/Desktop/smile.png" -``` - -> **Output**: The image features a yellow smiley face, which is likely the central focus of the picture. - -### Pass the prompt as an argument - -```shell -ollama run llama3.2 "Summarize this file: $(cat README.md)" -``` - -> **Output**: Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications. - -### Show model information - -```shell -ollama show llama3.2 -``` - -### List models on your computer - -```shell -ollama list -``` - -### List which models are currently loaded - -```shell -ollama ps -``` - -### Stop a model which is currently running - -```shell -ollama stop llama3.2 -``` - -### Start Ollama - -`ollama serve` is used when you want to start ollama without running the desktop application. - -## Building - -See the [developer guide](https://github.com/ollama/ollama/blob/main/docs/development.md) - -### Running local builds - -Next, start the server: - -```shell -./ollama serve -``` - -Finally, in a separate shell, run a model: - -```shell -./ollama run llama3.2 -``` - -## REST API - -Ollama has a REST API for running and managing models. - -### Generate a response - -```shell +# Run inference curl http://localhost:11434/api/generate -d '{ - "model": "llama3.2", - "prompt":"Why is the sky blue?" + "model": "gemma3:4b", + "prompt": "Why is the sky blue?", + "stream": false }' ``` -### Chat with a model +### Using the CLI -```shell -curl http://localhost:11434/api/chat -d '{ - "model": "llama3.2", - "messages": [ - { "role": "user", "content": "why is the sky blue?" } - ] -}' +```bash +# List models +docker exec ollama37 ollama list + +# Pull a model +docker exec ollama37 ollama pull gemma3:4b + +# Run a model +docker exec ollama37 ollama run gemma3:4b "Hello!" ``` -See the [API documentation](./docs/api.md) for all endpoints. +## Architecture -## Community Integrations +### Build System Components -### Web & Desktop +``` +docker/ +├── builder/ +│ └── Dockerfile # Base image: CUDA 11.4, GCC 10, CMake 4, Go 1.25.3 +├── runtime/ +│ └── Dockerfile # Two-stage: compile ollama37, package runtime +├── Makefile # Build orchestration (images only) +├── docker-compose.yml # Runtime orchestration +└── README.md # This file +``` -- [Open WebUI](https://github.com/open-webui/open-webui) -- [SwiftChat (macOS with ReactNative)](https://github.com/aws-samples/swift-chat) -- [Enchanted (macOS native)](https://github.com/AugustDev/enchanted) -- [Hollama](https://github.com/fmaclen/hollama) -- [Lollms-Webui](https://github.com/ParisNeo/lollms-webui) -- [LibreChat](https://github.com/danny-avila/LibreChat) -- [Bionic GPT](https://github.com/bionic-gpt/bionic-gpt) -- [HTML UI](https://github.com/rtcfirefly/ollama-ui) -- [Saddle](https://github.com/jikkuatwork/saddle) -- [TagSpaces](https://www.tagspaces.org) (A platform for file-based apps, [utilizing Ollama](https://docs.tagspaces.org/ai/) for the generation of tags and descriptions) -- [Chatbot UI](https://github.com/ivanfioravanti/chatbot-ollama) -- [Chatbot UI v2](https://github.com/mckaywrigley/chatbot-ui) -- [Typescript UI](https://github.com/ollama-interface/Ollama-Gui?tab=readme-ov-file) -- [Minimalistic React UI for Ollama Models](https://github.com/richawo/minimal-llm-ui) -- [Ollamac](https://github.com/kevinhermawan/Ollamac) -- [big-AGI](https://github.com/enricoros/big-AGI) -- [Cheshire Cat assistant framework](https://github.com/cheshire-cat-ai/core) -- [Amica](https://github.com/semperai/amica) -- [chatd](https://github.com/BruceMacD/chatd) -- [Ollama-SwiftUI](https://github.com/kghandour/Ollama-SwiftUI) -- [Dify.AI](https://github.com/langgenius/dify) -- [MindMac](https://mindmac.app) -- [NextJS Web Interface for Ollama](https://github.com/jakobhoeg/nextjs-ollama-llm-ui) -- [Msty](https://msty.app) -- [Chatbox](https://github.com/Bin-Huang/Chatbox) -- [WinForm Ollama Copilot](https://github.com/tgraupmann/WinForm_Ollama_Copilot) -- [NextChat](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web) with [Get Started Doc](https://docs.nextchat.dev/models/ollama) -- [Alpaca WebUI](https://github.com/mmo80/alpaca-webui) -- [OllamaGUI](https://github.com/enoch1118/ollamaGUI) -- [OpenAOE](https://github.com/InternLM/OpenAOE) -- [Odin Runes](https://github.com/leonid20000/OdinRunes) -- [LLM-X](https://github.com/mrdjohnson/llm-x) (Progressive Web App) -- [AnythingLLM (Docker + MacOs/Windows/Linux native app)](https://github.com/Mintplex-Labs/anything-llm) -- [Ollama Basic Chat: Uses HyperDiv Reactive UI](https://github.com/rapidarchitect/ollama_basic_chat) -- [Ollama-chats RPG](https://github.com/drazdra/ollama-chats) -- [IntelliBar](https://intellibar.app/) (AI-powered assistant for macOS) -- [Jirapt](https://github.com/AliAhmedNada/jirapt) (Jira Integration to generate issues, tasks, epics) -- [ojira](https://github.com/AliAhmedNada/ojira) (Jira chrome plugin to easily generate descriptions for tasks) -- [QA-Pilot](https://github.com/reid41/QA-Pilot) (Interactive chat tool that can leverage Ollama models for rapid understanding and navigation of GitHub code repositories) -- [ChatOllama](https://github.com/sugarforever/chat-ollama) (Open Source Chatbot based on Ollama with Knowledge Bases) -- [CRAG Ollama Chat](https://github.com/Nagi-ovo/CRAG-Ollama-Chat) (Simple Web Search with Corrective RAG) -- [RAGFlow](https://github.com/infiniflow/ragflow) (Open-source Retrieval-Augmented Generation engine based on deep document understanding) -- [StreamDeploy](https://github.com/StreamDeploy-DevRel/streamdeploy-llm-app-scaffold) (LLM Application Scaffold) -- [chat](https://github.com/swuecho/chat) (chat web app for teams) -- [Lobe Chat](https://github.com/lobehub/lobe-chat) with [Integrating Doc](https://lobehub.com/docs/self-hosting/examples/ollama) -- [Ollama RAG Chatbot](https://github.com/datvodinh/rag-chatbot.git) (Local Chat with multiple PDFs using Ollama and RAG) -- [BrainSoup](https://www.nurgo-software.com/products/brainsoup) (Flexible native client with RAG & multi-agent automation) -- [macai](https://github.com/Renset/macai) (macOS client for Ollama, ChatGPT, and other compatible API back-ends) -- [RWKV-Runner](https://github.com/josStorer/RWKV-Runner) (RWKV offline LLM deployment tool, also usable as a client for ChatGPT and Ollama) -- [Ollama Grid Search](https://github.com/dezoito/ollama-grid-search) (app to evaluate and compare models) -- [Olpaka](https://github.com/Otacon/olpaka) (User-friendly Flutter Web App for Ollama) -- [Casibase](https://casibase.org) (An open source AI knowledge base and dialogue system combining the latest RAG, SSO, ollama support, and multiple large language models.) -- [OllamaSpring](https://github.com/CrazyNeil/OllamaSpring) (Ollama Client for macOS) -- [LLocal.in](https://github.com/kartikm7/llocal) (Easy to use Electron Desktop Client for Ollama) -- [Shinkai Desktop](https://github.com/dcSpark/shinkai-apps) (Two click install Local AI using Ollama + Files + RAG) -- [AiLama](https://github.com/zeyoyt/ailama) (A Discord User App that allows you to interact with Ollama anywhere in Discord) -- [Ollama with Google Mesop](https://github.com/rapidarchitect/ollama_mesop/) (Mesop Chat Client implementation with Ollama) -- [R2R](https://github.com/SciPhi-AI/R2R) (Open-source RAG engine) -- [Ollama-Kis](https://github.com/elearningshow/ollama-kis) (A simple easy-to-use GUI with sample custom LLM for Drivers Education) -- [OpenGPA](https://opengpa.org) (Open-source offline-first Enterprise Agentic Application) -- [Painting Droid](https://github.com/mateuszmigas/painting-droid) (Painting app with AI integrations) -- [Kerlig AI](https://www.kerlig.com/) (AI writing assistant for macOS) -- [AI Studio](https://github.com/MindWorkAI/AI-Studio) -- [Sidellama](https://github.com/gyopak/sidellama) (browser-based LLM client) -- [LLMStack](https://github.com/trypromptly/LLMStack) (No-code multi-agent framework to build LLM agents and workflows) -- [BoltAI for Mac](https://boltai.com) (AI Chat Client for Mac) -- [Harbor](https://github.com/av/harbor) (Containerized LLM Toolkit with Ollama as default backend) -- [PyGPT](https://github.com/szczyglis-dev/py-gpt) (AI desktop assistant for Linux, Windows, and Mac) -- [Alpaca](https://github.com/Jeffser/Alpaca) (An Ollama client application for Linux and macOS made with GTK4 and Adwaita) -- [AutoGPT](https://github.com/Significant-Gravitas/AutoGPT/blob/master/docs/content/platform/ollama.md) (AutoGPT Ollama integration) -- [Go-CREW](https://www.jonathanhecl.com/go-crew/) (Powerful Offline RAG in Golang) -- [PartCAD](https://github.com/openvmp/partcad/) (CAD model generation with OpenSCAD and CadQuery) -- [Ollama4j Web UI](https://github.com/ollama4j/ollama4j-web-ui) - Java-based Web UI for Ollama built with Vaadin, Spring Boot, and Ollama4j -- [PyOllaMx](https://github.com/kspviswa/pyOllaMx) - macOS application capable of chatting with both Ollama and Apple MLX models. -- [Cline](https://github.com/cline/cline) - Formerly known as Claude Dev is a VSCode extension for multi-file/whole-repo coding -- [Cherry Studio](https://github.com/kangfenmao/cherry-studio) (Desktop client with Ollama support) -- [ConfiChat](https://github.com/1runeberg/confichat) (Lightweight, standalone, multi-platform, and privacy-focused LLM chat interface with optional encryption) -- [Archyve](https://github.com/nickthecook/archyve) (RAG-enabling document library) -- [crewAI with Mesop](https://github.com/rapidarchitect/ollama-crew-mesop) (Mesop Web Interface to run crewAI with Ollama) -- [Tkinter-based client](https://github.com/chyok/ollama-gui) (Python tkinter-based Client for Ollama) -- [LLMChat](https://github.com/trendy-design/llmchat) (Privacy focused, 100% local, intuitive all-in-one chat interface) -- [Local Multimodal AI Chat](https://github.com/Leon-Sander/Local-Multimodal-AI-Chat) (Ollama-based LLM Chat with support for multiple features, including PDF RAG, voice chat, image-based interactions, and integration with OpenAI.) -- [ARGO](https://github.com/xark-argo/argo) (Locally download and run Ollama and Huggingface models with RAG and deep research on Mac/Windows/Linux) -- [OrionChat](https://github.com/EliasPereirah/OrionChat) - OrionChat is a web interface for chatting with different AI providers -- [G1](https://github.com/bklieger-groq/g1) (Prototype of using prompting strategies to improve the LLM's reasoning through o1-like reasoning chains.) -- [Web management](https://github.com/lemonit-eric-mao/ollama-web-management) (Web management page) -- [Promptery](https://github.com/promptery/promptery) (desktop client for Ollama.) -- [Ollama App](https://github.com/JHubi1/ollama-app) (Modern and easy-to-use multi-platform client for Ollama) -- [chat-ollama](https://github.com/annilq/chat-ollama) (a React Native client for Ollama) -- [SpaceLlama](https://github.com/tcsenpai/spacellama) (Firefox and Chrome extension to quickly summarize web pages with ollama in a sidebar) -- [YouLama](https://github.com/tcsenpai/youlama) (Webapp to quickly summarize any YouTube video, supporting Invidious as well) -- [DualMind](https://github.com/tcsenpai/dualmind) (Experimental app allowing two models to talk to each other in the terminal or in a web interface) -- [ollamarama-matrix](https://github.com/h1ddenpr0cess20/ollamarama-matrix) (Ollama chatbot for the Matrix chat protocol) -- [ollama-chat-app](https://github.com/anan1213095357/ollama-chat-app) (Flutter-based chat app) -- [Perfect Memory AI](https://www.perfectmemory.ai/) (Productivity AI assists personalized by what you have seen on your screen, heard, and said in the meetings) -- [Hexabot](https://github.com/hexastack/hexabot) (A conversational AI builder) -- [Reddit Rate](https://github.com/rapidarchitect/reddit_analyzer) (Search and Rate Reddit topics with a weighted summation) -- [OpenTalkGpt](https://github.com/adarshM84/OpenTalkGpt) (Chrome Extension to manage open-source models supported by Ollama, create custom models, and chat with models from a user-friendly UI) -- [VT](https://github.com/vinhnx/vt.ai) (A minimal multimodal AI chat app, with dynamic conversation routing. Supports local models via Ollama) -- [Nosia](https://github.com/nosia-ai/nosia) (Easy to install and use RAG platform based on Ollama) -- [Witsy](https://github.com/nbonamy/witsy) (An AI Desktop application available for Mac/Windows/Linux) -- [Abbey](https://github.com/US-Artificial-Intelligence/abbey) (A configurable AI interface server with notebooks, document storage, and YouTube support) -- [Minima](https://github.com/dmayboroda/minima) (RAG with on-premises or fully local workflow) -- [aidful-ollama-model-delete](https://github.com/AidfulAI/aidful-ollama-model-delete) (User interface for simplified model cleanup) -- [Perplexica](https://github.com/ItzCrazyKns/Perplexica) (An AI-powered search engine & an open-source alternative to Perplexity AI) -- [Ollama Chat WebUI for Docker ](https://github.com/oslook/ollama-webui) (Support for local docker deployment, lightweight ollama webui) -- [AI Toolkit for Visual Studio Code](https://aka.ms/ai-tooklit/ollama-docs) (Microsoft-official VSCode extension to chat, test, evaluate models with Ollama support, and use them in your AI applications.) -- [MinimalNextOllamaChat](https://github.com/anilkay/MinimalNextOllamaChat) (Minimal Web UI for Chat and Model Control) -- [Chipper](https://github.com/TilmanGriesel/chipper) AI interface for tinkerers (Ollama, Haystack RAG, Python) -- [ChibiChat](https://github.com/CosmicEventHorizon/ChibiChat) (Kotlin-based Android app to chat with Ollama and Koboldcpp API endpoints) -- [LocalLLM](https://github.com/qusaismael/localllm) (Minimal Web-App to run ollama models on it with a GUI) -- [Ollamazing](https://github.com/buiducnhat/ollamazing) (Web extension to run Ollama models) -- [OpenDeepResearcher-via-searxng](https://github.com/benhaotang/OpenDeepResearcher-via-searxng) (A Deep Research equivalent endpoint with Ollama support for running locally) -- [AntSK](https://github.com/AIDotNet/AntSK) (Out-of-the-box & Adaptable RAG Chatbot) -- [MaxKB](https://github.com/1Panel-dev/MaxKB/) (Ready-to-use & flexible RAG Chatbot) -- [yla](https://github.com/danielekp/yla) (Web interface to freely interact with your customized models) -- [LangBot](https://github.com/RockChinQ/LangBot) (LLM-based instant messaging bots platform, with Agents, RAG features, supports multiple platforms) -- [1Panel](https://github.com/1Panel-dev/1Panel/) (Web-based Linux Server Management Tool) -- [AstrBot](https://github.com/Soulter/AstrBot/) (User-friendly LLM-based multi-platform chatbot with a WebUI, supporting RAG, LLM agents, and plugins integration) -- [Reins](https://github.com/ibrahimcetin/reins) (Easily tweak parameters, customize system prompts per chat, and enhance your AI experiments with reasoning model support.) -- [Flufy](https://github.com/Aharon-Bensadoun/Flufy) (A beautiful chat interface for interacting with Ollama's API. Built with React, TypeScript, and Material-UI.) -- [Ellama](https://github.com/zeozeozeo/ellama) (Friendly native app to chat with an Ollama instance) -- [screenpipe](https://github.com/mediar-ai/screenpipe) Build agents powered by your screen history -- [Ollamb](https://github.com/hengkysteen/ollamb) (Simple yet rich in features, cross-platform built with Flutter and designed for Ollama. Try the [web demo](https://hengkysteen.github.io/demo/ollamb/).) -- [Writeopia](https://github.com/Writeopia/Writeopia) (Text editor with integration with Ollama) -- [AppFlowy](https://github.com/AppFlowy-IO/AppFlowy) (AI collaborative workspace with Ollama, cross-platform and self-hostable) -- [Lumina](https://github.com/cushydigit/lumina.git) (A lightweight, minimal React.js frontend for interacting with Ollama servers) -- [Tiny Notepad](https://pypi.org/project/tiny-notepad) (A lightweight, notepad-like interface to chat with ollama available on PyPI) -- [macLlama (macOS native)](https://github.com/hellotunamayo/macLlama) (A native macOS GUI application for interacting with Ollama models, featuring a chat interface.) -- [GPTranslate](https://github.com/philberndt/GPTranslate) (A fast and lightweight, AI powered desktop translation application written with Rust and Tauri. Features real-time translation with OpenAI/Azure/Ollama.) -- [ollama launcher](https://github.com/NGC13009/ollama-launcher) (A launcher for Ollama, aiming to provide users with convenient functions such as ollama server launching, management, or configuration.) -- [ai-hub](https://github.com/Aj-Seven/ai-hub) (AI Hub supports multiple models via API keys and Chat support via Ollama API.) -- [Mayan EDMS](https://gitlab.com/mayan-edms/mayan-edms) (Open source document management system to organize, tag, search, and automate your files with powerful Ollama driven workflows.) -- [Serene Pub](https://github.com/doolijb/serene-pub) (Beginner friendly, open source AI Roleplaying App for Windows, Mac OS and Linux. Search, download and use models with Ollama all inside the app.) -- [Andes](https://github.com/aqerd/andes) (A Visual Studio Code extension that provides a local UI interface for Ollama models) -- [Clueless](https://github.com/KashyapTan/clueless) (Open Source & Local Cluely: A desktop application LLM assistant to help you talk to anything on your screen using locally served Ollama models. Also undetectable to screenshare) -- [ollama-co2](https://github.com/carbonatedWaterOrg/ollama-co2) (FastAPI web interface for monitoring and managing local and remote Ollama servers with real-time model monitoring and concurrent downloads) -- [Hillnote](https://hillnote.com) (A Markdown-first workspace designed to supercharge your AI workflow. Create documents ready to integrate with Claude, ChatGPT, Gemini, Cursor, and more - all while keeping your work on your device.) +### Two-Stage Build Process -### Cloud +#### Stage 1: Builder Image (`builder/Dockerfile`) +**Purpose**: Provide consistent build environment -- [Google Cloud](https://cloud.google.com/run/docs/tutorials/gpu-gemma2-with-ollama) -- [Fly.io](https://fly.io/docs/python/do-more/add-ollama/) -- [Koyeb](https://www.koyeb.com/deploy/ollama) +**Contents:** +- Rocky Linux 8 base +- CUDA 11.4 toolkit (compilation only, no driver) +- GCC 10 from source (~60 min build time) +- CMake 4.0 from source (~8 min build time) +- Go 1.25.3 binary +- All build dependencies -### Terminal +**Build time:** ~90 minutes (first time), cached thereafter -- [oterm](https://github.com/ggozad/oterm) -- [Ellama Emacs client](https://github.com/s-kostyaev/ellama) -- [Emacs client](https://github.com/zweifisch/ollama) -- [neollama](https://github.com/paradoxical-dev/neollama) UI client for interacting with models from within Neovim -- [gen.nvim](https://github.com/David-Kunz/gen.nvim) -- [ollama.nvim](https://github.com/nomnivore/ollama.nvim) -- [ollero.nvim](https://github.com/marco-souza/ollero.nvim) -- [ollama-chat.nvim](https://github.com/gerazov/ollama-chat.nvim) -- [ogpt.nvim](https://github.com/huynle/ogpt.nvim) -- [gptel Emacs client](https://github.com/karthink/gptel) -- [Oatmeal](https://github.com/dustinblackman/oatmeal) -- [cmdh](https://github.com/pgibler/cmdh) -- [ooo](https://github.com/npahlfer/ooo) -- [shell-pilot](https://github.com/reid41/shell-pilot)(Interact with models via pure shell scripts on Linux or macOS) -- [tenere](https://github.com/pythops/tenere) -- [llm-ollama](https://github.com/taketwo/llm-ollama) for [Datasette's LLM CLI](https://llm.datasette.io/en/stable/). -- [typechat-cli](https://github.com/anaisbetts/typechat-cli) -- [ShellOracle](https://github.com/djcopley/ShellOracle) -- [tlm](https://github.com/yusufcanb/tlm) -- [podman-ollama](https://github.com/ericcurtin/podman-ollama) -- [gollama](https://github.com/sammcj/gollama) -- [ParLlama](https://github.com/paulrobello/parllama) -- [Ollama eBook Summary](https://github.com/cognitivetech/ollama-ebook-summary/) -- [Ollama Mixture of Experts (MOE) in 50 lines of code](https://github.com/rapidarchitect/ollama_moe) -- [vim-intelligence-bridge](https://github.com/pepo-ec/vim-intelligence-bridge) Simple interaction of "Ollama" with the Vim editor -- [x-cmd ollama](https://x-cmd.com/mod/ollama) -- [bb7](https://github.com/drunkwcodes/bb7) -- [SwollamaCLI](https://github.com/marcusziade/Swollama) bundled with the Swollama Swift package. [Demo](https://github.com/marcusziade/Swollama?tab=readme-ov-file#cli-usage) -- [aichat](https://github.com/sigoden/aichat) All-in-one LLM CLI tool featuring Shell Assistant, Chat-REPL, RAG, AI tools & agents, with access to OpenAI, Claude, Gemini, Ollama, Groq, and more. -- [PowershAI](https://github.com/rrg92/powershai) PowerShell module that brings AI to terminal on Windows, including support for Ollama -- [DeepShell](https://github.com/Abyss-c0re/deepshell) Your self-hosted AI assistant. Interactive Shell, Files and Folders analysis. -- [orbiton](https://github.com/xyproto/orbiton) Configuration-free text editor and IDE with support for tab completion with Ollama. -- [orca-cli](https://github.com/molbal/orca-cli) Ollama Registry CLI Application - Browse, pull, and download models from Ollama Registry in your terminal. -- [GGUF-to-Ollama](https://github.com/jonathanhecl/gguf-to-ollama) - Importing GGUF to Ollama made easy (multiplatform) -- [AWS-Strands-With-Ollama](https://github.com/rapidarchitect/ollama_strands) - AWS Strands Agents with Ollama Examples -- [ollama-multirun](https://github.com/attogram/ollama-multirun) - A bash shell script to run a single prompt against any or all of your locally installed ollama models, saving the output and performance statistics as easily navigable web pages. ([Demo](https://attogram.github.io/ai_test_zone/)) -- [ollama-bash-toolshed](https://github.com/attogram/ollama-bash-toolshed) - Bash scripts to chat with tool using models. Add new tools to your shed with ease. Runs on Ollama. -- [VT Code](https://github.com/vinhnx/vtcode) - VT Code is a Rust-based terminal coding agent with semantic code intelligence via Tree-sitter. Ollama integration for running local/cloud models with configurable endpoints. +**Image size:** ~15GB -### Apple Vision Pro +#### Stage 2: Runtime Image (`runtime/Dockerfile`) -- [SwiftChat](https://github.com/aws-samples/swift-chat) (Cross-platform AI chat app supporting Apple Vision Pro via "Designed for iPad") -- [Enchanted](https://github.com/AugustDev/enchanted) +**Stage 2.1 - Compile** (FROM ollama37-builder) +1. Clone ollama37 source from GitHub +2. Configure with CMake ("CUDA 11" preset for compute 3.7) +3. Build C/C++/CUDA libraries +4. Build Go binary -### Database +**Stage 2.2 - Runtime** (FROM ollama37-builder) +1. Copy entire source tree (includes compiled artifacts) +2. Copy binary to /usr/local/bin/ollama +3. Setup LD_LIBRARY_PATH for runtime libraries +4. Configure server, expose ports, setup volumes -- [pgai](https://github.com/timescale/pgai) - PostgreSQL as a vector database (Create and search embeddings from Ollama models using pgvector) - - [Get started guide](https://github.com/timescale/pgai/blob/main/docs/vectorizer-quick-start.md) -- [MindsDB](https://github.com/mindsdb/mindsdb/blob/staging/mindsdb/integrations/handlers/ollama_handler/README.md) (Connects Ollama models with nearly 200 data platforms and apps) -- [chromem-go](https://github.com/philippgille/chromem-go/blob/v0.5.0/embed_ollama.go) with [example](https://github.com/philippgille/chromem-go/tree/v0.5.0/examples/rag-wikipedia-ollama) -- [Kangaroo](https://github.com/dbkangaroo/kangaroo) (AI-powered SQL client and admin tool for popular databases) +**Build time:** ~10 minutes -### Package managers +**Image size:** ~18GB (includes build environment + compiled Ollama) -- [Pacman](https://archlinux.org/packages/extra/x86_64/ollama/) -- [Gentoo](https://github.com/gentoo/guru/tree/master/app-misc/ollama) -- [Homebrew](https://formulae.brew.sh/formula/ollama) -- [Helm Chart](https://artifacthub.io/packages/helm/ollama-helm/ollama) -- [Guix channel](https://codeberg.org/tusharhero/ollama-guix) -- [Nix package](https://search.nixos.org/packages?show=ollama&from=0&size=50&sort=relevance&type=packages&query=ollama) -- [Flox](https://flox.dev/blog/ollama-part-one) +### Why Both Stages Use Builder Base? -### Libraries +**Problem**: Compiled binaries have hardcoded library paths (via rpath/LD_LIBRARY_PATH) -- [LangChain](https://python.langchain.com/docs/integrations/chat/ollama/) and [LangChain.js](https://js.langchain.com/docs/integrations/chat/ollama/) with [example](https://js.langchain.com/docs/tutorials/local_rag/) -- [Firebase Genkit](https://firebase.google.com/docs/genkit/plugins/ollama) -- [crewAI](https://github.com/crewAIInc/crewAI) -- [Yacana](https://remembersoftwares.github.io/yacana/) (User-friendly multi-agent framework for brainstorming and executing predetermined flows with built-in tool integration) -- [Strands Agents](https://github.com/strands-agents/sdk-python) (A model-driven approach to building AI agents in just a few lines of code) -- [Spring AI](https://github.com/spring-projects/spring-ai) with [reference](https://docs.spring.io/spring-ai/reference/api/chat/ollama-chat.html) and [example](https://github.com/tzolov/ollama-tools) -- [LangChainGo](https://github.com/tmc/langchaingo/) with [example](https://github.com/tmc/langchaingo/tree/main/examples/ollama-completion-example) -- [LangChain4j](https://github.com/langchain4j/langchain4j) with [example](https://github.com/langchain4j/langchain4j-examples/tree/main/ollama-examples/src/main/java) -- [LangChainRust](https://github.com/Abraxas-365/langchain-rust) with [example](https://github.com/Abraxas-365/langchain-rust/blob/main/examples/llm_ollama.rs) -- [LangChain for .NET](https://github.com/tryAGI/LangChain) with [example](https://github.com/tryAGI/LangChain/blob/main/examples/LangChain.Samples.OpenAI/Program.cs) -- [LLPhant](https://github.com/theodo-group/LLPhant?tab=readme-ov-file#ollama) -- [LlamaIndex](https://docs.llamaindex.ai/en/stable/examples/llm/ollama/) and [LlamaIndexTS](https://ts.llamaindex.ai/modules/llms/available_llms/ollama) -- [LiteLLM](https://github.com/BerriAI/litellm) -- [OllamaFarm for Go](https://github.com/presbrey/ollamafarm) -- [OllamaSharp for .NET](https://github.com/awaescher/OllamaSharp) -- [Ollama for Ruby](https://github.com/gbaptista/ollama-ai) -- [Ollama-rs for Rust](https://github.com/pepperoni21/ollama-rs) -- [Ollama-hpp for C++](https://github.com/jmont-dev/ollama-hpp) -- [Ollama4j for Java](https://github.com/ollama4j/ollama4j) -- [ModelFusion Typescript Library](https://modelfusion.dev/integration/model-provider/ollama) -- [OllamaKit for Swift](https://github.com/kevinhermawan/OllamaKit) -- [Ollama for Dart](https://github.com/breitburg/dart-ollama) -- [Ollama for Laravel](https://github.com/cloudstudio/ollama-laravel) -- [LangChainDart](https://github.com/davidmigloz/langchain_dart) -- [Semantic Kernel - Python](https://github.com/microsoft/semantic-kernel/tree/main/python/semantic_kernel/connectors/ai/ollama) -- [Haystack](https://github.com/deepset-ai/haystack-integrations/blob/main/integrations/ollama.md) -- [Elixir LangChain](https://github.com/brainlid/langchain) -- [Ollama for R - rollama](https://github.com/JBGruber/rollama) -- [Ollama for R - ollama-r](https://github.com/hauselin/ollama-r) -- [Ollama-ex for Elixir](https://github.com/lebrunel/ollama-ex) -- [Ollama Connector for SAP ABAP](https://github.com/b-tocs/abap_btocs_ollama) -- [Testcontainers](https://testcontainers.com/modules/ollama/) -- [Portkey](https://portkey.ai/docs/welcome/integration-guides/ollama) -- [PromptingTools.jl](https://github.com/svilupp/PromptingTools.jl) with an [example](https://svilupp.github.io/PromptingTools.jl/dev/examples/working_with_ollama) -- [LlamaScript](https://github.com/Project-Llama/llamascript) -- [llm-axe](https://github.com/emirsahin1/llm-axe) (Python Toolkit for Building LLM Powered Apps) -- [Gollm](https://docs.gollm.co/examples/ollama-example) -- [Gollama for Golang](https://github.com/jonathanhecl/gollama) -- [Ollamaclient for Golang](https://github.com/xyproto/ollamaclient) -- [High-level function abstraction in Go](https://gitlab.com/tozd/go/fun) -- [Ollama PHP](https://github.com/ArdaGnsrn/ollama-php) -- [Agents-Flex for Java](https://github.com/agents-flex/agents-flex) with [example](https://github.com/agents-flex/agents-flex/tree/main/agents-flex-llm/agents-flex-llm-ollama/src/test/java/com/agentsflex/llm/ollama) -- [Parakeet](https://github.com/parakeet-nest/parakeet) is a GoLang library, made to simplify the development of small generative AI applications with Ollama. -- [Haverscript](https://github.com/andygill/haverscript) with [examples](https://github.com/andygill/haverscript/tree/main/examples) -- [Ollama for Swift](https://github.com/mattt/ollama-swift) -- [Swollama for Swift](https://github.com/marcusziade/Swollama) with [DocC](https://marcusziade.github.io/Swollama/documentation/swollama/) -- [GoLamify](https://github.com/prasad89/golamify) -- [Ollama for Haskell](https://github.com/tusharad/ollama-haskell) -- [multi-llm-ts](https://github.com/nbonamy/multi-llm-ts) (A Typescript/JavaScript library allowing access to different LLM in a unified API) -- [LlmTornado](https://github.com/lofcz/llmtornado) (C# library providing a unified interface for major FOSS & Commercial inference APIs) -- [Ollama for Zig](https://github.com/dravenk/ollama-zig) -- [Abso](https://github.com/lunary-ai/abso) (OpenAI-compatible TypeScript SDK for any LLM provider) -- [Nichey](https://github.com/goodreasonai/nichey) is a Python package for generating custom wikis for your research topic -- [Ollama for D](https://github.com/kassane/ollama-d) -- [OllamaPlusPlus](https://github.com/HardCodeDev777/OllamaPlusPlus) (Very simple C++ library for Ollama) -- [any-llm](https://github.com/mozilla-ai/any-llm) (A single interface to use different llm providers by [mozilla.ai](https://www.mozilla.ai/)) -- [any-agent](https://github.com/mozilla-ai/any-agent) (A single interface to use and evaluate different agent frameworks by [mozilla.ai](https://www.mozilla.ai/)) -- [Neuro SAN](https://github.com/cognizant-ai-lab/neuro-san-studio) (Data-driven multi-agent orchestration framework) with [example](https://github.com/cognizant-ai-lab/neuro-san-studio/blob/main/docs/user_guide.md#ollama) -- [achatbot-go](https://github.com/ai-bot-pro/achatbot-go) a multimodal(text/audio/image) chatbot. -- [Ollama Bash Lib](https://github.com/attogram/ollama-bash-lib) - A Bash Library for Ollama. Run LLM prompts straight from your shell, and more +**Solution**: Use identical base images for compile and runtime stages -### Mobile +**Benefits:** +- ✅ Library paths match between build and runtime +- ✅ All GCC 10 runtime libraries present +- ✅ All CUDA libraries at expected paths +- ✅ No complex artifact extraction/copying +- ✅ Guaranteed compatibility -- [SwiftChat](https://github.com/aws-samples/swift-chat) (Lightning-fast Cross-platform AI chat app with native UI for Android, iOS, and iPad) -- [Enchanted](https://github.com/AugustDev/enchanted) -- [Maid](https://github.com/Mobile-Artificial-Intelligence/maid) -- [Ollama App](https://github.com/JHubi1/ollama-app) (Modern and easy-to-use multi-platform client for Ollama) -- [ConfiChat](https://github.com/1runeberg/confichat) (Lightweight, standalone, multi-platform, and privacy-focused LLM chat interface with optional encryption) -- [Ollama Android Chat](https://github.com/sunshine0523/OllamaServer) (No need for Termux, start the Ollama service with one click on an Android device) -- [Reins](https://github.com/ibrahimcetin/reins) (Easily tweak parameters, customize system prompts per chat, and enhance your AI experiments with reasoning model support.) +**Trade-off:** Larger runtime image (~18GB) vs complexity and reliability issues -### Extensions & Plugins +### Alternative: Single-Stage Build -- [Raycast extension](https://github.com/MassimilianoPasquini97/raycast_ollama) -- [Discollama](https://github.com/mxyng/discollama) (Discord bot inside the Ollama discord channel) -- [Continue](https://github.com/continuedev/continue) -- [Vibe](https://github.com/thewh1teagle/vibe) (Transcribe and analyze meetings with Ollama) -- [Obsidian Ollama plugin](https://github.com/hinterdupfinger/obsidian-ollama) -- [Logseq Ollama plugin](https://github.com/omagdy7/ollama-logseq) -- [NotesOllama](https://github.com/andersrex/notesollama) (Apple Notes Ollama plugin) -- [Dagger Chatbot](https://github.com/samalba/dagger-chatbot) -- [Discord AI Bot](https://github.com/mekb-turtle/discord-ai-bot) -- [Ollama Telegram Bot](https://github.com/ruecat/ollama-telegram) -- [Hass Ollama Conversation](https://github.com/ej52/hass-ollama-conversation) -- [Rivet plugin](https://github.com/abrenneke/rivet-plugin-ollama) -- [Obsidian BMO Chatbot plugin](https://github.com/longy2k/obsidian-bmo-chatbot) -- [Cliobot](https://github.com/herval/cliobot) (Telegram bot with Ollama support) -- [Copilot for Obsidian plugin](https://github.com/logancyang/obsidian-copilot) -- [Obsidian Local GPT plugin](https://github.com/pfrankov/obsidian-local-gpt) -- [Open Interpreter](https://docs.openinterpreter.com/language-model-setup/local-models/ollama) -- [Llama Coder](https://github.com/ex3ndr/llama-coder) (Copilot alternative using Ollama) -- [Ollama Copilot](https://github.com/bernardo-bruning/ollama-copilot) (Proxy that allows you to use Ollama as a copilot like GitHub Copilot) -- [twinny](https://github.com/rjmacarthy/twinny) (Copilot and Copilot chat alternative using Ollama) -- [Wingman-AI](https://github.com/RussellCanfield/wingman-ai) (Copilot code and chat alternative using Ollama and Hugging Face) -- [Page Assist](https://github.com/n4ze3m/page-assist) (Chrome Extension) -- [Plasmoid Ollama Control](https://github.com/imoize/plasmoid-ollamacontrol) (KDE Plasma extension that allows you to quickly manage/control Ollama model) -- [AI Telegram Bot](https://github.com/tusharhero/aitelegrambot) (Telegram bot using Ollama in backend) -- [AI ST Completion](https://github.com/yaroslavyaroslav/OpenAI-sublime-text) (Sublime Text 4 AI assistant plugin with Ollama support) -- [Discord-Ollama Chat Bot](https://github.com/kevinthedang/discord-ollama) (Generalized TypeScript Discord Bot w/ Tuning Documentation) -- [ChatGPTBox: All in one browser extension](https://github.com/josStorer/chatGPTBox) with [Integrating Tutorial](https://github.com/josStorer/chatGPTBox/issues/616#issuecomment-1975186467) -- [Discord AI chat/moderation bot](https://github.com/rapmd73/Companion) Chat/moderation bot written in python. Uses Ollama to create personalities. -- [Headless Ollama](https://github.com/nischalj10/headless-ollama) (Scripts to automatically install ollama client & models on any OS for apps that depend on ollama server) -- [Terraform AWS Ollama & Open WebUI](https://github.com/xuyangbocn/terraform-aws-self-host-llm) (A Terraform module to deploy on AWS a ready-to-use Ollama service, together with its front-end Open WebUI service.) -- [node-red-contrib-ollama](https://github.com/jakubburkiewicz/node-red-contrib-ollama) -- [Local AI Helper](https://github.com/ivostoykov/localAI) (Chrome and Firefox extensions that enable interactions with the active tab and customisable API endpoints. Includes secure storage for user prompts.) -- [vnc-lm](https://github.com/jake83741/vnc-lm) (Discord bot for messaging with LLMs through Ollama and LiteLLM. Seamlessly move between local and flagship models.) -- [LSP-AI](https://github.com/SilasMarvin/lsp-ai) (Open-source language server for AI-powered functionality) -- [QodeAssist](https://github.com/Palm1r/QodeAssist) (AI-powered coding assistant plugin for Qt Creator) -- [Obsidian Quiz Generator plugin](https://github.com/ECuiDev/obsidian-quiz-generator) -- [AI Summmary Helper plugin](https://github.com/philffm/ai-summary-helper) -- [TextCraft](https://github.com/suncloudsmoon/TextCraft) (Copilot in Word alternative using Ollama) -- [Alfred Ollama](https://github.com/zeitlings/alfred-ollama) (Alfred Workflow) -- [TextLLaMA](https://github.com/adarshM84/TextLLaMA) A Chrome Extension that helps you write emails, correct grammar, and translate into any language -- [Simple-Discord-AI](https://github.com/zyphixor/simple-discord-ai) -- [LLM Telegram Bot](https://github.com/innightwolfsleep/llm_telegram_bot) (telegram bot, primary for RP. Oobabooga-like buttons, [A1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui) API integration e.t.c) -- [mcp-llm](https://github.com/sammcj/mcp-llm) (MCP Server to allow LLMs to call other LLMs) -- [SimpleOllamaUnity](https://github.com/HardCodeDev777/SimpleOllamaUnity) (Unity Engine extension for communicating with Ollama in a few lines of code. Also works at runtime) -- [UnityCodeLama](https://github.com/HardCodeDev777/UnityCodeLama) (Unity Edtior tool to analyze scripts via Ollama) -- [NativeMind](https://github.com/NativeMindBrowser/NativeMindExtension) (Private, on-device AI Assistant, no cloud dependencies) -- [GMAI - Gradle Managed AI](https://gmai.premex.se/) (Gradle plugin for automated Ollama lifecycle management during build phases) -- [NOMYO Router](https://github.com/nomyo-ai/nomyo-router) (A transparent Ollama proxy with model deployment aware routing which auto-manages multiple Ollama instances in a given network) +See `Dockerfile.single-stage.archived` for the original single-stage design that inspired this architecture. -### Supported backends +## Build Commands -- [llama.cpp](https://github.com/ggml-org/llama.cpp) project founded by Georgi Gerganov. +### Using the Makefile -### Observability -- [Opik](https://www.comet.com/docs/opik/cookbook/ollama) is an open-source platform to debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards. Opik supports native intergration to Ollama. -- [Lunary](https://lunary.ai/docs/integrations/ollama) is the leading open-source LLM observability platform. It provides a variety of enterprise-grade features such as real-time analytics, prompt templates management, PII masking, and comprehensive agent tracing. -- [OpenLIT](https://github.com/openlit/openlit) is an OpenTelemetry-native tool for monitoring Ollama Applications & GPUs using traces and metrics. -- [HoneyHive](https://docs.honeyhive.ai/integrations/ollama) is an AI observability and evaluation platform for AI agents. Use HoneyHive to evaluate agent performance, interrogate failures, and monitor quality in production. -- [Langfuse](https://langfuse.com/docs/integrations/ollama) is an open source LLM observability platform that enables teams to collaboratively monitor, evaluate and debug AI applications. -- [MLflow Tracing](https://mlflow.org/docs/latest/llms/tracing/index.html#automatic-tracing) is an open source LLM observability tool with a convenient API to log and visualize traces, making it easy to debug and evaluate GenAI applications. +```bash +# Build both builder and runtime images +make build + +# Build only builder image +make build-builder + +# Build only runtime image (will auto-build builder if needed) +make build-runtime + +# Remove all images +make clean + +# Show help +make help +``` + +### Direct Docker Commands + +```bash +# Build builder image +docker build -f builder/Dockerfile -t ollama37-builder:latest builder/ + +# Build runtime image +docker build -f runtime/Dockerfile -t ollama37:latest . +``` + +## Runtime Management + +### Using Docker Compose (Recommended) + +```bash +# Start server +docker compose up -d + +# View logs (live tail) +docker compose logs -f + +# Stop server +docker compose down + +# Stop and remove volumes +docker compose down -v + +# Restart server +docker compose restart +``` + +### Manual Docker Commands + +```bash +# Start container +docker run -d \ + --name ollama37 \ + --runtime=nvidia \ + --gpus all \ + -p 11434:11434 \ + -v ollama-data:/root/.ollama \ + ollama37:latest + +# View logs +docker logs -f ollama37 + +# Stop container +docker stop ollama37 +docker rm ollama37 + +# Shell access +docker exec -it ollama37 bash +``` + +## Configuration + +### Environment Variables + +| Variable | Default | Description | +|----------|---------|-------------| +| `OLLAMA_HOST` | `0.0.0.0:11434` | Server listen address | +| `LD_LIBRARY_PATH` | `/usr/local/src/ollama37/build/lib/ollama:/usr/local/lib64:/usr/local/cuda-11.4/lib64:/usr/lib64` | Library search path | +| `NVIDIA_VISIBLE_DEVICES` | `all` | Which GPUs to use | +| `NVIDIA_DRIVER_CAPABILITIES` | `compute,utility` | GPU capabilities | +| `OLLAMA_DEBUG` | (unset) | Enable verbose Ollama logging | +| `GGML_CUDA_DEBUG` | (unset) | Enable CUDA/CUBLAS debug logging | + +### Volume Mounts + +- `/root/.ollama` - Model storage (use Docker volume `ollama-data`) + +### Customizing docker-compose.yml + +```yaml +# Change port +ports: + - "11435:11434" # Host:Container + +# Use specific GPU +environment: + - NVIDIA_VISIBLE_DEVICES=0 # Use GPU 0 only + +# Enable debug logging +environment: + - OLLAMA_DEBUG=1 + - GGML_CUDA_DEBUG=1 +``` + +## GPU Support + +### Supported Compute Capabilities +- **3.7** - Tesla K80 (primary target) +- **5.0-5.2** - Maxwell (GTX 900 series) +- **6.0-6.1** - Pascal (GTX 10 series) +- **7.0-7.5** - Volta, Turing (RTX 20 series) +- **8.0-8.6** - Ampere (RTX 30 series) + +### Tesla K80 Recommendations + +**VRAM:** 12GB per GPU (24GB for dual-GPU K80) + +**Model sizes:** +- Small (1-4B): Full precision or Q8 quantization +- Medium (7-8B): Q4_K_M quantization +- Large (13B+): Q4_0 quantization or multi-GPU + +**Tested models:** +- ✅ gemma3:4b +- ✅ gpt-oss +- ✅ deepseek-r1 + +**Multi-GPU:** +```bash +# Use all GPUs +docker run --gpus all ... + +# Use specific GPU +docker run --gpus '"device=0"' ... + +# Use multiple specific GPUs +docker run --gpus '"device=0,1"' ... +``` + +## Troubleshooting + +### GPU not detected + +```bash +# Check GPU visibility in container +docker exec ollama37 nvidia-smi + +# Check CUDA libraries +docker exec ollama37 ldconfig -p | grep cuda + +# Check NVIDIA runtime +docker info | grep -i runtime +``` + +### NVIDIA UVM Device Files Missing + +**Symptom:** `nvidia-smi` works inside the container, but Ollama reports **0 GPUs detected** (CUDA runtime cannot find GPUs). + +**Root Cause:** + +The nvidia-uvm device files were missing on the host system. + +While the `nvidia-uvm` kernel module was loaded, the device files `/dev/nvidia-uvm` and `/dev/nvidia-uvm-tools` were not created. + +These device files are critical for CUDA runtime: +- `nvidia-smi` only needs the basic driver (works without UVM) +- **CUDA applications require UVM** for GPU memory allocation and kernel execution +- Without UVM devices: CUDA reports 0 GPUs even though they exist + +**The Fix:** + +Run this single command on the **host system** (not inside the container): + +```bash +nvidia-modprobe -u -c=0 +``` + +This creates the required device files: +- `/dev/nvidia-uvm` (major 239, minor 0) +- `/dev/nvidia-uvm-tools` (major 239, minor 1) + +Then restart the container: + +```bash +docker compose restart +``` + +**Result:** GPUs now properly detected by CUDA runtime. + +**Verify the fix:** + +```bash +# Check UVM device files exist on host +ls -l /dev/nvidia-uvm* + +# Check Ollama logs for GPU detection +docker compose logs | grep -i gpu + +# You should see output like: +# ollama37 | time=... level=INFO msg="Nvidia GPU detected" name="Tesla K80" vram=11441 MiB +# ollama37 | time=... level=INFO msg="Nvidia GPU detected" name="Tesla K80" vram=11441 MiB +``` + +### Model fails to load + +```bash +# Check logs with CUDA debug +docker run --rm --runtime=nvidia --gpus all \ + -e OLLAMA_DEBUG=1 \ + -e GGML_CUDA_DEBUG=1 \ + -p 11434:11434 \ + ollama37:latest + +# Check library paths +docker exec ollama37 bash -c 'echo $LD_LIBRARY_PATH' + +# Verify CUBLAS functions +docker exec ollama37 bash -c 'ldd /usr/local/bin/ollama | grep cublas' +``` + +### Build fails with "out of memory" + +```bash +# Edit runtime/Dockerfile line for cmake build +# Change: cmake --build build -j$(nproc) +# To: cmake --build build -j2 + +# Or set Docker memory limit +docker build --memory=8g ... +``` + +### Port already in use + +```bash +# Find process using port 11434 +sudo lsof -i :11434 + +# Kill the process or change port in docker-compose.yml +ports: + - "11435:11434" +``` + +### Build cache issues + +```bash +# Rebuild runtime image without cache +docker build --no-cache -f runtime/Dockerfile -t ollama37:latest . + +# Rebuild builder image without cache +docker build --no-cache -f builder/Dockerfile -t ollama37-builder:latest builder/ + +# Remove all images and rebuild +make clean +make build +``` + +## Rebuilding + +### Rebuild with latest code + +```bash +# Runtime Dockerfile clones from GitHub, so rebuild to get latest +make build-runtime + +# Restart container +docker compose restart +``` + +### Rebuild everything from scratch + +```bash +# Stop and remove containers +docker compose down -v + +# Remove images +make clean + +# Rebuild all +make build + +# Start fresh +docker compose up -d +``` + +### Rebuild only builder (rare) + +```bash +# Only needed if you change CUDA/GCC/CMake/Go versions +make clean +make build-builder +make build-runtime +``` + +## Development + +### Modifying the build + +1. **Change build tools** - Edit `builder/Dockerfile` +2. **Change Ollama build process** - Edit `runtime/Dockerfile` +3. **Change build orchestration** - Edit `Makefile` +4. **Change runtime config** - Edit `docker-compose.yml` + +### Testing changes + +```bash +# Build with your changes +make build + +# Run and test +docker compose up -d +docker compose logs -f + +# If issues, check inside container +docker exec -it ollama37 bash +``` + +### Shell access for debugging + +```bash +# Enter running container +docker exec -it ollama37 bash + +# Check GPU +nvidia-smi + +# Check libraries +ldd /usr/local/bin/ollama +ldconfig -p | grep -E "cuda|cublas" + +# Test binary +/usr/local/bin/ollama --version +``` + +## Image Sizes + +| Image | Size | Contents | +|-------|------|----------| +| `ollama37-builder:latest` | ~15GB | CUDA, GCC, CMake, Go, build deps | +| `ollama37:latest` | ~18GB | Builder + Ollama binary + libraries | + +**Note**: Large size ensures all runtime dependencies are present and properly linked. + +## Build Times + +| Task | First Build | Cached Build | +|------|-------------|--------------| +| Builder image | ~90 min | <1 min | +| Runtime image | ~10 min | ~10 min | +| **Total** | **~100 min** | **~10 min** | + +**Breakdown (first build):** +- GCC 10: ~60 min +- CMake 4: ~8 min +- CUDA toolkit: ~10 min +- Go install: ~1 min +- Ollama build: ~10 min + +## Documentation + +- **[../CLAUDE.md](../CLAUDE.md)** - Project goals, implementation details, and technical notes +- **[Upstream Ollama](https://github.com/ollama/ollama)** - Original Ollama project +- **[dogkeeper886/ollama37](https://github.com/dogkeeper886/ollama37)** - This fork with K80 support + +## License + +MIT (same as upstream Ollama)