mirror of
https://github.com/dogkeeper886/ollama37.git
synced 2025-12-18 03:37:09 +00:00
Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support
This commit represents a complete rework after pulling the latest changes from official ollama/ollama repository and re-applying Tesla K80 compatibility patches. ## Key Changes ### CUDA Compute Capability 3.7 Support (Tesla K80) - Added sm_37 (compute 3.7) to CMAKE_CUDA_ARCHITECTURES in CMakeLists.txt - Updated CMakePresets.json to include compute 3.7 in "CUDA 11" preset - Using 37-virtual (PTX with JIT compilation) for maximum compatibility ### Legacy Toolchain Compatibility - **NVIDIA Driver**: 470.256.02 (last version supporting Kepler/K80) - **CUDA Version**: 11.4.4 (last CUDA 11.x supporting compute 3.7) - **GCC Version**: 10.5.0 (required by CUDA 11.4 host_config.h) ### CPU Architecture Trade-offs Due to GCC 10.5 limitation, sacrificed newer CPU optimizations: - Alderlake CPU variant enabled WITHOUT AVX_VNNI (requires GCC 11+) - Still supports: SSE4.2, AVX, F16C, AVX2, BMI2, FMA - Performance impact: ~3-7% on newer CPUs (acceptable for K80 compatibility) ### Build System Updates - Modified ml/backend/ggml/ggml/src/ggml-cuda/CMakeLists.txt for compute 3.7 - Added -Wno-deprecated-gpu-targets flag to suppress warnings - Updated ml/backend/ggml/ggml/src/CMakeLists.txt for Alderlake without AVX_VNNI ### Upstream Sync Merged latest llama.cpp changes including: - Enhanced KV cache management with ISWA and hybrid memory support - Improved multi-modal support (mtmd framework) - New model architectures (Gemma3, Llama4, Qwen3, etc.) - GPU backend improvements for CUDA, Metal, and ROCm - Updated quantization support and GGUF format handling ### Documentation - Updated CLAUDE.md with comprehensive build instructions - Documented toolchain constraints and CPU architecture trade-offs - Removed outdated CI/CD workflows (tesla-k80-*.yml) - Cleaned up temporary development artifacts ## Rationale This fork maintains Tesla K80 GPU support (compute 3.7) which was dropped in official Ollama due to legacy driver/CUDA requirements. The toolchain constraint creates a deadlock: - K80 → Driver 470 → CUDA 11.4 → GCC 10 → No AVX_VNNI We accept the loss of cutting-edge CPU optimizations to enable running modern LLMs on legacy but still capable Tesla K80 hardware (12GB VRAM per GPU). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
38
docs/integrations/cline.mdx
Normal file
38
docs/integrations/cline.mdx
Normal file
@@ -0,0 +1,38 @@
|
||||
---
|
||||
title: Cline
|
||||
---
|
||||
|
||||
## Install
|
||||
|
||||
Install [Cline](https://docs.cline.bot/getting-started/installing-cline) in your IDE.
|
||||
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
1. Open Cline settings > `API Configuration` and set `API Provider` to `Ollama`
|
||||
2. Select a model under `Model` or type one (e.g. `qwen3`)
|
||||
3. Update the context window to at least 32K tokens under `Context Window`
|
||||
|
||||
<Note>Coding tools require a larger context window. It is recommended to use a context window of at least 32K tokens. See [Context length](/context-length) for more information.</Note>
|
||||
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/cline-settings.png"
|
||||
alt="Cline settings configuration showing API Provider set to Ollama"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
## Connecting to ollama.com
|
||||
1. Create an [API key](https://ollama.com/settings/keys) from ollama.com
|
||||
2. Click on `Use custom base URL` and set it to `https://ollama.com`
|
||||
3. Enter your **Ollama API Key**
|
||||
4. Select a model from the list
|
||||
|
||||
|
||||
### Recommended Models
|
||||
|
||||
- `qwen3-coder:480b`
|
||||
- `deepseek-v3.1:671b`
|
||||
56
docs/integrations/codex.mdx
Normal file
56
docs/integrations/codex.mdx
Normal file
@@ -0,0 +1,56 @@
|
||||
---
|
||||
title: Codex
|
||||
---
|
||||
|
||||
|
||||
## Install
|
||||
|
||||
Install the [Codex CLI](https://developers.openai.com/codex/cli/):
|
||||
|
||||
```
|
||||
npm install -g @openai/codex
|
||||
```
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
<Note>Codex requires a larger context window. It is recommended to use a context window of at least 32K tokens.</Note>
|
||||
|
||||
To use `codex` with Ollama, use the `--oss` flag:
|
||||
|
||||
```
|
||||
codex --oss
|
||||
```
|
||||
|
||||
### Changing Models
|
||||
|
||||
By default, codex will use the local `gpt-oss:20b` model. However, you can specify a different model with the `-m` flag:
|
||||
|
||||
```
|
||||
codex --oss -m gpt-oss:120b
|
||||
```
|
||||
|
||||
### Cloud Models
|
||||
|
||||
```
|
||||
codex --oss -m gpt-oss:120b-cloud
|
||||
```
|
||||
|
||||
|
||||
## Connecting to ollama.com
|
||||
|
||||
|
||||
Create an [API key](https://ollama.com/settings/keys) from ollama.com and export it as `OLLAMA_API_KEY`.
|
||||
|
||||
To use ollama.com directly, edit your `~/.codex/config.toml` file to point to ollama.com.
|
||||
|
||||
```toml
|
||||
model = "gpt-oss:120b"
|
||||
model_provider = "ollama"
|
||||
|
||||
[model_providers.ollama]
|
||||
name = "Ollama"
|
||||
base_url = "https://ollama.com/v1"
|
||||
env_key = "OLLAMA_API_KEY"
|
||||
```
|
||||
|
||||
Run `codex` in a new terminal to load the new settings.
|
||||
76
docs/integrations/droid.mdx
Normal file
76
docs/integrations/droid.mdx
Normal file
@@ -0,0 +1,76 @@
|
||||
---
|
||||
title: Droid
|
||||
---
|
||||
|
||||
|
||||
## Install
|
||||
|
||||
Install the [Droid CLI](https://factory.ai/):
|
||||
|
||||
```bash
|
||||
curl -fsSL https://app.factory.ai/cli | sh
|
||||
```
|
||||
|
||||
<Note>Droid requires a larger context window. It is recommended to use a context window of at least 32K tokens. See [Context length](/context-length) for more information.</Note>
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
Add a local configuration block to `~/.factory/config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"custom_models": [
|
||||
{
|
||||
"model_display_name": "qwen3-coder [Ollama]",
|
||||
"model": "qwen3-coder",
|
||||
"base_url": "http://localhost:11434/v1/",
|
||||
"api_key": "not-needed",
|
||||
"provider": "generic-chat-completion-api",
|
||||
"max_tokens": 32000
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## Cloud Models
|
||||
`qwen3-coder:480b-cloud` is the recommended model for use with Droid.
|
||||
|
||||
Add the cloud configuration block to `~/.factory/config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"custom_models": [
|
||||
{
|
||||
"model_display_name": "qwen3-coder [Ollama Cloud]",
|
||||
"model": "qwen3-coder:480b-cloud",
|
||||
"base_url": "http://localhost:11434/v1/",
|
||||
"api_key": "not-needed",
|
||||
"provider": "generic-chat-completion-api",
|
||||
"max_tokens": 128000
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Connecting to ollama.com
|
||||
|
||||
1. Create an [API key](https://ollama.com/settings/keys) from ollama.com and export it as `OLLAMA_API_KEY`.
|
||||
2. Add the cloud configuration block to `~/.factory/config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"custom_models": [
|
||||
{
|
||||
"model_display_name": "qwen3-coder [Ollama Cloud]",
|
||||
"model": "qwen3-coder:480b",
|
||||
"base_url": "https://ollama.com/v1/",
|
||||
"api_key": "OLLAMA_API_KEY",
|
||||
"provider": "generic-chat-completion-api",
|
||||
"max_tokens": 128000
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Run `droid` in a new terminal to load the new settings.
|
||||
49
docs/integrations/goose.mdx
Normal file
49
docs/integrations/goose.mdx
Normal file
@@ -0,0 +1,49 @@
|
||||
---
|
||||
title: Goose
|
||||
---
|
||||
|
||||
## Goose Desktop
|
||||
|
||||
Install [Goose](https://block.github.io/goose/docs/getting-started/installation/) Desktop.
|
||||
|
||||
### Usage with Ollama
|
||||
1. In Goose, open **Settings** → **Configure Provider**.
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/goose-settings.png"
|
||||
alt="Goose settings Panel"
|
||||
width="75%"
|
||||
/>
|
||||
</div>
|
||||
2. Find **Ollama**, click **Configure**
|
||||
3. Confirm **API Host** is `http://localhost:11434` and click Submit
|
||||
|
||||
|
||||
### Connecting to ollama.com
|
||||
|
||||
1. Create an [API key](https://ollama.com/settings/keys) on ollama.com and save it in your `.env`
|
||||
2. In Goose, set **API Host** to `https://ollama.com`
|
||||
|
||||
|
||||
## Goose CLI
|
||||
|
||||
Install [Goose](https://block.github.io/goose/docs/getting-started/installation/) CLI
|
||||
|
||||
### Usage with Ollama
|
||||
1. Run `goose configure`
|
||||
2. Select **Configure Providers** and select **Ollama**
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/goose-cli.png"
|
||||
alt="Goose CLI"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
3. Enter model name (e.g `qwen3`)
|
||||
|
||||
### Connecting to ollama.com
|
||||
|
||||
1. Create an [API key](https://ollama.com/settings/keys) on ollama.com and save it in your `.env`
|
||||
2. Run `goose configure`
|
||||
3. Select **Configure Providers** and select **Ollama**
|
||||
4. Update **OLLAMA_HOST** to `https://ollama.com`
|
||||
47
docs/integrations/jetbrains.mdx
Normal file
47
docs/integrations/jetbrains.mdx
Normal file
@@ -0,0 +1,47 @@
|
||||
---
|
||||
title: JetBrains
|
||||
---
|
||||
|
||||
<Note>This example uses **IntelliJ**; same steps apply to other JetBrains IDEs (e.g., PyCharm).</Note>
|
||||
|
||||
## Install
|
||||
|
||||
Install [IntelliJ](https://www.jetbrains.com/idea/).
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
<Note>
|
||||
To use **Ollama**, you will need a [JetBrains AI Subscription](https://www.jetbrains.com/ai-ides/buy/?section=personal&billing=yearly).
|
||||
</Note>
|
||||
|
||||
1. In Intellij, click the **chat icon** located in the right sidebar
|
||||
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/intellij-chat-sidebar.png"
|
||||
alt="Intellij Sidebar Chat"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
2. Select the **current model** in the sidebar, then click **Set up Local Models**
|
||||
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/intellij-current-model.png"
|
||||
alt="Intellij model bottom right corner"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
3. Under **Third Party AI Providers**, choose **Ollama**
|
||||
4. Confirm the **Host URL** is `http://localhost:11434`, then click **Ok**
|
||||
5. Once connected, select a model under **Local models by Ollama**
|
||||
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/intellij-local-models.png"
|
||||
alt="Zed star icon in bottom right corner"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
53
docs/integrations/n8n.mdx
Normal file
53
docs/integrations/n8n.mdx
Normal file
@@ -0,0 +1,53 @@
|
||||
---
|
||||
title: n8n
|
||||
---
|
||||
|
||||
## Install
|
||||
|
||||
Install [n8n](https://docs.n8n.io/choose-n8n/).
|
||||
|
||||
## Using Ollama Locally
|
||||
|
||||
1. In the top right corner, click the dropdown and select **Create Credential**
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/n8n-credential-creation.png"
|
||||
alt="Create a n8n Credential"
|
||||
width="75%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
2. Under **Add new credential** select **Ollama**
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/n8n-ollama-form.png"
|
||||
alt="Select Ollama under Credential"
|
||||
width="75%"
|
||||
/>
|
||||
</div>
|
||||
3. Confirm Base URL is set to `http://localhost:11434` and click **Save**
|
||||
<Note> If connecting to `http://localhost:11434` fails, use `http://127.0.0.1:11434`</Note>
|
||||
4. When creating a new workflow, select **Add a first step** and select an **Ollama node**
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/n8n-chat-node.png"
|
||||
alt="Add a first step with Ollama node"
|
||||
width="75%"
|
||||
/>
|
||||
</div>
|
||||
5. Select your model of choice (e.g. `qwen3-coder`)
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/n8n-models.png"
|
||||
alt="Set up Ollama credentials"
|
||||
width="75%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
## Connecting to ollama.com
|
||||
1. Create an [API key](https://ollama.com/settings/keys) on **ollama.com**.
|
||||
2. In n8n, click **Create Credential** and select **Ollama**
|
||||
4. Set the **API URL** to `https://ollama.com`
|
||||
5. Enter your **API Key** and click **Save**
|
||||
|
||||
|
||||
30
docs/integrations/roo-code.mdx
Normal file
30
docs/integrations/roo-code.mdx
Normal file
@@ -0,0 +1,30 @@
|
||||
---
|
||||
title: Roo Code
|
||||
---
|
||||
|
||||
|
||||
## Install
|
||||
|
||||
Install [Roo Code](https://marketplace.visualstudio.com/items?itemName=RooVeterinaryInc.roo-cline) from the VS Code Marketplace.
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
1. Open Roo Code in VS Code and click the **gear icon** on the top right corner of the Roo Code window to open **Provider Settings**
|
||||
2. Set `API Provider` to `Ollama`
|
||||
3. (Optional) Update `Base URL` if your Ollama instance is running remotely. The default is `http://localhost:11434`
|
||||
4. Enter a valid `Model ID` (for example `qwen3` or `qwen3-coder:480b-cloud`)
|
||||
5. Adjust the `Context Window` to at least 32K tokens for coding tasks
|
||||
|
||||
<Note>Coding tools require a larger context window. It is recommended to use a context window of at least 32K tokens. See [Context length](/context-length) for more information.</Note>
|
||||
|
||||
## Connecting to ollama.com
|
||||
|
||||
1. Create an [API key](https://ollama.com/settings/keys) from ollama.com
|
||||
2. Enable `Use custom base URL` and set it to `https://ollama.com`
|
||||
3. Enter your **Ollama API Key**
|
||||
4. Select a model from the list
|
||||
|
||||
### Recommended Models
|
||||
|
||||
- `qwen3-coder:480b`
|
||||
- `deepseek-v3.1:671b`
|
||||
34
docs/integrations/vscode.mdx
Normal file
34
docs/integrations/vscode.mdx
Normal file
@@ -0,0 +1,34 @@
|
||||
---
|
||||
title: VS Code
|
||||
---
|
||||
|
||||
## Install
|
||||
|
||||
Install [VSCode](https://code.visualstudio.com/download).
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
1. Open Copilot side bar found in top right window
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/vscode-sidebar.png"
|
||||
alt="VSCode chat Sidebar"
|
||||
width="75%"
|
||||
/>
|
||||
</div>
|
||||
2. Select the model drowpdown > **Manage models**
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/vscode-models.png"
|
||||
alt="VSCode model picker"
|
||||
width="75%"
|
||||
/>
|
||||
</div>
|
||||
3. Enter **Ollama** under **Provider Dropdown** and select desired models (e.g `qwen3, qwen3-coder:480b-cloud`)
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/vscode-model-options.png"
|
||||
alt="VSCode model options dropdown"
|
||||
width="75%"
|
||||
/>
|
||||
</div>
|
||||
45
docs/integrations/xcode.mdx
Normal file
45
docs/integrations/xcode.mdx
Normal file
@@ -0,0 +1,45 @@
|
||||
---
|
||||
title: Xcode
|
||||
---
|
||||
|
||||
## Install
|
||||
|
||||
Install [XCode](https://developer.apple.com/xcode/)
|
||||
|
||||
|
||||
## Usage with Ollama
|
||||
<Note> Ensure Apple Intelligence is setup and the latest XCode version is v26.0 </Note>
|
||||
|
||||
1. Click **XCode** in top left corner > **Settings**
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/xcode-intelligence-window.png"
|
||||
alt="Xcode Intelligence window"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
2. Select **Locally Hosted**, enter port **11434** and click **Add**
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/xcode-locally-hosted.png"
|
||||
alt="Xcode settings"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
3. Select the **star icon** on the top left corner and click the **dropdown**
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/xcode-chat-icon.png"
|
||||
alt="Xcode settings"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
4. Click **My Account** and select your desired model
|
||||
|
||||
|
||||
## Connecting to ollama.com directly
|
||||
1. Create an [API key](https://ollama.com/settings/keys) from ollama.com
|
||||
2. Select **Internet Hosted** and enter URL as `https://ollama.com`
|
||||
3. Enter your **Ollama API Key** and click **Add**
|
||||
38
docs/integrations/zed.mdx
Normal file
38
docs/integrations/zed.mdx
Normal file
@@ -0,0 +1,38 @@
|
||||
---
|
||||
title: Zed
|
||||
---
|
||||
|
||||
## Install
|
||||
|
||||
Install [Zed](https://zed.dev/download).
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
1. In Zed, click the **star icon** in the bottom-right corner, then select **Configure**.
|
||||
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/zed-settings.png"
|
||||
alt="Zed star icon in bottom right corner"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
2. Under **LLM Providers**, choose **Ollama**
|
||||
3. Confirm the **Host URL** is `http://localhost:11434`, then click **Connect**
|
||||
4. Once connected, select a model under **Ollama**
|
||||
|
||||
<div style={{ display: 'flex', justifyContent: 'center' }}>
|
||||
<img
|
||||
src="/images/zed-ollama-dropdown.png"
|
||||
alt="Zed star icon in bottom right corner"
|
||||
width="50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
## Connecting to ollama.com
|
||||
1. Create an [API key](https://ollama.com/settings/keys) on **ollama.com**
|
||||
2. In Zed, open the **star icon** → **Configure**
|
||||
3. Under **LLM Providers**, select **Ollama**
|
||||
4. Set the **API URL** to `https://ollama.com`
|
||||
|
||||
Reference in New Issue
Block a user