Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support

This commit represents a complete rework after pulling the latest changes from
official ollama/ollama repository and re-applying Tesla K80 compatibility patches.

## Key Changes

### CUDA Compute Capability 3.7 Support (Tesla K80)
- Added sm_37 (compute 3.7) to CMAKE_CUDA_ARCHITECTURES in CMakeLists.txt
- Updated CMakePresets.json to include compute 3.7 in "CUDA 11" preset
- Using 37-virtual (PTX with JIT compilation) for maximum compatibility

### Legacy Toolchain Compatibility
- **NVIDIA Driver**: 470.256.02 (last version supporting Kepler/K80)
- **CUDA Version**: 11.4.4 (last CUDA 11.x supporting compute 3.7)
- **GCC Version**: 10.5.0 (required by CUDA 11.4 host_config.h)

### CPU Architecture Trade-offs
Due to GCC 10.5 limitation, sacrificed newer CPU optimizations:
- Alderlake CPU variant enabled WITHOUT AVX_VNNI (requires GCC 11+)
- Still supports: SSE4.2, AVX, F16C, AVX2, BMI2, FMA
- Performance impact: ~3-7% on newer CPUs (acceptable for K80 compatibility)

### Build System Updates
- Modified ml/backend/ggml/ggml/src/ggml-cuda/CMakeLists.txt for compute 3.7
- Added -Wno-deprecated-gpu-targets flag to suppress warnings
- Updated ml/backend/ggml/ggml/src/CMakeLists.txt for Alderlake without AVX_VNNI

### Upstream Sync
Merged latest llama.cpp changes including:
- Enhanced KV cache management with ISWA and hybrid memory support
- Improved multi-modal support (mtmd framework)
- New model architectures (Gemma3, Llama4, Qwen3, etc.)
- GPU backend improvements for CUDA, Metal, and ROCm
- Updated quantization support and GGUF format handling

### Documentation
- Updated CLAUDE.md with comprehensive build instructions
- Documented toolchain constraints and CPU architecture trade-offs
- Removed outdated CI/CD workflows (tesla-k80-*.yml)
- Cleaned up temporary development artifacts

## Rationale

This fork maintains Tesla K80 GPU support (compute 3.7) which was dropped in
official Ollama due to legacy driver/CUDA requirements. The toolchain constraint
creates a deadlock:
- K80 → Driver 470 → CUDA 11.4 → GCC 10 → No AVX_VNNI

We accept the loss of cutting-edge CPU optimizations to enable running modern
LLMs on legacy but still capable Tesla K80 hardware (12GB VRAM per GPU).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Shang Chieh Tseng
2025-11-05 14:03:05 +08:00
parent fabe2c5cb7
commit ef14fb5b26
817 changed files with 241634 additions and 70888 deletions

View File

@@ -0,0 +1,38 @@
---
title: Cline
---
## Install
Install [Cline](https://docs.cline.bot/getting-started/installing-cline) in your IDE.
## Usage with Ollama
1. Open Cline settings > `API Configuration` and set `API Provider` to `Ollama`
2. Select a model under `Model` or type one (e.g. `qwen3`)
3. Update the context window to at least 32K tokens under `Context Window`
<Note>Coding tools require a larger context window. It is recommended to use a context window of at least 32K tokens. See [Context length](/context-length) for more information.</Note>
<div style={{ display: 'flex', justifyContent: 'center' }}>
<img
src="/images/cline-settings.png"
alt="Cline settings configuration showing API Provider set to Ollama"
width="50%"
/>
</div>
## Connecting to ollama.com
1. Create an [API key](https://ollama.com/settings/keys) from ollama.com
2. Click on `Use custom base URL` and set it to `https://ollama.com`
3. Enter your **Ollama API Key**
4. Select a model from the list
### Recommended Models
- `qwen3-coder:480b`
- `deepseek-v3.1:671b`

View File

@@ -0,0 +1,56 @@
---
title: Codex
---
## Install
Install the [Codex CLI](https://developers.openai.com/codex/cli/):
```
npm install -g @openai/codex
```
## Usage with Ollama
<Note>Codex requires a larger context window. It is recommended to use a context window of at least 32K tokens.</Note>
To use `codex` with Ollama, use the `--oss` flag:
```
codex --oss
```
### Changing Models
By default, codex will use the local `gpt-oss:20b` model. However, you can specify a different model with the `-m` flag:
```
codex --oss -m gpt-oss:120b
```
### Cloud Models
```
codex --oss -m gpt-oss:120b-cloud
```
## Connecting to ollama.com
Create an [API key](https://ollama.com/settings/keys) from ollama.com and export it as `OLLAMA_API_KEY`.
To use ollama.com directly, edit your `~/.codex/config.toml` file to point to ollama.com.
```toml
model = "gpt-oss:120b"
model_provider = "ollama"
[model_providers.ollama]
name = "Ollama"
base_url = "https://ollama.com/v1"
env_key = "OLLAMA_API_KEY"
```
Run `codex` in a new terminal to load the new settings.

View File

@@ -0,0 +1,76 @@
---
title: Droid
---
## Install
Install the [Droid CLI](https://factory.ai/):
```bash
curl -fsSL https://app.factory.ai/cli | sh
```
<Note>Droid requires a larger context window. It is recommended to use a context window of at least 32K tokens. See [Context length](/context-length) for more information.</Note>
## Usage with Ollama
Add a local configuration block to `~/.factory/config.json`:
```json
{
"custom_models": [
{
"model_display_name": "qwen3-coder [Ollama]",
"model": "qwen3-coder",
"base_url": "http://localhost:11434/v1/",
"api_key": "not-needed",
"provider": "generic-chat-completion-api",
"max_tokens": 32000
}
]
}
```
## Cloud Models
`qwen3-coder:480b-cloud` is the recommended model for use with Droid.
Add the cloud configuration block to `~/.factory/config.json`:
```json
{
"custom_models": [
{
"model_display_name": "qwen3-coder [Ollama Cloud]",
"model": "qwen3-coder:480b-cloud",
"base_url": "http://localhost:11434/v1/",
"api_key": "not-needed",
"provider": "generic-chat-completion-api",
"max_tokens": 128000
}
]
}
```
## Connecting to ollama.com
1. Create an [API key](https://ollama.com/settings/keys) from ollama.com and export it as `OLLAMA_API_KEY`.
2. Add the cloud configuration block to `~/.factory/config.json`:
```json
{
"custom_models": [
{
"model_display_name": "qwen3-coder [Ollama Cloud]",
"model": "qwen3-coder:480b",
"base_url": "https://ollama.com/v1/",
"api_key": "OLLAMA_API_KEY",
"provider": "generic-chat-completion-api",
"max_tokens": 128000
}
]
}
```
Run `droid` in a new terminal to load the new settings.

View File

@@ -0,0 +1,49 @@
---
title: Goose
---
## Goose Desktop
Install [Goose](https://block.github.io/goose/docs/getting-started/installation/) Desktop.
### Usage with Ollama
1. In Goose, open **Settings** → **Configure Provider**.
<div style={{ display: 'flex', justifyContent: 'center' }}>
<img
src="/images/goose-settings.png"
alt="Goose settings Panel"
width="75%"
/>
</div>
2. Find **Ollama**, click **Configure**
3. Confirm **API Host** is `http://localhost:11434` and click Submit
### Connecting to ollama.com
1. Create an [API key](https://ollama.com/settings/keys) on ollama.com and save it in your `.env`
2. In Goose, set **API Host** to `https://ollama.com`
## Goose CLI
Install [Goose](https://block.github.io/goose/docs/getting-started/installation/) CLI
### Usage with Ollama
1. Run `goose configure`
2. Select **Configure Providers** and select **Ollama**
<div style={{ display: 'flex', justifyContent: 'center' }}>
<img
src="/images/goose-cli.png"
alt="Goose CLI"
width="50%"
/>
</div>
3. Enter model name (e.g `qwen3`)
### Connecting to ollama.com
1. Create an [API key](https://ollama.com/settings/keys) on ollama.com and save it in your `.env`
2. Run `goose configure`
3. Select **Configure Providers** and select **Ollama**
4. Update **OLLAMA_HOST** to `https://ollama.com`

View File

@@ -0,0 +1,47 @@
---
title: JetBrains
---
<Note>This example uses **IntelliJ**; same steps apply to other JetBrains IDEs (e.g., PyCharm).</Note>
## Install
Install [IntelliJ](https://www.jetbrains.com/idea/).
## Usage with Ollama
<Note>
To use **Ollama**, you will need a [JetBrains AI Subscription](https://www.jetbrains.com/ai-ides/buy/?section=personal&billing=yearly).
</Note>
1. In Intellij, click the **chat icon** located in the right sidebar
<div style={{ display: 'flex', justifyContent: 'center' }}>
<img
src="/images/intellij-chat-sidebar.png"
alt="Intellij Sidebar Chat"
width="50%"
/>
</div>
2. Select the **current model** in the sidebar, then click **Set up Local Models**
<div style={{ display: 'flex', justifyContent: 'center' }}>
<img
src="/images/intellij-current-model.png"
alt="Intellij model bottom right corner"
width="50%"
/>
</div>
3. Under **Third Party AI Providers**, choose **Ollama**
4. Confirm the **Host URL** is `http://localhost:11434`, then click **Ok**
5. Once connected, select a model under **Local models by Ollama**
<div style={{ display: 'flex', justifyContent: 'center' }}>
<img
src="/images/intellij-local-models.png"
alt="Zed star icon in bottom right corner"
width="50%"
/>
</div>

53
docs/integrations/n8n.mdx Normal file
View File

@@ -0,0 +1,53 @@
---
title: n8n
---
## Install
Install [n8n](https://docs.n8n.io/choose-n8n/).
## Using Ollama Locally
1. In the top right corner, click the dropdown and select **Create Credential**
<div style={{ display: 'flex', justifyContent: 'center' }}>
<img
src="/images/n8n-credential-creation.png"
alt="Create a n8n Credential"
width="75%"
/>
</div>
2. Under **Add new credential** select **Ollama**
<div style={{ display: 'flex', justifyContent: 'center' }}>
<img
src="/images/n8n-ollama-form.png"
alt="Select Ollama under Credential"
width="75%"
/>
</div>
3. Confirm Base URL is set to `http://localhost:11434` and click **Save**
<Note> If connecting to `http://localhost:11434` fails, use `http://127.0.0.1:11434`</Note>
4. When creating a new workflow, select **Add a first step** and select an **Ollama node**
<div style={{ display: 'flex', justifyContent: 'center' }}>
<img
src="/images/n8n-chat-node.png"
alt="Add a first step with Ollama node"
width="75%"
/>
</div>
5. Select your model of choice (e.g. `qwen3-coder`)
<div style={{ display: 'flex', justifyContent: 'center' }}>
<img
src="/images/n8n-models.png"
alt="Set up Ollama credentials"
width="75%"
/>
</div>
## Connecting to ollama.com
1. Create an [API key](https://ollama.com/settings/keys) on **ollama.com**.
2. In n8n, click **Create Credential** and select **Ollama**
4. Set the **API URL** to `https://ollama.com`
5. Enter your **API Key** and click **Save**

View File

@@ -0,0 +1,30 @@
---
title: Roo Code
---
## Install
Install [Roo Code](https://marketplace.visualstudio.com/items?itemName=RooVeterinaryInc.roo-cline) from the VS Code Marketplace.
## Usage with Ollama
1. Open Roo Code in VS Code and click the **gear icon** on the top right corner of the Roo Code window to open **Provider Settings**
2. Set `API Provider` to `Ollama`
3. (Optional) Update `Base URL` if your Ollama instance is running remotely. The default is `http://localhost:11434`
4. Enter a valid `Model ID` (for example `qwen3` or `qwen3-coder:480b-cloud`)
5. Adjust the `Context Window` to at least 32K tokens for coding tasks
<Note>Coding tools require a larger context window. It is recommended to use a context window of at least 32K tokens. See [Context length](/context-length) for more information.</Note>
## Connecting to ollama.com
1. Create an [API key](https://ollama.com/settings/keys) from ollama.com
2. Enable `Use custom base URL` and set it to `https://ollama.com`
3. Enter your **Ollama API Key**
4. Select a model from the list
### Recommended Models
- `qwen3-coder:480b`
- `deepseek-v3.1:671b`

View File

@@ -0,0 +1,34 @@
---
title: VS Code
---
## Install
Install [VSCode](https://code.visualstudio.com/download).
## Usage with Ollama
1. Open Copilot side bar found in top right window
<div style={{ display: 'flex', justifyContent: 'center' }}>
<img
src="/images/vscode-sidebar.png"
alt="VSCode chat Sidebar"
width="75%"
/>
</div>
2. Select the model drowpdown > **Manage models**
<div style={{ display: 'flex', justifyContent: 'center' }}>
<img
src="/images/vscode-models.png"
alt="VSCode model picker"
width="75%"
/>
</div>
3. Enter **Ollama** under **Provider Dropdown** and select desired models (e.g `qwen3, qwen3-coder:480b-cloud`)
<div style={{ display: 'flex', justifyContent: 'center' }}>
<img
src="/images/vscode-model-options.png"
alt="VSCode model options dropdown"
width="75%"
/>
</div>

View File

@@ -0,0 +1,45 @@
---
title: Xcode
---
## Install
Install [XCode](https://developer.apple.com/xcode/)
## Usage with Ollama
<Note> Ensure Apple Intelligence is setup and the latest XCode version is v26.0 </Note>
1. Click **XCode** in top left corner > **Settings**
<div style={{ display: 'flex', justifyContent: 'center' }}>
<img
src="/images/xcode-intelligence-window.png"
alt="Xcode Intelligence window"
width="50%"
/>
</div>
2. Select **Locally Hosted**, enter port **11434** and click **Add**
<div style={{ display: 'flex', justifyContent: 'center' }}>
<img
src="/images/xcode-locally-hosted.png"
alt="Xcode settings"
width="50%"
/>
</div>
3. Select the **star icon** on the top left corner and click the **dropdown**
<div style={{ display: 'flex', justifyContent: 'center' }}>
<img
src="/images/xcode-chat-icon.png"
alt="Xcode settings"
width="50%"
/>
</div>
4. Click **My Account** and select your desired model
## Connecting to ollama.com directly
1. Create an [API key](https://ollama.com/settings/keys) from ollama.com
2. Select **Internet Hosted** and enter URL as `https://ollama.com`
3. Enter your **Ollama API Key** and click **Add**

38
docs/integrations/zed.mdx Normal file
View File

@@ -0,0 +1,38 @@
---
title: Zed
---
## Install
Install [Zed](https://zed.dev/download).
## Usage with Ollama
1. In Zed, click the **star icon** in the bottom-right corner, then select **Configure**.
<div style={{ display: 'flex', justifyContent: 'center' }}>
<img
src="/images/zed-settings.png"
alt="Zed star icon in bottom right corner"
width="50%"
/>
</div>
2. Under **LLM Providers**, choose **Ollama**
3. Confirm the **Host URL** is `http://localhost:11434`, then click **Connect**
4. Once connected, select a model under **Ollama**
<div style={{ display: 'flex', justifyContent: 'center' }}>
<img
src="/images/zed-ollama-dropdown.png"
alt="Zed star icon in bottom right corner"
width="50%"
/>
</div>
## Connecting to ollama.com
1. Create an [API key](https://ollama.com/settings/keys) on **ollama.com**
2. In Zed, open the **star icon** → **Configure**
3. Under **LLM Providers**, select **Ollama**
4. Set the **API URL** to `https://ollama.com`