Files
ollama37/docs/capabilities/web-search.mdx
Shang Chieh Tseng ef14fb5b26 Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support
This commit represents a complete rework after pulling the latest changes from
official ollama/ollama repository and re-applying Tesla K80 compatibility patches.

## Key Changes

### CUDA Compute Capability 3.7 Support (Tesla K80)
- Added sm_37 (compute 3.7) to CMAKE_CUDA_ARCHITECTURES in CMakeLists.txt
- Updated CMakePresets.json to include compute 3.7 in "CUDA 11" preset
- Using 37-virtual (PTX with JIT compilation) for maximum compatibility

### Legacy Toolchain Compatibility
- **NVIDIA Driver**: 470.256.02 (last version supporting Kepler/K80)
- **CUDA Version**: 11.4.4 (last CUDA 11.x supporting compute 3.7)
- **GCC Version**: 10.5.0 (required by CUDA 11.4 host_config.h)

### CPU Architecture Trade-offs
Due to GCC 10.5 limitation, sacrificed newer CPU optimizations:
- Alderlake CPU variant enabled WITHOUT AVX_VNNI (requires GCC 11+)
- Still supports: SSE4.2, AVX, F16C, AVX2, BMI2, FMA
- Performance impact: ~3-7% on newer CPUs (acceptable for K80 compatibility)

### Build System Updates
- Modified ml/backend/ggml/ggml/src/ggml-cuda/CMakeLists.txt for compute 3.7
- Added -Wno-deprecated-gpu-targets flag to suppress warnings
- Updated ml/backend/ggml/ggml/src/CMakeLists.txt for Alderlake without AVX_VNNI

### Upstream Sync
Merged latest llama.cpp changes including:
- Enhanced KV cache management with ISWA and hybrid memory support
- Improved multi-modal support (mtmd framework)
- New model architectures (Gemma3, Llama4, Qwen3, etc.)
- GPU backend improvements for CUDA, Metal, and ROCm
- Updated quantization support and GGUF format handling

### Documentation
- Updated CLAUDE.md with comprehensive build instructions
- Documented toolchain constraints and CPU architecture trade-offs
- Removed outdated CI/CD workflows (tesla-k80-*.yml)
- Cleaned up temporary development artifacts

## Rationale

This fork maintains Tesla K80 GPU support (compute 3.7) which was dropped in
official Ollama due to legacy driver/CUDA requirements. The toolchain constraint
creates a deadlock:
- K80 → Driver 470 → CUDA 11.4 → GCC 10 → No AVX_VNNI

We accept the loss of cutting-edge CPU optimizations to enable running modern
LLMs on legacy but still capable Tesla K80 hardware (12GB VRAM per GPU).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 14:03:05 +08:00

361 lines
11 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: Web search
---
Ollama's web search API can be used to augment models with the latest information to reduce hallucinations and improve accuracy.
Web search is provided as a REST API with deeper tool integrations in the Python and JavaScript libraries. This also enables models like OpenAIs gpt-oss models to conduct long-running research tasks.
## Authentication
For access to Ollama's web search API, create an [API key](https://ollama.com/settings/keys). A free Ollama account is required.
## Web search API
Performs a web search for a single query and returns relevant results.
### Request
`POST https://ollama.com/api/web_search`
- `query` (string, required): the search query string
- `max_results` (integer, optional): maximum results to return (default 5, max 10)
### Response
Returns an object containing:
- `results` (array): array of search result objects, each containing:
- `title` (string): the title of the web page
- `url` (string): the URL of the web page
- `content` (string): relevant content snippet from the web page
### Examples
<Note>
Ensure OLLAMA_API_KEY is set or it must be passed in the Authorization header.
</Note>
#### cURL Request
```bash
curl https://ollama.com/api/web_search \
--header "Authorization: Bearer $OLLAMA_API_KEY" \
-d '{
"query":"what is ollama?"
}'
```
**Response**
```json
{
"results": [
{
"title": "Ollama",
"url": "https://ollama.com/",
"content": "Cloud models are now available..."
},
{
"title": "What is Ollama? Introduction to the AI model management tool",
"url": "https://www.hostinger.com/tutorials/what-is-ollama",
"content": "Ariffud M. 6min Read..."
},
{
"title": "Ollama Explained: Transforming AI Accessibility and Language ...",
"url": "https://www.geeksforgeeks.org/artificial-intelligence/ollama-explained-transforming-ai-accessibility-and-language-processing/",
"content": "Data Science Data Science Projects Data Analysis..."
}
]
}
```
#### Python library
```python
import ollama
response = ollama.web_search("What is Ollama?")
print(response)
```
**Example output**
```python
results = [
{
"title": "Ollama",
"url": "https://ollama.com/",
"content": "Cloud models are now available in Ollama..."
},
{
"title": "What is Ollama? Features, Pricing, and Use Cases - Walturn",
"url": "https://www.walturn.com/insights/what-is-ollama-features-pricing-and-use-cases",
"content": "Our services..."
},
{
"title": "Complete Ollama Guide: Installation, Usage & Code Examples",
"url": "https://collabnix.com/complete-ollama-guide-installation-usage-code-examples",
"content": "Join our Discord Server..."
}
]
```
More Ollama [Python example](https://github.com/ollama/ollama-python/blob/main/examples/web-search.py)
#### JavaScript Library
```tsx
import { Ollama } from "ollama";
const client = new Ollama();
const results = await client.webSearch({ query: "what is ollama?" });
console.log(JSON.stringify(results, null, 2));
```
**Example output**
```json
{
"results": [
{
"title": "Ollama",
"url": "https://ollama.com/",
"content": "Cloud models are now available..."
},
{
"title": "What is Ollama? Introduction to the AI model management tool",
"url": "https://www.hostinger.com/tutorials/what-is-ollama",
"content": "Ollama is an open-source tool..."
},
{
"title": "Ollama Explained: Transforming AI Accessibility and Language Processing",
"url": "https://www.geeksforgeeks.org/artificial-intelligence/ollama-explained-transforming-ai-accessibility-and-language-processing/",
"content": "Ollama is a groundbreaking..."
}
]
}
```
More Ollama [JavaScript example](https://github.com/ollama/ollama-js/blob/main/examples/websearch/websearch-tools.ts)
## Web fetch API
Fetches a single web page by URL and returns its content.
### Request
`POST https://ollama.com/api/web_fetch`
- `url` (string, required): the URL to fetch
### Response
Returns an object containing:
- `title` (string): the title of the web page
- `content` (string): the main content of the web page
- `links` (array): array of links found on the page
### Examples
#### cURL Request
```python
curl --request POST \
--url https://ollama.com/api/web_fetch \
--header "Authorization: Bearer $OLLAMA_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"url": "ollama.com"
}'
```
**Response**
```json
{
"title": "Ollama",
"content": "[Cloud models](https://ollama.com/blog/cloud-models) are now available in Ollama...",
"links": [
"http://ollama.com/",
"http://ollama.com/models",
"https://github.com/ollama/ollama"
]
```
#### Python SDK
```python
from ollama import web_fetch
result = web_fetch('https://ollama.com')
print(result)
```
**Result**
```python
WebFetchResponse(
title='Ollama',
content='[Cloud models](https://ollama.com/blog/cloud-models) are now available in Ollama\n\n**Chat & build
with open models**\n\n[Download](https://ollama.com/download) [Explore
models](https://ollama.com/models)\n\nAvailable for macOS, Windows, and Linux',
links=['https://ollama.com/', 'https://ollama.com/models', 'https://github.com/ollama/ollama']
)
```
#### JavaScript SDK
```tsx
import { Ollama } from "ollama";
const client = new Ollama();
const fetchResult = await client.webFetch({ url: "https://ollama.com" });
console.log(JSON.stringify(fetchResult, null, 2));
```
**Result**
```json
{
"title": "Ollama",
"content": "[Cloud models](https://ollama.com/blog/cloud-models) are now available in Ollama...",
"links": [
"https://ollama.com/",
"https://ollama.com/models",
"https://github.com/ollama/ollama"
]
}
```
## Building a search agent
Use Ollamas web search API as a tool to build a mini search agent.
This example uses Alibabas Qwen 3 model with 4B parameters.
```bash
ollama pull qwen3:4b
```
```python
from ollama import chat, web_fetch, web_search
available_tools = {'web_search': web_search, 'web_fetch': web_fetch}
messages = [{'role': 'user', 'content': "what is ollama's new engine"}]
while True:
response = chat(
model='qwen3:4b',
messages=messages,
tools=[web_search, web_fetch],
think=True
)
if response.message.thinking:
print('Thinking: ', response.message.thinking)
if response.message.content:
print('Content: ', response.message.content)
messages.append(response.message)
if response.message.tool_calls:
print('Tool calls: ', response.message.tool_calls)
for tool_call in response.message.tool_calls:
function_to_call = available_tools.get(tool_call.function.name)
if function_to_call:
args = tool_call.function.arguments
result = function_to_call(**args)
print('Result: ', str(result)[:200]+'...')
# Result is truncated for limited context lengths
messages.append({'role': 'tool', 'content': str(result)[:2000 * 4], 'tool_name': tool_call.function.name})
else:
messages.append({'role': 'tool', 'content': f'Tool {tool_call.function.name} not found', 'tool_name': tool_call.function.name})
else:
break
```
**Result**
```
Thinking: Okay, the user is asking about Ollama's new engine. I need to figure out what they're referring to. Ollama is a company that develops large language models, so maybe they've released a new model or an updated version of their existing engine....
Tool calls: [ToolCall(function=Function(name='web_search', arguments={'max_results': 3, 'query': 'Ollama new engine'}))]
Result: results=[WebSearchResult(content='# New model scheduling\n\n## September 23, 2025\n\nOllama now includes a significantly improved model scheduling system. Ahead of running a model, Ollamas new engine
Thinking: Okay, the user asked about Ollama's new engine. Let me look at the search results.
First result is from September 23, 2025, talking about new model scheduling. It mentions improved memory management, reduced crashes, better GPU utilization, and multi-GPU performance. Examples show speed improvements and accurate memory reporting. Supported models include gemma3, llama4, qwen3, etc...
Content: Ollama has introduced two key updates to its engine, both released in 2025:
1. **Enhanced Model Scheduling (September 23, 2025)**
- **Precision Memory Management**: Exact memory allocation reduces out-of-memory crashes and optimizes GPU utilization.
- **Performance Gains**: Examples show significant speed improvements (e.g., 85.54 tokens/s vs 52.02 tokens/s) and full GPU layer utilization.
- **Multi-GPU Support**: Improved efficiency across multiple GPUs, with accurate memory reporting via tools like `nvidia-smi`.
- **Supported Models**: Includes `gemma3`, `llama4`, `qwen3`, `mistral-small3.2`, and more.
2. **Multimodal Engine (May 15, 2025)**
- **Vision Support**: First-class support for vision models, including `llama4:scout` (109B parameters), `gemma3`, `qwen2.5vl`, and `mistral-small3.1`.
- **Multimodal Tasks**: Examples include identifying animals in multiple images, answering location-based questions from videos, and document scanning.
These updates highlight Ollama's focus on efficiency, performance, and expanded capabilities for both text and vision tasks.
```
### Context length and agents
Web search results can return thousands of tokens. It is recommended to increase the context length of the model to at least ~32000 tokens. Search agents work best with full context length. [Ollama's cloud models](https://docs.ollama.com/cloud) run at the full context length.
## MCP Server
You can enable web search in any MCP client through the [Python MCP server](https://github.com/ollama/ollama-python/blob/main/examples/web-search-mcp.py).
### Cline
Ollama's web search can be integrated with Cline easily using the MCP server configuration.
`Manage MCP Servers` > `Configure MCP Servers` > Add the following configuration:
```json
{
"mcpServers": {
"web_search_and_fetch": {
"type": "stdio",
"command": "uv",
"args": ["run", "path/to/web-search-mcp.py"],
"env": { "OLLAMA_API_KEY": "your_api_key_here" }
}
}
}
```
![Cline MCP Configuration](/images/cline-mcp.png)
### Codex
Ollama works well with OpenAI's Codex tool.
Add the following configuration to `~/.codex/config.toml`
```python
[mcp_servers.web_search]
command = "uv"
args = ["run", "path/to/web-search-mcp.py"]
env = { "OLLAMA_API_KEY" = "your_api_key_here" }
```
![Codex MCP Configuration](/images/codex-mcp.png)
### Goose
Ollama can integrate with Goose via its MCP feature.
![Goose MCP Configuration 1](/images/goose-mcp-1.png)
![Goose MCP Configuration 2](/images/goose-mcp-2.png)
### Other integrations
Ollama can be integrated into most of the tools available either through direct integration of Ollama's API, Python / JavaScript libraries, OpenAI compatible API, and MCP server integration.