docs: improve syntax highlighting in code blocks (#8854)

This commit is contained in:
Azis Alvriyanto
2025-02-08 00:55:07 +07:00
committed by GitHub
parent abb8dd57f8
commit b901a712c6
16 changed files with 158 additions and 127 deletions

View File

@@ -31,7 +31,7 @@ Certain endpoints stream responses as JSON objects. Streaming can be disabled by
## Generate a completion
```shell
```
POST /api/generate
```
@@ -485,7 +485,7 @@ A single JSON object is returned:
## Generate a chat completion
```shell
```
POST /api/chat
```
@@ -878,6 +878,7 @@ curl http://localhost:11434/api/chat -d '{
```
##### Response
```json
{
"model": "llama3.2",
@@ -924,7 +925,7 @@ A single JSON object is returned:
## Create a Model
```shell
```
POST /api/create
```
@@ -1020,7 +1021,7 @@ curl http://localhost:11434/api/create -d '{
A stream of JSON objects is returned:
```
```json
{"status":"quantizing F16 model to Q4_K_M"}
{"status":"creating new layer sha256:667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29"}
{"status":"using existing layer sha256:11ce4ee3e170f6adebac9a991c22e22ab3f8530e154ee669954c4bc73061c258"}
@@ -1051,7 +1052,7 @@ curl http://localhost:11434/api/create -d '{
A stream of JSON objects is returned:
```
```json
{"status":"parsing GGUF"}
{"status":"using existing layer sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"}
{"status":"writing manifest"}
@@ -1118,7 +1119,7 @@ Return 200 OK if the blob exists, 404 Not Found if it does not.
## Push a Blob
```shell
```
POST /api/blobs/:digest
```
@@ -1142,7 +1143,7 @@ Return 201 Created if the blob was successfully created, 400 Bad Request if the
## List Local Models
```shell
```
GET /api/tags
```
@@ -1195,7 +1196,7 @@ A single JSON object will be returned.
## Show Model Information
```shell
```
POST /api/show
```
@@ -1261,7 +1262,7 @@ curl http://localhost:11434/api/show -d '{
## Copy a Model
```shell
```
POST /api/copy
```
@@ -1284,7 +1285,7 @@ Returns a 200 OK if successful, or a 404 Not Found if the source model doesn't e
## Delete a Model
```shell
```
DELETE /api/delete
```
@@ -1310,7 +1311,7 @@ Returns a 200 OK if successful, 404 Not Found if the model to be deleted doesn't
## Pull a Model
```shell
```
POST /api/pull
```
@@ -1382,7 +1383,7 @@ if `stream` is set to false, then the response is a single JSON object:
## Push a Model
```shell
```
POST /api/push
```
@@ -1447,7 +1448,7 @@ If `stream` is set to `false`, then the response is a single JSON object:
## Generate Embeddings
```shell
```
POST /api/embed
```
@@ -1515,7 +1516,7 @@ curl http://localhost:11434/api/embed -d '{
```
## List Running Models
```shell
```
GET /api/ps
```
@@ -1562,7 +1563,7 @@ A single JSON object will be returned.
> Note: this endpoint has been superseded by `/api/embed`
```shell
```
POST /api/embeddings
```
@@ -1602,7 +1603,7 @@ curl http://localhost:11434/api/embeddings -d '{
## Version
```shell
```
GET /api/version
```

View File

@@ -7,7 +7,7 @@ Install prerequisites:
Then build and run Ollama from the root directory of the repository:
```
```shell
go run . serve
```
@@ -23,14 +23,14 @@ Install prerequisites:
Then, configure and build the project:
```
```shell
cmake -B build
cmake --build build
```
Lastly, run Ollama:
```
```shell
go run . serve
```
@@ -57,14 +57,14 @@ Install prerequisites:
Then, configure and build the project:
```
```shell
cmake -B build
cmake --build build --config Release
```
Lastly, run Ollama:
```
```shell
go run . serve
```
@@ -88,26 +88,26 @@ Install prerequisites:
Then, configure and build the project:
```
```shell
cmake -B build
cmake --build build
```
Lastly, run Ollama:
```
```shell
go run . serve
```
## Docker
```
```shell
docker build .
```
### ROCm
```
```shell
docker build --build-arg FLAVOR=rocm .
```
@@ -115,7 +115,7 @@ docker build --build-arg FLAVOR=rocm .
To run tests, use `go test`:
```
```shell
go test ./...
```

View File

@@ -2,7 +2,7 @@
### CPU only
```bash
```shell
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
```
@@ -11,42 +11,46 @@ Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-
#### Install with Apt
1. Configure the repository
```bash
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
```
```shell
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
```
2. Install the NVIDIA Container Toolkit packages
```bash
sudo apt-get install -y nvidia-container-toolkit
```
```shell
sudo apt-get install -y nvidia-container-toolkit
```
#### Install with Yum or Dnf
1. Configure the repository
```bash
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo \
| sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
```
```shell
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo \
| sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
```
2. Install the NVIDIA Container Toolkit packages
```bash
sudo yum install -y nvidia-container-toolkit
```
```shell
sudo yum install -y nvidia-container-toolkit
```
#### Configure Docker to use Nvidia driver
```
```shell
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
```
#### Start the container
```bash
```shell
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
```
@@ -57,7 +61,7 @@ docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ol
To run Ollama using Docker with AMD GPUs, use the `rocm` tag and the following command:
```
```shell
docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm
```
@@ -65,7 +69,7 @@ docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 114
Now you can run a model:
```
```shell
docker exec -it ollama ollama run llama3.2
```

View File

@@ -24,7 +24,7 @@ By default, Ollama uses a context window size of 2048 tokens.
To change this when using `ollama run`, use `/set parameter`:
```
```shell
/set parameter num_ctx 4096
```
@@ -46,10 +46,15 @@ Use the `ollama ps` command to see what models are currently loaded into memory.
```shell
ollama ps
NAME ID SIZE PROCESSOR UNTIL
llama3:70b bcfb190ca3a7 42 GB 100% GPU 4 minutes from now
```
> **Output**:
>
> ```
> NAME ID SIZE PROCESSOR UNTIL
> llama3:70b bcfb190ca3a7 42 GB 100% GPU 4 minutes from now
> ```
The `Processor` column will show which memory the model was loaded in to:
* `100% GPU` means the model was loaded entirely into the GPU
* `100% CPU` means the model was loaded entirely in system memory
@@ -88,7 +93,7 @@ If Ollama is run as a systemd service, environment variables should be set using
4. Reload `systemd` and restart Ollama:
```bash
```shell
systemctl daemon-reload
systemctl restart ollama
```
@@ -221,16 +226,19 @@ properties.
If you are using the API you can preload a model by sending the Ollama server an empty request. This works with both the `/api/generate` and `/api/chat` API endpoints.
To preload the mistral model using the generate endpoint, use:
```shell
curl http://localhost:11434/api/generate -d '{"model": "mistral"}'
```
To use the chat completions endpoint, use:
```shell
curl http://localhost:11434/api/chat -d '{"model": "mistral"}'
```
To preload a model using the CLI, use the command:
```shell
ollama run llama3.2 ""
```
@@ -250,11 +258,13 @@ If you're using the API, use the `keep_alive` parameter with the `/api/generate`
* '0' which will unload the model immediately after generating a response
For example, to preload a model and leave it in memory use:
```shell
curl http://localhost:11434/api/generate -d '{"model": "llama3.2", "keep_alive": -1}'
```
To unload the model and free up memory use:
```shell
curl http://localhost:11434/api/generate -d '{"model": "llama3.2", "keep_alive": 0}'
```

View File

@@ -20,13 +20,13 @@ Make sure that you use the same base model in the `FROM` command as you used to
Now run `ollama create` from the directory where the `Modelfile` was created:
```bash
```shell
ollama create my-model
```
Lastly, test the model:
```bash
```shell
ollama run my-model
```

View File

@@ -119,7 +119,7 @@ sudo systemctl status ollama
To customize the installation of Ollama, you can edit the systemd service file or the environment variables by running:
```
```shell
sudo systemctl edit ollama
```

View File

@@ -28,7 +28,7 @@ A model file is the blueprint to create and share models with Ollama.
The format of the `Modelfile`:
```modelfile
```
# comment
INSTRUCTION arguments
```
@@ -49,7 +49,7 @@ INSTRUCTION arguments
An example of a `Modelfile` creating a mario blueprint:
```modelfile
```
FROM llama3.2
# sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
@@ -69,24 +69,30 @@ To use this:
To view the Modelfile of a given model, use the `ollama show --modelfile` command.
```bash
> ollama show --modelfile llama3.2
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM llama3.2:latest
FROM /Users/pdevine/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
```shell
ollama show --modelfile llama3.2
```
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
> **Output**:
>
> ```
> # Modelfile generated by "ollama show"
> # To build a new Modelfile based on this one, replace the FROM line with:
> # FROM llama3.2:latest
> FROM /Users/pdevine/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
> TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
>
> {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
>
> {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
>
> {{ .Response }}<|eot_id|>"""
> PARAMETER stop "<|start_header_id|>"
> PARAMETER stop "<|end_header_id|>"
> PARAMETER stop "<|eot_id|>"
> PARAMETER stop "<|reserved_special_token"
> ```
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>"""
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|reserved_special_token"
```
## Instructions
@@ -94,13 +100,13 @@ To view the Modelfile of a given model, use the `ollama show --modelfile` comman
The `FROM` instruction defines the base model to use when creating a model.
```modelfile
```
FROM <model name>:<tag>
```
#### Build from existing model
```modelfile
```
FROM llama3.2
```
@@ -111,7 +117,7 @@ Additional models can be found at:
#### Build from a Safetensors model
```modelfile
```
FROM <model directory>
```
@@ -125,7 +131,7 @@ Currently supported model architectures:
#### Build from a GGUF file
```modelfile
```
FROM ./ollama-model.gguf
```
@@ -136,7 +142,7 @@ The GGUF file location should be specified as an absolute path or relative to th
The `PARAMETER` instruction defines a parameter that can be set when the model is run.
```modelfile
```
PARAMETER <parameter> <parametervalue>
```
@@ -183,7 +189,7 @@ TEMPLATE """{{ if .System }}<|im_start|>system
The `SYSTEM` instruction specifies the system message to be used in the template, if applicable.
```modelfile
```
SYSTEM """<system message>"""
```
@@ -193,7 +199,7 @@ The `ADAPTER` instruction specifies a fine tuned LoRA adapter that should apply
#### Safetensor adapter
```modelfile
```
ADAPTER <path to safetensor adapter>
```
@@ -204,7 +210,7 @@ Currently supported Safetensor adapters:
#### GGUF adapter
```modelfile
```
ADAPTER ./ollama-lora.gguf
```
@@ -212,7 +218,7 @@ ADAPTER ./ollama-lora.gguf
The `LICENSE` instruction allows you to specify the legal license under which the model used with this Modelfile is shared or distributed.
```modelfile
```
LICENSE """
<license text>
"""
@@ -222,7 +228,7 @@ LICENSE """
The `MESSAGE` instruction allows you to specify a message history for the model to use when responding. Use multiple iterations of the MESSAGE command to build up a conversation which will guide the model to answer in a similar way.
```modelfile
```
MESSAGE <role> <message>
```
@@ -237,7 +243,7 @@ MESSAGE <role> <message>
#### Example conversation
```modelfile
```
MESSAGE user Is Toronto in Canada?
MESSAGE assistant yes
MESSAGE user Is Sacramento in Canada?

View File

@@ -1,6 +1,7 @@
# OpenAI compatibility
> **Note:** OpenAI compatibility is experimental and is subject to major adjustments including breaking changes. For fully-featured access to the Ollama API, see the Ollama [Python library](https://github.com/ollama/ollama-python), [JavaScript library](https://github.com/ollama/ollama-js) and [REST API](https://github.com/ollama/ollama/blob/main/docs/api.md).
> [!NOTE]
> OpenAI compatibility is experimental and is subject to major adjustments including breaking changes. For fully-featured access to the Ollama API, see the Ollama [Python library](https://github.com/ollama/ollama-python), [JavaScript library](https://github.com/ollama/ollama-js) and [REST API](https://github.com/ollama/ollama/blob/main/docs/api.md).
Ollama provides experimental compatibility with parts of the [OpenAI API](https://platform.openai.com/docs/api-reference) to help connect existing applications to Ollama.
@@ -59,8 +60,10 @@ embeddings = client.embeddings.create(
input=["why is the sky blue?", "why is the grass green?"],
)
```
#### Structured outputs
```py
```python
from pydantic import BaseModel
from openai import OpenAI
@@ -144,7 +147,7 @@ const embedding = await openai.embeddings.create({
### `curl`
``` shell
```shell
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
@@ -319,7 +322,7 @@ ollama pull llama3.2
For tooling that relies on default OpenAI model names such as `gpt-3.5-turbo`, use `ollama cp` to copy an existing model name to a temporary name:
```
```shell
ollama cp llama3.2 gpt-3.5-turbo
```
@@ -343,7 +346,7 @@ curl http://localhost:11434/v1/chat/completions \
The OpenAI API does not have a way of setting the context size for a model. If you need to change the context size, create a `Modelfile` which looks like:
```modelfile
```
FROM <some model>
PARAMETER num_ctx <context size>
```

View File

@@ -17,6 +17,7 @@ When you run Ollama in a **container**, the logs go to stdout/stderr in the cont
```shell
docker logs <container-name>
```
(Use `docker ps` to find the container name)
If manually running `ollama serve` in a terminal, the logs will be on that terminal.
@@ -28,6 +29,7 @@ When you run Ollama on **Windows**, there are a few different locations. You can
- `explorer %TEMP%` where temporary executable files are stored in one or more `ollama*` directories
To enable additional debug logging to help troubleshoot problems, first **Quit the running app from the tray menu** then in a powershell terminal
```powershell
$env:OLLAMA_DEBUG="1"
& "ollama app.exe"
@@ -49,12 +51,13 @@ Dynamic LLM libraries [rocm_v6 cpu cpu_avx cpu_avx2 cuda_v11 rocm_v5]
You can set OLLAMA_LLM_LIBRARY to any of the available LLM libraries to bypass autodetection, so for example, if you have a CUDA card, but want to force the CPU LLM library with AVX2 vector support, use:
```
```shell
OLLAMA_LLM_LIBRARY="cpu_avx2" ollama serve
```
You can see what features your CPU has with the following.
```
```shell
cat /proc/cpuinfo| grep flags | head -1
```
@@ -62,8 +65,8 @@ cat /proc/cpuinfo| grep flags | head -1
If you run into problems on Linux and want to install an older version, or you'd like to try out a pre-release before it's officially released, you can tell the install script which version to install.
```sh
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION="0.1.29" sh
```shell
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.5.7 sh
```
## Linux tmp noexec

View File

@@ -47,6 +47,7 @@ If Ollama is already running, Quit the tray application and relaunch it from the
## API Access
Here's a quick example showing API access from `powershell`
```powershell
(Invoke-WebRequest -method POST -Body '{"model":"llama3.2", "prompt":"Why is the sky blue?", "stream": false}' -uri http://localhost:11434/api/generate ).Content | ConvertFrom-json
```