update default model to llama3.2 (#6959)

This commit is contained in:
Jeffrey Morgan
2024-09-25 11:11:22 -07:00
committed by GitHub
parent e9e9bdb8d9
commit 55ea963c9e
29 changed files with 102 additions and 100 deletions

View File

@@ -69,7 +69,7 @@ Enable JSON mode by setting the `format` parameter to `json`. This will structur
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"model": "llama3.2",
"prompt": "Why is the sky blue?"
}'
```
@@ -80,7 +80,7 @@ A stream of JSON objects is returned:
```json
{
"model": "llama3.1",
"model": "llama3.2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"response": "The",
"done": false
@@ -102,7 +102,7 @@ To calculate how fast the response is generated in tokens per second (token/s),
```json
{
"model": "llama3.1",
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "",
"done": true,
@@ -124,7 +124,7 @@ A response can be received in one reply when streaming is off.
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"model": "llama3.2",
"prompt": "Why is the sky blue?",
"stream": false
}'
@@ -136,7 +136,7 @@ If `stream` is set to `false`, the response will be a single JSON object:
```json
{
"model": "llama3.1",
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "The sky is blue because it is the color of the sky.",
"done": true,
@@ -194,7 +194,7 @@ curl http://localhost:11434/api/generate -d '{
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"model": "llama3.2",
"prompt": "What color is the sky at different times of the day? Respond using JSON",
"format": "json",
"stream": false
@@ -205,7 +205,7 @@ curl http://localhost:11434/api/generate -d '{
```json
{
"model": "llama3.1",
"model": "llama3.2",
"created_at": "2023-11-09T21:07:55.186497Z",
"response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
"done": true,
@@ -327,7 +327,7 @@ If you want to set custom options for the model at runtime rather than in the Mo
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"model": "llama3.2",
"prompt": "Why is the sky blue?",
"stream": false,
"options": {
@@ -368,7 +368,7 @@ curl http://localhost:11434/api/generate -d '{
```json
{
"model": "llama3.1",
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "The sky is blue because it is the color of the sky.",
"done": true,
@@ -390,7 +390,7 @@ If an empty prompt is provided, the model will be loaded into memory.
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1"
"model": "llama3.2"
}'
```
@@ -400,7 +400,7 @@ A single JSON object is returned:
```json
{
"model": "llama3.1",
"model": "llama3.2",
"created_at": "2023-12-18T19:52:07.071755Z",
"response": "",
"done": true
@@ -415,7 +415,7 @@ If an empty prompt is provided and the `keep_alive` parameter is set to `0`, a m
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"model": "llama3.2",
"keep_alive": 0
}'
```
@@ -426,7 +426,7 @@ A single JSON object is returned:
```json
{
"model": "llama3.1",
"model": "llama3.2",
"created_at": "2024-09-12T03:54:03.516566Z",
"response": "",
"done": true,
@@ -472,7 +472,7 @@ Send a chat message with a streaming response.
```shell
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"model": "llama3.2",
"messages": [
{
"role": "user",
@@ -488,7 +488,7 @@ A stream of JSON objects is returned:
```json
{
"model": "llama3.1",
"model": "llama3.2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"message": {
"role": "assistant",
@@ -503,7 +503,7 @@ Final response:
```json
{
"model": "llama3.1",
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"done": true,
"total_duration": 4883583458,
@@ -521,7 +521,7 @@ Final response:
```shell
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"model": "llama3.2",
"messages": [
{
"role": "user",
@@ -536,7 +536,7 @@ curl http://localhost:11434/api/chat -d '{
```json
{
"model": "llama3.1",
"model": "llama3.2",
"created_at": "2023-12-12T14:13:43.416799Z",
"message": {
"role": "assistant",
@@ -560,7 +560,7 @@ Send a chat message with a conversation history. You can use this same approach
```shell
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"model": "llama3.2",
"messages": [
{
"role": "user",
@@ -584,7 +584,7 @@ A stream of JSON objects is returned:
```json
{
"model": "llama3.1",
"model": "llama3.2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"message": {
"role": "assistant",
@@ -598,7 +598,7 @@ Final response:
```json
{
"model": "llama3.1",
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"done": true,
"total_duration": 8113331500,
@@ -656,7 +656,7 @@ curl http://localhost:11434/api/chat -d '{
```shell
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"model": "llama3.2",
"messages": [
{
"role": "user",
@@ -674,7 +674,7 @@ curl http://localhost:11434/api/chat -d '{
```json
{
"model": "llama3.1",
"model": "llama3.2",
"created_at": "2023-12-12T14:13:43.416799Z",
"message": {
"role": "assistant",
@@ -696,7 +696,7 @@ curl http://localhost:11434/api/chat -d '{
```
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"model": "llama3.2",
"messages": [
{
"role": "user",
@@ -735,7 +735,7 @@ curl http://localhost:11434/api/chat -d '{
```json
{
"model": "llama3.1",
"model": "llama3.2",
"created_at": "2024-07-22T20:33:28.123648Z",
"message": {
"role": "assistant",
@@ -771,7 +771,7 @@ If the messages array is empty, the model will be loaded into memory.
```
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"model": "llama3.2",
"messages": []
}'
```
@@ -779,7 +779,7 @@ curl http://localhost:11434/api/chat -d '{
##### Response
```json
{
"model": "llama3.1",
"model": "llama3.2",
"created_at":"2024-09-12T21:17:29.110811Z",
"message": {
"role": "assistant",
@@ -798,7 +798,7 @@ If the messages array is empty and the `keep_alive` parameter is set to `0`, a m
```
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"model": "llama3.2",
"messages": [],
"keep_alive": 0
}'
@@ -810,7 +810,7 @@ A single JSON object is returned:
```json
{
"model": "llama3.1",
"model": "llama3.2",
"created_at":"2024-09-12T21:33:17.547535Z",
"message": {
"role": "assistant",
@@ -989,7 +989,7 @@ Show information about a model including details, modelfile, template, parameter
```shell
curl http://localhost:11434/api/show -d '{
"name": "llama3.1"
"name": "llama3.2"
}'
```
@@ -1050,7 +1050,7 @@ Copy a model. Creates a model with another name from an existing model.
```shell
curl http://localhost:11434/api/copy -d '{
"source": "llama3.1",
"source": "llama3.2",
"destination": "llama3-backup"
}'
```
@@ -1105,7 +1105,7 @@ Download a model from the ollama library. Cancelled pulls are resumed from where
```shell
curl http://localhost:11434/api/pull -d '{
"name": "llama3.1"
"name": "llama3.2"
}'
```

View File

@@ -63,7 +63,7 @@ docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 114
Now you can run a model:
```
docker exec -it ollama ollama run llama3.1
docker exec -it ollama ollama run llama3.2
```
### Try different models

View File

@@ -32,7 +32,7 @@ When using the API, specify the `num_ctx` parameter:
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"model": "llama3.2",
"prompt": "Why is the sky blue?",
"options": {
"num_ctx": 4096
@@ -232,7 +232,7 @@ curl http://localhost:11434/api/chat -d '{"model": "mistral"}'
To preload a model using the CLI, use the command:
```shell
ollama run llama3.1 ""
ollama run llama3.2 ""
```
## How do I keep a model loaded in memory or make it unload immediately?
@@ -240,7 +240,7 @@ ollama run llama3.1 ""
By default models are kept in memory for 5 minutes before being unloaded. This allows for quicker response times if you're making numerous requests to the LLM. If you want to immediately unload a model from memory, use the `ollama stop` command:
```shell
ollama stop llama3.1
ollama stop llama3.2
```
If you're using the API, use the `keep_alive` parameter with the `/api/generate` and `/api/chat` endpoints to set the amount of time that a model stays in memory. The `keep_alive` parameter can be set to:
@@ -251,12 +251,12 @@ If you're using the API, use the `keep_alive` parameter with the `/api/generate`
For example, to preload a model and leave it in memory use:
```shell
curl http://localhost:11434/api/generate -d '{"model": "llama3.1", "keep_alive": -1}'
curl http://localhost:11434/api/generate -d '{"model": "llama3.2", "keep_alive": -1}'
```
To unload the model and free up memory use:
```shell
curl http://localhost:11434/api/generate -d '{"model": "llama3.1", "keep_alive": 0}'
curl http://localhost:11434/api/generate -d '{"model": "llama3.2", "keep_alive": 0}'
```
Alternatively, you can change the amount of time all models are loaded into memory by setting the `OLLAMA_KEEP_ALIVE` environment variable when starting the Ollama server. The `OLLAMA_KEEP_ALIVE` variable uses the same parameter types as the `keep_alive` parameter types mentioned above. Refer to the section explaining [how to configure the Ollama server](#how-do-i-configure-ollama-server) to correctly set the environment variable.

View File

@@ -50,7 +50,7 @@ INSTRUCTION arguments
An example of a `Modelfile` creating a mario blueprint:
```modelfile
FROM llama3.1
FROM llama3.2
# sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token
@@ -72,10 +72,10 @@ More examples are available in the [examples directory](../examples).
To view the Modelfile of a given model, use the `ollama show --modelfile` command.
```bash
> ollama show --modelfile llama3.1
> ollama show --modelfile llama3.2
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM llama3.1:latest
# FROM llama3.2:latest
FROM /Users/pdevine/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
@@ -103,7 +103,7 @@ FROM <model name>:<tag>
#### Build from existing model
```modelfile
FROM llama3.1
FROM llama3.2
```
A list of available base models:

View File

@@ -25,7 +25,7 @@ chat_completion = client.chat.completions.create(
'content': 'Say this is a test',
}
],
model='llama3.1',
model='llama3.2',
)
response = client.chat.completions.create(
@@ -46,13 +46,13 @@ response = client.chat.completions.create(
)
completion = client.completions.create(
model="llama3.1",
model="llama3.2",
prompt="Say this is a test",
)
list_completion = client.models.list()
model = client.models.retrieve("llama3.1")
model = client.models.retrieve("llama3.2")
embeddings = client.embeddings.create(
model="all-minilm",
@@ -74,7 +74,7 @@ const openai = new OpenAI({
const chatCompletion = await openai.chat.completions.create({
messages: [{ role: 'user', content: 'Say this is a test' }],
model: 'llama3.1',
model: 'llama3.2',
})
const response = await openai.chat.completions.create({
@@ -94,13 +94,13 @@ const response = await openai.chat.completions.create({
})
const completion = await openai.completions.create({
model: "llama3.1",
model: "llama3.2",
prompt: "Say this is a test.",
})
const listCompletion = await openai.models.list()
const model = await openai.models.retrieve("llama3.1")
const model = await openai.models.retrieve("llama3.2")
const embedding = await openai.embeddings.create({
model: "all-minilm",
@@ -114,7 +114,7 @@ const embedding = await openai.embeddings.create({
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.1",
"model": "llama3.2",
"messages": [
{
"role": "system",
@@ -154,13 +154,13 @@ curl http://localhost:11434/v1/chat/completions \
curl http://localhost:11434/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.1",
"model": "llama3.2",
"prompt": "Say this is a test"
}'
curl http://localhost:11434/v1/models
curl http://localhost:11434/v1/models/llama3.1
curl http://localhost:11434/v1/models/llama3.2
curl http://localhost:11434/v1/embeddings \
-H "Content-Type: application/json" \
@@ -274,7 +274,7 @@ curl http://localhost:11434/v1/embeddings \
Before using a model, pull it locally `ollama pull`:
```shell
ollama pull llama3.1
ollama pull llama3.2
```
### Default model names
@@ -282,7 +282,7 @@ ollama pull llama3.1
For tooling that relies on default OpenAI model names such as `gpt-3.5-turbo`, use `ollama cp` to copy an existing model name to a temporary name:
```
ollama cp llama3.1 gpt-3.5-turbo
ollama cp llama3.2 gpt-3.5-turbo
```
Afterwards, this new model name can be specified the `model` field:

View File

@@ -33,7 +33,7 @@ Omitting a template in these models puts the responsibility of correctly templat
To add templates in your model, you'll need to add a `TEMPLATE` command to the Modelfile. Here's an example using Meta's Llama 3.
```dockerfile
FROM llama3.1
FROM llama3.2
TEMPLATE """{{- if .System }}<|start_header_id|>system<|end_header_id|>

View File

@@ -15,7 +15,7 @@ import { Ollama } from "@langchain/community/llms/ollama";
const ollama = new Ollama({
baseUrl: "http://localhost:11434",
model: "llama3.1",
model: "llama3.2",
});
const answer = await ollama.invoke(`why is the sky blue?`);
@@ -23,7 +23,7 @@ const answer = await ollama.invoke(`why is the sky blue?`);
console.log(answer);
```
That will get us the same thing as if we ran `ollama run llama3.1 "why is the sky blue"` in the terminal. But we want to load a document from the web to ask a question against. **Cheerio** is a great library for ingesting a webpage, and **LangChain** uses it in their **CheerioWebBaseLoader**. So let's install **Cheerio** and build that part of the app.
That will get us the same thing as if we ran `ollama run llama3.2 "why is the sky blue"` in the terminal. But we want to load a document from the web to ask a question against. **Cheerio** is a great library for ingesting a webpage, and **LangChain** uses it in their **CheerioWebBaseLoader**. So let's install **Cheerio** and build that part of the app.
```bash
npm install cheerio

View File

@@ -29,7 +29,7 @@ Ollama uses unicode characters for progress indication, which may render as unkn
Here's a quick example showing API access from `powershell`
```powershell
(Invoke-WebRequest -method POST -Body '{"model":"llama3.1", "prompt":"Why is the sky blue?", "stream": false}' -uri http://localhost:11434/api/generate ).Content | ConvertFrom-json
(Invoke-WebRequest -method POST -Body '{"model":"llama3.2", "prompt":"Why is the sky blue?", "stream": false}' -uri http://localhost:11434/api/generate ).Content | ConvertFrom-json
```
## Troubleshooting