ollama37/server at main - ollama37 - Gitea: Git with a cup of tea

matt/ollama37

mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-09 23:37:06 +00:00

Files

History

Shang Chieh Tseng 68f9b1580e Add timing instrumentation and user progress messages for model loading

Problem: Model loading takes 2-3 minutes on first load with no user feedback,
causing confusion about whether the system is frozen or working.

Root Cause: GPU initialization (reserveWorstCaseGraph) takes ~164 seconds on
Tesla K80 GPUs due to CUDA kernel compilation (PTX JIT for compute 3.7). This
is by design - it validates GPU compatibility before committing to full load.

Solution:
1. Add comprehensive timing instrumentation to identify bottlenecks
2. Add user-facing progress messages explaining the delay

Changes:
- cmd/cmd.go: Update spinner with informative message for users
- llama/llama.go: Add timing logs for CGO model loading
- runner/llamarunner/runner.go: Add detailed timing for llama runner
- runner/ollamarunner/runner.go: Add timing + stderr messages for new engine
- server/sched.go: Add timing for scheduler load operation

User Experience:
Before: Silent wait with blinking cursor for 2-3 minutes
After: Rotating spinner with message "loading model (may take 1-3 min on first load)"

Performance Metrics Captured:
- GGUF file reading: ~0.4s
- GPU kernel compilation: ~164s (bottleneck identified)
- Model weight loading: ~0.002s
- Total end-to-end: ~165s

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-12 19:09:37 +08:00

..

Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support

2025-11-05 14:03:05 +08:00

auth.go

fix nil deref in auth.go

2024-07-26 14:14:48 -07:00

create_test.go

Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support

2025-11-05 14:03:05 +08:00

create.go

Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support

2025-11-05 14:03:05 +08:00

download.go

server: abort download on empty digest

2025-05-27 11:28:48 -07:00

fixblobs_test.go

server: replace blob prefix separator from ':' to '-' (#3146 )

2024-03-14 20:18:06 -07:00

fixblobs.go

server: replace blob prefix separator from ':' to '-' (#3146 )

2024-03-14 20:18:06 -07:00

images_test.go

Reapply "feat: incremental gguf parser (#10822 )" (#11114 ) (#11119 )

2025-06-20 11:11:40 -07:00

images.go

Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support

2025-11-05 14:03:05 +08:00

layer.go

One corrupt manifest should not wedge model operations (#7515 )

2024-11-05 14:21:45 -08:00

manifest_test.go

One corrupt manifest should not wedge model operations (#7515 )

2024-11-05 14:21:45 -08:00

manifest.go

One corrupt manifest should not wedge model operations (#7515 )

2024-11-05 14:21:45 -08:00

model.go

tools: refactor tool call parsing and enable streaming (#10415 )

2025-05-23 14:19:31 -07:00

modelpath_test.go

lint: enable usetesting, disable tenv (#10594 )

2025-05-08 11:42:14 -07:00

modelpath.go

server: add hint to the error message when model path access fails (#10843 )

2025-05-24 13:17:04 -07:00

prompt_test.go

Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support

2025-11-05 14:03:05 +08:00

prompt.go

Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support

2025-11-05 14:03:05 +08:00

quantization_test.go

Reapply "feat: incremental gguf parser (#10822 )" (#11114 ) (#11119 )

2025-06-20 11:11:40 -07:00

quantization.go

skip quantizing per_layer_token_embd (#11207 )

2025-06-26 21:49:35 -07:00

routes_create_test.go

Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support

2025-11-05 14:03:05 +08:00

routes_debug_test.go

Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support

2025-11-05 14:03:05 +08:00

routes_delete_test.go

Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support

2025-11-05 14:03:05 +08:00

routes_generate_renderer_test.go

Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support

2025-11-05 14:03:05 +08:00

routes_generate_test.go

Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support

2025-11-05 14:03:05 +08:00

routes_harmony_streaming_test.go

Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support

2025-11-05 14:03:05 +08:00

routes_list_test.go

Update the /api/create endpoint to use JSON (#7935 )

2024-12-31 18:02:30 -08:00

routes_test.go

Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support

2025-11-05 14:03:05 +08:00

routes.go

Fix Tesla K80 CUBLAS compatibility with two-tier fallback strategy

2025-11-05 23:52:45 +08:00

sched_test.go

Sync with upstream ollama/ollama and restore Tesla K80 (compute 3.7) support

2025-11-05 14:03:05 +08:00

sched.go

Add timing instrumentation and user progress messages for model loading

2025-11-12 19:09:37 +08:00

sparse_common.go

Don't hard fail on sparse setup error

2024-08-09 12:16:19 -07:00

sparse_windows.go

Don't hard fail on sparse setup error

2024-08-09 12:16:19 -07:00

upload.go

server: always print upload/download part info (#8832 )

2025-02-04 19:30:49 -08:00