mirror of
https://github.com/dogkeeper886/ollama37.git
synced 2025-12-17 19:27:00 +00:00
Add timing instrumentation and user progress messages for model loading
Problem: Model loading takes 2-3 minutes on first load with no user feedback, causing confusion about whether the system is frozen or working. Root Cause: GPU initialization (reserveWorstCaseGraph) takes ~164 seconds on Tesla K80 GPUs due to CUDA kernel compilation (PTX JIT for compute 3.7). This is by design - it validates GPU compatibility before committing to full load. Solution: 1. Add comprehensive timing instrumentation to identify bottlenecks 2. Add user-facing progress messages explaining the delay Changes: - cmd/cmd.go: Update spinner with informative message for users - llama/llama.go: Add timing logs for CGO model loading - runner/llamarunner/runner.go: Add detailed timing for llama runner - runner/ollamarunner/runner.go: Add timing + stderr messages for new engine - server/sched.go: Add timing for scheduler load operation User Experience: Before: Silent wait with blinking cursor for 2-3 minutes After: Rotating spinner with message "loading model (may take 1-3 min on first load)" Performance Metrics Captured: - GGUF file reading: ~0.4s - GPU kernel compilation: ~164s (bottleneck identified) - Model weight loading: ~0.002s - Total end-to-end: ~165s 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -480,7 +480,10 @@ func (s *Scheduler) load(req *LlmRequest, f *ggml.GGML, systemInfo ml.SystemInfo
|
||||
// Returns:
|
||||
// - gpuIDs: List of GPU device IDs where model layers were loaded
|
||||
// - err: Error if model doesn't fit or loading fails
|
||||
slog.Info("scheduler.load: calling llama.Load() to load model into memory")
|
||||
loadStart := time.Now()
|
||||
gpuIDs, err := llama.Load(req.ctx, systemInfo, gpus, requireFull)
|
||||
slog.Info("scheduler.load: llama.Load() completed", "duration_sec", time.Since(loadStart).Seconds(), "error", err)
|
||||
if err != nil {
|
||||
if errors.Is(err, llm.ErrLoadRequiredFull) {
|
||||
if !requireFull {
|
||||
|
||||
Reference in New Issue
Block a user