Reduce default parallelism to 1 (#11330)

The current scheduler algorithm of picking the paralellism based on available VRAM complicates the upcoming dynamic layer memory allocation algorithm. This changes the default to 1, with the intent going forward that parallelism is explicit and will no longer be dynamically determined. Removal of the dynamic logic will come in a follow up.
2025-12-12 00:37:04 +00:00 · 2025-07-08 12:08:37 -07:00
parent 34088dbcfb
commit 20c3266e94
3 changed files with 4 additions and 6 deletions
--- a/server/sched.go
+++ b/server/sched.go
@@ -57,9 +57,7 @@ type Scheduler struct {
 var defaultModelsPerGPU = 3

 // Default automatic value for parallel setting
-// Model will still need to fit in VRAM.  If this setting won't fit
-// we'll back off down to 1 to try to get it to fit
-var defaultParallel = 2
+var defaultParallel = 1

 var ErrMaxQueue = errors.New("server busy, please try again.  maximum pending requests exceeded")