Reduce default parallelism to 1 (#11330)

The current scheduler algorithm of picking the paralellism based on available
VRAM complicates the upcoming dynamic layer memory allocation algorithm.  This
changes the default to 1, with the intent going forward that parallelism is
explicit and will no longer be dynamically determined.  Removal of the dynamic
logic will come in a follow up.
This commit is contained in:
Daniel Hiltgen
2025-07-08 12:08:37 -07:00
committed by GitHub
parent 34088dbcfb
commit 20c3266e94
3 changed files with 4 additions and 6 deletions

View File

@@ -57,9 +57,7 @@ type Scheduler struct {
var defaultModelsPerGPU = 3
// Default automatic value for parallel setting
// Model will still need to fit in VRAM. If this setting won't fit
// we'll back off down to 1 to try to get it to fit
var defaultParallel = 2
var defaultParallel = 1
var ErrMaxQueue = errors.New("server busy, please try again. maximum pending requests exceeded")