ollama37/llm/llama.cpp/generate_linux.go at 7a1b37ac64f0fb0585e279a0a840707843511ed3

mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-12 08:47:01 +00:00

Files

Jongwook Choi 12e8c12d2b Disable CUDA peer access as a workaround for multi-gpu inference bug (#1261 )

When CUDA peer access is enabled, multi-gpu inference will produce
garbage output. This is a known bug of llama.cpp (or nvidia). Until the
upstream bug is fixed, we can disable CUDA peer access temporarily
to ensure correct output.

See #961.

2023-11-24 14:05:57 -05:00

1.8 KiB

Raw Blame History

View Raw

1.8 KiB Raw Blame History

1.8 KiB

Raw Blame History