ollama37/llm/llama.cpp/generate_linux.go at 12e8c12d2b5c0658cad014b58a8baf597b0741df

mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-12 16:57:04 +00:00

Files

Jongwook Choi 12e8c12d2b Disable CUDA peer access as a workaround for multi-gpu inference bug (#1261 )

When CUDA peer access is enabled, multi-gpu inference will produce
garbage output. This is a known bug of llama.cpp (or nvidia). Until the
upstream bug is fixed, we can disable CUDA peer access temporarily
to ensure correct output.

See #961.

2023-11-24 14:05:57 -05:00

1.8 KiB

Raw Blame History

View Raw

1.8 KiB Raw Blame History

1.8 KiB

Raw Blame History