ollama37/server/quantization.go at 73d6a82cce18f84ff5c67148783224cf25b30b32

mirror of https://github.com/dogkeeper886/ollama37.git synced 2025-12-11 16:26:59 +00:00

Files

Bruce MacDonald fbe6ae285a server: improve tensor quantization fallback logic (#10806 )

Fall back to alternative quantization types when a tensor's dimensions aren't divisible by the block size required for the original desired quantization type. If retried quantization types fail, the system ultimately falls back to F16 (half-precision floating point) which has a block size of 1 and can handle any tensor dimension.

2025-05-22 10:48:08 -07:00

8.1 KiB

Raw Blame History

View Raw

8.1 KiB Raw Blame History

8.1 KiB

Raw Blame History