backend: API to support full precision matmul

Most tensor backends try to optimize performance by using a lower precision for matmuls. However, some operations (such as kq) on some models are sensitive to this and require full precision.
2025-12-10 07:46:59 +00:00 · 2025-02-13 10:01:14 -08:00
parent 4d4463b2bd
commit d773b7d671
4 changed files with 12 additions and 2 deletions
--- a/ml/backend.go
+++ b/ml/backend.go
@@ -66,6 +66,7 @@ type Tensor interface {
 	Add(ctx Context, t2 Tensor) Tensor
 	Mul(ctx Context, t2 Tensor) Tensor
 	Mulmat(ctx Context, t2 Tensor) Tensor
+	MulmatFullPrec(ctx Context, t2 Tensor) Tensor

 	Softmax(ctx Context) Tensor
 	LayerNorm(ctx Context, weight, bias Tensor, eps float32) Tensor