mirror of
https://github.com/dogkeeper886/ollama37.git
synced 2025-12-10 15:57:04 +00:00
Fix gpt-oss model architecture to match GGUF tensor format
The gpt-oss model architecture code expected fused tensors (attn_qkv, ffn_gate_up_exps) but the actual GGUF files contain separate tensors (attn_q/k/v, ffn_gate_exps/up_exps), causing nil pointer panics during model loading. Changes: - model/models/gptoss/model.go: Updated AttentionBlock to use separate Query/Key/Value fields instead of fused QKV, modified Forward() to compute projections separately - model/models/gptoss/model.go: Updated MLPBlock to use separate Gate/Up fields instead of fused GateUp, simplified Forward() logic - fs/ggml/type.go: Reorganized MXFP4 tensor type constant ordering - ml/backend/ggml/ggml/include/ggml.h: Moved GGML_TYPE_MXFP4 to end of enum to match GGUF file format specification - ml/backend/ggml/ggml/src/ggml.c: Updated type name array to match reordered enum - CLAUDE.md: Documented gpt-oss model compatibility fix Result: gpt-oss:20b model now loads and runs successfully on Tesla K80, all 25 layers offload to GPU correctly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
18
CLAUDE.md
18
CLAUDE.md
@@ -149,6 +149,24 @@ Analysis of real-world usage (gemma3:12b) revealed a **2.6 GiB memory overestima
|
||||
- Simpler deployment for single-model workloads
|
||||
- Empirically validated with real Tesla K80 measurements
|
||||
|
||||
## Model Architecture Compatibility
|
||||
|
||||
### GPT-OSS Model Fix (2025-10-29)
|
||||
|
||||
**Issue**: The `gpt-oss` model architecture code expected fused tensor formats that didn't match the actual GGUF file structure, causing nil pointer panics.
|
||||
|
||||
**Root Cause**: Mismatch between code expectations and GGUF file format:
|
||||
- Code expected: `attn_qkv` (fused), `ffn_gate_up_exps` (fused)
|
||||
- GGUF contains: `attn_q/k/v` (separate), `ffn_gate_exps/up_exps` (separate)
|
||||
|
||||
**Fix Applied** (`model/models/gptoss/model.go`):
|
||||
1. Updated `AttentionBlock` struct to use separate `Query`, `Key`, `Value` fields instead of fused `QKV`
|
||||
2. Modified `AttentionBlock.Forward()` to compute Q/K/V projections separately
|
||||
3. Updated `MLPBlock` struct to use separate `Gate` and `Up` fields instead of fused `GateUp`
|
||||
4. Modified `MLPBlock.Forward()` to compute gate/up separately and removed incorrect reshape
|
||||
|
||||
**Result**: ✅ `gpt-oss:20b` model now loads and runs successfully on Tesla K80
|
||||
|
||||
## Documentation Structure
|
||||
|
||||
The project documentation is organized as follows:
|
||||
|
||||
Reference in New Issue
Block a user