From 83973336d644a6cad685e63a3a0e002aea7ea32c Mon Sep 17 00:00:00 2001
From: Shang Chieh Tseng <shangchieh.tseng@tsengsyu.com>
Date: Fri, 8 Aug 2025 11:44:59 +0800
Subject: [PATCH] Optimize Docker build performance with parallel compilation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Add -j$(nproc) flag to cmake build in ollama37.Dockerfile
- Use all available CPU cores for faster compilation
- Add sync-upstream.md documentation for future maintenance

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
---
 docs/sync-upstream.md | 147 ++++++++++++++++++++++++++++++++++++++++++
 ollama37.Dockerfile   |   2 +-
 2 files changed, 148 insertions(+), 1 deletion(-)
 create mode 100644 docs/sync-upstream.md

diff --git a/docs/sync-upstream.md b/docs/sync-upstream.md
new file mode 100644
index 00000000..9af84817
--- /dev/null
+++ b/docs/sync-upstream.md
@@ -0,0 +1,147 @@
+# Syncing with Upstream Ollama
+
+This document describes the process for syncing the ollama37 fork with the official ollama/ollama repository while preserving CUDA Compute Capability 3.7 support for Tesla K80 GPUs.
+
+## Prerequisites
+
+- Git configured with the upstream remote: `https://github.com/ollama/ollama.git`
+- Understanding of which files contain CUDA 3.7 specific changes
+
+## Key Files to Preserve
+
+When merging from upstream, always preserve CUDA 3.7 support in these files:
+
+1. **`ml/backend/ggml/ggml/src/ggml-cuda/CMakeLists.txt`** - Contains `CMAKE_CUDA_ARCHITECTURES "37;50;61;70;75;80"`
+2. **`CMakePresets.json`** - Keep both "CUDA 11" (with arch 37) and "CUDA 12" presets
+3. **`README.md`** - Maintain ollama37 specific documentation
+4. **`docs/*.md`** - Keep our custom documentation
+
+## Sync Process
+
+### 1. Create a New Branch
+
+```bash
+git checkout main
+git checkout -b sync-upstream-models
+```
+
+### 2. Add Upstream Remote (if not exists)
+
+```bash
+git remote add upstream https://github.com/ollama/ollama.git
+```
+
+### 3. Fetch Latest Changes
+
+```bash
+git fetch upstream main
+```
+
+### 4. Merge Upstream Changes
+
+```bash
+git merge upstream/main -m "Merge upstream ollama/ollama main branch while preserving CUDA 3.7 support"
+```
+
+### 5. Resolve Conflicts
+
+Common conflict resolutions:
+
+#### CMakePresets.json
+- **Resolution**: Keep both CUDA 11 (with architecture 37) and CUDA 12 configurations
+- **Example**: Preserve the "CUDA 11" preset with `"CMAKE_CUDA_ARCHITECTURES": "37;50;52;53;60;61;70;75;80;86"`
+
+#### README.md and Documentation
+- **Resolution**: Keep our version with `git checkout --ours README.md`
+- **Reason**: Maintains ollama37 specific instructions and branding
+
+#### Model Support Files
+- **Resolution**: Accept upstream changes for new model support
+- **Files**: `model/models/models.go`, new model directories
+- **Example**: Accept new imports like `_ "github.com/ollama/ollama/model/models/gptoss"`
+
+#### Backend/Tools Updates
+- **Resolution**: Generally accept upstream improvements
+- **Files**: `tools/tools.go`, `ml/backend/ggml/ggml.go`
+- **Caution**: Verify no CUDA 3.7 specific code is removed
+
+#### Test Files
+- **Resolution**: Accept upstream test additions
+- **Files**: `integration/utils_test.go`
+- **Action**: Include new model tests in the test lists
+
+### 6. Commit the Merge
+
+```bash
+git add -A
+git commit -m "Merge upstream ollama/ollama main branch while preserving CUDA 3.7 support
+
+- Added support for new [model name] from upstream
+- Preserved CUDA Compute Capability 3.7 (Tesla K80) support
+- Kept CUDA 11 configuration alongside CUDA 12
+- Maintained all documentation specific to ollama37 fork
+- [List other significant changes]"
+```
+
+### 7. Test the Build
+
+Build with Docker to verify CUDA 3.7 support:
+
+```bash
+docker build -f ollama37.Dockerfile -t ollama37:test .
+```
+
+Or build manually:
+
+```bash
+CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake -B build
+CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake --build build
+go build -o ollama .
+```
+
+### 8. Test with Tesla K80
+
+```bash
+# Run the server
+./ollama serve
+
+# In another terminal, test a model
+ollama run llama3.2:1b
+```
+
+### 9. Merge to Main
+
+After successful testing:
+
+```bash
+git checkout main
+git merge sync-upstream-models
+git push origin main
+```
+
+## Troubleshooting
+
+### Conflict in CUDA Architecture Settings
+- Always ensure "37" is included in CMAKE_CUDA_ARCHITECTURES
+- CUDA 11 is required for Compute Capability 3.7 support
+- CUDA 12 dropped support for architectures below 50
+
+### Build Failures
+- Check that GCC 10 is being used (required for CUDA 11.4)
+- Verify CUDA 11.x is installed (not CUDA 12)
+- Ensure all CUDA 3.7 specific patches are preserved
+
+### New Model Integration Issues
+- New models should work automatically if they don't have specific CUDA version requirements
+- If a model requires CUDA 12 features, document the limitation in README.md
+
+## Recent Sync History
+
+- **2025-08-08**: Synced with upstream, added gpt-oss model support
+- **Previous syncs**: See git log for merge commits from upstream/main
+
+## Notes
+
+- This process maintains our fork's ability to run on Tesla K80 GPUs while staying current with upstream features
+- Always test thoroughly before merging to main
+- Document any model-specific limitations discovered during testing
\ No newline at end of file
diff --git a/ollama37.Dockerfile b/ollama37.Dockerfile
index 15a2b81a..80972bfb 100644
--- a/ollama37.Dockerfile
+++ b/ollama37.Dockerfile
@@ -5,7 +5,7 @@ FROM dogkeeper886/ollama37-builder AS builder
 COPY . /usr/local/src/ollama37
 WORKDIR /usr/local/src/ollama37
 RUN CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake -B build \
-    && CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake --build build \
+    && CC=/usr/local/bin/gcc CXX=/usr/local/bin/g++ cmake --build build -j$(nproc) \
     && go build -o ollama .
 
 # ===== Stage 2: Runtime image =====