- Added GPT-OSS model to supported models list with multi-GPU optimization notes
- Documented Tesla K80 Multi-GPU usage example with nvidia-smi monitoring
- Added comprehensive Tesla K80 Memory Improvements section covering:
* VMM pool crash fixes with granularity alignment
* Multi-GPU model switching scheduler improvements
* Silent inference failure resolution
- Updated recent updates section for v1.4.0 release
- Enhanced technical details with multi-GPU optimization specs
These improvements enable robust production use of Tesla K80 hardware
for LLM inference with seamless model switching capabilities.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Update README.md and CLAUDE.md to correctly reference Gemma3n model
support that was added in version 1.3.0, replacing generic "Gemma 3"
references with the specific "Gemma3n" model name.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Restructure README.md for better readability and organization
- Reduce README word count by 75% while maintaining key information
- Move detailed installation guides to docs/manual-build.md
- Add Tesla K80-specific build instructions and optimizations
- Update CLAUDE.md with new documentation structure and references
- Improve title formatting with emoji and clear tagline
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add Gemma3n model support with text generation capabilities
- Add new CUDA mean operations for improved performance
- Add macOS documentation and performance tests
- Update LLAMA patches for ROCm/CUDA compatibility
- Fix various model conversion and processing issues
- Update CI workflows and build configurations
- Add library model tests and Shakespeare test data
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit updates the README to include macLlama within the community integrations section.
macLlama is a native macOS application built for lightweight and efficient LLM interaction. Key features include:
* **Lightweight & Native:** Designed to be resource-friendly and perform optimally on macOS.
* **Chat-like Interface:** Provides a user-friendly, conversational interface.
* **Multiple Window Support:** Allows users to manage multiple conversations simultaneously.
The primary goal of macLlama is to offer a simple and easy-to-run LLM experience on macOS.
This PR adds Tiny Notepad, a lightweight, notepad-like interface to chat with local LLMs via Ollama.
- It’s designed as a simple, distraction-free alternative.
- The app supports basic note-taking, timestamped logs, and model parameter controls.
- Built with Tkinter, it runs entirely offline and available via PyPI.
Aims to provide a lightweight easy to run and install interface for ollama.