Update README with v1.4.0 release notes and GPT-OSS video

- Add v1.4.0 release information with GPT-OSS support and K80 stability improvements
- Add GPT-OSS 20B performance video to showcase section
- Reorganize video numbering for consistency

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Shang Chieh Tseng
2025-08-16 12:22:23 +08:00
parent 9cbf972aae
commit a9c26eb3b0

View File

@@ -20,6 +20,21 @@ This repository includes a customized version of Ollama, specifically optimized
### 📦 Version History
#### v1.4.0 (2025-08-10)
This release introduces GPT-OSS support and delivers critical stability improvements for Tesla K80 GPUs:
**New Model Support:**
- **GPT-OSS**: Open-source GPT implementation with optimized context management for smaller VRAM GPUs
**Tesla K80 Improvements:**
- Fixed VMM pool crashes through proper memory alignment granularity
- Resolved multi-GPU model switching deadlocks and silent failures
- Enhanced BF16 compatibility for Compute Capability 3.7 devices
- Optimized Docker build performance with parallel compilation
This release ensures reliable operation across single and multi-GPU Tesla K80 configurations while expanding model support with the latest open-source innovations.
#### v1.3.0 (2025-07-01)
This release expands model support while maintaining full Tesla K80 compatibility:
@@ -77,29 +92,33 @@ For detailed build instructions and further customization, refer to the [GitHub
Check out these videos showcasing different aspects of running Ollama on a Tesla K80 GPU:
1. **How to Build an AI Home Lab with a Tesla Graphic Card**
1. **GPT-OSS 20B on a Tesla K80 Real Time Performance Analysis**
- [Watch here](https://youtu.be/58azOBe_tGM)
Description: Experience the power of GPT-OSS 20B running on a Tesla K80 GPU with real-time performance analysis. This video demonstrates the latest open-source GPT implementation with optimized context management, showcasing its capabilities on K80 hardware.
2. **How to Build an AI Home Lab with a Tesla Graphic Card**
- [Watch here](https://youtu.be/-5gMpGI49PA)
Description: Want to dive into the exciting world of AI development at home? In this video, I'm walking you through the entire process of building an AI lab using a Tesla graphic card from setup to running models! This isn's your typical gaming build; we're tackling the unique challenges these cards present.
2. **We Fixed Our LLM Test!**
3. **We Fixed Our LLM Test!**
- [Watch here](https://youtu.be/TUwjZ20rr-U)
Description: We messed up! In our last video, we tested big AI models (called LLMs), but we found some mistakes in how we did it. So, were back with a new test! This time, we're checking out smaller, but still powerful, AI models: Ollama 3.2 Vision 11B, Gemma 3 12B, and Phi-4 14B. These models are easier to run on regular computers.
Description: We messed up! In our last video, we tested big AI models (called LLMs), but we found some mistakes in how we did it. So, we're back with a new test! This time, we're checking out smaller, but still powerful, AI models: Ollama 3.2 Vision 11B, Gemma 3 12B, and Phi-4 14B. These models are easier to run on regular computers.
3. **How to Set Up Ollama for Tesla K80**
4. **How to Set Up Ollama for Tesla K80**
- [Watch here](https://youtu.be/nJ0W6xCdp_c)
Description: 🚀 Set up Ollama with a Tesla K80 Full Guide for CUDA 11.4 + GCC 10 + Go + CMake 💻 In this video, Ill walk you step-by-step through setting up Ollama to run on an NVIDIA Tesla K80 GPU using CUDA 11.4. We will handle all the heavy lifting — from installing the correct NVIDIA CUDA Toolkit suitable for a K80 Building GCC 10 from source (to meet compatibility needs) Compiling CMake manually, and Go installation Prepping your system with everything needed for Ollama development workflows 🔧 Whether you're setting up an dev box using similar hardware or just want to explore LLMs on Tesla K80, this guide has got you covered.
Description: 🚀 Set up Ollama with a Tesla K80 Full Guide for CUDA 11.4 + GCC 10 + Go + CMake 💻 In this video, I'll walk you step-by-step through setting up Ollama to run on an NVIDIA Tesla K80 GPU using CUDA 11.4. We will handle all the heavy lifting — from installing the correct NVIDIA CUDA Toolkit suitable for a K80 Building GCC 10 from source (to meet compatibility needs) Compiling CMake manually, and Go installation Prepping your system with everything needed for Ollama development workflows 🔧 Whether you're setting up an dev box using similar hardware or just want to explore LLMs on Tesla K80, this guide has got you covered.
4. **LLM-Powered Text Refinement with Dify Studio**
5. **LLM-Powered Text Refinement with Dify Studio**
- [Watch here](https://youtu.be/FcAjriKB74M)
Description: This video showcases how to use a Large Language Model (LLM) integrated with Dify Studio for text refinement tasks. We'll walk through the setup, demonstrate real-time processing of text for quality improvement, and evaluate response accuracy and coherence. Whether you're an AI enthusiast or looking to enhance your writing workflow, this tutorial will provide valuable insights.
5. **DeepSeek-R1:32B on Intel i3-14100 CPU - Real-Time Performance Analysis**
6. **DeepSeek-R1:32B on Intel i3-14100 CPU - Real-Time Performance Analysis**
- [Watch here](https://youtu.be/aCqV4hmMxtM)
Description: In this video, we dive into the real-time performance of DeepSeek-R1:32B running on an Intel i3-14100 CPU. Following our previous showcase on a Tesla K80 GPU, we now evaluate its capabilities on a more accessible platform.
6. **DeepSeek-R1:32b in Action on Tesla K80 GPU - Real-Time Performance Showcase**
7. **DeepSeek-R1:32b in Action on Tesla K80 GPU - Real-Time Performance Showcase**
- [Watch here](https://youtu.be/k8jHMa_cHCI)
Description: Whether youre a developer looking to optimize AI models on similar hardware, or just curious about high-performance computing setups, this video offers valuable insights. From technical setup tips to performance benchmarks, we cover it all. What You'll See: - NVIDIA-SMI Status - Ollama Log Insights - Real-Time Response Time Analysis
Description: Whether you're a developer looking to optimize AI models on similar hardware, or just curious about high-performance computing setups, this video offers valuable insights. From technical setup tips to performance benchmarks, we cover it all. What You'll See: - NVIDIA-SMI Status - Ollama Log Insights - Real-Time Response Time Analysis
## License