diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..4a13865 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,92 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +This is a laboratory for running Ollama (local LLM runner) on NVIDIA K80 GPUs with custom Docker builds optimized for CUDA 11.4 compatibility. The project focuses on LLM-powered workflow automation for software quality assurance, integrating with tools like Dify, VS Code Continue plugin, N8N, and auto-webui. + +## Docker Commands + +### Running Ollama +```bash +# Pull and run the custom K80-optimized Ollama image +docker pull dogkeeper886/ollama37 +docker run --runtime=nvidia --gpus all -p 11434:11434 dogkeeper886/ollama37 + +# Using docker-compose (recommended for persistent data) +cd ollama37/ +docker-compose up -d + +# Stop the service +docker-compose down +``` + +### Building Custom Images +```bash +# Build the builder image (contains CUDA 11.4, GCC 10, CMake, Go) +cd ollama37-builder/ +docker build -t dogkeeper886/ollama37-builder . + +# Build the runtime image +cd ollama37/ +docker build -t dogkeeper886/ollama37 . +``` + +## Architecture + +### Core Components + +1. **ollama37-builder/**: Multi-stage Docker build environment + - Rocky Linux 8 base with NVIDIA drivers 470 + - CUDA 11.4 toolkit for K80 GPU compatibility + - Custom-compiled GCC 10, CMake 4.0, Go 1.24.2 + - Environment setup scripts for proper library paths + +2. **ollama37/**: Runtime Docker image + - Compiled Ollama binary optimized for K80 + - Minimal runtime environment with required CUDA libraries + - Exposes Ollama API on port 11434 + - Persistent volume support for model storage + +3. **dify/**: Workflow automation configurations + - YAML workflow definitions for LLM-powered QA tasks + - Python utilities for Atlassian/Jira integration (`format_jira_ticket.py`) + - Workflow templates: BugBlitz, QualityQuest, ER2Test, etc. + - Knowledge base with PDF documentation for various systems + +4. **mcp-servers/**: Model Context Protocol integrations + - Web browser MCP server for enhanced LLM capabilities + +### Key Environment Variables +- `OLLAMA_HOST=0.0.0.0:11434` - API endpoint +- `LD_LIBRARY_PATH="/usr/local/lib64:/usr/local/cuda-11.4/lib64"` - CUDA libraries +- `NVIDIA_DRIVER_CAPABILITIES=compute,utility` - GPU capabilities +- `NVIDIA_VISIBLE_DEVICES=all` - GPU visibility + +### Hardware Requirements +- NVIDIA K80 GPU +- NVIDIA Tesla K80 driver installed +- NVIDIA Container Runtime for Docker +- Sufficient storage for model downloads (models stored in `./volume/` when using docker-compose) + +## Development Workflow + +### Model Testing +The project supports running various LLM models optimized for K80: +- Qwen2.5-VL (multi-modal vision-language model) +- Qwen3 Dense & Sparse variants +- Improved MLLama models +- Gemma 3 12B +- Phi-4 Reasoning 14B +- DeepSeek-R1:32B + +### Quality Assurance Integration +The Dify workflows enable automated processing of: +- Jira tickets to Markdown conversion +- Requirements analysis and test generation +- Documentation refinement +- Bug report processing + +### Persistent Data +When using docker-compose, model data persists in `./volume/` directory, mapped to `/root/.ollama` inside the container. \ No newline at end of file diff --git a/README.md b/README.md index eab4422..39715c4 100644 --- a/README.md +++ b/README.md @@ -20,6 +20,19 @@ This repository includes a customized version of Ollama, specifically optimized ### 📦 Version History +#### v1.3.0 (2025-07-01) + +This release expands model support while maintaining full Tesla K80 compatibility: + +**New Model Support:** +- **Qwen2.5-VL**: Multi-modal vision-language model for image understanding +- **Qwen3 Dense & Sparse**: Enhanced Qwen3 model variants +- **Improved MLLama**: Better support for Meta's LLaMA models + +**Documentation Updates:** +- Updated installation guides for Tesla K80 compatibility +- Enhanced Docker Hub documentation with latest model information + #### v1.2.0 (2025-05-06) This release introduces support for Qwen3 models, marking a significant step in our commitment to staying Tesla K80 with leading open-source language models. Testing includes successful execution of Gemma 3 12B, Phi-4 Reasoning 14B, and Qwen3 14B, ensuring compatibility with models expected to be widely used in May 2025. diff --git a/ollama37/README.md b/ollama37/README.md index 20bec11..21bc198 100644 --- a/ollama37/README.md +++ b/ollama37/README.md @@ -14,8 +14,11 @@ This setup ensures that users can start experimenting with AI models without the ## Features - **GPU Acceleration**: Fully supports NVIDIA K80 GPUs to accelerate model computations. +- **Multi-Modal AI**: Supports vision-language models like Qwen2.5-VL for image understanding. +- **Advanced Reasoning**: Built-in thinking support for enhanced AI reasoning capabilities. - **Pre-built Binary**: Contains the compiled Ollama binary for immediate use. - **CUDA Libraries**: Includes necessary CUDA libraries and drivers for GPU operations. +- **Enhanced Tool Support**: Improved tool calling and WebP image input support. - **Environment Variables**: Configured to facilitate seamless interaction with the GPU and network settings. ## Usage @@ -99,6 +102,19 @@ This will stop and remove the container, but the data stored in the `.ollama` di ## 📦 Version History +### v1.3.0 (2025-07-01) + +This release expands model support while maintaining full Tesla K80 compatibility: + +**New Model Support:** +- **Qwen2.5-VL**: Multi-modal vision-language model for image understanding +- **Qwen3 Dense & Sparse**: Enhanced Qwen3 model variants +- **Improved MLLama**: Better support for Meta's LLaMA models + +**Documentation Updates:** +- Updated installation guides for Tesla K80 compatibility +- Enhanced Docker Hub documentation with latest model information + ### v1.2.0 (2025-05-06) This release introduces support for Qwen3 models, marking a significant step in our commitment to staying Tesla K80 with leading open-source language models. Testing includes successful execution of Gemma 3 12B, Phi-4 Reasoning 14B, and Qwen3 14B, ensuring compatibility with models expected to be widely used in May 2025.