Files
ollama-k80-lab/CLAUDE.md
Shang Chieh Tseng 5436af0189 Update documentation for v1.3.0 release
- Add v1.3.0 release notes with new model support (Qwen2.5-VL, Qwen3 Dense & Sparse, improved MLLama)
- Update both main README.md and ollama37/README.md for consistency
- Add CLAUDE.md for future Claude Code instances
- Enhanced Docker Hub documentation

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-20 09:42:26 +08:00

3.1 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a laboratory for running Ollama (local LLM runner) on NVIDIA K80 GPUs with custom Docker builds optimized for CUDA 11.4 compatibility. The project focuses on LLM-powered workflow automation for software quality assurance, integrating with tools like Dify, VS Code Continue plugin, N8N, and auto-webui.

Docker Commands

Running Ollama

# Pull and run the custom K80-optimized Ollama image
docker pull dogkeeper886/ollama37
docker run --runtime=nvidia --gpus all -p 11434:11434 dogkeeper886/ollama37

# Using docker-compose (recommended for persistent data)
cd ollama37/
docker-compose up -d

# Stop the service
docker-compose down

Building Custom Images

# Build the builder image (contains CUDA 11.4, GCC 10, CMake, Go)
cd ollama37-builder/
docker build -t dogkeeper886/ollama37-builder .

# Build the runtime image
cd ollama37/
docker build -t dogkeeper886/ollama37 .

Architecture

Core Components

  1. ollama37-builder/: Multi-stage Docker build environment

    • Rocky Linux 8 base with NVIDIA drivers 470
    • CUDA 11.4 toolkit for K80 GPU compatibility
    • Custom-compiled GCC 10, CMake 4.0, Go 1.24.2
    • Environment setup scripts for proper library paths
  2. ollama37/: Runtime Docker image

    • Compiled Ollama binary optimized for K80
    • Minimal runtime environment with required CUDA libraries
    • Exposes Ollama API on port 11434
    • Persistent volume support for model storage
  3. dify/: Workflow automation configurations

    • YAML workflow definitions for LLM-powered QA tasks
    • Python utilities for Atlassian/Jira integration (format_jira_ticket.py)
    • Workflow templates: BugBlitz, QualityQuest, ER2Test, etc.
    • Knowledge base with PDF documentation for various systems
  4. mcp-servers/: Model Context Protocol integrations

    • Web browser MCP server for enhanced LLM capabilities

Key Environment Variables

  • OLLAMA_HOST=0.0.0.0:11434 - API endpoint
  • LD_LIBRARY_PATH="/usr/local/lib64:/usr/local/cuda-11.4/lib64" - CUDA libraries
  • NVIDIA_DRIVER_CAPABILITIES=compute,utility - GPU capabilities
  • NVIDIA_VISIBLE_DEVICES=all - GPU visibility

Hardware Requirements

  • NVIDIA K80 GPU
  • NVIDIA Tesla K80 driver installed
  • NVIDIA Container Runtime for Docker
  • Sufficient storage for model downloads (models stored in ./volume/ when using docker-compose)

Development Workflow

Model Testing

The project supports running various LLM models optimized for K80:

  • Qwen2.5-VL (multi-modal vision-language model)
  • Qwen3 Dense & Sparse variants
  • Improved MLLama models
  • Gemma 3 12B
  • Phi-4 Reasoning 14B
  • DeepSeek-R1:32B

Quality Assurance Integration

The Dify workflows enable automated processing of:

  • Jira tickets to Markdown conversion
  • Requirements analysis and test generation
  • Documentation refinement
  • Bug report processing

Persistent Data

When using docker-compose, model data persists in ./volume/ directory, mapped to /root/.ollama inside the container.