ollama-k80-lab/CLAUDE.md

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This is a laboratory for running Ollama (local LLM runner) on NVIDIA K80 GPUs with custom Docker builds optimized for CUDA 11.4 compatibility. The project focuses on LLM-powered workflow automation for software quality assurance, integrating with tools like Dify, VS Code Continue plugin, N8N, and auto-webui.

## Docker Commands

### Running Ollama
```bash
# Pull and run the custom K80-optimized Ollama image
docker pull dogkeeper886/ollama37
docker run --runtime=nvidia --gpus all -p 11434:11434 dogkeeper886/ollama37

# Using docker-compose (recommended for persistent data)
cd ollama37/
docker-compose up -d

# Stop the service
docker-compose down
```

### Building Custom Images
```bash
# Build the builder image (contains CUDA 11.4, GCC 10, CMake, Go)
cd ollama37-builder/
docker build -t dogkeeper886/ollama37-builder .

# Build the runtime image
cd ollama37/
docker build -t dogkeeper886/ollama37 .
```

## Architecture

### Core Components

1. **ollama37-builder/**: Multi-stage Docker build environment
   - Rocky Linux 8 base with NVIDIA drivers 470
   - CUDA 11.4 toolkit for K80 GPU compatibility
   - Custom-compiled GCC 10, CMake 4.0, Go 1.24.2
   - Environment setup scripts for proper library paths

2. **ollama37/**: Runtime Docker image
   - Compiled Ollama binary optimized for K80
   - Minimal runtime environment with required CUDA libraries
   - Exposes Ollama API on port 11434
   - Persistent volume support for model storage

3. **dify/**: Workflow automation configurations
   - YAML workflow definitions for LLM-powered QA tasks
   - Python utilities for Atlassian/Jira integration (`format_jira_ticket.py`)
   - Workflow templates: BugBlitz, QualityQuest, ER2Test, etc.
   - Knowledge base with PDF documentation for various systems

4. **mcp-servers/**: Model Context Protocol integrations
   - Web browser MCP server for enhanced LLM capabilities

### Key Environment Variables
- `OLLAMA_HOST=0.0.0.0:11434` - API endpoint
- `LD_LIBRARY_PATH="/usr/local/lib64:/usr/local/cuda-11.4/lib64"` - CUDA libraries
- `NVIDIA_DRIVER_CAPABILITIES=compute,utility` - GPU capabilities
- `NVIDIA_VISIBLE_DEVICES=all` - GPU visibility

### Hardware Requirements
- NVIDIA K80 GPU
- NVIDIA Tesla K80 driver installed
- NVIDIA Container Runtime for Docker
- Sufficient storage for model downloads (models stored in `./volume/` when using docker-compose)

## Development Workflow

### Model Testing
The project supports running various LLM models optimized for K80:
- Qwen2.5-VL (multi-modal vision-language model)
- Qwen3 Dense & Sparse variants
- Improved MLLama models
- Gemma 3 12B
- Phi-4 Reasoning 14B
- DeepSeek-R1:32B

### Quality Assurance Integration
The Dify workflows enable automated processing of:
- Jira tickets to Markdown conversion
- Requirements analysis and test generation
- Documentation refinement
- Bug report processing

### Persistent Data
When using docker-compose, model data persists in `./volume/` directory, mapped to `/root/.ollama` inside the container.