- Replace actions/download-artifact@v4 with dawidd6/action-download-artifact@v6
- The default download-artifact action only works within same workflow run
- Third-party action enables downloading artifacts from different workflow
- Both test workflows now download from latest successful tesla-k80-ci.yml run
- Rename tesla-k80-tests.yml to tesla-k80-single-gpu-tests.yml for clarity
- Add new tesla-k80-multi-gpu-tests.yml workflow for large models
- Add multi-gpu profile to test/config/models.yaml with gemma3:27b and gpt-oss:20b
- Multi-GPU workflow includes GPU count verification and weekly schedule
- Profile-specific validation allows multi-GPU splits for large models
- Separate workflows optimize CI efficiency: quick tests vs. thorough tests
Add comprehensive test orchestration framework:
Test Runner (cmd/test-runner/):
- config.go: YAML configuration loading and validation
- server.go: Ollama server lifecycle management (start/stop/health checks)
- monitor.go: Real-time log monitoring with pattern matching
- test.go: Model testing via Ollama API (pull, chat, validation)
- validate.go: Test result validation (GPU usage, response quality, log analysis)
- report.go: Structured reporting (JSON and Markdown formats)
- main.go: CLI interface with run/validate/list commands
Test Configurations (test/config/):
- models.yaml: Full test suite with quick/full/stress profiles
- quick.yaml: Fast smoke test with gemma2:2b
Updated Workflow:
- tesla-k80-tests.yml: Use test-runner instead of shell scripts
- Run quick tests first, then full tests if passing
- Generate structured JSON reports for pass/fail checking
- Upload test results as artifacts
Features:
- Multi-model testing with configurable profiles
- API-based testing (not CLI commands)
- Real-time log monitoring for GPU events and errors
- Automatic validation of GPU loading and response quality
- Structured JSON and Markdown reports
- Graceful server lifecycle management
- Interrupt handling (Ctrl+C cleanup)
Addresses limitations of shell-based testing by providing:
- Better error handling and reporting
- Programmatic test orchestration
- Reusable test framework
- Clear pass/fail criteria
- Detailed test metrics and timing
- Changed tesla-k80-ci.yml to manual trigger only, simplified to build-only workflow
- Created tesla-k80-tests.yml for separate test execution (manual trigger)
- Added .github/workflows/CLAUDE.md with comprehensive test framework design
- Removed binary artifact upload (not needed for single self-hosted runner)
- Replaced README.md with CLAUDE.md for better documentation structure
Test framework plan:
- Go-based test runner at cmd/test-runner/
- YAML configuration for multi-model testing
- Server lifecycle management with log monitoring
- API-based testing with structured reporting
- Support for test profiles (quick/full/stress)
- Tesla K80 build and test workflow with self-hosted runner
- Build using GCC 10 and CUDA 11.4 for Compute Capability 3.7
- Run unit tests, integration tests, and model inference tests
- Test gemma2:2b model loading and GPU acceleration
- Use Claude headless mode to analyze server logs and verify proper GPU initialization
- Upload logs, analysis results, and binary artifacts
- Comprehensive documentation in workflows README
Removed all GitHub Actions workflows (.github/workflows/) as they're not needed
for this Tesla K80 support fork. The workflows were designed for the official
Ollama repository's CI/CD pipeline and would fail in a fork since they:
- Attempt to push to Ollama's Docker Hub
- Run automated tests on PRs (not needed for personal fork)
- Handle official release process
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add Gemma3n model support with text generation capabilities
- Add new CUDA mean operations for improved performance
- Add macOS documentation and performance tests
- Update LLAMA patches for ROCm/CUDA compatibility
- Fix various model conversion and processing issues
- Update CI workflows and build configurations
- Add library model tests and Shakespeare test data
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
This reduces the size of our Windows installer payloads by ~256M by dropping
support for nvidia drivers older than Feb 2023. Hardware support is unchanged.
Linux default bundle sizes are reduced by ~600M to 1G.
During work on our new registry client, I ran into frustrations with CI
where a misspelling in a comment caused the linter to fail, which caused
the tests to not run, which caused the build to not be cached, which
caused the next run to be slow, which caused me to be sad.
This commit address these issues, and pulls in some helpful changes
we've had in CI on ollama.com for some time now.
They are:
* Always run tests, even if the other checks fail.
Tests are the most important part of CI, and should always run. Failures
in tests can be correlated with failures in other checks, and can help
surface the root cause of the failure sooner. This is especially
important when the failure is platform specific, and the tests are not
platform independent.
* Check that `go generate` is clean.
This prevents 'go generate' abuse regressions. This codebase used to use
it to generate platform specific binary build artifacts. Let's make sure
that does not happen again and this powerful tool is used correctly, and
the generated code is checked in.
Also, while adding `go generate` the check, it was revealed that the
generated metal code was putting dates in the comments, resulting in
non-deterministic builds. This is a bad practice, and this commit fixes
that. Git tells us the most important date: the commit date along with
other associated changes.
* Check that `go mod tidy` is clean.
A new job to check that `go mod tidy` is clean was added, to prevent
easily preventable merge conflicts or go.mod changes being deferred to a
future PR that is unrelated to the change that caused the go.mod to
change.
* More robust caching.
We now cache the go build cache, and the go mod download cache
independently. This is because the download cache contains zips that can
be unpacked in parallel faster than they can be fetched and extracted by
tar. This speeds up the build significantly.
The linter is hostile enough. It does not need to also punish us with
longer build times due to small failures like misspellings.
* Bump cuda and rocm versions
Update ROCm to linux:6.3 win:6.2 and CUDA v12 to 12.8.
Yum has some silent failure modes, so largely switch to dnf.
* Fix windows build script
This commit copies (without history) the bmizerany/ollama-go repository
with the intention of integrating it into the ollama as a replacement
for the pushing, and pulling of models, and management of the cache they
are pushed and pulled from.
New homes for these packages will be determined as they are integrated
and we have a better understanding of proper package boundaries.
clang outputs are faster. we were previously building with clang via gcc
wrapper in cgo but this was missed during the build updates so there was
a drop in performance
set owner and group when building the linux tarball so extracted files
are consistent. this is the behaviour of release tarballs in version
0.5.7 and lower
ollama requires vcruntime140_1.dll which isn't found on 2019. previously
the job used the windows runner (2019) but it explicitly installs
2022 to build the app. since the sign job doesn't actually build
anything, it can use the windows-2022 runner instead.
* add build to .dockerignore
* test: only build one arch
* add build to .gitignore
* fix ccache path
* filter amdgpu targets
* only filter if autodetecting
* Don't clobber gpu list for default runner
This ensures the GPU specific environment variables are set properly
* explicitly set CXX compiler for HIP
* Update build_windows.ps1
This isn't complete, but is close. Dependencies are missing, and it only builds the "default" preset.
* build: add ollama subdir
* add .git to .dockerignore
* docs: update development.md
* update build_darwin.sh
* remove unused scripts
* llm: add cwd and build/lib/ollama to library paths
* default DYLD_LIBRARY_PATH to LD_LIBRARY_PATH in runner on macOS
* add additional cmake output vars for msvc
* interim edits to make server detection logic work with dll directories like lib/ollama/cuda_v12
* remove unncessary filepath.Dir, cleanup
* add hardware-specific directory to path
* use absolute server path
* build: linux arm
* cmake install targets
* remove unused files
* ml: visit each library path once
* build: skip cpu variants on arm
* build: install cpu targets
* build: fix workflow
* shorter names
* fix rocblas install
* docs: clean up development.md
* consistent build dir removal in development.md
* silence -Wimplicit-function-declaration build warnings in ggml-cpu
* update readme
* update development readme
* llm: update library lookup logic now that there is one runner (#8587)
* tweak development.md
* update docs
* add windows cuda/rocm tests
---------
Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
upload-artifacts strips off leading common paths so when
the ./build/ artifacts were removed, the ./dist/windows-amd64
prefix became common and was stripped, making the
later download-artifacts place them in the wrong location
* llama: wire up builtin runner
This adds a new entrypoint into the ollama CLI to run the cgo built runner.
On Mac arm64, this will have GPU support, but on all other platforms it will
be the lowest common denominator CPU build. After we fully transition
to the new Go runners more tech-debt can be removed and we can stop building
the "default" runner via make and rely on the builtin always.
* build: Make target improvements
Add a few new targets and help for building locally.
This also adjusts the runner lookup to favor local builds, then
runners relative to the executable, and finally payloads.
* Support customized CPU flags for runners
This implements a simplified custom CPU flags pattern for the runners.
When built without overrides, the runner name contains the vector flag
we check for (AVX) to ensure we don't try to run on unsupported systems
and crash. If the user builds a customized set, we omit the naming
scheme and don't check for compatibility. This avoids checking
requirements at runtime, so that logic has been removed as well. This
can be used to build GPU runners with no vector flags, or CPU/GPU
runners with additional flags (e.g. AVX512) enabled.
* Use relative paths
If the user checks out the repo in a path that contains spaces, make gets
really confused so use relative paths for everything in-repo to avoid breakage.
* Remove payloads from main binary
* install: clean up prior libraries
This removes support for v0.3.6 and older versions (before the tar bundle)
and ensures we clean up prior libraries before extracting the bundle(s).
Without this change, runners and dependent libraries could leak when we
update and lead to subtle runtime errors.
This leverages caching, and some reduced installer scope to try
to speed up builds. It also tidies up some windows build logic
that was only relevant for the older generate/cmake builds.
This will no longer error if built with regular gcc on windows. To help
triage issues that may come in related to different compilers, the runner now
reports the compier used by cgo.