History

Tom Foster 430dc059d4 All checks were successful Build Vast.ai Ollama Benchmark Image / Build and Push (push) Successful in 6m4s Details Debugging		2025-07-28 21:15:37 +01:00
..
helpers	Debugging	2025-07-28 21:15:37 +01:00
llm_benchmark.py	Debugging	2025-07-28 21:15:37 +01:00
README.md	Initial commit	2025-07-28 16:58:21 +01:00
run_vast.ai_benchmark.py	Debugging	2025-07-28 21:15:37 +01:00

README.md

Scripts

This directory contains all benchmarking and utility scripts for the GPU LLM Benchmarking project.

Available Scripts
Running Locally
Running on Vast.ai

Available Scripts

Script	About
llm_benchmark.py	The main benchmarking script that tests LLM inference performance across different context window
sizes using dual scenario testing (short prompts vs half-context), comprehensive GPU monitoring, and
statistical analysis across multiple runs.
run_vast.ai_benchmark.py	Remote benchmarking runner for executing LLM benchmarks on Vast.ai GPU instances. Handles the
complete lifecycle including instance provisioning, Ollama installation and configuration, benchmark
execution, results retrieval, and resource cleanup.

Running Locally

For testing on your own hardware:

# Install uv if missing
curl -LsSf https://astral.sh/uv/install.sh | sh

# Or update your existing copy
uv self update

# Path is important to load the .env from the directory above
uv run scripts/llm_benchmark.py

Copy the .env.example to .env locally, and configure it as required. This script does not install Ollama - it expects Ollama to already be running with your desired model pre-pulled.

Running on Vast.ai

For testing across different GPU configurations:

# Install uv if missing
curl -LsSf https://astral.sh/uv/install.sh | sh

# Or update your existing copy
uv self update

# Path is important to load the .env from the directory above
uv run scripts/run_vast.ai_benchmark.py

Copy the .env.example to .env locally, and configure it as required, including an API key from Vast.ai. The script automatically provisions the cheapest instance meeting the parameters, installs dependencies, executes benchmarks, then retrieves the results.