Switch to llama-cpp-python

This commit is contained in:
Tom Foster 2025-08-08 21:40:15 +01:00
parent ef7df1a8c3
commit d937f2d5fa
25 changed files with 2957 additions and 1181 deletions

View file

@ -1,8 +1,13 @@
# 🤖 LLM GGUF Tools
A collection of Python tools for converting and quantising language models to
[GGUF format](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md), featuring advanced
quantisation methods and direct SafeTensors conversion capabilities.
Python tools for transforming language models into optimised
[GGUF format](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) using proven quantisation
strategies. Based on analysis of community patterns, these tools replicate Bartowski's acclaimed
quantisation profiles whilst handling edge cases that break naive conversion approaches.
The project bridges the gap between HuggingFace's SafeTensors ecosystem and llama.cpp's GGUF
inference engine, with particular focus on models that fall outside llama.cpp's supported
architecture list.
> 💡 **Looking for quantised models?** Check out [tcpipuk's HuggingFace profile](https://huggingface.co/tcpipuk)
> for models quantised using these tools!
@ -11,48 +16,45 @@ quantisation methods and direct SafeTensors conversion capabilities.
| Tool | Purpose | Documentation |
|------|---------|---------------|
| [quantise_gguf.py](./quantise_gguf.py) | ⚡ GGUF quantisation using a variant of [Bartowski's method](https://huggingface.co/bartowski) | [📖 Docs](docs/quantise_gguf.md) |
| [safetensors2gguf.py](./safetensors2gguf.py) | 🔄 Direct SafeTensors to GGUF conversion | [📖 Docs](docs/safetensors2gguf.md) |
| [quantise_gguf.py](./quantise_gguf.py) | Advanced GGUF quantisation with Bartowski's proven profiles (Q3_K-Q6_K variants) | [📖 Docs](docs/quantise_gguf.md) • [🔬 Analysis](docs/bartowski_analysis.md) |
| [safetensors2gguf.py](./safetensors2gguf.py) | Direct SafeTensors conversion for unsupported architectures | [📖 Docs](docs/safetensors2gguf.md) |
## Installation
## Quick Start
1. You need [`uv`](https://docs.astral.sh/uv/) for the dependencies:
The project uses [`uv`](https://docs.astral.sh/uv/) for Rust-fast dependency management with
automatic Python version handling:
```bash
# Install uv (see https://docs.astral.sh/uv/#installation for more options)
curl -LsSf https://astral.sh/uv/install.sh | sh
```bash
# Install uv (or update existing: uv self update)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or update your existing instance
uv self update
```
# Clone and set up the project
git clone https://git.tomfos.tr/tom/llm-gguf-tools.git
cd llm-gguf-tools
uv sync # Installs llama-cpp-python with CUDA support if available
2. Then to set up the environment for these scripts:
# Generate HuggingFace token for uploads (optional)
# Visit https://huggingface.co/settings/tokens
export HF_TOKEN=your_token_here
```
```bash
# Clone the repository
git clone https://git.tomfos.tr/tom/llm-gguf-tools.git
cd llm-gguf-tools
Then quantise any HuggingFace model:
# Set up virtual environment and install dependencies
uv sync
```
```bash
# Fire-and-forget quantisation with automatic upload
uv run quantise_gguf.py https://huggingface.co/meta-llama/Llama-3.2-1B
## Requirements
# Or convert unsupported architectures directly
uv run safetensors2gguf.py ./path/to/model
```
- **For quantisation**: [llama.cpp](https://github.com/ggerganov/llama.cpp) binaries
(`llama-quantize`, `llama-cli`, `llama-imatrix`)
- **For BFloat16 models**: PyTorch (optional, auto-detected)
- **For uploads**: HuggingFace API token (set `HF_TOKEN` environment variable)
For importance matrix (imatrix) data and calibration techniques, see the
[📖 IMatrix Data Guide](docs/imatrix_data.md).
## Development
For development setup and contribution guidelines, see [📖 Development Guide](docs/development.md).
## Notes
The `resources/imatrix_data.txt` file contains importance matrix calibration data from
[Bartowski's Gist](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8),
based on calibration data provided by Dampf, building upon Kalomaze's foundational work.
Contributions welcome for pragmatic solutions. See [📖 Development Guide](docs/development.md)
for setup, standards, and architectural decisions.
## License