SHA256

Tools to convert/quantise language models in GGUF format

Find a file

Tom Foster d937f2d5fa Switch to llama-cpp-python		2025-08-08 21:40:15 +01:00
docs	Switch to llama-cpp-python	2025-08-08 21:40:15 +01:00
helpers	Switch to llama-cpp-python	2025-08-08 21:40:15 +01:00
resources	Initial commit	2025-08-07 18:29:12 +01:00
.gitignore	Switch to llama-cpp-python	2025-08-08 21:40:15 +01:00
LICENSE	Initial commit	2025-08-07 18:29:12 +01:00
pyproject.toml	Switch to llama-cpp-python	2025-08-08 21:40:15 +01:00
quantise_gguf.py	Switch to llama-cpp-python	2025-08-08 21:40:15 +01:00
README.md	Switch to llama-cpp-python	2025-08-08 21:40:15 +01:00
safetensors2gguf.py	Initial commit	2025-08-07 18:29:12 +01:00
uv.lock	Switch to llama-cpp-python	2025-08-08 21:40:15 +01:00

README.md

🤖 LLM GGUF Tools

Python tools for transforming language models into optimised GGUF format using proven quantisation strategies. Based on analysis of community patterns, these tools replicate Bartowski's acclaimed quantisation profiles whilst handling edge cases that break naive conversion approaches.

The project bridges the gap between HuggingFace's SafeTensors ecosystem and llama.cpp's GGUF inference engine, with particular focus on models that fall outside llama.cpp's supported architecture list.

💡 Looking for quantised models? Check out tcpipuk's HuggingFace profile for models quantised using these tools!

Available Tools

Tool	Purpose	Documentation
quantise_gguf.py	Advanced GGUF quantisation with Bartowski's proven profiles (Q3_K-Q6_K variants)	📖 Docs • 🔬 Analysis
safetensors2gguf.py	Direct SafeTensors conversion for unsupported architectures	📖 Docs

Quick Start

The project uses uv for Rust-fast dependency management with automatic Python version handling:

# Install uv (or update existing: uv self update)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and set up the project
git clone https://git.tomfos.tr/tom/llm-gguf-tools.git
cd llm-gguf-tools
uv sync  # Installs llama-cpp-python with CUDA support if available

# Generate HuggingFace token for uploads (optional)
# Visit https://huggingface.co/settings/tokens
export HF_TOKEN=your_token_here

Then quantise any HuggingFace model:

# Fire-and-forget quantisation with automatic upload
uv run quantise_gguf.py https://huggingface.co/meta-llama/Llama-3.2-1B

# Or convert unsupported architectures directly
uv run safetensors2gguf.py ./path/to/model

For importance matrix (imatrix) data and calibration techniques, see the 📖 IMatrix Data Guide.

Development

Contributions welcome for pragmatic solutions. See 📖 Development Guide for setup, standards, and architectural decisions.

License

Apache 2.0 License - see LICENSE file for details.