61 lines
2.3 KiB
Markdown
61 lines
2.3 KiB
Markdown
# 🤖 LLM GGUF Tools
|
|
|
|
Python tools for transforming language models into optimised
|
|
[GGUF format](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) using proven quantisation
|
|
strategies. Based on analysis of community patterns, these tools replicate Bartowski's acclaimed
|
|
quantisation profiles whilst handling edge cases that break naive conversion approaches.
|
|
|
|
The project bridges the gap between HuggingFace's SafeTensors ecosystem and llama.cpp's GGUF
|
|
inference engine, with particular focus on models that fall outside llama.cpp's supported
|
|
architecture list.
|
|
|
|
> 💡 **Looking for quantised models?** Check out [tcpipuk's HuggingFace profile](https://huggingface.co/tcpipuk)
|
|
> for models quantised using these tools!
|
|
|
|
## Available Tools
|
|
|
|
| Tool | Purpose | Documentation |
|
|
|------|---------|---------------|
|
|
| [quantise_gguf.py](./quantise_gguf.py) | Advanced GGUF quantisation with Bartowski's proven profiles (Q3_K-Q6_K variants) | [📖 Docs](docs/quantise_gguf.md) • [🔬 Analysis](docs/bartowski_analysis.md) |
|
|
| [safetensors2gguf.py](./safetensors2gguf.py) | Direct SafeTensors conversion for unsupported architectures | [📖 Docs](docs/safetensors2gguf.md) |
|
|
|
|
## Quick Start
|
|
|
|
The project uses [`uv`](https://docs.astral.sh/uv/) for Rust-fast dependency management with
|
|
automatic Python version handling:
|
|
|
|
```bash
|
|
# Install uv (or update existing: uv self update)
|
|
curl -LsSf https://astral.sh/uv/install.sh | sh
|
|
|
|
# Clone and set up the project
|
|
git clone https://git.tomfos.tr/tom/llm-gguf-tools.git
|
|
cd llm-gguf-tools
|
|
uv sync # Installs llama-cpp-python with CUDA support if available
|
|
|
|
# Generate HuggingFace token for uploads (optional)
|
|
# Visit https://huggingface.co/settings/tokens
|
|
export HF_TOKEN=your_token_here
|
|
```
|
|
|
|
Then quantise any HuggingFace model:
|
|
|
|
```bash
|
|
# Fire-and-forget quantisation with automatic upload
|
|
uv run quantise_gguf.py https://huggingface.co/meta-llama/Llama-3.2-1B
|
|
|
|
# Or convert unsupported architectures directly
|
|
uv run safetensors2gguf.py ./path/to/model
|
|
```
|
|
|
|
For importance matrix (imatrix) data and calibration techniques, see the
|
|
[📖 IMatrix Data Guide](docs/imatrix_data.md).
|
|
|
|
## Development
|
|
|
|
Contributions welcome for pragmatic solutions. See [📖 Development Guide](docs/development.md)
|
|
for setup, standards, and architectural decisions.
|
|
|
|
## License
|
|
|
|
Apache 2.0 License - see [LICENSE](./LICENSE) file for details.
|