Switch to llama-cpp-python

2025-08-08 21:40:15 +01:00 · 2025-08-08 21:40:15 +01:00 · d937f2d5fa
commit d937f2d5fa
parent ef7df1a8c3
25 changed files with 2957 additions and 1181 deletions
--- a/README.md
+++ b/README.md
@ -1,8 +1,13 @@
 # 🤖 LLM GGUF Tools

-A collection of Python tools for converting and quantising language models to
-[GGUF format](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md), featuring advanced
-quantisation methods and direct SafeTensors conversion capabilities.
+Python tools for transforming language models into optimised
+[GGUF format](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) using proven quantisation
+strategies. Based on analysis of community patterns, these tools replicate Bartowski's acclaimed
+quantisation profiles whilst handling edge cases that break naive conversion approaches.
+
+The project bridges the gap between HuggingFace's SafeTensors ecosystem and llama.cpp's GGUF
+inference engine, with particular focus on models that fall outside llama.cpp's supported
+architecture list.

 > 💡 **Looking for quantised models?** Check out [tcpipuk's HuggingFace profile](https://huggingface.co/tcpipuk)
 > for models quantised using these tools!
@ -11,48 +16,45 @@ quantisation methods and direct SafeTensors conversion capabilities.

 | Tool | Purpose | Documentation |
 |------|---------|---------------|
-| [quantise_gguf.py](./quantise_gguf.py) | ⚡ GGUF quantisation using a variant of [Bartowski's method](https://huggingface.co/bartowski) | [📖 Docs](docs/quantise_gguf.md) |
-| [safetensors2gguf.py](./safetensors2gguf.py) | 🔄 Direct SafeTensors to GGUF conversion | [📖 Docs](docs/safetensors2gguf.md) |
+| [quantise_gguf.py](./quantise_gguf.py) | Advanced GGUF quantisation with Bartowski's proven profiles (Q3_K-Q6_K variants) | [📖 Docs](docs/quantise_gguf.md) • [🔬 Analysis](docs/bartowski_analysis.md) |
+| [safetensors2gguf.py](./safetensors2gguf.py) | Direct SafeTensors conversion for unsupported architectures | [📖 Docs](docs/safetensors2gguf.md) |

-## Installation
+## Quick Start

-1. You need [`uv`](https://docs.astral.sh/uv/) for the dependencies:
+The project uses [`uv`](https://docs.astral.sh/uv/) for Rust-fast dependency management with
+automatic Python version handling:

-   ```bash
-   # Install uv (see https://docs.astral.sh/uv/#installation for more options)
-   curl -LsSf https://astral.sh/uv/install.sh | sh
+```bash
+# Install uv (or update existing: uv self update)
+curl -LsSf https://astral.sh/uv/install.sh | sh

-   # Or update your existing instance
-   uv self update
-   ```
+# Clone and set up the project
+git clone https://git.tomfos.tr/tom/llm-gguf-tools.git
+cd llm-gguf-tools
+uv sync  # Installs llama-cpp-python with CUDA support if available

-2. Then to set up the environment for these scripts:
+# Generate HuggingFace token for uploads (optional)
+# Visit https://huggingface.co/settings/tokens
+export HF_TOKEN=your_token_here
+```

-   ```bash
-   # Clone the repository
-   git clone https://git.tomfos.tr/tom/llm-gguf-tools.git
-   cd llm-gguf-tools
+Then quantise any HuggingFace model:

-   # Set up virtual environment and install dependencies
-   uv sync
-   ```
+```bash
+# Fire-and-forget quantisation with automatic upload
+uv run quantise_gguf.py https://huggingface.co/meta-llama/Llama-3.2-1B

-## Requirements
+# Or convert unsupported architectures directly
+uv run safetensors2gguf.py ./path/to/model
+```

- **For quantisation**: [llama.cpp](https://github.com/ggerganov/llama.cpp) binaries
-  (`llama-quantize`, `llama-cli`, `llama-imatrix`)
- **For BFloat16 models**: PyTorch (optional, auto-detected)
- **For uploads**: HuggingFace API token (set `HF_TOKEN` environment variable)
+For importance matrix (imatrix) data and calibration techniques, see the
+[📖 IMatrix Data Guide](docs/imatrix_data.md).

 ## Development

-For development setup and contribution guidelines, see [📖 Development Guide](docs/development.md).
-
-## Notes
-
-The `resources/imatrix_data.txt` file contains importance matrix calibration data from
-[Bartowski's Gist](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8),
-based on calibration data provided by Dampf, building upon Kalomaze's foundational work.
+Contributions welcome for pragmatic solutions. See [📖 Development Guide](docs/development.md)
+for setup, standards, and architectural decisions.

 ## License