Refactor to extrapolate helpers

2025-08-07 18:16:00 +01:00 · 2025-08-07 18:16:00 +01:00 · 0294945904
commit 0294945904
parent 485e838bf5
27 changed files with 6701 additions and 115 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,4 +1,3 @@
-# ---> Python
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
@ -27,16 +26,6 @@ share/python-wheels/
 *.egg
 MANIFEST

-# PyInstaller
-#  Usually these files are written by a python script from a template
-#  before PyInstaller builds the exe, so as to inject date/other infos into it.
-*.manifest
-*.spec
-
-# Installer logs
-pip-log.txt
-pip-delete-this-directory.txt
-
 # Unit test / coverage reports
 htmlcov/
 .tox/
@ -52,76 +41,6 @@ coverage.xml
 .pytest_cache/
 cover/

-# Translations
-*.mo
-*.pot
-
-# Django stuff:
-*.log
-local_settings.py
-db.sqlite3
-db.sqlite3-journal
-
-# Flask stuff:
-instance/
-.webassets-cache
-
-# Scrapy stuff:
-.scrapy
-
-# Sphinx documentation
-docs/_build/
-
-# PyBuilder
-.pybuilder/
-target/
-
-# Jupyter Notebook
-.ipynb_checkpoints
-
-# IPython
-profile_default/
-ipython_config.py
-
-# pyenv
-#   For a library or package, you might want to ignore these files since the code is
-#   intended to run in multiple environments; otherwise, check them in:
-# .python-version
-
-# pipenv
-#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
-#   However, in case of collaboration, if having platform-specific dependencies or dependencies
-#   having no cross-platform support, pipenv may install dependencies that don't work, or not
-#   install all needed dependencies.
-#Pipfile.lock
-
-# poetry
-#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
-#   This is especially recommended for binary packages to ensure reproducibility, and is more
-#   commonly ignored for libraries.
-#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
-#poetry.lock
-
-# pdm
-#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
-#pdm.lock
-#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
-#   in version control.
-#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
-.pdm.toml
-.pdm-python
-.pdm-build/
-
-# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
-__pypackages__/
-
-# Celery stuff
-celerybeat-schedule
-celerybeat.pid
-
-# SageMath parsed files
-*.sage.py
-
 # Environments
 .env
 .venv
@ -130,35 +49,3 @@ venv/
 ENV/
 env.bak/
 venv.bak/
-
-# Spyder project settings
-.spyderproject
-.spyproject
-
-# Rope project settings
-.ropeproject
-
-# mkdocs documentation
-/site
-
-# mypy
-.mypy_cache/
-.dmypy.json
-dmypy.json
-
-# Pyre type checker
-.pyre/
-
-# pytype static type analyzer
-.pytype/
-
-# Cython debug symbols
-cython_debug/
-
-# PyCharm
-#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
-#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
-#  and can be added to the global gitignore or merged into this file.  For a more nuclear
-#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
-#.idea/
-
--- a/README.md
+++ b/README.md
@ -1,3 +1,56 @@
-# llm-gguf-tools
+# LLM GGUF Tools

-Tools to convert/quantise language models in GGUF format
+A collection of Python tools for converting and quantising language models to
+[GGUF format](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md), featuring advanced
+quantisation methods and direct SafeTensors conversion capabilities.
+
+## Available Tools
+
+| Tool | Purpose | Documentation |
+|------|---------|---------------|
+| [quantize_gguf.py](./quantize_gguf.py) | GGUF quantisation using [Bartowski's method](https://huggingface.co/bartowski) | [📖 Docs](docs/quantize_gguf.md) |
+| [safetensors2gguf.py](./safetensors2gguf.py) | Direct SafeTensors to GGUF conversion | [📖 Docs](docs/safetensors2gguf.md) |
+
+## Installation
+
+1. You need [`uv`](https://docs.astral.sh/uv/) for the dependencies:
+
+   ```bash
+   # Install uv (see https://docs.astral.sh/uv/#installation for more options)
+   curl -LsSf https://astral.sh/uv/install.sh | sh
+
+   # Or update your existing instance
+   uv self update
+   ```
+
+2. Then to set up the environment for these scripts:
+
+   ```bash
+   # Clone the repository
+   git clone https://git.tomfos.tr/tom/llm-gguf-tools.git
+   cd llm-gguf-tools
+
+   # Set up virtual environment and install dependencies
+   uv sync
+   ```
+
+## Requirements
+
+- **For quantisation**: [llama.cpp](https://github.com/ggerganov/llama.cpp) binaries
+  (`llama-quantize`, `llama-cli`, `llama-imatrix`)
+- **For BFloat16 models**: PyTorch (optional, auto-detected)
+- **For uploads**: HuggingFace API token (set `HF_TOKEN` environment variable)
+
+## Development
+
+For development setup and contribution guidelines, see [📖 Development Guide](docs/development.md).
+
+## Notes
+
+The `resources/imatrix_data.txt` file contains importance matrix calibration data from
+[Bartowski's Gist](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8),
+based on calibration data provided by Dampf, building upon Kalomaze's foundational work.
+
+## License
+
+Apache 2.0 License - see [LICENSE](./LICENSE) file for details.
--- a/docs/development.md
+++ b/docs/development.md
@ -0,0 +1,86 @@
+# Development Guide
+
+This guide covers development setup, code quality standards, and project structure for contributors.
+
+## Code Quality
+
+```bash
+# Run linting
+uv run ruff check
+
+# Format code
+uv run ruff format
+
+# Run with debug logging
+DEBUG=true uv run <script>
+```
+
+## Project Structure
+
+```plain
+llm-gguf-tools/
+├── quantise.py                    # Bartowski quantisation tool
+├── direct_safetensors_to_gguf.py  # Direct conversion tool
+├── helpers/                       # Shared utilities
+│   ├── __init__.py
+│   └── logger.py                  # Colour-coded logging
+├── resources/                     # Resource files
+│   └── imatrix_data.txt          # Calibration data for imatrix
+├── docs/                          # Detailed documentation
+│   ├── quantise.md
+│   ├── direct_safetensors_to_gguf.md
+│   └── development.md
+└── pyproject.toml                # Project configuration
+```
+
+## Contributing Guidelines
+
+Contributions are welcome! Please ensure:
+
+1. Code follows the existing style (run `uv run ruff format`)
+2. All functions have Google-style docstrings
+3. Type hints are used throughout
+4. Tests pass (if applicable)
+
+## Development Workflow
+
+### Setting Up Development Environment
+
+```bash
+# Clone the repository
+git clone https://git.tomfos.tr/tom/llm-gguf-tools.git
+cd llm-gguf-tools
+
+# Install all dependencies including dev
+uv sync --all-groups
+```
+
+### Code Style
+
+- Follow PEP 8 with ruff enforcement
+- Use UK English spelling in comments and documentation
+- Maximum line length: 100 characters
+- Use type hints for all function parameters and returns
+
+### Testing
+
+While formal tests are not yet implemented, ensure:
+
+- Scripts run without errors on sample models
+- Logger output is correctly formatted
+- File I/O operations handle errors gracefully
+
+### Debugging
+
+Enable debug logging for verbose output:
+
+```bash
+DEBUG=true uv run quantise.py <model_url>
+```
+
+This will show additional information about:
+
+- Model download progress
+- Conversion steps
+- File operations
+- Error details
--- a/docs/quantize_gguf.md
+++ b/docs/quantize_gguf.md
@ -0,0 +1,102 @@
+# quantise.py - Advanced GGUF Quantisation
+
+Advanced GGUF quantisation tool implementing Bartowski's sophisticated quantisation pipeline.
+
+## Overview
+
+This tool automates the complete quantisation workflow for converting models to GGUF format with
+multiple precision variants, importance matrix generation, and automatic upload to HuggingFace.
+
+## Quantisation Variants
+
+The tool produces four quantisation variants based on Bartowski's method:
+
+- **Q4_K_M**: Standard baseline quantisation
+- **Q4_K_L**: Q6_K embeddings + Q6_K attention layers for better quality
+- **Q4_K_XL**: Q8_0 embeddings + Q6_K attention layers for enhanced precision
+- **Q4_K_XXL**: Q8_0 embeddings + Q8_0 attention for maximum precision
+
+## Features
+
+- **Automatic model download**: Downloads models from HuggingFace automatically
+- **Importance matrix generation**: Creates imatrix for improved quantisation quality
+- **Parallel processing**: Uploads multiple variants simultaneously
+- **Progress tracking**: Real-time status updates during conversion
+- **README generation**: Automatically creates model cards with quantisation details
+- **HuggingFace integration**: Direct upload to HuggingFace with proper metadata
+
+## Usage
+
+### Basic Usage
+
+```bash
+# Quantise a model from HuggingFace
+uv run quantise.py https://huggingface.co/meta-llama/Llama-3.2-1B
+```
+
+### Command Line Options
+
+```bash
+# Skip imatrix generation for faster processing
+uv run quantise.py <model_url> --no-imatrix
+
+# Local testing without upload
+uv run quantise.py <model_url> --no-upload
+
+# Custom output directory
+uv run quantise.py <model_url> --output-dir ./my-models
+
+# Use specific HuggingFace token
+uv run quantise.py <model_url> --hf-token YOUR_TOKEN
+```
+
+## Environment Variables
+
+- `HF_TOKEN`: HuggingFace API token for uploads
+- `LLAMA_CPP_DIR`: Custom path to llama.cpp binaries
+- `DEBUG`: Enable debug logging when set to "true"
+
+## Requirements
+
+- **llama.cpp binaries**: `llama-quantize`, `llama-cli`, `llama-imatrix`
+- **Calibration data**: `resources/imatrix_data.txt` for importance matrix generation
+- **HuggingFace account**: For uploading quantised models (optional)
+
+## Workflow
+
+1. **Download**: Fetches the model from HuggingFace
+2. **Convert**: Converts to initial GGUF format (F32)
+3. **Generate imatrix**: Creates importance matrix using calibration data
+4. **Quantise**: Produces multiple quantisation variants in parallel
+5. **Upload**: Pushes quantised models to HuggingFace with metadata
+6. **Clean up**: Removes temporary files and caches
+
+## Output Structure
+
+```plain
+output_dir/
+├── model-F32.gguf           # Full precision conversion
+├── model-Q4_K_M.gguf        # Standard quantisation
+├── model-Q4_K_M-imat.gguf   # With importance matrix
+├── model-Q4_K_L-imat.gguf   # Enhanced embeddings/attention
+├── model-Q4_K_XL-imat.gguf  # High precision embeddings
+├── model-Q4_K_XXL-imat.gguf # Maximum precision
+└── imatrix.dat              # Generated importance matrix
+```
+
+## Error Handling
+
+The tool includes comprehensive error handling for:
+
+- Network failures during download
+- Missing binaries or dependencies
+- Insufficient disk space
+- HuggingFace API errors
+- Conversion failures
+
+## Performance Considerations
+
+- **Disk space**: Requires ~3x model size in free space
+- **Memory**: Needs RAM proportional to model size
+- **Processing time**: Varies from minutes to hours based on model size
+- **Network**: Downloads can be large (10-100+ GB for large models)
--- a/docs/safetensors2gguf.md
+++ b/docs/safetensors2gguf.md
@ -0,0 +1,164 @@
+# direct_safetensors_to_gguf.py - Direct SafeTensors Conversion
+
+Direct SafeTensors to GGUF converter for unsupported architectures.
+
+## Overview
+
+This tool converts SafeTensors models directly to GGUF format without requiring specific
+architecture support in llama.cpp. It's particularly useful for experimental models, custom
+architectures, or when llama.cpp's standard conversion tools don't recognise your model
+architecture.
+
+## Features
+
+- **Architecture-agnostic**: Works with unsupported model architectures
+- **Automatic mapping**: Intelligently maps tensor names to GGUF conventions
+- **BFloat16 support**: Handles BF16 tensors with PyTorch (optional)
+- **Vision models**: Supports models with vision components
+- **Tokeniser preservation**: Extracts and includes tokeniser metadata
+- **Fallback mechanisms**: Provides sensible defaults for unknown architectures
+
+## Usage
+
+### Basic Usage
+
+```bash
+# Convert a local SafeTensors model
+uv run direct_safetensors_to_gguf.py /path/to/model/directory
+```
+
+### Command Line Options
+
+```bash
+# Specify output file
+uv run direct_safetensors_to_gguf.py /path/to/model -o output.gguf
+
+# Force specific architecture mapping
+uv run direct_safetensors_to_gguf.py /path/to/model --force-arch qwen2
+
+# Convert with custom output path
+uv run direct_safetensors_to_gguf.py ./my-model --output ./converted/my-model.gguf
+```
+
+## Supported Input Formats
+
+The tool automatically detects and handles:
+
+1. **Single file models**: `model.safetensors`
+2. **Sharded models**: `model-00001-of-00005.safetensors`, etc.
+3. **Custom names**: Any `*.safetensors` files in the directory
+
+## Architecture Mapping
+
+The tool includes built-in mappings for several architectures:
+
+- `DotsOCRForCausalLM` → `qwen2`
+- `GptOssForCausalLM` → `llama`
+- Unknown architectures → `llama` (fallback)
+
+You can override these with the `--force-arch` parameter.
+
+## Tensor Name Mapping
+
+The converter automatically maps common tensor patterns:
+
+| Original Pattern | GGUF Name |
+|-----------------|-----------|
+| `model.embed_tokens.weight` | `token_embd.weight` |
+| `model.norm.weight` | `output_norm.weight` |
+| `lm_head.weight` | `output.weight` |
+| `layers.N.self_attn.q_proj` | `blk.N.attn_q` |
+| `layers.N.self_attn.k_proj` | `blk.N.attn_k` |
+| `layers.N.self_attn.v_proj` | `blk.N.attn_v` |
+| `layers.N.mlp.gate_proj` | `blk.N.ffn_gate` |
+| `layers.N.mlp.up_proj` | `blk.N.ffn_up` |
+| `layers.N.mlp.down_proj` | `blk.N.ffn_down` |
+
+## Configuration Requirements
+
+The model directory must contain:
+
+- **config.json**: Model configuration file (required)
+- **\*.safetensors**: One or more SafeTensors files (required)
+- **tokenizer_config.json**: Tokeniser configuration (optional)
+- **tokenizer.json**: Tokeniser data (optional)
+
+## Output Format
+
+The tool produces a single GGUF file containing:
+
+- All model weights in F32 format
+- Model architecture metadata
+- Tokeniser configuration (if available)
+- Special token IDs (BOS, EOS, UNK, PAD)
+
+## Error Handling
+
+| Error | Message | Solution |
+|-------|---------|----------|
+| Missing config.json | `FileNotFoundError: Config file not found` | Ensure the model directory contains a valid `config.json` file |
+| No SafeTensors files | `FileNotFoundError: No safetensor files found` | Check that the directory contains `.safetensors` files |
+| BFloat16 without PyTorch | `Warning: PyTorch not available, BFloat16 models may not convert properly` | Install PyTorch for BF16 support: `uv add torch` |
+| Unknown architecture | `Warning: Unknown architecture X, using llama as fallback` | Use `--force-arch` to specify a known compatible architecture |
+
+## Technical Details
+
+### Parameter Inference
+
+The tool infers GGUF parameters from the model configuration:
+
+- `vocab_size` → vocabulary size (default: 32000)
+- `max_position_embeddings` → context length (default: 2048)
+- `hidden_size` → embedding dimension (default: 4096)
+- `num_hidden_layers` → number of transformer blocks (default: 32)
+- `num_attention_heads` → attention head count (default: 32)
+- `num_key_value_heads` → KV head count (defaults to attention heads)
+- `rope_theta` → RoPE frequency base (default: 10000.0)
+- `rms_norm_eps` → layer normalisation epsilon (default: 1e-5)
+
+### Vision Model Support
+
+For models with vision components, the tool extracts:
+
+- Vision embedding dimensions
+- Vision transformer block count
+- Vision attention heads
+- Vision feed-forward dimensions
+- Patch size and spatial merge parameters
+
+## Limitations
+
+- **F32 only**: Currently outputs only full precision (F32) models
+- **Architecture guessing**: May require manual architecture specification
+- **Tokeniser compatibility**: Uses llama tokeniser as default fallback
+- **Memory usage**: Requires loading full tensors into memory
+
+## Examples
+
+### Converting a custom model
+
+```bash
+# Download a model first
+git clone https://huggingface.co/my-org/my-model ./my-model
+
+# Convert to GGUF
+uv run direct_safetensors_to_gguf.py ./my-model
+
+# Output will be at ./my-model/my-model-f32.gguf
+```
+
+### Converting with specific architecture
+
+```bash
+# For a Qwen2-based model
+uv run direct_safetensors_to_gguf.py ./qwen-model --force-arch qwen2
+```
+
+### Batch conversion
+
+```bash
+# Convert multiple models
+for model in ./models/*; do
+    uv run direct_safetensors_to_gguf.py "$model" -o "./gguf/$(basename $model).gguf"
+done
+```
--- a/helpers/init.py
+++ b/helpers/init.py
@ -0,0 +1,6 @@
+"""Helper utilities for LLM GGUF tools.
+
+This package provides common utilities, logging, and shared functionality
+used across the quantisation and conversion tools. Uses UK English spelling
+conventions throughout.
+"""
--- a/helpers/config/init.py
+++ b/helpers/config/init.py
@ -0,0 +1,6 @@
+"""Configuration module for quantisation settings and tensor-level precision control.
+
+Provides structured configuration definitions for Bartowski quantisation methods
+including Q4_K_M, Q4_K_L, Q4_K_XL, and Q4_K_XXL variants with fallback strategies
+for different model architectures and deployment scenarios.
+"""
--- a/helpers/config/quantisation_configs.py
+++ b/helpers/config/quantisation_configs.py
@ -0,0 +1,95 @@
+"""Quantisation configuration definitions.
+
+Pre-defined quantisation configurations for the Bartowski method, supporting
+Q4_K_M, Q4_K_L, Q4_K_XL, and Q4_K_XXL variants with tensor-level precision control.
+"""
+
+from __future__ import annotations
+
+from helpers.models.quantisation import QuantisationConfig, QuantisationType
+
+QUANTISATION_CONFIGS: dict[QuantisationType, QuantisationConfig] = {
+    QuantisationType.Q4_K_M: QuantisationConfig(
+        name="Q4_K_M",
+        description="Standard Q4_K_M quantisation (baseline)",
+        tensor_types={},  # No special tensor overrides - uses default Q4_K_M
+        fallback_methods=[],
+    ),
+    QuantisationType.Q4_K_L: QuantisationConfig(
+        name="Q4_K_L",
+        description="Q6_K embeddings + Q6_K attention (+753MB for vocab + reasoning)",
+        tensor_types={
+            "token_embd.weight": "Q6_K",
+            "output.weight": "Q6_K",
+            "lm_head.weight": "Q6_K",
+            "blk.*.attn_q.weight": "Q6_K",
+            "blk.*.attn_k.weight": "Q6_K",
+            "blk.*.attn_v.weight": "Q6_K",
+        },
+        fallback_methods=[
+            {
+                "embed_tokens.weight": "Q6_K",
+                "output.weight": "Q6_K",
+                "lm_head.weight": "Q6_K",
+                "blk.*.attn_q.weight": "Q6_K",
+                "blk.*.attn_k.weight": "Q6_K",
+                "blk.*.attn_v.weight": "Q6_K",
+            },
+            {"token-embedding-type": "Q6_K", "output-tensor-type": "Q6_K"},
+        ],
+    ),
+    QuantisationType.Q4_K_XL: QuantisationConfig(
+        name="Q4_K_XL",
+        description="Q8_0 embeddings + Q6_K attention (+2.1GB for vocabulary + reasoning)",
+        tensor_types={
+            "token_embd.weight": "Q8_0",
+            "output.weight": "Q8_0",
+            "lm_head.weight": "Q8_0",
+            "blk.*.attn_q.weight": "Q6_K",
+            "blk.*.attn_k.weight": "Q6_K",
+            "blk.*.attn_v.weight": "Q6_K",
+        },
+        fallback_methods=[
+            {
+                "embed_tokens.weight": "Q8_0",
+                "output.weight": "Q8_0",
+                "lm_head.weight": "Q8_0",
+                "blk.*.attn_q.weight": "Q6_K",
+                "blk.*.attn_k.weight": "Q6_K",
+                "blk.*.attn_v.weight": "Q6_K",
+            },
+            {"token-embedding-type": "Q8_0", "output-tensor-type": "Q8_0"},
+        ],
+    ),
+    QuantisationType.Q4_K_XXL: QuantisationConfig(
+        name="Q4_K_XXL",
+        description="Q8_0 embeddings + Q8_0 attention (+2.8GB total, maximum precision)",
+        tensor_types={
+            "token_embd.weight": "Q8_0",
+            "output.weight": "Q8_0",
+            "lm_head.weight": "Q8_0",
+            "blk.*.attn_q.weight": "Q8_0",
+            "blk.*.attn_k.weight": "Q8_0",
+            "blk.*.attn_v.weight": "Q8_0",
+        },
+        fallback_methods=[
+            {
+                "embed_tokens.weight": "Q8_0",
+                "output.weight": "Q8_0",
+                "lm_head.weight": "Q8_0",
+                "blk.*.attn_q.weight": "Q8_0",
+                "blk.*.attn_k.weight": "Q8_0",
+                "blk.*.attn_v.weight": "Q8_0",
+            },
+            {"token-embedding-type": "Q8_0", "output-tensor-type": "Q8_0"},
+        ],
+    ),
+}
+
+
+SUPPORTED_QUANTISATION_TYPES: list[QuantisationType] = [
+    QuantisationType.Q4_K_M,
+    QuantisationType.Q4_K_L,
+    QuantisationType.Q4_K_XL,
+    QuantisationType.Q4_K_XXL,
+]
--- a/helpers/logger.py
+++ b/helpers/logger.py
@ -0,0 +1,94 @@
+"""Colour-coded logging configuration for LLM GGUF tools.
+
+Provides a consistent logging interface with colour-coded output for different
+log levels, making it easier to identify warnings, errors, and informational
+messages at a glance during tool execution and debugging sessions.
+"""
+
+from __future__ import annotations
+
+from logging import (
+    CRITICAL,
+    DEBUG,
+    ERROR,
+    INFO,
+    WARNING,
+    Formatter as LoggingFormatter,
+    Logger,
+    LogRecord,
+    StreamHandler as LoggingStreamHandler,
+    getLogger,
+)
+from os import getenv as os_getenv
+from sys import stdout as sys_stdout
+from typing import ClassVar
+
+DEBUG_MODE = os_getenv("DEBUG", "false").lower() == "true"
+
+
+class ColourFormatter(LoggingFormatter):
+    """Custom formatter adding colours to log messages based on severity level.
+
+    Uses ANSI escape codes to provide visual distinction between different
+    log levels in terminal output. Supports standard logging levels with
+    appropriate colour coding: DEBUG (cyan), INFO (green), WARNING (yellow),
+    ERROR (red), and CRITICAL (bold red) for immediate visual feedback.
+    """
+
+    # ANSI colour codes
+    COLOURS: ClassVar[dict[int, str]] = {
+        DEBUG: "\033[36m",  # Cyan
+        INFO: "\033[32m",  # Green
+        WARNING: "\033[33m",  # Yellow
+        ERROR: "\033[31m",  # Red
+        CRITICAL: "\033[1;31m",  # Bold Red
+    }
+    RESET = "\033[0m"
+
+    # Emoji prefixes for different levels
+    EMOJIS: ClassVar[dict[int, str]] = {
+        DEBUG: "🔍",
+        INFO: "ℹ️ ",  # noqa: RUF001
+        WARNING: "⚠️ ",
+        ERROR: "❌",
+        CRITICAL: "🔥",
+    }
+
+    def format(self, record: LogRecord) -> str:
+        """Format log record with colour and emoji based on severity level.
+
+        Enhances standard log formatting by prepending ANSI colour codes and
+        emoji indicators, then appending reset codes to prevent colour bleeding.
+        Maintains standard log structure whilst adding visual enhancements for
+        improved readability in terminal environments.
+
+        Returns:
+            str: Formatted log message with colour and emoji.
+        """
+        # Get colour for this level
+        colour = self.COLOURS.get(record.levelno, "")
+        emoji = self.EMOJIS.get(record.levelno, "")
+
+        # Format the message
+        record.msg = f"{emoji} {record.msg}"
+        formatted = super().format(record)
+
+        # Add colour codes
+        return f"{colour}{formatted}{self.RESET}"
+
+
+# Create and configure the logger
+logger: Logger = getLogger("llm-gguf-tools")
+logger.setLevel(DEBUG if DEBUG_MODE else INFO)
+
+# Create console handler with colour formatter
+handler = LoggingStreamHandler(sys_stdout)
+handler.setLevel(DEBUG if DEBUG_MODE else INFO)
+
+# Set formatter without timestamp for cleaner output
+formatter = ColourFormatter(fmt="%(message)s", datefmt="%H:%M:%S")
+handler.setFormatter(formatter)
+logger.addHandler(handler)
+
+# Prevent propagation to root logger
+logger.propagate = False
--- a/helpers/models/init.py
+++ b/helpers/models/init.py
@ -0,0 +1,35 @@
+"""Pydantic models for llm-gguf-tools.
+
+This module provides structured data models for quantisation and conversion
+operations, ensuring type safety and validation across the toolset.
+"""
+
+from __future__ import annotations
+
+from helpers.models.conversion import (
+    GGUFParameters,
+    ModelConfig,
+    TensorMapping,
+    VisionConfig,
+)
+from helpers.models.quantisation import (
+    LlamaCppEnvironment,
+    ModelSource,
+    QuantisationConfig,
+    QuantisationResult,
+    QuantisationType,
+    URLType,
+)
+
+__all__ = [
+    "GGUFParameters",
+    "LlamaCppEnvironment",
+    "ModelConfig",
+    "ModelSource",
+    "QuantisationConfig",
+    "QuantisationResult",
+    "QuantisationType",
+    "TensorMapping",
+    "URLType",
+    "VisionConfig",
+]
--- a/helpers/models/conversion.py
+++ b/helpers/models/conversion.py
@ -0,0 +1,150 @@
+"""Pydantic models for GGUF conversion operations.
+
+Contains data models for SafeTensors to GGUF conversion including
+model configurations, parameter mappings, and tensor specifications.
+Uses UK English spelling conventions throughout.
+"""
+
+from __future__ import annotations
+
+from typing import Any
+
+from pydantic import BaseModel, ConfigDict, Field
+
+
+class ModelConfig(BaseModel):
+    """Parsed model configuration from HuggingFace config.json.
+
+    Represents the standard configuration metadata extracted from HuggingFace
+    models, providing structured access to architecture details, hyperparameters,
+    and quantisation settings required for GGUF conversion.
+    """
+
+    model_config = ConfigDict(extra="allow")
+
+    architectures: list[str] = Field(default_factory=lambda: ["Unknown"])
+    model_type: str = "unknown"
+    vocab_size: int = 32000
+    max_position_embeddings: int = 2048
+    hidden_size: int = 4096
+    num_hidden_layers: int = 32
+    intermediate_size: int = 11008
+    num_attention_heads: int = 32
+    num_key_value_heads: int | None = None
+    rope_theta: float = 10000.0
+    rope_scaling: dict[str, Any] | None = None
+    rms_norm_eps: float = 1e-5
+    vision_config: VisionConfig | None = None
+
+    def to_gguf_params(self) -> GGUFParameters:
+        """Convert model configuration to GGUF parameters.
+
+        Translates HuggingFace model configuration values to GGUF-specific
+        parameter format, handling defaults and calculating derived values
+        like RoPE dimension count from head dimensions.
+
+        Returns:
+            GGUFParameters instance with converted values.
+        """
+        params = {
+            "vocab_size": self.vocab_size,
+            "context_length": self.max_position_embeddings,
+            "embedding_length": self.hidden_size,
+            "block_count": self.num_hidden_layers,
+            "feed_forward_length": self.intermediate_size,
+            "attention.head_count": self.num_attention_heads,
+            "attention.head_count_kv": self.num_key_value_heads or self.num_attention_heads,
+            "attention.layer_norm_rms_epsilon": self.rms_norm_eps,
+            "rope.freq_base": self.rope_theta,
+            "rope.dimension_count": self.hidden_size // self.num_attention_heads,
+        }
+        return GGUFParameters(**params)  # type: ignore[arg-type]
+
+
+class VisionConfig(BaseModel):
+    """Vision model configuration for multimodal models.
+
+    Contains parameters specific to vision components in multimodal architectures,
+    including patch sizes, embedding dimensions, and spatial merge configurations
+    for proper GGUF metadata generation.
+    """
+
+    model_config = ConfigDict(extra="allow")
+
+    hidden_size: int = 1536
+    num_hidden_layers: int = 42
+    num_attention_heads: int = 12
+    intermediate_size: int = 4224
+    patch_size: int = 14
+    spatial_merge_size: int = 2
+    rms_norm_eps: float | None = None
+
+
+class GGUFParameters(BaseModel):
+    """GGUF-specific parameters inferred from model configuration.
+
+    Translates HuggingFace configuration values to GGUF parameter names and
+    formats, providing a standardised interface for GGUF writer configuration
+    across different model architectures and quantisation strategies.
+    """
+
+    model_config = ConfigDict(extra="allow")
+
+    # Basic parameters
+    vocab_size: int
+    context_length: int
+    embedding_length: int
+    block_count: int
+    feed_forward_length: int
+
+    # Attention parameters
+    attention_head_count: int = Field(alias="attention.head_count")
+    attention_head_count_kv: int = Field(alias="attention.head_count_kv")
+    attention_layer_norm_rms_epsilon: float = Field(alias="attention.layer_norm_rms_epsilon")
+
+    # RoPE parameters
+    rope_freq_base: float = Field(alias="rope.freq_base")
+    rope_dimension_count: int = Field(alias="rope.dimension_count")
+    rope_scaling_type: str | None = Field(default=None, alias="rope.scaling.type")
+    rope_scaling_factor: float | None = Field(default=None, alias="rope.scaling.factor")
+
+
+class TensorMapping(BaseModel):
+    """Mapping configuration for tensor name conversion.
+
+    Defines rules for translating between HuggingFace tensor naming conventions
+    and GGUF tensor names, supporting both direct mappings and pattern-based
+    transformations for layer-specific tensors.
+    """
+
+    model_config = ConfigDict(frozen=True)
+
+    # Direct mappings (exact name matches)
+    direct_mappings: dict[str, str] = Field(
+        default_factory=lambda: {
+            "model.embed_tokens.weight": "token_embd.weight",
+            "model.norm.weight": "output_norm.weight",
+            "lm_head.weight": "output.weight",
+        }
+    )
+
+    # Layer component patterns (for .layers.N. tensors)
+    layer_patterns: dict[str, str] = Field(
+        default_factory=lambda: {
+            "self_attn.q_proj.weight": "attn_q.weight",
+            "self_attn.q_proj.bias": "attn_q.bias",
+            "self_attn.k_proj.weight": "attn_k.weight",
+            "self_attn.k_proj.bias": "attn_k.bias",
+            "self_attn.v_proj.weight": "attn_v.weight",
+            "self_attn.v_proj.bias": "attn_v.bias",
+            "self_attn.o_proj": "attn_output.weight",
+            "mlp.gate_proj": "ffn_gate.weight",
+            "mlp.up_proj": "ffn_up.weight",
+            "mlp.down_proj": "ffn_down.weight",
+            "input_layernorm": "attn_norm.weight",
+            "post_attention_layernorm": "ffn_norm.weight",
+        }
+    )
+
+    # Architecture-specific overrides
+    architecture_overrides: dict[str, dict[str, str]] = Field(default_factory=dict)
--- a/helpers/models/quantisation.py
+++ b/helpers/models/quantisation.py
@ -0,0 +1,168 @@
+"""Pydantic models for quantisation operations.
+
+Contains data models specific to the quantisation workflow including
+quantisation types, configurations, and results. Uses UK English spelling
+conventions throughout (quantisation, not quantization).
+"""
+
+from __future__ import annotations
+
+from enum import StrEnum
+from typing import TYPE_CHECKING
+
+from pydantic import BaseModel, ConfigDict, Field, field_validator
+
+if TYPE_CHECKING:
+    from pathlib import Path
+
+
+class QuantisationType(StrEnum):
+    """Available quantisation types for Bartowski-method GGUF model conversion.
+
+    Defines the specific quantisation strategies supported by this tool, ranging
+    from Q4_K_M baseline to Q4_K_XXL maximum precision variants. Each type
+    represents different trade-offs between model size and quality preservation
+    for embeddings, attention layers, and feed-forward networks.
+    """
+
+    Q4_K_M = "Q4_K_M"
+    Q4_K_L = "Q4_K_L"
+    Q4_K_XL = "Q4_K_XL"
+    Q4_K_XXL = "Q4_K_XXL"
+
+
+class URLType(StrEnum):
+    """Supported URL formats for model source specification.
+
+    Categorises input URL formats to enable appropriate handling strategies.
+    HuggingFace URLs require full model download and conversion, whilst Ollama
+    GGUF URLs allow direct GGUF file downloads with pattern matching for
+    efficient processing of pre-quantised models.
+    """
+
+    HUGGINGFACE = "huggingface"
+    OLLAMA_GGUF = "ollama_gguf"
+
+
+class QuantisationConfig(BaseModel):
+    """Configuration for a specific quantisation method with tensor-level precision control.
+
+    Defines quantisation parameters including tensor type mappings and fallback
+    methods for handling different model architectures. Enables fine-grained
+    control over which layers receive higher precision treatment whilst
+    maintaining compatibility across diverse model structures.
+    """
+
+    model_config = ConfigDict(use_enum_values=True)
+
+    name: str
+    description: str
+    tensor_types: dict[str, str] = Field(default_factory=dict)
+    fallback_methods: list[dict[str, str]] = Field(default_factory=list)
+
+
+class ModelSource(BaseModel):
+    """Represents a model source with parsed information from URL analysis.
+
+    Contains comprehensive metadata extracted from model URLs including source
+    repository details, author information, and GGUF file patterns. Enables
+    differentiation between regular HuggingFace repositories requiring conversion
+    and GGUF repositories allowing direct file downloads.
+    """
+
+    model_config = ConfigDict(use_enum_values=True, protected_namespaces=())
+
+    url: str
+    url_type: URLType
+    source_model: str
+    original_author: str
+    model_name: str
+    gguf_file_pattern: str | None = None
+    is_gguf_repo: bool = False
+
+    @field_validator("url")
+    @classmethod
+    def validate_url(cls, v: str) -> str:
+        """Validate that URL is not empty.
+
+        Ensures the provided URL string is not empty or None,
+        as this is required for model source identification.
+
+        Returns:
+            The validated URL string.
+
+        Raises:
+            ValueError: If URL is empty or None.
+        """
+        if not v:
+            msg = "URL cannot be empty"
+            raise ValueError(msg)
+        return v
+
+
+class QuantisationResult(BaseModel):
+    """Result of a quantisation operation with comprehensive status tracking.
+
+    Captures the outcome of individual quantisation attempts including success
+    status, file paths, sizes, and error details. Supports workflow status
+    tracking from planning through processing to completion, enabling real-time
+    progress reporting and parallel upload coordination.
+    """
+
+    model_config = ConfigDict(use_enum_values=True, arbitrary_types_allowed=True)
+
+    quantisation_type: QuantisationType
+    success: bool
+    file_path: Path | None = None
+    file_size: str | None = None
+    method_used: str | None = None
+    error_message: str | None = None
+    status: str = "pending"  # planned, processing, uploading, completed, failed
+
+
+class LlamaCppEnvironment(BaseModel):
+    """Represents llama.cpp environment setup with binary and script locations.
+
+    Encapsulates the runtime environment for llama.cpp tools including paths
+    to quantisation binaries, CLI tools, and conversion scripts. Handles both
+    local binary installations and repository-based setups to provide flexible
+    deployment options across different system configurations.
+    """
+
+    model_config = ConfigDict(arbitrary_types_allowed=True)
+
+    quantise_binary: Path  # UK spelling
+    cli_binary: Path
+    convert_script: str
+    use_repo: bool = False
+
+
+class QuantisationContext(BaseModel):
+    """Context object containing all parameters needed for quantisation execution.
+
+    Encapsulates quantisation parameters to reduce method argument counts
+    and improve code maintainability following parameter object pattern.
+    """
+
+    model_config = ConfigDict(frozen=True)
+
+    f16_model_path: Path
+    model_source: ModelSource
+    config: QuantisationConfig
+    llama_env: LlamaCppEnvironment
+    models_dir: Path
+    imatrix_path: Path | None = None
+    base_quant: str = "Q4_K_M"
+
+    def get_output_path(self) -> Path:
+        """Generate output path for quantised model.
+
+        Returns:
+            Path to the output GGUF file.
+        """
+        output_filename = (
+            f"{self.model_source.original_author}-"
+            f"{self.model_source.model_name}-"
+            f"{self.config.name}.gguf"
+        )
+        return self.models_dir / self.model_source.model_name / output_filename
--- a/helpers/services/init.py
+++ b/helpers/services/init.py
@ -0,0 +1,20 @@
+"""Service layer for llm-gguf-tools.
+
+Provides high-level service interfaces for interacting with external systems
+including HuggingFace, llama.cpp, and filesystem operations. Uses UK English
+spelling conventions throughout.
+"""
+
+from __future__ import annotations
+
+from helpers.services.filesystem import FilesystemService
+from helpers.services.huggingface import HuggingFaceService, ReadmeGenerator
+from helpers.services.llama_cpp import EnvironmentManager, IMatrixGenerator
+
+__all__ = [
+    "EnvironmentManager",
+    "FilesystemService",
+    "HuggingFaceService",
+    "IMatrixGenerator",
+    "ReadmeGenerator",
+]
--- a/helpers/services/filesystem.py
+++ b/helpers/services/filesystem.py
@ -0,0 +1,174 @@
+"""Filesystem operations service.
+
+Provides unified filesystem operations including file discovery, size
+calculation, and path management. Consolidates common filesystem patterns
+used across quantisation and conversion workflows.
+"""
+
+from __future__ import annotations
+
+import json
+import subprocess
+from pathlib import Path
+from typing import Any
+
+from helpers.logger import logger
+
+BYTES_PER_UNIT = 1024.0
+
+
+class FilesystemService:
+    """Handles filesystem operations with consistent error handling.
+
+    Provides methods for file discovery, size formatting, and JSON loading
+    with proper error handling and logging. Ensures consistent behaviour
+    across different tools and workflows.
+    """
+
+    @staticmethod
+    def get_file_size(file_path: Path) -> str:
+        """Get human-readable file size using system utilities.
+
+        Attempts to use `du -h` for human-readable output, falling back to
+        Python calculation if the system command fails. Provides consistent
+        size formatting across the toolset.
+
+        Returns:
+            Human-readable file size string (e.g., "1.5G", "750M").
+        """
+        try:
+            result = subprocess.run(
+                ["du", "-h", str(file_path)], capture_output=True, text=True, check=True
+            )
+            return result.stdout.split()[0]
+        except (subprocess.CalledProcessError, FileNotFoundError):
+            # Fallback to Python calculation
+
+            try:
+                size_bytes: float = float(file_path.stat().st_size)
+                for unit in ["B", "K", "M", "G", "T"]:
+                    if size_bytes < BYTES_PER_UNIT:
+                        return f"{size_bytes:.1f}{unit}"
+                    size_bytes /= BYTES_PER_UNIT
+            except Exception:
+                return "Unknown"
+            else:
+                return f"{size_bytes:.1f}P"
+
+    @staticmethod
+    def load_json_config(config_path: Path) -> dict[str, Any]:
+        """Load and parse JSON configuration file.
+
+        Provides consistent JSON loading with proper error handling and
+        encoding specification. Used for loading model configurations,
+        tokeniser settings, and other JSON-based metadata.
+
+        Returns:
+            Parsed JSON content as dictionary.
+
+        Raises:
+            FileNotFoundError: If config file doesn't exist.
+        """
+        if not config_path.exists():
+            msg = f"Configuration file not found: {config_path}"
+            raise FileNotFoundError(msg)
+
+        with Path(config_path).open(encoding="utf-8") as f:
+            return json.load(f)
+
+    @staticmethod
+    def find_safetensor_files(model_path: Path) -> list[Path]:
+        """Find all SafeTensor files in model directory using priority search.
+
+        Searches for tensor files in order of preference: single model.safetensors,
+        sharded model-*-of-*.safetensors files, then any *.safetensors files. This
+        approach handles both single-file and multi-shard model distributions whilst
+        ensuring predictable file ordering for conversion consistency.
+
+        Returns:
+            List of SafeTensor file paths in priority order.
+
+        Raises:
+            FileNotFoundError: If no SafeTensor files are found.
+        """
+        # Check for single file
+        single_file = model_path / "model.safetensors"
+        if single_file.exists():
+            return [single_file]
+
+        # Check for sharded files
+        pattern = "model-*-of-*.safetensors"
+        sharded_files = sorted(model_path.glob(pattern))
+        if sharded_files:
+            return sharded_files
+
+        # Check for any safetensor files
+        any_files = sorted(model_path.glob("*.safetensors"))
+        if any_files:
+            return any_files
+
+        msg = f"No SafeTensor files found in {model_path}"
+        raise FileNotFoundError(msg)
+
+    @staticmethod
+    def find_gguf_files(model_path: Path, pattern: str | None = None) -> list[Path]:
+        """Find GGUF files in directory, optionally filtered by pattern.
+
+        Searches for GGUF files with optional pattern matching. Prioritises
+        multi-part files (00001-of-*) over single files for proper handling
+        of large models split across multiple files.
+
+        Returns:
+            List of GGUF file paths, sorted with multi-part files first.
+        """
+        if pattern:
+            gguf_files = list(model_path.glob(f"*{pattern}*.gguf"))
+        else:
+            gguf_files = list(model_path.glob("*.gguf"))
+
+        # Sort to prioritise 00001-of-* files
+        gguf_files.sort(
+            key=lambda x: (
+                "00001-of-" not in x.name,  # False sorts before True
+                x.name,
+            )
+        )
+
+        return gguf_files
+
+    @staticmethod
+    def ensure_directory(path: Path) -> Path:
+        """Ensure directory exists, creating if necessary.
+
+        Creates directory and all parent directories if they don't exist.
+        Returns the path for method chaining convenience.
+
+        Returns:
+            The directory path.
+        """
+        path.mkdir(parents=True, exist_ok=True)
+        return path
+
+    @staticmethod
+    def cleanup_directory(path: Path, pattern: str = "*") -> int:
+        """Remove files matching pattern from directory.
+
+        Safely removes files matching the specified glob pattern. Returns
+        count of files removed for logging purposes.
+
+        Returns:
+            Number of files removed.
+        """
+        if not path.exists():
+            return 0
+
+        files_removed = 0
+        for file_path in path.glob(pattern):
+            if file_path.is_file():
+                try:
+                    file_path.unlink()
+                    files_removed += 1
+                except Exception as e:
+                    logger.warning(f"Failed to remove {file_path}: {e}")
+
+        return files_removed
--- a/helpers/services/gguf.py
+++ b/helpers/services/gguf.py
@ -0,0 +1,210 @@
+"""GGUF file operations service.
+
+Provides unified interface for creating, writing, and manipulating GGUF files.
+Consolidates GGUF-specific operations from conversion and quantisation workflows.
+Uses UK English spelling conventions throughout.
+"""
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any
+
+import gguf
+import torch
+from safetensors import safe_open
+
+from helpers.logger import logger
+from helpers.services.filesystem import FilesystemService
+from helpers.utils.config_parser import ConfigParser
+
+if TYPE_CHECKING:
+    from pathlib import Path
+
+    import numpy as np
+
+    from helpers.models.conversion import ModelConfig
+
+
+class GGUFWriter:
+    """Manages GGUF file creation and metadata writing.
+
+    Provides high-level interface for GGUF file operations including metadata
+    configuration, tensor addition, and tokeniser integration. Encapsulates
+    low-level GGUF library interactions for consistent error handling.
+    """
+
+    def __init__(self, output_path: Path, architecture: str) -> None:
+        """Initialise GGUF writer with output path and architecture.
+
+        Creates the underlying GGUF writer instance and prepares for metadata
+        and tensor addition. Sets up the file structure for the specified
+        model architecture.
+        """
+        self.output_path = output_path
+        self.architecture = architecture
+        self.writer = gguf.GGUFWriter(str(output_path), architecture)
+        logger.info(f"Created GGUF writer for {architecture} architecture")
+
+    def add_metadata(self, model_config: ModelConfig, model_name: str) -> None:
+        """Add comprehensive metadata from model configuration.
+
+        Writes general model information, architectural parameters, and
+        quantisation settings to the GGUF file header. Handles both standard
+        and vision model configurations with appropriate parameter mapping.
+        """
+        # General metadata
+        self.writer.add_name(model_name)
+        self.writer.add_description(f"Converted from {model_config.architectures[0]}")
+        self.writer.add_file_type(gguf.LlamaFileType.ALL_F32)
+
+        # Model parameters from config
+        params = model_config.to_gguf_params()
+        self.writer.add_context_length(params.context_length)
+        self.writer.add_embedding_length(params.embedding_length)
+        self.writer.add_block_count(params.block_count)
+        self.writer.add_feed_forward_length(params.feed_forward_length)
+        self.writer.add_head_count(params.attention_head_count)
+        self.writer.add_head_count_kv(params.attention_head_count_kv)
+        self.writer.add_layer_norm_rms_eps(params.attention_layer_norm_rms_epsilon)
+        self.writer.add_rope_freq_base(params.rope_freq_base)
+        self.writer.add_rope_dimension_count(params.rope_dimension_count)
+
+        logger.info(f"Added metadata: {params.block_count} layers, {params.context_length} context")
+
+    def add_vision_metadata(self, vision_config: Any) -> None:
+        """Add vision model parameters to GGUF metadata.
+
+        Configures vision-specific parameters for multimodal models including
+        embedding dimensions, attention heads, and spatial processing settings.
+        """
+        if not vision_config:
+            return
+
+        logger.info("Adding vision model parameters...")
+        self.writer.add_vision_embedding_length(vision_config.hidden_size)
+        self.writer.add_vision_block_count(vision_config.num_hidden_layers)
+        self.writer.add_vision_head_count(vision_config.num_attention_heads)
+        self.writer.add_vision_feed_forward_length(vision_config.intermediate_size)
+        self.writer.add_vision_patch_size(vision_config.patch_size)
+        self.writer.add_vision_spatial_merge_size(vision_config.spatial_merge_size)
+
+        if hasattr(vision_config, "rms_norm_eps") and vision_config.rms_norm_eps:
+            self.writer.add_vision_attention_layernorm_eps(vision_config.rms_norm_eps)
+
+    def add_tokeniser(self, tokeniser_config: dict[str, Any]) -> None:
+        """Add tokeniser metadata to GGUF file.
+
+        Writes special token IDs and tokeniser model type to enable proper
+        text processing during inference. Uses sensible defaults for missing
+        configuration values.
+        """
+        self.writer.add_bos_token_id(tokeniser_config.get("bos_token_id", 1))
+        self.writer.add_eos_token_id(tokeniser_config.get("eos_token_id", 2))
+        self.writer.add_unk_token_id(tokeniser_config.get("unk_token_id", 0))
+        self.writer.add_pad_token_id(tokeniser_config.get("pad_token_id", 0))
+        self.writer.add_tokenizer_model(tokeniser_config.get("model_type", "llama"))
+
+        logger.info("Added tokeniser configuration")
+
+    def add_tensor(self, name: str, data: np.ndarray) -> None:
+        """Add a tensor to the GGUF file.
+
+        Writes tensor data with the specified name to the file. Handles
+        data type conversions and validates tensor shapes.
+        """
+        self.writer.add_tensor(name, data)
+
+    def finalise(self) -> None:
+        """Write all data to file and close writer.
+
+        Completes the GGUF file creation by writing headers, key-value data,
+        and tensor data in the correct order. Ensures proper file closure.
+        """
+        logger.info(f"Writing GGUF file to {self.output_path}")
+        self.writer.write_header_to_file()
+        self.writer.write_kv_data_to_file()
+        self.writer.write_tensors_to_file()
+        self.writer.close()
+        logger.info("GGUF file written successfully")
+
+
+class GGUFConverter:
+    """High-level GGUF conversion orchestrator.
+
+    Coordinates the complete conversion workflow from source models to GGUF
+    format, managing metadata extraction, tensor mapping, and file writing.
+    """
+
+    @staticmethod
+    def convert_safetensors(
+        model_path: Path,
+        output_path: Path,
+        model_config: ModelConfig,
+        architecture: str,
+        tensor_mapper: Any,
+    ) -> bool:
+        """Convert SafeTensors model to GGUF format.
+
+        Orchestrates the conversion process including metadata setup, tensor
+        loading with BFloat16 support, name mapping, and tokeniser integration.
+
+        Returns:
+            True if conversion successful, False otherwise.
+        """
+        logger.info(f"Converting {model_path.name} to GGUF...")
+
+        # Create writer
+        writer_wrapper = GGUFWriter(output_path, architecture)
+
+        # Add metadata
+        writer_wrapper.add_metadata(model_config, model_path.name)
+
+        # Add vision metadata if present
+        if model_config.vision_config:
+            writer_wrapper.add_vision_metadata(model_config.vision_config)
+
+        # Load and add tensors
+        fs = FilesystemService()
+        tensor_files = fs.find_safetensor_files(model_path)
+        logger.info(f"Found {len(tensor_files)} tensor file(s)")
+
+        tensor_count = 0
+        for tensor_file in tensor_files:
+            logger.info(f"Loading {tensor_file.name}...")
+            with safe_open(tensor_file, framework="pt") as f:
+                for tensor_name in f:
+                    tensor_data = f.get_tensor(tensor_name)
+
+                    # Convert BFloat16 to Float32
+                    if hasattr(tensor_data, "numpy"):
+                        if torch and tensor_data.dtype == torch.bfloat16:
+                            tensor_data = tensor_data.float()
+                        tensor_data = tensor_data.numpy()
+
+                    # Map tensor name
+                    gguf_name = tensor_mapper.map_tensor_name(tensor_name)
+
+                    if gguf_name:
+                        writer_wrapper.add_tensor(gguf_name, tensor_data)
+                        tensor_count += 1
+
+                        if tensor_count % 100 == 0:
+                            logger.info(f"  Processed {tensor_count} tensors...")
+
+        logger.info(f"Total tensors processed: {tensor_count}")
+
+        # Add tokeniser
+        try:
+            tok_config = ConfigParser.load_tokeniser_config(model_path)
+            writer_wrapper.add_tokeniser(tok_config)
+            logger.info("Tokeniser added")
+        except Exception as e:
+            logger.warning(f"Could not add tokeniser: {e}")
+
+        # Finalise file
+        writer_wrapper.finalise()
+
+        file_size = fs.get_file_size(output_path)
+        logger.info(f"Conversion complete! Output: {output_path} ({file_size})")
+
+        return True
--- a/helpers/services/huggingface.py
+++ b/helpers/services/huggingface.py
@ -0,0 +1,454 @@
+"""HuggingFace operations service.
+
+Handles all interactions with HuggingFace including model downloads,
+uploads, README generation, and repository management. Uses UK English
+spelling conventions throughout.
+"""
+
+from __future__ import annotations
+
+import re
+import subprocess
+import tempfile
+from pathlib import Path
+from typing import TYPE_CHECKING
+
+from helpers.logger import logger
+from helpers.models.quantisation import QuantisationType
+
+if TYPE_CHECKING:
+    from helpers.models.quantisation import ModelSource, QuantisationResult
+
+
+class HuggingFaceService:
+    """Manages HuggingFace repository operations.
+
+    Provides methods for downloading models, uploading files, and managing
+    repositories. Handles authentication, error recovery, and progress tracking
+    for robust interaction with HuggingFace services.
+    """
+
+    @staticmethod
+    def get_username() -> str:
+        """Get authenticated HuggingFace username.
+
+        Retrieves the current user's HuggingFace username using the CLI.
+        Requires prior authentication via `huggingface-cli login`.
+
+        Returns:
+            HuggingFace username.
+
+        Raises:
+            RuntimeError: If not authenticated or CLI not available.
+        """
+        try:
+            result = subprocess.run(
+                ["huggingface-cli", "whoami"],
+                capture_output=True,
+                text=True,
+                check=True,
+            )
+            return result.stdout.strip()
+        except (subprocess.CalledProcessError, FileNotFoundError) as err:
+            msg = "Please log in to HuggingFace first: huggingface-cli login"
+            raise RuntimeError(msg) from err
+
+    @staticmethod
+    def download_model(
+        model_name: str, output_dir: Path, include_pattern: str | None = None
+    ) -> None:
+        """Download model from HuggingFace.
+
+        Downloads a complete model or specific files matching a pattern.
+        Creates the output directory if it doesn't exist. Supports filtered
+        downloads for efficient bandwidth usage when only certain files are needed.
+        """
+        logger.info(f"Downloading {model_name} to {output_dir}")
+
+        cmd = [
+            "huggingface-cli",
+            "download",
+            model_name,
+            "--local-dir",
+            str(output_dir),
+        ]
+
+        if include_pattern:
+            cmd.extend(["--include", include_pattern])
+
+        subprocess.run(cmd, check=True)
+        logger.info("Download complete")
+
+    @staticmethod
+    def upload_file(
+        repo_id: str,
+        local_path: Path,
+        repo_path: str | None = None,
+        create_repo: bool = False,
+    ) -> None:
+        """Upload a file to HuggingFace repository.
+
+        Uploads a single file to the specified repository path. Can create
+        the repository if it doesn't exist. Handles repository creation conflicts
+        gracefully by retrying without the create flag when needed.
+
+        Raises:
+            CalledProcessError: If upload fails.
+        """
+        repo_path = repo_path or local_path.name
+        logger.info(f"Uploading {local_path.name} to {repo_id}/{repo_path}")
+
+        cmd = [
+            "huggingface-cli",
+            "upload",
+            repo_id,
+            str(local_path),
+            repo_path,
+        ]
+
+        if create_repo:
+            cmd.append("--create")
+
+        try:
+            subprocess.run(cmd, check=True, capture_output=True)
+            logger.info(f"Uploaded {repo_path}")
+        except subprocess.CalledProcessError:
+            if create_repo:
+                # Repository might already exist, retry without --create
+                cmd = cmd[:-1]  # Remove --create flag
+                subprocess.run(cmd, check=True)
+                logger.info(f"Updated {repo_path}")
+            else:
+                raise
+
+
+class ReadmeGenerator:
+    """Generates README files for quantised models.
+
+    Creates comprehensive README documentation including model cards,
+    quantisation details, and status tracking. Supports both initial
+    planning documentation and final result summaries.
+    """
+
+    def generate(
+        self,
+        model_source: ModelSource,
+        results: dict[QuantisationType, QuantisationResult],
+        models_dir: Path,
+        output_repo: str | None = None,
+    ) -> Path:
+        """Generate README file for quantised model repository.
+
+        Creates a comprehensive README with frontmatter, quantisation table,
+        and original model information. Handles status tracking for planned,
+        processing, and completed quantisations.
+
+        Returns:
+            Path to generated README file.
+        """
+        logger.info("Creating model card...")
+
+        model_dir = models_dir / model_source.model_name
+        readme_path = model_dir / "README.md"
+
+        # Get original README content
+        original_content = self._get_original_readme(model_source, model_dir)
+
+        # Generate new README
+        readme_content = self._generate_readme_content(
+            model_source, results, original_content, output_repo
+        )
+
+        readme_path.write_text(readme_content)
+        return readme_path
+
+    def _get_original_readme(self, model_source: ModelSource, model_dir: Path) -> dict[str, str]:
+        """Extract original README and metadata.
+
+        Downloads or reads the original model's README for inclusion in the
+        quantised model documentation. Parses YAML frontmatter if present.
+
+        Returns:
+            Dictionary with readme content, licence, tags, and frontmatter.
+        """
+        content = {"readme": "", "licence": "apache-2.0", "tags": "", "frontmatter": ""}
+
+        # Try local file first
+        readme_path = model_dir / "README.md"
+        if readme_path.exists():
+            content["readme"] = readme_path.read_text(encoding="utf-8")
+            logger.info(f"Found original README ({len(content['readme'])} characters)")
+        else:
+            # Download separately
+            content = self._download_readme(model_source)
+
+        # Parse frontmatter if present
+        if content["readme"].startswith("---\n"):
+            content = self._parse_frontmatter(content["readme"])
+
+        return content
+
+    def _download_readme(self, model_source: ModelSource) -> dict[str, str]:
+        """Download README from HuggingFace repository.
+
+        Attempts to download just the README.md file from the source repository
+        for efficient documentation extraction.
+
+        Returns:
+            Dictionary with readme content and default metadata.
+        """
+        content = {"readme": "", "licence": "apache-2.0", "tags": "", "frontmatter": ""}
+
+        with tempfile.TemporaryDirectory() as temp_dir:
+            try:
+                logger.info(f"Downloading README from {model_source.source_model}...")
+                subprocess.run(
+                    [
+                        "huggingface-cli",
+                        "download",
+                        model_source.source_model,
+                        "--include",
+                        "README.md",
+                        "--local-dir",
+                        temp_dir,
+                    ],
+                    check=True,
+                    capture_output=True,
+                )
+
+                readme_path = Path(temp_dir) / "README.md"
+                if readme_path.exists():
+                    content["readme"] = readme_path.read_text(encoding="utf-8")
+                    logger.info(f"Downloaded README ({len(content['readme'])} characters)")
+            except subprocess.CalledProcessError as e:
+                logger.warning(f"Failed to download README: {e}")
+
+        return content
+
+    def _parse_frontmatter(self, readme_text: str) -> dict[str, str]:
+        """Parse YAML frontmatter from README.
+
+        Extracts metadata from YAML frontmatter including licence, tags,
+        and other model card fields.
+
+        Returns:
+            Dictionary with separated content and metadata.
+        """
+        lines = readme_text.split("\n")
+        if lines[0] != "---":
+            return {
+                "readme": readme_text,
+                "licence": "apache-2.0",
+                "tags": "",
+                "frontmatter": "",
+            }
+
+        frontmatter_end = -1
+        for i, line in enumerate(lines[1:], 1):
+            if line == "---":
+                frontmatter_end = i
+                break
+
+        if frontmatter_end == -1:
+            return {
+                "readme": readme_text,
+                "licence": "apache-2.0",
+                "tags": "",
+                "frontmatter": "",
+            }
+
+        frontmatter = "\n".join(lines[1:frontmatter_end])
+        content = "\n".join(lines[frontmatter_end + 1 :])
+
+        # Extract licence
+        licence_match = re.search(r"^license:\s*(.+)$", frontmatter, re.MULTILINE)
+        licence_val = licence_match.group(1).strip().strip('"') if licence_match else "apache-2.0"
+
+        # Extract tags
+        tags = []
+        in_tags = False
+        for line in frontmatter.split("\n"):
+            if line.startswith("tags:"):
+                in_tags = True
+                continue
+            if in_tags:
+                if line.startswith("- "):
+                    tags.append(line[2:].strip())
+                elif line and not line.startswith(" "):
+                    break
+
+        return {
+            "readme": content,
+            "licence": licence_val,
+            "tags": ",".join(tags),
+            "frontmatter": frontmatter,
+        }
+
+    def _generate_readme_content(
+        self,
+        model_source: ModelSource,
+        results: dict[QuantisationType, QuantisationResult],
+        original_content: dict[str, str],
+        output_repo: str | None = None,
+    ) -> str:
+        """Generate complete README content with quantisation details.
+
+        Creates the full README including YAML frontmatter, quantisation status
+        table, and original model information.
+
+        Returns:
+            Complete README markdown content.
+        """
+        # Build tags
+        our_tags = [
+            "quantised",
+            "gguf",
+            "q4_k_m",
+            "q4_k_l",
+            "q4_k_xl",
+            "q4_k_xxl",
+            "bartowski-method",
+        ]
+        original_tags = original_content["tags"].split(",") if original_content["tags"] else []
+        all_tags = sorted(set(our_tags + original_tags))
+
+        # Build frontmatter
+        frontmatter = f"""---
+license: {original_content["licence"]}
+library_name: gguf
+base_model: {model_source.source_model}
+tags:
+"""
+        for tag in all_tags:
+            if tag.strip():
+                frontmatter += f"- {tag.strip()}\n"
+
+        frontmatter += "---\n\n"
+
+        # Build main content
+        hf_url = f"https://huggingface.co/{model_source.source_model}"
+        content = f"""# {model_source.original_author}-{model_source.model_name}-GGUF
+
+GGUF quantisations of [{model_source.source_model}]({hf_url}) using Bartowski's method.
+
+| Quantisation | Embeddings/Output | Attention | Feed-Forward | Status |
+|--------------|-------------------|-----------|--------------|--------|
+"""
+
+        # Add results table
+        for quant_type in [
+            QuantisationType.Q4_K_M,
+            QuantisationType.Q4_K_L,
+            QuantisationType.Q4_K_XL,
+            QuantisationType.Q4_K_XXL,
+        ]:
+            result = results.get(quant_type)
+            if not result:
+                result = type("Result", (), {"status": "planned", "success": False})()
+
+            layers = self._get_layers_config(quant_type)
+            status = self._format_status(result, model_source, quant_type, output_repo)
+
+            content += (
+                f"| {quant_type.value} | {layers['embeddings']} | "
+                f"{layers['attention']} | {layers['ffn']} | {status} |\n"
+            )
+
+        content += "\n---\n\n"
+
+        # Add original content
+        if original_content["readme"]:
+            content += "# Original Model Information\n\n" + original_content["readme"]
+        else:
+            content += f"## Original Model\n\nQuantisation of [{model_source.source_model}](https://huggingface.co/{model_source.source_model}).\n"
+
+        return frontmatter + content
+
+    def _get_layers_config(self, quant_type: QuantisationType) -> dict[str, str]:
+        """Get layer configuration for quantisation type.
+
+        Returns layer precision specifications for the quantisation table.
+
+        Returns:
+            Dictionary with embeddings, attention, and ffn precision labels.
+        """
+        configs = {
+            QuantisationType.Q4_K_M: {
+                "embeddings": "Q4_K_M",
+                "attention": "Q4_K_M",
+                "ffn": "Q4_K_M",
+            },
+            QuantisationType.Q4_K_L: {"embeddings": "Q6_K", "attention": "Q6_K", "ffn": "Q4_K_M"},
+            QuantisationType.Q4_K_XL: {"embeddings": "Q8_0", "attention": "Q6_K", "ffn": "Q4_K_M"},
+            QuantisationType.Q4_K_XXL: {"embeddings": "Q8_0", "attention": "Q8_0", "ffn": "Q4_K_M"},
+        }
+        return configs.get(
+            quant_type, {"embeddings": "Unknown", "attention": "Unknown", "ffn": "Unknown"}
+        )
+
+    def _format_status(
+        self,
+        result: QuantisationResult,
+        model_source: ModelSource,
+        quant_type: QuantisationType,
+        output_repo: str | None,
+    ) -> str:
+        """Format status indicator for README table.
+
+        Creates appropriate status indicator based on quantisation state
+        including progress indicators, file sizes, and download links.
+
+        Returns:
+            Formatted status string for table cell.
+        """
+        status_map = {
+            "planned": "⏳ Planned",
+            "processing": "🔄 Processing...",
+            "uploading": "⬆️ Uploading...",
+            "failed": "❌ Failed",
+        }
+
+        if hasattr(result, "status") and result.status in status_map:
+            base_status = status_map[result.status]
+
+            if result.status == "uploading" and hasattr(result, "file_size") and result.file_size:
+                return f"{base_status} ({result.file_size})"
+            if result.status == "completed" or (hasattr(result, "success") and result.success):
+                return self._format_success_status(result, model_source, quant_type, output_repo)
+            return base_status
+
+        # Legacy support
+        if hasattr(result, "success") and result.success:
+            return self._format_success_status(result, model_source, quant_type, output_repo)
+        return "❌ Failed"
+
+    def _format_success_status(
+        self,
+        result: QuantisationResult,
+        model_source: ModelSource,
+        quant_type: QuantisationType,
+        output_repo: str | None,
+    ) -> str:
+        """Format successful quantisation status with download link.
+
+        Creates a download link if repository information is available,
+        otherwise shows file size.
+
+        Returns:
+            Formatted success status string.
+        """
+        if not output_repo:
+            return (
+                f"✅ {result.file_size}"
+                if hasattr(result, "file_size") and result.file_size
+                else "✅ Available"
+            )
+
+        filename = (
+            f"{model_source.original_author}-{model_source.model_name}-{quant_type.value}.gguf"
+        )
+        url = f"https://huggingface.co/{output_repo}?show_file_info={filename}"
+
+        if hasattr(result, "file_size") and result.file_size:
+            return f"[✅ {result.file_size}]({url})"
+        return f"[✅ Available]({url})"
--- a/helpers/services/llama_cpp.py
+++ b/helpers/services/llama_cpp.py
@ -0,0 +1,417 @@
+"""llama.cpp environment and operations service.
+
+Manages llama.cpp binary discovery, environment setup, and imatrix generation.
+Provides consistent interface for interacting with llama.cpp tools across
+different installation methods.
+"""
+
+from __future__ import annotations
+
+import subprocess
+from pathlib import Path
+
+from helpers.logger import logger
+from helpers.models.quantisation import LlamaCppEnvironment
+from helpers.services.filesystem import FilesystemService
+
+
+class EnvironmentManager:
+    """Manages llama.cpp environment setup and binary discovery.
+
+    Handles detection of local binaries, repository setup, and conversion
+    script location. Provides fallback strategies for different installation
+    scenarios including local builds and repository-based setups.
+    """
+
+    def __init__(self, work_dir: Path) -> None:
+        """Initialise EnvironmentManager."""
+        self.work_dir = work_dir
+        self.llama_cpp_dir = work_dir / "llama.cpp"
+        self.fs = FilesystemService()
+
+    def setup(self) -> LlamaCppEnvironment:
+        """Set up llama.cpp environment with automatic detection.
+
+        Checks for local llama.cpp binaries first, then falls back to
+        repository-based setup if needed. Handles conversion script location,
+        dependency installation, and path resolution.
+
+        Returns:
+            Configured LlamaCppEnvironment instance.
+        """
+        # Check for local binaries first
+        local_env = self._check_local_binaries()
+        if local_env:
+            return local_env
+
+        # Setup repository if needed
+        return self.setup_repository()
+
+    def _check_local_binaries(self) -> LlamaCppEnvironment | None:
+        """Check for existing llama.cpp binaries in current directory.
+
+        Searches for quantise and CLI binaries in the current directory
+        and standard installation paths. Also locates conversion scripts.
+
+        Returns:
+            LlamaCppEnvironment if binaries found, None otherwise.
+        """
+        quantise_bin = Path("./llama-quantize")
+        cli_bin = Path("./llama-cli")
+
+        if not (quantise_bin.exists() and cli_bin.exists()):
+            return None
+
+        logger.info("Found llama.cpp binaries in current directory")
+
+        # Check for conversion script
+        convert_script = self._find_convert_script()
+        if convert_script:
+            logger.info(f"Found conversion script: {convert_script}")
+            return LlamaCppEnvironment(
+                quantise_binary=quantise_bin.resolve(),
+                cli_binary=cli_bin.resolve(),
+                convert_script=convert_script,
+                use_repo=False,
+            )
+
+        logger.warning("No conversion script found in current directory")
+        logger.info("Will use llama.cpp repository method for conversion")
+        return LlamaCppEnvironment(
+            quantise_binary=quantise_bin.resolve(),
+            cli_binary=cli_bin.resolve(),
+            convert_script=f"python3 {self.llama_cpp_dir}/convert_hf_to_gguf.py",
+            use_repo=True,
+        )
+
+    def _find_convert_script(self) -> str | None:
+        """Find conversion script in current directory.
+
+        Searches for various naming conventions of the HF to GGUF
+        conversion script.
+
+        Returns:
+            Command to run conversion script, or None if not found.
+        """
+        scripts = [
+            "./llama-convert-hf-to-gguf",
+            "python3 ./convert_hf_to_gguf.py",
+            "python3 ./convert-hf-to-gguf.py",
+        ]
+
+        for script in scripts:
+            if script.startswith("python3"):
+                script_path = script.split(" ", 1)[1]
+                if Path(script_path).exists():
+                    return script
+            elif Path(script).exists():
+                return script
+        return None
+
+    def setup_repository(self) -> LlamaCppEnvironment:
+        """Setup llama.cpp repository for conversion scripts.
+
+        Clones the llama.cpp repository if not present and installs
+        Python dependencies for model conversion.
+
+        Returns:
+            LlamaCppEnvironment configured with repository paths.
+        """
+        if not self.llama_cpp_dir.exists():
+            logger.info("Cloning llama.cpp for conversion script...")
+            subprocess.run(
+                [
+                    "git",
+                    "clone",
+                    "https://github.com/ggerganov/llama.cpp.git",
+                    str(self.llama_cpp_dir),
+                ],
+                check=True,
+            )
+
+            # Install Python requirements
+            logger.info("Installing Python requirements...")
+            subprocess.run(
+                [
+                    "pip3",
+                    "install",
+                    "-r",
+                    "requirements.txt",
+                    "--break-system-packages",
+                    "--root-user-action=ignore",
+                ],
+                cwd=self.llama_cpp_dir,
+                check=True,
+            )
+
+            # Install additional conversion dependencies
+            logger.info("Installing additional conversion dependencies...")
+            subprocess.run(
+                [
+                    "pip3",
+                    "install",
+                    "transformers",
+                    "sentencepiece",
+                    "protobuf",
+                    "--break-system-packages",
+                    "--root-user-action=ignore",
+                ],
+                check=True,
+            )
+        else:
+            logger.info("llama.cpp repository already exists")
+
+        # Use local binaries but repo conversion script
+        return LlamaCppEnvironment(
+            quantise_binary=Path("./llama-quantize").resolve(),
+            cli_binary=Path("./llama-cli").resolve(),
+            convert_script=f"python3 {self.llama_cpp_dir}/convert_hf_to_gguf.py",
+            use_repo=False,
+        )
+
+
+class IMatrixGenerator:
+    """Handles importance matrix generation for quantisation guidance.
+
+    Generates or locates importance matrices that guide quantisation
+    decisions, helping preserve model quality by identifying critical
+    tensors requiring higher precision.
+    """
+
+    def __init__(self) -> None:
+        """Initialise IMatrixGenerator."""
+        self.fs = FilesystemService()
+
+    def generate_imatrix(
+        self, f16_model_path: Path, llama_env: LlamaCppEnvironment, model_dir: Path
+    ) -> Path | None:
+        """Generate importance matrix for quantisation guidance.
+
+        Searches for existing imatrix files first, provides interactive
+        prompts for user-supplied matrices, then generates new matrices
+        using calibration data if necessary.
+
+        Returns:
+            Path to imatrix file, or None if generation fails.
+        """
+        imatrix_path = model_dir / "imatrix.dat"
+
+        # Check for existing imatrix
+        if imatrix_path.exists():
+            logger.info(f"Found existing imatrix: {imatrix_path.name}")
+            return imatrix_path
+
+        # Try user-provided imatrix
+        user_imatrix = self._prompt_for_user_imatrix(model_dir, imatrix_path)
+        if user_imatrix:
+            return user_imatrix
+
+        # Generate new imatrix
+        calibration_file = self._get_calibration_file()
+        if not calibration_file:
+            return None
+
+        return self._generate_new_imatrix(f16_model_path, llama_env, imatrix_path, calibration_file)
+
+    def _prompt_for_user_imatrix(self, model_dir: Path, imatrix_path: Path) -> Path | None:
+        """Prompt user for existing imatrix file.
+
+        Returns:
+            Path to user-provided imatrix, or None if not available.
+        """
+        logger.info(f"Model directory: {model_dir}")
+        logger.info(f"Looking for imatrix file at: {imatrix_path}")
+        logger.info(
+            "Tip: You can download pre-computed imatrix files from Bartowski's repositories!"
+        )
+        logger.info(
+            "   Example: https://huggingface.co/bartowski/MODEL-NAME-GGUF/resolve/main/MODEL-NAME.imatrix"
+        )
+
+        response = (
+            input("\n❓ Do you have an imatrix file to place in the model directory? (y/N): ")
+            .strip()
+            .lower()
+        )
+
+        if response != "y":
+            return None
+
+        logger.info(f"Please place your imatrix.dat file in: {model_dir}")
+        input("⏳ Press Enter when you've placed the imatrix.dat file (or Ctrl+C to cancel)...")
+
+        if imatrix_path.exists():
+            file_size = self.fs.get_file_size(imatrix_path)
+            logger.info(f"Found imatrix file! ({file_size})")
+            return imatrix_path
+
+        logger.warning("No imatrix.dat file found - continuing with automatic generation")
+        return None
+
+    def _get_calibration_file(self) -> Path | None:
+        """Get calibration data file for imatrix generation.
+
+        Returns:
+            Path to calibration file, or None if not found.
+        """
+        calibration_file = Path(__file__).parent.parent.parent / "resources" / "imatrix_data.txt"
+        if not calibration_file.exists():
+            logger.warning("resources/imatrix_data.txt not found - skipping imatrix generation")
+            logger.info(
+                "Download from: https://gist.githubusercontent.com/bartowski1182/"
+                "eb213dccb3571f863da82e99418f81e8/raw/calibration_datav3.txt"
+            )
+            return None
+        return calibration_file
+
+    def _generate_new_imatrix(
+        self,
+        f16_model_path: Path,
+        llama_env: LlamaCppEnvironment,
+        imatrix_path: Path,
+        calibration_file: Path,
+    ) -> Path | None:
+        """Generate new importance matrix using calibration data.
+
+        Returns:
+            Path to generated imatrix, or None if generation fails.
+        """
+        logger.info("Generating importance matrix (this may take 1-4 hours for large models)...")
+        logger.info(f"Model: {f16_model_path.name}")
+        logger.info(f"Calibration: {calibration_file}")
+        logger.info(f"Output: {imatrix_path}")
+
+        # Find imatrix binary
+        imatrix_binary = self._find_imatrix_binary(llama_env)
+        if not imatrix_binary:
+            logger.warning("llama-imatrix binary not found - skipping imatrix generation")
+            logger.info("Make sure llama-imatrix is in the same directory as llama-quantize")
+            return None
+
+        # Build and execute command
+        cmd = self._build_imatrix_command(
+            imatrix_binary, f16_model_path, calibration_file, imatrix_path
+        )
+        return self._execute_imatrix_generation(cmd, imatrix_path)
+
+    def _build_imatrix_command(
+        self, binary: Path, model_path: Path, calibration_file: Path, output_path: Path
+    ) -> list[str]:
+        """Build imatrix generation command.
+
+        Returns:
+            Command arguments as list.
+        """
+        return [
+            str(binary),
+            "-m",
+            str(model_path),
+            "-f",
+            str(calibration_file),
+            "-o",
+            str(output_path),
+            "--process-output",
+            "--output-frequency",
+            "10",
+            "--save-frequency",
+            "50",
+            "-t",
+            "8",
+            "-c",
+            "2048",
+            "-b",
+            "512",
+        ]
+
+    def _execute_imatrix_generation(self, cmd: list[str], imatrix_path: Path) -> Path | None:
+        """Execute imatrix generation command with real-time output.
+
+        Returns:
+            Path to generated imatrix file, or None if generation fails.
+        """
+        logger.info(f"Running: {' '.join(cmd)}")
+        logger.info("Starting imatrix generation... (progress will be shown)")
+
+        try:
+            process = subprocess.Popen(
+                cmd,
+                stdout=subprocess.PIPE,
+                stderr=subprocess.STDOUT,
+                universal_newlines=True,
+                bufsize=1,
+            )
+
+            self._stream_imatrix_output(process)
+
+            return_code = process.poll()
+            if return_code == 0:
+                return self._validate_imatrix_output(imatrix_path)
+
+        except KeyboardInterrupt:
+            logger.info("imatrix generation cancelled by user")
+            process.terminate()
+            return None
+        except Exception as e:
+            logger.error(f"imatrix generation failed with exception: {e}")
+            return None
+        else:
+            logger.error(f"imatrix generation failed with return code {return_code}")
+            return None
+
+    def _stream_imatrix_output(self, process: subprocess.Popen) -> None:
+        """Stream imatrix generation output in real-time."""
+        while True:
+            if process.stdout is not None:
+                output = process.stdout.readline()
+            else:
+                break
+            if not output and process.poll() is not None:
+                break
+            if output:
+                line = output.strip()
+                if self._should_log_imatrix_line(line):
+                    logger.info(line)
+
+    def _should_log_imatrix_line(self, line: str) -> bool:
+        """Determine if imatrix output line should be logged.
+
+        Returns:
+            True if line should be logged, False otherwise.
+        """
+        keywords = ["Computing imatrix", "perplexity:", "save_imatrix", "entries =", "ETA"]
+        return any(keyword in line for keyword in keywords) or line.startswith("[")
+
+    def _validate_imatrix_output(self, imatrix_path: Path) -> Path | None:
+        """Validate generated imatrix file.
+
+        Returns:
+            Path to imatrix if valid, None otherwise.
+        """
+        if imatrix_path.exists():
+            file_size = self.fs.get_file_size(imatrix_path)
+            logger.info(f"imatrix generation successful! ({file_size})")
+            return imatrix_path
+        logger.error("imatrix generation completed but file not found")
+        return None
+
+    def _find_imatrix_binary(self, llama_env: LlamaCppEnvironment) -> Path | None:
+        """Find llama-imatrix binary in common locations.
+
+        Searches for the imatrix binary in the current directory and
+        standard installation paths.
+
+        Returns:
+            Path to imatrix binary, or None if not found.
+        """
+        candidates = [
+            Path("./llama-imatrix"),
+            llama_env.quantise_binary.parent / "llama-imatrix",
+            Path("/usr/local/bin/llama-imatrix"),
+            Path("/usr/bin/llama-imatrix"),
+        ]
+
+        for candidate in candidates:
+            if candidate.exists() and candidate.is_file():
+                return candidate
+
+        return None
--- a/helpers/services/orchestrator.py
+++ b/helpers/services/orchestrator.py
@ -0,0 +1,397 @@
+"""Quantisation orchestration service.
+
+High-level orchestration of the complete quantisation workflow from model
+acquisition through processing to upload. Manages parallel processing,
+status tracking, and cleanup operations for efficient resource utilisation.
+"""
+
+from __future__ import annotations
+
+from concurrent.futures import Future, ThreadPoolExecutor
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any
+
+from helpers.config.quantisation_configs import QUANTISATION_CONFIGS, SUPPORTED_QUANTISATION_TYPES
+from helpers.logger import logger
+from helpers.models.quantisation import (
+    ModelSource,
+    QuantisationContext,
+    QuantisationResult,
+    QuantisationType,
+)
+from helpers.services.huggingface import ReadmeGenerator
+from helpers.services.llama_cpp import EnvironmentManager, IMatrixGenerator
+from helpers.services.quantisation import HuggingFaceUploader, ModelManager, QuantisationEngine
+from helpers.utils.tensor_mapping import URLParser
+
+
+@dataclass(slots=True)
+class QuantisationOrchestrator:
+    """Orchestrates the complete quantisation workflow.
+
+    Uses dataclass with slots for efficient memory usage and dependency injection
+    for modular service interaction following SOLID principles.
+    """
+
+    work_dir: Path = field(default_factory=lambda: Path.cwd() / "quantisation_work")
+    use_imatrix: bool = True
+    imatrix_base: str = "Q4_K_M"
+    no_upload: bool = False
+
+    # Service dependencies with factory defaults
+    url_parser: URLParser = field(default_factory=URLParser)
+    quantisation_engine: QuantisationEngine = field(default_factory=QuantisationEngine)
+    imatrix_generator: IMatrixGenerator = field(default_factory=IMatrixGenerator)
+    readme_generator: ReadmeGenerator = field(default_factory=ReadmeGenerator)
+    uploader: HuggingFaceUploader = field(default_factory=HuggingFaceUploader)
+
+    # Computed properties
+    models_dir: Path = field(init=False)
+    environment_manager: EnvironmentManager = field(init=False)
+    model_manager: ModelManager = field(init=False)
+
+    def __post_init__(self) -> None:
+        """Initialise computed properties after dataclass construction."""
+        self.models_dir = self.work_dir / "models"
+        self.environment_manager = EnvironmentManager(self.work_dir)
+        self.model_manager = ModelManager(self.models_dir, self.environment_manager)
+
+    def quantise(self, url: str) -> dict[QuantisationType, QuantisationResult]:
+        """Main quantisation workflow orchestrating model processing from URL to upload.
+
+        Returns:
+            dict[QuantisationType, QuantisationResult]: Quantisation results for each type.
+        """
+        logger.info("Starting Bartowski quantisation process...")
+
+        # Setup and preparation
+        model_source, llama_env, f16_model_path, imatrix_path, output_repo = (
+            self._setup_environment(url)
+        )
+
+        # Create initial repository
+        self._create_initial_repository(model_source, output_repo)
+
+        # Execute all quantisations
+        results = self._execute_quantisations(
+            model_source, llama_env, f16_model_path, imatrix_path, output_repo
+        )
+
+        # Cleanup
+        self._cleanup_files(f16_model_path, model_source)
+
+        self._print_completion_summary(model_source, results, output_repo)
+        return results
+
+    def _setup_environment(self, url: str) -> tuple[ModelSource, Any, Path, Path | None, str]:
+        """Setup environment and prepare model for quantisation.
+
+        Returns:
+            Tuple of (model_source, llama_env, f16_model_path, imatrix_path, output_repo).
+        """
+        model_source = self.url_parser.parse(url)
+        self._print_model_info(model_source)
+
+        self.models_dir.mkdir(parents=True, exist_ok=True)
+        llama_env = self.environment_manager.setup()
+
+        f16_model_path = self.model_manager.prepare_model(model_source, llama_env)
+
+        imatrix_path = None
+        if self.use_imatrix:
+            logger.info("Generating importance matrix (imatrix)...")
+            imatrix_path = self.imatrix_generator.generate_imatrix(
+                f16_model_path, llama_env, self.models_dir / model_source.model_name
+            )
+
+        output_repo = (
+            f"{self.uploader.get_username()}/"
+            f"{model_source.original_author}-{model_source.model_name}-GGUF"
+        )
+
+        return model_source, llama_env, f16_model_path, imatrix_path, output_repo
+
+    def _create_initial_repository(self, model_source: ModelSource, output_repo: str) -> None:
+        """Create initial repository with planned quantisations."""
+        logger.info("Creating initial README with planned quantisations...")
+        planned_results = {
+            qt: QuantisationResult(quantisation_type=qt, success=False, status="planned")
+            for qt in SUPPORTED_QUANTISATION_TYPES
+        }
+        readme_path = self.readme_generator.generate(
+            model_source, planned_results, self.models_dir, output_repo
+        )
+
+        if not self.no_upload:
+            logger.info("Creating repository with planned quantisations...")
+            self.uploader.upload_readme(output_repo, readme_path)
+        else:
+            logger.info("Skipping repository creation (--no-upload specified)")
+
+    def _execute_quantisations(
+        self,
+        model_source: ModelSource,
+        llama_env: Any,
+        f16_model_path: Path,
+        imatrix_path: Path | None,
+        output_repo: str,
+    ) -> dict[QuantisationType, QuantisationResult]:
+        """Execute all quantisation types with parallel uploads.
+
+        Returns:
+            dict[QuantisationType, QuantisationResult]: Quantisation results for each type.
+        """
+        results: dict[QuantisationType, QuantisationResult] = {}
+        upload_futures: list[Future[None]] = []
+
+        with ThreadPoolExecutor(max_workers=1, thread_name_prefix="uploader") as upload_executor:
+            for quant_type in SUPPORTED_QUANTISATION_TYPES:
+                result = self._process_single_quantisation(
+                    quant_type,
+                    model_source,
+                    llama_env,
+                    f16_model_path,
+                    imatrix_path,
+                    output_repo,
+                    results,
+                    upload_executor,
+                    upload_futures,
+                )
+                results[quant_type] = result
+
+            self._wait_for_uploads(upload_futures)
+
+        return results
+
+    def _process_single_quantisation(
+        self,
+        quant_type: QuantisationType,
+        model_source: ModelSource,
+        llama_env: Any,
+        f16_model_path: Path,
+        imatrix_path: Path | None,
+        output_repo: str,
+        results: dict[QuantisationType, QuantisationResult],
+        upload_executor: ThreadPoolExecutor,
+        upload_futures: list,
+    ) -> QuantisationResult:
+        """Process a single quantisation type.
+
+        Returns:
+            QuantisationResult: Result of the quantisation attempt.
+        """
+        try:
+            logger.info(f"Starting {quant_type.value} quantisation...")
+            config = QUANTISATION_CONFIGS[quant_type]
+
+            # Update status to processing
+            result = QuantisationResult(quantisation_type=quant_type, success=False)
+            result.status = "processing"
+            results[quant_type] = result
+
+            self._update_readme_status(model_source, results, output_repo)
+
+            # Perform quantisation
+            context = QuantisationContext(
+                f16_model_path=f16_model_path,
+                model_source=model_source,
+                config=config,
+                llama_env=llama_env,
+                models_dir=self.models_dir,
+                imatrix_path=imatrix_path,
+                base_quant=self.imatrix_base,
+            )
+            result = self.quantisation_engine.quantise(context)
+
+            self._handle_quantisation_result(
+                result,
+                quant_type,
+                model_source,
+                results,
+                output_repo,
+                upload_executor,
+                upload_futures,
+            )
+        except Exception as e:
+            return self._handle_quantisation_error(
+                e, quant_type, model_source, results, output_repo
+            )
+        else:
+            return result
+
+    def _handle_quantisation_result(
+        self,
+        result: QuantisationResult,
+        quant_type: QuantisationType,
+        model_source: ModelSource,
+        results: dict[QuantisationType, QuantisationResult],
+        output_repo: str,
+        upload_executor: ThreadPoolExecutor,
+        upload_futures: list,
+    ) -> None:
+        """Handle successful or failed quantisation result."""
+        if result.success and result.file_path:
+            quant_str = getattr(result.quantisation_type, "value", result.quantisation_type)
+            logger.info(f"Starting parallel upload of {quant_str}...")
+            upload_future = upload_executor.submit(
+                self._upload_and_cleanup,
+                output_repo,
+                result.file_path,
+                quant_type,
+                model_source,
+                results,
+            )
+            upload_futures.append(upload_future)
+            result.file_path = None  # Mark as being uploaded
+            result.status = "uploading"
+        else:
+            result.status = "failed"
+
+        self._update_readme_status(model_source, results, output_repo)
+
+    def _handle_quantisation_error(
+        self,
+        error: Exception,
+        quant_type: QuantisationType,
+        model_source: ModelSource,
+        results: dict[QuantisationType, QuantisationResult],
+        output_repo: str,
+    ) -> QuantisationResult:
+        """Handle quantisation processing error.
+
+        Returns:
+            QuantisationResult: Failed quantisation result with error information.
+        """
+        logger.error(f"Error processing {quant_type.value}: {error}")
+        result = QuantisationResult(quantisation_type=quant_type, success=False)
+        result.status = "failed"
+        result.error_message = str(error)
+
+        try:
+            self._update_readme_status(model_source, results, output_repo)
+        except Exception as readme_error:
+            logger.error(f"Failed to update README after error: {readme_error}")
+
+        return result
+
+    def _update_readme_status(
+        self,
+        model_source: ModelSource,
+        results: dict[QuantisationType, QuantisationResult],
+        output_repo: str,
+    ) -> None:
+        """Update README with current quantisation status."""
+        if not self.no_upload:
+            updated_readme_path = self.readme_generator.generate(
+                model_source, results, self.models_dir, output_repo
+            )
+            self.uploader.upload_readme(output_repo, updated_readme_path)
+
+    def _wait_for_uploads(self, upload_futures: list) -> None:
+        """Wait for all parallel uploads to complete."""
+        logger.info("Waiting for any remaining uploads to complete...")
+        for future in upload_futures:
+            try:
+                future.result(timeout=300)  # 5 minute timeout per upload
+            except Exception as e:
+                logger.warning(f"Upload error: {e}")
+
+    def _cleanup_files(self, f16_model_path: Path, model_source: ModelSource) -> None:
+        """Clean up temporary files after processing."""
+        if f16_model_path.exists():
+            logger.info(f"Removing F16 model {f16_model_path.name} to save disk space...")
+            f16_model_path.unlink()
+
+        if not model_source.is_gguf_repo:
+            self._cleanup_original_model(model_source)
+
+    def _cleanup_original_model(self, model_source: ModelSource) -> None:
+        """Clean up original safetensors/PyTorch files after successful conversion."""
+        model_dir = self.models_dir / model_source.model_name
+
+        pytorch_files = list(model_dir.glob("pytorch_model*.bin"))
+        if pytorch_files:
+            logger.info(f"Removing {len(pytorch_files)} PyTorch model files to save disk space...")
+            for file in pytorch_files:
+                file.unlink()
+
+        logger.info("Keeping config files, tokeniser, and metadata for reference")
+
+    def _upload_and_cleanup(
+        self,
+        output_repo: str,
+        file_path: Path,
+        quant_type: QuantisationType,
+        model_source: ModelSource,
+        results: dict[QuantisationType, QuantisationResult],
+    ) -> None:
+        """Upload file and clean up (runs in background thread)."""
+        try:
+            logger.info(f"[PARALLEL] Uploading {quant_type}...")
+            self.uploader.upload_model_file(output_repo, file_path)
+
+            logger.info(f"[PARALLEL] Removing {file_path.name} to save disk space...")
+            file_path.unlink()
+
+            results[quant_type].status = "completed"
+            updated_readme_path = self.readme_generator.generate(
+                model_source, results, self.models_dir, output_repo
+            )
+            self.uploader.upload_readme(output_repo, updated_readme_path)
+
+            logger.info(f"[PARALLEL] {quant_type} upload and cleanup complete")
+        except Exception as e:
+            logger.error(f"[PARALLEL] Failed to upload {quant_type}: {e}")
+            results[quant_type].status = "failed"
+            results[quant_type].error_message = str(e)
+
+            updated_readme_path = self.readme_generator.generate(
+                model_source, results, self.models_dir, output_repo
+            )
+            self.uploader.upload_readme(output_repo, updated_readme_path)
+            raise
+
+    def _print_model_info(self, model_source: ModelSource) -> None:
+        """Print model information."""
+        logger.info(f"Source URL: {model_source.url}")
+        logger.info(f"Source model: {model_source.source_model}")
+        logger.info(f"Original author: {model_source.original_author}")
+        logger.info(f"Model name: {model_source.model_name}")
+        logger.info(f"Your HF username: {self.uploader.get_username()}")
+        logger.info(f"Working directory: {self.work_dir}")
+
+    def _print_completion_summary(
+        self,
+        model_source: ModelSource,
+        results: dict[QuantisationType, QuantisationResult],
+        output_repo: str,
+    ) -> None:
+        """Print completion summary."""
+        successful_results = [r for r in results.values() if r.success]
+
+        if successful_results:
+            logger.info("Complete! Your quantised models are available at:")
+            logger.info(f"   https://huggingface.co/{output_repo}")
+            logger.info("Model info:")
+            logger.info(f"   - Source URL: {model_source.url}")
+            logger.info(f"   - Original: {model_source.source_model}")
+            logger.info(
+                "   - Method: "
+                f"{'Direct GGUF download' if model_source.is_gguf_repo else 'HF model conversion'}"
+            )
+            logger.info(f"   - Quantised: {output_repo}")
+
+            for result in successful_results:
+                if result.file_size:
+                    filename = (
+                        f"{model_source.original_author}-{model_source.model_name}-"
+                        f"{result.quantisation_type}.gguf"
+                    )
+                    logger.info(f"   - {result.quantisation_type}: {filename} ({result.file_size})")
+        else:
+            logger.error(
+                "All quantisations failed - repository created with documentation "
+                "but no model files"
+            )
+            logger.error(f"   Repository: https://huggingface.co/{output_repo}")
--- a/helpers/services/quantisation.py
+++ b/helpers/services/quantisation.py
@ -0,0 +1,486 @@
+"""Quantisation operations service.
+
+Provides modular quantisation engine, model management, and upload capabilities
+for GGUF model processing. Consolidates quantisation logic from various tools
+into reusable components following SOLID principles.
+"""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+from typing import TYPE_CHECKING
+
+from helpers.logger import logger
+from helpers.models.quantisation import (
+    ModelSource,
+    QuantisationContext,
+    QuantisationResult,
+    QuantisationType,
+)
+from helpers.services.filesystem import FilesystemService
+
+if TYPE_CHECKING:
+    from pathlib import Path
+
+    from helpers.models.quantisation import LlamaCppEnvironment
+    from helpers.services.llama_cpp import EnvironmentManager
+
+
+class QuantisationEngine:
+    """Handles the actual quantisation process with configurable methods.
+
+    Provides flexible quantisation execution supporting multiple tensor
+    precision configurations, importance matrices, and fallback strategies.
+    Encapsulates llama-quantize binary interactions with real-time output.
+    """
+
+    def __init__(self) -> None:
+        """Initialise quantisation engine."""
+        self.fs = FilesystemService()
+
+    def quantise(self, context: QuantisationContext) -> QuantisationResult:
+        """Perform quantisation using the specified configuration.
+
+        Executes quantisation with primary and fallback methods, handling
+        tensor-specific precision overrides and importance matrix guidance.
+
+        Returns:
+            QuantisationResult with success status and file information.
+        """
+        logger.info(
+            f"⚙️ Creating {context.config.name} quantisation ({context.config.description})..."
+        )
+
+        output_path = context.get_output_path()
+
+        logger.info(f"🎯 Attempting {context.config.name} quantisation...")
+        logger.info(f"📝 Source: {context.f16_model_path}")
+        logger.info(f"📝 Target: {output_path}")
+
+        # Try primary method
+        if self._try_quantisation_method(
+            context, output_path, context.config.tensor_types, "method 1"
+        ):
+            return self._create_success_result(context.config.name, output_path, "method 1")
+
+        # Try fallback methods
+        for i, fallback_method in enumerate(context.config.fallback_methods, 2):
+            method_name = f"method {i}"
+            if self._try_quantisation_method(context, output_path, fallback_method, method_name):
+                return self._create_success_result(context.config.name, output_path, method_name)
+
+        logger.error("All %s quantisation methods failed", context.config.name)
+        return QuantisationResult(
+            quantisation_type=QuantisationType(context.config.name),
+            success=False,
+            error_message="All quantisation methods failed",
+        )
+
+    def _try_quantisation_method(
+        self,
+        context: QuantisationContext,
+        output_path: Path,
+        tensor_config: dict[str, str],
+        method_name: str,
+    ) -> bool:
+        """Try a specific quantisation method with real-time output.
+
+        Builds and executes llama-quantize command with appropriate parameters,
+        streaming output for progress monitoring.
+
+        Returns:
+            True if quantisation successful, False otherwise.
+        """
+        logger.info(f"🔍 Trying {method_name}...")
+
+        cmd = self._build_quantisation_command(context, output_path, tensor_config)
+        return self._execute_quantisation_command(cmd, method_name)
+
+    def _build_quantisation_command(
+        self, context: QuantisationContext, output_path: Path, tensor_config: dict[str, str]
+    ) -> list[str]:
+        """Build quantisation command with all required parameters.
+
+        Returns:
+            List of command arguments.
+        """
+        cmd = [str(context.llama_env.quantise_binary)]
+
+        # Add imatrix if available
+        if context.imatrix_path and context.imatrix_path.exists():
+            cmd.extend(["--imatrix", str(context.imatrix_path)])
+            logger.info(f"🧮 Using imatrix: {context.imatrix_path.name}")
+
+        # Add tensor type arguments
+        self._add_tensor_type_arguments(cmd, tensor_config)
+
+        cmd.extend([str(context.f16_model_path), str(output_path), context.base_quant])
+        return cmd
+
+    def _add_tensor_type_arguments(self, cmd: list[str], tensor_config: dict[str, str]) -> None:
+        """Add tensor type arguments to command."""
+        if not tensor_config:
+            return
+
+        for tensor_name, quant_type in tensor_config.items():
+            if tensor_name.startswith(("token-embedding-type", "output-tensor-type")):
+                cmd.extend([f"--{tensor_name}", quant_type])
+            else:
+                cmd.extend(["--tensor-type", f"{tensor_name}={quant_type}"])
+
+    def _execute_quantisation_command(self, cmd: list[str], method_name: str) -> bool:
+        """Execute quantisation command with real-time output.
+
+        Returns:
+            True if quantisation successful, False otherwise.
+        """
+        logger.info(f"💻 Running: {' '.join(cmd)}")
+        logger.info("⏳ Quantisation in progress... (this may take several minutes)")
+
+        try:
+            process = subprocess.Popen(
+                cmd,
+                stdout=subprocess.PIPE,
+                stderr=subprocess.STDOUT,
+                universal_newlines=True,
+                bufsize=1,
+            )
+
+            self._stream_quantisation_output(process)
+
+            return_code = process.poll()
+            if return_code == 0:
+                logger.info(f"✅ {method_name} quantisation successful!")
+                return True
+        except Exception as e:
+            logger.info(f"❌ {method_name} failed with exception: {e}")
+            return False
+        else:
+            logger.info(f"❌ {method_name} failed with return code {return_code}")
+            return False
+
+    def _stream_quantisation_output(self, process: subprocess.Popen) -> None:
+        """Stream quantisation output in real-time."""
+        while True:
+            if process.stdout is not None:
+                output = process.stdout.readline()
+            else:
+                break
+            if not output and process.poll() is not None:
+                break
+            if output:
+                logger.info(f"📊 {output.strip()}")
+
+    def _create_success_result(
+        self, quant_type: str, output_path: Path, method_used: str
+    ) -> QuantisationResult:
+        """Create successful quantisation result with file metadata.
+
+        Returns:
+            QuantisationResult with file path and size information.
+        """
+        file_size = self.fs.get_file_size(output_path)
+        return QuantisationResult(
+            quantisation_type=QuantisationType(quant_type),
+            success=True,
+            file_path=output_path,
+            file_size=file_size,
+            method_used=method_used,
+        )
+
+
+class ModelManager:
+    """Handles model downloading and preparation for quantisation.
+
+    Manages both GGUF repository downloads and HuggingFace model conversions,
+    providing unified interface for model acquisition and preparation.
+    """
+
+    def __init__(self, models_dir: Path, environment_manager: EnvironmentManager) -> None:
+        """Initialise model manager with storage and environment configuration.
+
+        Sets up model storage directory and links to environment manager for
+        conversion script access and llama.cpp tool discovery.
+        """
+        self.models_dir = models_dir
+        self.environment_manager = environment_manager
+        self.fs = FilesystemService()
+
+    def prepare_model(self, model_source: ModelSource, llama_env: LlamaCppEnvironment) -> Path:
+        """Prepare model for quantisation and return F16 model path.
+
+        Handles both GGUF repository downloads and regular HuggingFace model
+        conversion workflows with automatic format detection.
+
+        Returns:
+            Path to F16 GGUF model ready for quantisation.
+        """
+        model_dir = self.models_dir / model_source.model_name
+
+        if model_source.is_gguf_repo:
+            return self._handle_gguf_repo(model_source, model_dir)
+        return self._handle_regular_repo(model_source, model_dir, llama_env)
+
+    def _handle_gguf_repo(self, model_source: ModelSource, model_dir: Path) -> Path:
+        """Handle GGUF repository download with pattern matching.
+
+        Downloads GGUF files matching specified patterns, prioritising
+        multi-part files and F16 variants.
+
+        Returns:
+            Path to downloaded or existing GGUF file.
+        """
+        logger.info(f"⬇️ Downloading GGUF file from repository: {model_source.source_model}")
+        logger.info(f"🔍 Looking for file pattern: *{model_source.gguf_file_pattern}*")
+
+        f16_model = model_dir / f"{model_source.model_name}-f16.gguf"
+
+        if f16_model.exists():
+            logger.info(f"✅ Found existing F16 file: {f16_model.name}")
+            return f16_model
+
+        # Check for existing GGUF files
+        model_dir.mkdir(parents=True, exist_ok=True)
+        existing_gguf = self.fs.find_gguf_files(model_dir)
+
+        if existing_gguf:
+            logger.info(f"✅ Found existing GGUF file: {existing_gguf[0].name}")
+            return existing_gguf[0]
+
+        # Download with patterns
+        downloaded_file = self._download_gguf_with_patterns(
+            model_source.source_model, model_source.gguf_file_pattern, model_dir
+        )
+
+        if downloaded_file:
+            # Handle multi-part files
+            if "00001-of-" in downloaded_file.name:
+                return downloaded_file
+            if "-00002-of-" in downloaded_file.name or "-00003-of-" in downloaded_file.name:
+                base_name = downloaded_file.name.replace("-00002-of-", "-00001-of-").replace(
+                    "-00003-of-", "-00001-of-"
+                )
+                first_part = downloaded_file.parent / base_name
+                if first_part.exists():
+                    logger.info(f"🔄 Using first part: {first_part.name}")
+                    return first_part
+
+            # Rename single file to standard name
+            downloaded_file.rename(f16_model)
+            return f16_model
+
+        # Fallback to regular conversion
+        logger.info("💡 Falling back to downloading full repository and converting...")
+        return self._handle_regular_repo(
+            ModelSource(**{**model_source.dict(), "is_gguf_repo": False}),
+            model_dir,
+            None,
+        )
+
+    def _download_gguf_with_patterns(
+        self, source_model: str, pattern: str | None, model_dir: Path
+    ) -> Path | None:
+        """Download GGUF file using various pattern strategies.
+
+        Tries multiple pattern variations to find and download appropriate
+        GGUF files, handling timeouts and temporary directories.
+
+        Returns:
+            Path to downloaded file, or None if all patterns fail.
+        """
+        if pattern:
+            patterns = [
+                f"*{pattern}*",
+                f"*{pattern.lower()}*",
+                f"*{pattern.upper()}*",
+                "*f16*",
+                "*F16*",
+                "*fp16*",
+            ]
+        else:
+            patterns = ["*f16*", "*F16*", "*fp16*"]
+
+        temp_dir = model_dir / "gguf_temp"
+
+        for search_pattern in patterns:
+            logger.info(f"🔍 Trying pattern: {search_pattern}")
+            temp_dir.mkdir(exist_ok=True)
+
+            try:
+                subprocess.run(
+                    [
+                        "timeout",
+                        "300",
+                        "huggingface-cli",
+                        "download",
+                        source_model,
+                        "--include",
+                        search_pattern,
+                        "--local-dir",
+                        str(temp_dir),
+                    ],
+                    check=True,
+                    capture_output=True,
+                )
+
+                # Find downloaded GGUF files
+                gguf_files = self.fs.find_gguf_files(temp_dir, pattern)
+                if gguf_files:
+                    found_file = gguf_files[0]
+                    logger.info(f"✅ Found GGUF file: {found_file.name}")
+
+                    # Move to parent directory
+                    final_path = model_dir / found_file.name
+                    shutil.move(str(found_file), str(final_path))
+                    shutil.rmtree(temp_dir)
+                    return final_path
+
+            except subprocess.CalledProcessError:
+                logger.info(f"⚠️ Pattern {search_pattern} failed or timed out")
+                continue
+            finally:
+                if temp_dir.exists():
+                    shutil.rmtree(temp_dir, ignore_errors=True)
+
+        return None
+
+    def _handle_regular_repo(
+        self,
+        model_source: ModelSource,
+        model_dir: Path,
+        llama_env: LlamaCppEnvironment | None,
+    ) -> Path:
+        """Handle regular HuggingFace repository conversion.
+
+        Downloads full model repository and converts to F16 GGUF format
+        using llama.cpp conversion scripts.
+
+        Returns:
+            Path to converted F16 GGUF model.
+        """
+        logger.info(f"⬇️ Downloading source model: {model_source.source_model}")
+
+        if not model_dir.exists():
+            subprocess.run(
+                [
+                    "huggingface-cli",
+                    "download",
+                    model_source.source_model,
+                    "--local-dir",
+                    str(model_dir),
+                ],
+                check=True,
+            )
+        else:
+            logger.info("✅ Model already downloaded")
+
+        logger.info("🔄 Converting to GGUF F16 format...")
+        f16_model = model_dir / f"{model_source.model_name}-f16.gguf"
+
+        if not f16_model.exists():
+            if not llama_env:
+                llama_env = self.environment_manager.setup()
+
+            # Ensure conversion script is available
+            if llama_env.use_repo or not self.environment_manager.llama_cpp_dir.exists():
+                logger.info("Getting conversion script from llama.cpp repository...")
+                llama_env = self.environment_manager.setup_repository()
+
+            subprocess.run(
+                [
+                    *llama_env.convert_script.split(),
+                    str(model_dir),
+                    "--outtype",
+                    "f16",
+                    "--outfile",
+                    str(f16_model),
+                ],
+                check=True,
+            )
+        else:
+            logger.info("✅ F16 model already exists")
+
+        return f16_model
+
+
+class HuggingFaceUploader:
+    """Handles uploading models and documentation to HuggingFace.
+
+    Provides methods for repository creation, file uploads, and README
+    updates with proper error handling and retry logic.
+    """
+
+    @staticmethod
+    def get_username() -> str:
+        """Get authenticated HuggingFace username.
+
+        Returns:
+            HuggingFace username from CLI authentication.
+
+        Raises:
+            RuntimeError: If not authenticated.
+        """
+        try:
+            result = subprocess.run(
+                ["huggingface-cli", "whoami"],
+                capture_output=True,
+                text=True,
+                check=True,
+            )
+            return result.stdout.strip()
+        except (subprocess.CalledProcessError, FileNotFoundError) as err:
+            msg = "Please log in to HuggingFace first: huggingface-cli login"
+            raise RuntimeError(msg) from err
+
+    def upload_readme(self, output_repo: str, readme_path: Path) -> None:
+        """Upload or update README file to repository.
+
+        Creates repository if needed, handles existing repository updates.
+        """
+        logger.info("Uploading README...")
+        try:
+            subprocess.run(
+                [
+                    "huggingface-cli",
+                    "upload",
+                    output_repo,
+                    str(readme_path),
+                    "README.md",
+                    "--create",
+                ],
+                check=True,
+                capture_output=True,
+            )
+            logger.info("README uploaded")
+        except subprocess.CalledProcessError:
+            # Repository exists, update without --create
+            subprocess.run(
+                [
+                    "huggingface-cli",
+                    "upload",
+                    output_repo,
+                    str(readme_path),
+                    "README.md",
+                ],
+                check=True,
+            )
+            logger.info("README updated")
+
+    def upload_model_file(self, output_repo: str, model_path: Path) -> None:
+        """Upload model file to repository.
+
+        Uploads GGUF model file to specified repository path.
+        """
+        logger.info(f"Uploading {model_path.name}...")
+        subprocess.run(
+            [
+                "huggingface-cli",
+                "upload",
+                output_repo,
+                str(model_path),
+                model_path.name,
+            ],
+            check=True,
+        )
+        logger.info(f"{model_path.name} uploaded")
--- a/helpers/utils/init.py
+++ b/helpers/utils/init.py
@ -0,0 +1,16 @@
+"""Utility functions for llm-gguf-tools.
+
+Provides low-level utilities for tensor mapping, configuration parsing,
+and other common operations. Uses UK English spelling conventions throughout.
+"""
+
+from __future__ import annotations
+
+from helpers.utils.config_parser import ConfigParser
+from helpers.utils.tensor_mapping import TensorMapper, URLParser
+
+__all__ = [
+    "ConfigParser",
+    "TensorMapper",
+    "URLParser",
+]
--- a/helpers/utils/config_parser.py
+++ b/helpers/utils/config_parser.py
@ -0,0 +1,171 @@
+"""Configuration parsing utilities.
+
+Provides utilities for parsing model configurations, inferring parameters,
+and handling architecture-specific settings. Uses UK English spelling
+conventions throughout.
+"""
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any
+
+from helpers.models.conversion import GGUFParameters, ModelConfig, VisionConfig
+from helpers.services.filesystem import FilesystemService
+
+if TYPE_CHECKING:
+    from pathlib import Path
+
+
+class ConfigParser:
+    """Parses and transforms model configuration files.
+
+    Handles loading of HuggingFace config.json files, parameter inference,
+    and conversion to GGUF-compatible formats. Provides sensible defaults
+    for missing values and architecture-specific handling.
+    """
+
+    def __init__(self) -> None:
+        """Initialise ConfigParser."""
+        self.fs = FilesystemService()
+
+    def load_model_config(self, model_path: Path) -> ModelConfig:
+        """Load model configuration from config.json file.
+
+        Reads the standard HuggingFace config.json file and parses it into
+        a structured ModelConfig instance with proper type validation. Handles
+        vision model configurations and provides sensible defaults for missing values.
+
+        Returns:
+            Parsed ModelConfig instance.
+        """
+        config_file = model_path / "config.json"
+        raw_config = self.fs.load_json_config(config_file)
+
+        # Parse vision config if present
+        vision_config = None
+        if "vision_config" in raw_config:
+            vision_config = VisionConfig(**raw_config["vision_config"])
+
+        # Create ModelConfig with parsed values
+        return ModelConfig(
+            architectures=raw_config.get("architectures", ["Unknown"]),
+            model_type=raw_config.get("model_type", "unknown"),
+            vocab_size=raw_config.get("vocab_size", 32000),
+            max_position_embeddings=raw_config.get("max_position_embeddings", 2048),
+            hidden_size=raw_config.get("hidden_size", 4096),
+            num_hidden_layers=raw_config.get("num_hidden_layers", 32),
+            intermediate_size=raw_config.get("intermediate_size", 11008),
+            num_attention_heads=raw_config.get("num_attention_heads", 32),
+            num_key_value_heads=raw_config.get("num_key_value_heads"),
+            rope_theta=raw_config.get("rope_theta", 10000.0),
+            rope_scaling=raw_config.get("rope_scaling"),
+            rms_norm_eps=raw_config.get("rms_norm_eps", 1e-5),
+            vision_config=vision_config,
+        )
+
+    def infer_gguf_parameters(self, config: ModelConfig) -> GGUFParameters:
+        """Infer GGUF parameters from model configuration.
+
+        Translates HuggingFace model configuration to GGUF parameter format,
+        providing sensible defaults for missing values and handling various
+        architecture conventions.
+
+        Args:
+            config: Parsed ModelConfig instance.
+
+        Returns:
+            GGUFParameters with inferred values.
+        """
+        # Calculate derived parameters
+        num_heads = config.num_attention_heads
+        embedding_length = config.hidden_size
+        rope_dimension_count = embedding_length // num_heads
+
+        # Handle KV heads (for GQA models)
+        num_kv_heads = config.num_key_value_heads or num_heads
+
+        # Create GGUFParameters using dict with aliases
+        params_dict = {
+            "vocab_size": config.vocab_size,
+            "context_length": config.max_position_embeddings,
+            "embedding_length": embedding_length,
+            "block_count": config.num_hidden_layers,
+            "feed_forward_length": config.intermediate_size,
+            "attention.head_count": num_heads,
+            "attention.head_count_kv": num_kv_heads,
+            "attention.layer_norm_rms_epsilon": config.rms_norm_eps,
+            "rope.freq_base": config.rope_theta,
+            "rope.dimension_count": rope_dimension_count,
+        }
+
+        params = GGUFParameters.model_validate(params_dict)
+
+        # Add RoPE scaling if present
+        if config.rope_scaling:
+            params.rope_scaling_type = config.rope_scaling.get("type", "linear")
+            params.rope_scaling_factor = config.rope_scaling.get("factor", 1.0)
+
+        return params
+
+    @staticmethod
+    def get_architecture_mapping(architecture: str) -> str:
+        """Map architecture names to known GGUF architectures.
+
+        Provides fallback mappings for architectures not directly supported
+        by GGUF, mapping them to similar known architectures.
+
+        Args:
+            architecture: Original architecture name from config.
+
+        Returns:
+            GGUF-compatible architecture name.
+        """
+        # Architecture mappings to known GGUF types
+        mappings = {
+            "DotsOCRForCausalLM": "qwen2",  # Similar architecture
+            "GptOssForCausalLM": "llama",  # Use llama as fallback
+            "MistralForCausalLM": "llama",  # Mistral is llama-like
+            "Qwen2ForCausalLM": "qwen2",
+            "LlamaForCausalLM": "llama",
+            "GemmaForCausalLM": "gemma",
+            "Phi3ForCausalLM": "phi3",
+            # Add more mappings as needed
+        }
+
+        return mappings.get(architecture, "llama")  # Default to llama
+
+    @staticmethod
+    def load_tokeniser_config(model_path: Path) -> dict[str, Any]:
+        """Load tokeniser configuration from model directory.
+
+        Reads tokenizer_config.json to extract special token IDs and
+        other tokenisation parameters.
+
+        Args:
+            model_path: Path to model directory.
+
+        Returns:
+            Tokeniser configuration dictionary.
+        """
+        fs = FilesystemService()
+        tokeniser_config_path = model_path / "tokenizer_config.json"
+
+        if not tokeniser_config_path.exists():
+            # Return defaults if no config found
+            return {
+                "bos_token_id": 1,
+                "eos_token_id": 2,
+                "unk_token_id": 0,
+                "pad_token_id": 0,
+            }
+
+        config = fs.load_json_config(tokeniser_config_path)
+
+        # Extract token IDs with defaults
+        return {
+            "bos_token_id": config.get("bos_token_id", 1),
+            "eos_token_id": config.get("eos_token_id", 2),
+            "unk_token_id": config.get("unk_token_id", 0),
+            "pad_token_id": config.get("pad_token_id", 0),
+            "model_type": config.get("model_type", "llama"),
+        }
--- a/helpers/utils/tensor_mapping.py
+++ b/helpers/utils/tensor_mapping.py
@ -0,0 +1,196 @@
+"""Tensor mapping and URL parsing utilities.
+
+Provides utilities for mapping tensor names between different formats,
+parsing model URLs, and handling architecture-specific conversions.
+Uses UK English spelling conventions throughout.
+"""
+
+from __future__ import annotations
+
+import re
+from typing import ClassVar
+
+from helpers.models.quantisation import ModelSource, URLType
+
+
+class TensorMapper:
+    """Maps tensor names between HuggingFace and GGUF conventions.
+
+    Provides flexible tensor name translation supporting direct mappings,
+    layer-aware transformations, and architecture-specific overrides.
+    Handles both simple renames and complex pattern-based conversions.
+    """
+
+    # Common direct mappings across architectures
+    DIRECT_MAPPINGS: ClassVar[dict[str, str]] = {
+        "model.embed_tokens.weight": "token_embd.weight",
+        "model.norm.weight": "output_norm.weight",
+        "lm_head.weight": "output.weight",
+    }
+
+    # Layer component patterns for transformer blocks
+    LAYER_PATTERNS: ClassVar[dict[str, str]] = {
+        "self_attn.q_proj.weight": "attn_q.weight",
+        "self_attn.q_proj.bias": "attn_q.bias",
+        "self_attn.k_proj.weight": "attn_k.weight",
+        "self_attn.k_proj.bias": "attn_k.bias",
+        "self_attn.v_proj.weight": "attn_v.weight",
+        "self_attn.v_proj.bias": "attn_v.bias",
+        "self_attn.o_proj": "attn_output.weight",
+        "mlp.gate_proj": "ffn_gate.weight",
+        "mlp.up_proj": "ffn_up.weight",
+        "mlp.down_proj": "ffn_down.weight",
+        "input_layernorm": "attn_norm.weight",
+        "post_attention_layernorm": "ffn_norm.weight",
+    }
+
+    @classmethod
+    def map_tensor_name(cls, original_name: str) -> str | None:
+        """Map original tensor name to GGUF format.
+
+        Translates HuggingFace tensor naming to GGUF format, handling embeddings,
+        attention layers, feed-forward networks, and normalisation layers. Uses
+        layer-aware mapping for transformer blocks whilst maintaining consistency
+        across different model architectures.
+
+        Returns:
+            GGUF tensor name, or None if unmappable.
+        """
+        # Check direct mappings first
+        if original_name in cls.DIRECT_MAPPINGS:
+            return cls.DIRECT_MAPPINGS[original_name]
+
+        # Handle layer-specific tensors
+        if ".layers." in original_name:
+            return cls._map_layer_tensor(original_name)
+
+        # Return None for unmapped tensors
+        return None
+
+    @classmethod
+    def _map_layer_tensor(cls, tensor_name: str) -> str | None:
+        """Map layer-specific tensor names.
+
+        Handles tensors within transformer layers, extracting layer indices
+        and mapping component names to GGUF conventions.
+
+        Args:
+            tensor_name: Layer tensor name containing .layers.N. pattern.
+
+        Returns:
+            Mapped GGUF tensor name, or None if unmappable.
+        """
+        # Extract layer number
+        parts = tensor_name.split(".")
+        layer_idx = None
+        for i, part in enumerate(parts):
+            if part == "layers" and i + 1 < len(parts):
+                layer_idx = parts[i + 1]
+                break
+
+        if layer_idx is None:
+            return None
+
+        # Check each pattern
+        for pattern, replacement in cls.LAYER_PATTERNS.items():
+            if pattern in tensor_name:
+                return f"blk.{layer_idx}.{replacement}"
+
+        return None
+
+
+class URLParser:
+    """Parses and validates model URLs from various sources.
+
+    Handles HuggingFace URLs, Ollama-style GGUF references, and other
+    model source formats. Extracts metadata including author, model name,
+    and file patterns for appropriate download strategies.
+    """
+
+    @staticmethod
+    def parse(url: str) -> ModelSource:
+        """Parse URL and extract model source information.
+
+        Analyses URL format to determine source type and extract relevant
+        metadata for model download and processing.
+
+        Args:
+            url: Model URL in supported format.
+
+        Returns:
+            ModelSource with parsed information.
+
+        Raises:
+            ValueError: If URL format is not recognised.
+        """
+        if not url:
+            msg = "URL cannot be empty"
+            raise ValueError(msg)
+
+        # Try Ollama-style GGUF URL first (hf.co/author/model:pattern)
+        ollama_match = re.match(r"^hf\.co/([^:]+):(.+)$", url)
+        if ollama_match:
+            source_model = ollama_match.group(1)
+            gguf_pattern = ollama_match.group(2)
+            return URLParser._create_model_source(
+                url,
+                URLType.OLLAMA_GGUF,
+                source_model,
+                gguf_file_pattern=gguf_pattern,
+                is_gguf_repo=True,
+            )
+
+        # Try regular HuggingFace URL
+        hf_match = re.match(r"https://huggingface\.co/([^/]+/[^/?]+)", url)
+        if hf_match:
+            source_model = hf_match.group(1)
+            return URLParser._create_model_source(
+                url, URLType.HUGGINGFACE, source_model, is_gguf_repo=False
+            )
+
+        msg = (
+            "Invalid URL format\n"
+            "Supported formats:\n"
+            "  - https://huggingface.co/username/model-name\n"
+            "  - hf.co/username/model-name-GGUF:F16"
+        )
+        raise ValueError(msg)
+
+    @staticmethod
+    def _create_model_source(
+        url: str,
+        url_type: URLType,
+        source_model: str,
+        gguf_file_pattern: str | None = None,
+        is_gguf_repo: bool = False,
+    ) -> ModelSource:
+        """Create ModelSource with parsed information.
+
+        Constructs a ModelSource instance with extracted metadata,
+        handling author/model name splitting and GGUF suffix removal.
+
+        Args:
+            url: Original URL.
+            url_type: Type of URL (HuggingFace or Ollama GGUF).
+            source_model: Repository identifier (author/model).
+            gguf_file_pattern: Optional GGUF file pattern.
+            is_gguf_repo: Whether this is a GGUF repository.
+
+        Returns:
+            Configured ModelSource instance.
+        """
+        author, model_name = source_model.split("/", 1)
+
+        # Strip -GGUF suffix for GGUF repos
+        if is_gguf_repo and model_name.endswith("-GGUF"):
+            model_name = model_name[:-5]
+
+        return ModelSource(
+            url=url,
+            url_type=url_type,
+            source_model=source_model,
+            original_author=author,
+            model_name=model_name,
+            gguf_file_pattern=gguf_file_pattern,
+            is_gguf_repo=is_gguf_repo,
+        )
--- a/pyproject.toml
+++ b/pyproject.toml
@ -0,0 +1,96 @@
+[project]
+name = "llm-gguf-tools"
+version = "0.1.0"
+description = "Tools to convert and quantise language models in GGUF format"
+readme = "README.md"
+license = { text = "Apache-2.0" }
+authors = [{ name = "Tom Foster", email = "tom@tomfos.tr" }]
+maintainers = [{ name = "Tom Foster", email = "tom@tomfos.tr" }]
+requires-python = ">=3.13"
+classifiers = [
+    "Development Status :: 3 - Alpha",
+    "License :: OSI Approved :: Apache Software License",
+    "Programming Language :: Python",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.13",
+    "Topic :: Scientific/Engineering :: Artificial Intelligence",
+    "Topic :: Software Development :: Libraries :: Python Modules",
+]
+dependencies = ["gguf>=0", "pydantic>=2", "safetensors>=0", "torch>=2"]
+
+[project.urls]
+Homepage = "https://git.tomfos.tr/tom/llm-gguf-tools"
+"Bug Reports" = "https://git.tomfos.tr/tom/llm-gguf-tools/issues"
+"Source" = "https://git.tomfos.tr/tom/llm-gguf-tools"
+
+[dependency-groups]
+dev = ["pytest>=8", "ruff>=0", "uv>=0"]
+
+[tool.uv]
+package = true
+
+[[tool.uv.index]]
+name = "pytorch-cpu"
+url = "https://download.pytorch.org/whl/cpu"
+
+[tool.uv.sources]
+torch = { index = "pytorch-cpu" }
+
+[build-system]
+requires = ["setuptools>=61.0"]
+build-backend = "setuptools.build_meta"
+
+[project.scripts]
+quantise = "quantise:main"
+safetensors-to-gguf = "direct_safetensors_to_gguf:main"
+
+[tool.setuptools]
+packages = { find = {} }
+
+[tool.ruff]
+cache-dir = "/tmp/.ruff_cache"
+fix = true
+line-length = 100
+preview = true
+show-fixes = false
+target-version = "py313"
+unsafe-fixes = true
+
+[tool.ruff.format]
+line-ending = "auto"
+skip-magic-trailing-comma = false
+
+[tool.ruff.lint]
+fixable = ["ALL"]
+ignore = [
+    "ANN401",  # use of Any type
+    "BLE001",  # blind Exception usage
+    "COM812",  # missing trailing comma
+    "CPY",     # flake8-copyright
+    "FBT",     # boolean arguments
+    "PLR0912", # too many branches
+    "PLR0913", # too many arguments
+    "PLR0915", # too many statements
+    "PLR0917", # too many positional arguments
+    "PLR6301", # method could be static
+    "RUF029",  # async methods that don't await
+    "S104",    # binding to all interfaces
+    "S110",    # passed exceptions
+    "S404",    # use of subprocess
+    "S603",    # check subprocess input
+    "S607",    # subprocess with partial path
+    "TRY301",  # raise inside try block
+]
+select = ["ALL"]
+unfixable = [
+    "F841",   # local variable assigned but never used
+    "RUF100", # unused noqa comments
+    "T201",   # don't strip print statement
+]
+
+[tool.ruff.lint.isort]
+combine-as-imports = true
+required-imports = ["from __future__ import annotations"]
+
+[tool.ruff.lint.pydocstyle]
+convention = "google"
--- a/quantize_gguf.py
+++ b/quantize_gguf.py
@ -0,0 +1,101 @@
+#!/usr/bin/env python3
+"""Bartowski Quantisation Script for advanced GGUF model processing.
+
+Implements a sophisticated quantisation pipeline supporting Q4_K_M, Q4_K_L,
+Q4_K_XL, and Q4_K_XXL methods with tensor-level precision control. Features
+parallel processing, status tracking, automatic README generation, and
+HuggingFace integration for streamlined model distribution workflows.
+
+Usage: python quantise.py <huggingface_url>
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+from pathlib import Path
+
+from helpers.logger import logger
+from helpers.services.orchestrator import QuantisationOrchestrator
+
+
+def main() -> None:
+    """Main entry point for the Bartowski quantisation workflow.
+
+    Parses command-line arguments, initialises the quantisation orchestrator,
+    and executes the complete model processing pipeline from HuggingFace URL
+    to quantised GGUF files with optional HuggingFace upload and cleanup.
+    """
+    parser = argparse.ArgumentParser(
+        description="Bartowski Quantisation Script - Supports Q4_K_M, Q4_K_L, Q4_K_XL, Q4_K_XXL",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  python quantise.py https://huggingface.co/DavidAU/Gemma-3-4b-it-Uncensored-DBL-X
+  python quantise.py hf.co/DavidAU/Gemma-3-it-4B-Uncensored-DBL-X-GGUF:F16
+        """,
+    )
+    parser.add_argument("url", help="HuggingFace model URL")
+    parser.add_argument(
+        "--work-dir", type=Path, help="Working directory (default: ./quantisation_work)"
+    )
+    parser.add_argument(
+        "--no-imatrix",
+        action="store_true",
+        help="Skip imatrix generation (faster but lower quality)",
+    )
+    parser.add_argument(
+        "--imatrix-base",
+        choices=[
+            "Q2_K",
+            "Q3_K_L",
+            "Q3_K_M",
+            "Q3_K_S",
+            "Q4_K_S",
+            "Q4_K_M",
+            "Q5_K_S",
+            "Q5_K_M",
+            "Q6_K",
+            "Q8_0",
+        ],
+        default="Q4_K_M",
+        help="Base quantisation for imatrix generation",
+    )
+    parser.add_argument(
+        "--no-upload",
+        action="store_true",
+        help="Skip uploading to HuggingFace (local testing only)",
+    )
+
+    args = parser.parse_args()
+
+    if not args.url:
+        parser.print_help()
+        sys.exit(1)
+
+    try:
+        orchestrator = QuantisationOrchestrator(
+            work_dir=args.work_dir or Path.cwd() / "quantisation_work",
+            use_imatrix=not args.no_imatrix,
+            imatrix_base=args.imatrix_base,
+            no_upload=args.no_upload,
+        )
+        orchestrator.quantise(args.url)
+
+        # Cleanup prompt
+        logger.info("Cleaning up...")
+        response = input("Delete working files? (y/N): ").strip().lower()
+        if response == "y":
+            shutil.rmtree(orchestrator.work_dir)
+            logger.info("Cleanup complete")
+        else:
+            logger.info(f"Working files kept in: {orchestrator.work_dir}")
+
+    except Exception as e:
+        logger.error(f"Error: {e}")
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/resources/imatrix_data.txt
+++ b/resources/imatrix_data.txt
--- a/safetensors2gguf.py
+++ b/safetensors2gguf.py
@ -0,0 +1,95 @@
+#!/usr/bin/env python3
+"""Direct SafeTensors to GGUF converter for unsupported architectures.
+
+This script attempts to convert SafeTensors models to GGUF format directly,
+without relying on llama.cpp's architecture-specific conversion logic.
+"""
+
+from __future__ import annotations
+
+import sys
+import traceback
+from argparse import ArgumentParser
+from pathlib import Path
+
+from helpers.logger import logger
+from helpers.services.gguf import GGUFConverter
+from helpers.utils.config_parser import ConfigParser
+from helpers.utils.tensor_mapping import TensorMapper
+
+
+def convert_safetensors_to_gguf(
+    model_path: Path, output_path: Path, force_architecture: str | None = None
+) -> bool:
+    """Convert SafeTensors model to GGUF format with comprehensive metadata handling.
+
+    Orchestrates the complete conversion workflow: loads configuration, maps
+    architecture to known GGUF types, creates writer with proper metadata,
+    processes all tensor files with name mapping, and adds tokeniser data.
+    Handles BFloat16 conversion and provides fallback architecture mapping
+    for unsupported model types to ensure maximum compatibility.
+
+    Returns:
+        True if conversion was successful, False otherwise.
+    """
+    # Use ConfigParser to load configuration
+    config_parser = ConfigParser()
+    model_config = config_parser.load_model_config(model_path)
+
+    arch_name = model_config.architectures[0]
+    model_type = model_config.model_type
+
+    logger.info(f"Architecture: {arch_name}")
+    logger.info(f"Model type: {model_type}")
+
+    # Use forced architecture or try to map to a known one
+    if force_architecture:
+        arch = force_architecture
+        logger.warning(f"Using forced architecture: {arch}")
+    else:
+        # Use ConfigParser's architecture mapping
+        arch = config_parser.get_architecture_mapping(arch_name)
+        if arch != arch_name:
+            logger.warning(f"Unknown architecture {arch_name}, using {arch} as fallback")
+
+    # Use the new GGUFConverter for the conversion
+    tensor_mapper = TensorMapper()
+    return GGUFConverter.convert_safetensors(
+        model_path, output_path, model_config, arch, tensor_mapper
+    )
+
+
+def main() -> None:
+    """Main entry point for SafeTensors to GGUF conversion command-line interface.
+
+    Parses command-line arguments, validates input paths, and orchestrates the
+    conversion process with proper error handling. Supports forced architecture
+    mapping and flexible output path specification. Provides comprehensive
+    error reporting and exit codes for integration with automated workflows.
+    """
+    parser = ArgumentParser(description="Convert SafeTensors to GGUF directly")
+    parser.add_argument("model_path", help="Path to SafeTensors model directory")
+    parser.add_argument("-o", "--output", help="Output GGUF file path")
+    parser.add_argument("--force-arch", help="Force a specific architecture mapping")
+
+    args = parser.parse_args()
+
+    model_path = Path(args.model_path)
+    if not model_path.exists():
+        logger.error(f"Model path not found: {model_path}")
+        sys.exit(1)
+
+    output_path = Path(args.output) if args.output else model_path / f"{model_path.name}-f32.gguf"
+
+    try:
+        success = convert_safetensors_to_gguf(model_path, output_path, args.force_arch)
+        sys.exit(0 if success else 1)
+    except Exception as e:
+        logger.error(f"Conversion failed: {e}")
+
+        traceback.print_exc()
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/uv.lock
+++ b/uv.lock
@ -0,0 +1,425 @@
+version = 1
+revision = 2
+requires-python = ">=3.13"
+resolution-markers = [
+    "sys_platform != 'darwin'",
+    "sys_platform == 'darwin'",
+]
+
+[[package]]
+name = "annotated-types"
+version = "0.7.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/ee/67/531ea369ba64dcff5ec9c3402f9f51bf748cec26dde048a2f973a4eea7f5/annotated_types-0.7.0.tar.gz", hash = "sha256:aff07c09a53a08bc8cfccb9c85b05f1aa9a2a6f23728d790723543408344ce89", size = 16081, upload-time = "2024-05-20T21:33:25.928Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/78/b6/6307fbef88d9b5ee7421e68d78a9f162e0da4900bc5f5793f6d3d0e34fb8/annotated_types-0.7.0-py3-none-any.whl", hash = "sha256:1f02e8b43a8fbbc3f3e0d4f0f4bfc8131bcb4eebe8849b8e5c773f3a1c582a53", size = 13643, upload-time = "2024-05-20T21:33:24.1Z" },
+]
+
+[[package]]
+name = "colorama"
+version = "0.4.6"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6" },
+]
+
+[[package]]
+name = "filelock"
+version = "3.13.1"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/filelock-3.13.1-py3-none-any.whl", hash = "sha256:57dbda9b35157b05fb3e58ee91448612eb674172fab98ee235ccb0b5bee19a1c" },
+]
+
+[[package]]
+name = "fsspec"
+version = "2024.6.1"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/fsspec-2024.6.1-py3-none-any.whl", hash = "sha256:3cb443f8bcd2efb31295a5b9fdb02aee81d8452c80d28f97a6d0959e6cee101e" },
+]
+
+[[package]]
+name = "gguf"
+version = "0.17.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "numpy" },
+    { name = "pyyaml" },
+    { name = "tqdm" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/08/08/7de1ca4b71e7bf33b547f82bb22505e221b5fa42f67d635e200e0ad22ad6/gguf-0.17.1.tar.gz", hash = "sha256:36ad71aad900a3e75fc94ebe96ea6029f03a4e44be7627ef7ad3d03e8c7bcb53", size = 89338, upload-time = "2025-06-19T14:00:33.705Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/fc/31/6a93a887617ee7deeaa602ca3d02d1c12a6cb8a742a695de5d128f5fa46a/gguf-0.17.1-py3-none-any.whl", hash = "sha256:7bc5aa7eeb1931f7d39b48fdc5b38fda6b294b9dca75cf607ac69557840a3943", size = 96224, upload-time = "2025-06-19T14:00:32.88Z" },
+]
+
+[[package]]
+name = "iniconfig"
+version = "2.1.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/f2/97/ebf4da567aa6827c909642694d71c9fcf53e5b504f2d96afea02718862f3/iniconfig-2.1.0.tar.gz", hash = "sha256:3abbd2e30b36733fee78f9c7f7308f2d0050e88f0087fd25c2645f63c773e1c7", size = 4793, upload-time = "2025-03-19T20:09:59.721Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/2c/e1/e6716421ea10d38022b952c159d5161ca1193197fb744506875fbb87ea7b/iniconfig-2.1.0-py3-none-any.whl", hash = "sha256:9deba5723312380e77435581c6bf4935c94cbfab9b1ed33ef8d238ea168eb760", size = 6050, upload-time = "2025-03-19T20:10:01.071Z" },
+]
+
+[[package]]
+name = "jinja2"
+version = "3.1.4"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+dependencies = [
+    { name = "markupsafe" },
+]
+wheels = [
+    { url = "https://download.pytorch.org/whl/Jinja2-3.1.4-py3-none-any.whl", hash = "sha256:bc5dd2abb727a5319567b7a813e6a2e7318c39f4f487cfe6c89c6f9c7d25197d" },
+]
+
+[[package]]
+name = "llm-gguf-tools"
+version = "0.1.0"
+source = { editable = "." }
+dependencies = [
+    { name = "gguf" },
+    { name = "pydantic" },
+    { name = "safetensors" },
+    { name = "torch", version = "2.8.0", source = { registry = "https://download.pytorch.org/whl/cpu" }, marker = "sys_platform == 'darwin'" },
+    { name = "torch", version = "2.8.0+cpu", source = { registry = "https://download.pytorch.org/whl/cpu" }, marker = "sys_platform != 'darwin'" },
+]
+
+[package.dev-dependencies]
+dev = [
+    { name = "pytest" },
+    { name = "ruff" },
+    { name = "uv" },
+]
+
+[package.metadata]
+requires-dist = [
+    { name = "gguf", specifier = ">=0" },
+    { name = "pydantic", specifier = ">=2" },
+    { name = "safetensors", specifier = ">=0" },
+    { name = "torch", specifier = ">=2", index = "https://download.pytorch.org/whl/cpu" },
+]
+
+[package.metadata.requires-dev]
+dev = [
+    { name = "pytest", specifier = ">=8" },
+    { name = "ruff", specifier = ">=0" },
+    { name = "uv", specifier = ">=0" },
+]
+
+[[package]]
+name = "markupsafe"
+version = "3.0.2"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/MarkupSafe-3.0.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:15ab75ef81add55874e7ab7055e9c397312385bd9ced94920f2802310c930396" },
+]
+
+[[package]]
+name = "mpmath"
+version = "1.3.0"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/mpmath-1.3.0-py3-none-any.whl", hash = "sha256:a0b2b9fe80bbcd81a6647ff13108738cfb482d481d826cc0e02f5b35e5c88d2c" },
+]
+
+[[package]]
+name = "networkx"
+version = "3.3"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/networkx-3.3-py3-none-any.whl", hash = "sha256:28575580c6ebdaf4505b22c6256a2b9de86b316dc63ba9e93abde3d78dfdbcf2" },
+]
+
+[[package]]
+name = "numpy"
+version = "2.1.2"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:a84498e0d0a1174f2b3ed769b67b656aa5460c92c9554039e11f20a05650f00d" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:4d6ec0d4222e8ffdab1744da2560f07856421b367928026fb540e1945f2eeeaf" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:259ec80d54999cc34cd1eb8ded513cb053c3bf4829152a2e00de2371bd406f5e" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-macosx_14_0_x86_64.whl", hash = "sha256:675c741d4739af2dc20cd6c6a5c4b7355c728167845e3c6b0e824e4e5d36a6c3" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:05b2d4e667895cc55e3ff2b56077e4c8a5604361fc21a042845ea3ad67465aa8" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:43cca367bf94a14aca50b89e9bc2061683116cfe864e56740e083392f533ce7a" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-win_amd64.whl", hash = "sha256:f2ded8d9b6f68cc26f8425eda5d3877b47343e68ca23d0d0846f4d312ecaa445" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:2ffef621c14ebb0188a8633348504a35c13680d6da93ab5cb86f4e54b7e922b5" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:ad369ed238b1959dfbade9018a740fb9392c5ac4f9b5173f420bd4f37ba1f7a0" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:d82075752f40c0ddf57e6e02673a17f6cb0f8eb3f587f63ca1eaab5594da5b17" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313t-macosx_14_0_x86_64.whl", hash = "sha256:1600068c262af1ca9580a527d43dc9d959b0b1d8e56f8a05d830eea39b7c8af6" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a26ae94658d3ba3781d5e103ac07a876b3e9b29db53f68ed7df432fd033358a8" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:13311c2db4c5f7609b462bc0f43d3c465424d25c626d95040f073e30f7570e35" },
+]
+
+[[package]]
+name = "packaging"
+version = "24.1"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/packaging-24.1-py3-none-any.whl", hash = "sha256:5b8f2217dbdbd2f7f384c41c628544e6d52f2d0f53c6d0c3ea61aa5d1d7ff124" },
+]
+
+[[package]]
+name = "pluggy"
+version = "1.6.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/f9/e2/3e91f31a7d2b083fe6ef3fa267035b518369d9511ffab804f839851d2779/pluggy-1.6.0.tar.gz", hash = "sha256:7dcc130b76258d33b90f61b658791dede3486c3e6bfb003ee5c9bfb396dd22f3", size = 69412, upload-time = "2025-05-15T12:30:07.975Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/54/20/4d324d65cc6d9205fabedc306948156824eb9f0ee1633355a8f7ec5c66bf/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538, upload-time = "2025-05-15T12:30:06.134Z" },
+]
+
+[[package]]
+name = "pydantic"
+version = "2.11.7"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "annotated-types" },
+    { name = "pydantic-core" },
+    { name = "typing-extensions" },
+    { name = "typing-inspection" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/00/dd/4325abf92c39ba8623b5af936ddb36ffcfe0beae70405d456ab1fb2f5b8c/pydantic-2.11.7.tar.gz", hash = "sha256:d989c3c6cb79469287b1569f7447a17848c998458d49ebe294e975b9baf0f0db", size = 788350, upload-time = "2025-06-14T08:33:17.137Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/6a/c0/ec2b1c8712ca690e5d61979dee872603e92b8a32f94cc1b72d53beab008a/pydantic-2.11.7-py3-none-any.whl", hash = "sha256:dde5df002701f6de26248661f6835bbe296a47bf73990135c7d07ce741b9623b", size = 444782, upload-time = "2025-06-14T08:33:14.905Z" },
+]
+
+[[package]]
+name = "pydantic-core"
+version = "2.33.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/ad/88/5f2260bdfae97aabf98f1778d43f69574390ad787afb646292a638c923d4/pydantic_core-2.33.2.tar.gz", hash = "sha256:7cb8bc3605c29176e1b105350d2e6474142d7c1bd1d9327c4a9bdb46bf827acc", size = 435195, upload-time = "2025-04-23T18:33:52.104Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/46/8c/99040727b41f56616573a28771b1bfa08a3d3fe74d3d513f01251f79f172/pydantic_core-2.33.2-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:1082dd3e2d7109ad8b7da48e1d4710c8d06c253cbc4a27c1cff4fbcaa97a9e3f", size = 2015688, upload-time = "2025-04-23T18:31:53.175Z" },
+    { url = "https://files.pythonhosted.org/packages/3a/cc/5999d1eb705a6cefc31f0b4a90e9f7fc400539b1a1030529700cc1b51838/pydantic_core-2.33.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:f517ca031dfc037a9c07e748cefd8d96235088b83b4f4ba8939105d20fa1dcd6", size = 1844808, upload-time = "2025-04-23T18:31:54.79Z" },
+    { url = "https://files.pythonhosted.org/packages/6f/5e/a0a7b8885c98889a18b6e376f344da1ef323d270b44edf8174d6bce4d622/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0a9f2c9dd19656823cb8250b0724ee9c60a82f3cdf68a080979d13092a3b0fef", size = 1885580, upload-time = "2025-04-23T18:31:57.393Z" },
+    { url = "https://files.pythonhosted.org/packages/3b/2a/953581f343c7d11a304581156618c3f592435523dd9d79865903272c256a/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:2b0a451c263b01acebe51895bfb0e1cc842a5c666efe06cdf13846c7418caa9a", size = 1973859, upload-time = "2025-04-23T18:31:59.065Z" },
+    { url = "https://files.pythonhosted.org/packages/e6/55/f1a813904771c03a3f97f676c62cca0c0a4138654107c1b61f19c644868b/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1ea40a64d23faa25e62a70ad163571c0b342b8bf66d5fa612ac0dec4f069d916", size = 2120810, upload-time = "2025-04-23T18:32:00.78Z" },
+    { url = "https://files.pythonhosted.org/packages/aa/c3/053389835a996e18853ba107a63caae0b9deb4a276c6b472931ea9ae6e48/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0fb2d542b4d66f9470e8065c5469ec676978d625a8b7a363f07d9a501a9cb36a", size = 2676498, upload-time = "2025-04-23T18:32:02.418Z" },
+    { url = "https://files.pythonhosted.org/packages/eb/3c/f4abd740877a35abade05e437245b192f9d0ffb48bbbbd708df33d3cda37/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9fdac5d6ffa1b5a83bca06ffe7583f5576555e6c8b3a91fbd25ea7780f825f7d", size = 2000611, upload-time = "2025-04-23T18:32:04.152Z" },
+    { url = "https://files.pythonhosted.org/packages/59/a7/63ef2fed1837d1121a894d0ce88439fe3e3b3e48c7543b2a4479eb99c2bd/pydantic_core-2.33.2-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:04a1a413977ab517154eebb2d326da71638271477d6ad87a769102f7c2488c56", size = 2107924, upload-time = "2025-04-23T18:32:06.129Z" },
+    { url = "https://files.pythonhosted.org/packages/04/8f/2551964ef045669801675f1cfc3b0d74147f4901c3ffa42be2ddb1f0efc4/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:c8e7af2f4e0194c22b5b37205bfb293d166a7344a5b0d0eaccebc376546d77d5", size = 2063196, upload-time = "2025-04-23T18:32:08.178Z" },
+    { url = "https://files.pythonhosted.org/packages/26/bd/d9602777e77fc6dbb0c7db9ad356e9a985825547dce5ad1d30ee04903918/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_armv7l.whl", hash = "sha256:5c92edd15cd58b3c2d34873597a1e20f13094f59cf88068adb18947df5455b4e", size = 2236389, upload-time = "2025-04-23T18:32:10.242Z" },
+    { url = "https://files.pythonhosted.org/packages/42/db/0e950daa7e2230423ab342ae918a794964b053bec24ba8af013fc7c94846/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:65132b7b4a1c0beded5e057324b7e16e10910c106d43675d9bd87d4f38dde162", size = 2239223, upload-time = "2025-04-23T18:32:12.382Z" },
+    { url = "https://files.pythonhosted.org/packages/58/4d/4f937099c545a8a17eb52cb67fe0447fd9a373b348ccfa9a87f141eeb00f/pydantic_core-2.33.2-cp313-cp313-win32.whl", hash = "sha256:52fb90784e0a242bb96ec53f42196a17278855b0f31ac7c3cc6f5c1ec4811849", size = 1900473, upload-time = "2025-04-23T18:32:14.034Z" },
+    { url = "https://files.pythonhosted.org/packages/a0/75/4a0a9bac998d78d889def5e4ef2b065acba8cae8c93696906c3a91f310ca/pydantic_core-2.33.2-cp313-cp313-win_amd64.whl", hash = "sha256:c083a3bdd5a93dfe480f1125926afcdbf2917ae714bdb80b36d34318b2bec5d9", size = 1955269, upload-time = "2025-04-23T18:32:15.783Z" },
+    { url = "https://files.pythonhosted.org/packages/f9/86/1beda0576969592f1497b4ce8e7bc8cbdf614c352426271b1b10d5f0aa64/pydantic_core-2.33.2-cp313-cp313-win_arm64.whl", hash = "sha256:e80b087132752f6b3d714f041ccf74403799d3b23a72722ea2e6ba2e892555b9", size = 1893921, upload-time = "2025-04-23T18:32:18.473Z" },
+    { url = "https://files.pythonhosted.org/packages/a4/7d/e09391c2eebeab681df2b74bfe6c43422fffede8dc74187b2b0bf6fd7571/pydantic_core-2.33.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:61c18fba8e5e9db3ab908620af374db0ac1baa69f0f32df4f61ae23f15e586ac", size = 1806162, upload-time = "2025-04-23T18:32:20.188Z" },
+    { url = "https://files.pythonhosted.org/packages/f1/3d/847b6b1fed9f8ed3bb95a9ad04fbd0b212e832d4f0f50ff4d9ee5a9f15cf/pydantic_core-2.33.2-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:95237e53bb015f67b63c91af7518a62a8660376a6a0db19b89acc77a4d6199f5", size = 1981560, upload-time = "2025-04-23T18:32:22.354Z" },
+    { url = "https://files.pythonhosted.org/packages/6f/9a/e73262f6c6656262b5fdd723ad90f518f579b7bc8622e43a942eec53c938/pydantic_core-2.33.2-cp313-cp313t-win_amd64.whl", hash = "sha256:c2fc0a768ef76c15ab9238afa6da7f69895bb5d1ee83aeea2e3509af4472d0b9", size = 1935777, upload-time = "2025-04-23T18:32:25.088Z" },
+]
+
+[[package]]
+name = "pygments"
+version = "2.19.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/b0/77/a5b8c569bf593b0140bde72ea885a803b82086995367bf2037de0159d924/pygments-2.19.2.tar.gz", hash = "sha256:636cb2477cec7f8952536970bc533bc43743542f70392ae026374600add5b887", size = 4968631, upload-time = "2025-06-21T13:39:12.283Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/c7/21/705964c7812476f378728bdf590ca4b771ec72385c533964653c68e86bdc/pygments-2.19.2-py3-none-any.whl", hash = "sha256:86540386c03d588bb81d44bc3928634ff26449851e99741617ecb9037ee5ec0b", size = 1225217, upload-time = "2025-06-21T13:39:07.939Z" },
+]
+
+[[package]]
+name = "pytest"
+version = "8.4.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "colorama", marker = "sys_platform == 'win32'" },
+    { name = "iniconfig" },
+    { name = "packaging" },
+    { name = "pluggy" },
+    { name = "pygments" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/08/ba/45911d754e8eba3d5a841a5ce61a65a685ff1798421ac054f85aa8747dfb/pytest-8.4.1.tar.gz", hash = "sha256:7c67fd69174877359ed9371ec3af8a3d2b04741818c51e5e99cc1742251fa93c", size = 1517714, upload-time = "2025-06-18T05:48:06.109Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/29/16/c8a903f4c4dffe7a12843191437d7cd8e32751d5de349d45d3fe69544e87/pytest-8.4.1-py3-none-any.whl", hash = "sha256:539c70ba6fcead8e78eebbf1115e8b589e7565830d7d006a8723f19ac8a0afb7", size = 365474, upload-time = "2025-06-18T05:48:03.955Z" },
+]
+
+[[package]]
+name = "pyyaml"
+version = "6.0.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/54/ed/79a089b6be93607fa5cdaedf301d7dfb23af5f25c398d5ead2525b063e17/pyyaml-6.0.2.tar.gz", hash = "sha256:d584d9ec91ad65861cc08d42e834324ef890a082e591037abe114850ff7bbc3e", size = 130631, upload-time = "2024-08-06T20:33:50.674Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/ef/e3/3af305b830494fa85d95f6d95ef7fa73f2ee1cc8ef5b495c7c3269fb835f/PyYAML-6.0.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:efdca5630322a10774e8e98e1af481aad470dd62c3170801852d752aa7a783ba", size = 181309, upload-time = "2024-08-06T20:32:43.4Z" },
+    { url = "https://files.pythonhosted.org/packages/45/9f/3b1c20a0b7a3200524eb0076cc027a970d320bd3a6592873c85c92a08731/PyYAML-6.0.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:50187695423ffe49e2deacb8cd10510bc361faac997de9efef88badc3bb9e2d1", size = 171679, upload-time = "2024-08-06T20:32:44.801Z" },
+    { url = "https://files.pythonhosted.org/packages/7c/9a/337322f27005c33bcb656c655fa78325b730324c78620e8328ae28b64d0c/PyYAML-6.0.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0ffe8360bab4910ef1b9e87fb812d8bc0a308b0d0eef8c8f44e0254ab3b07133", size = 733428, upload-time = "2024-08-06T20:32:46.432Z" },
+    { url = "https://files.pythonhosted.org/packages/a3/69/864fbe19e6c18ea3cc196cbe5d392175b4cf3d5d0ac1403ec3f2d237ebb5/PyYAML-6.0.2-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:17e311b6c678207928d649faa7cb0d7b4c26a0ba73d41e99c4fff6b6c3276484", size = 763361, upload-time = "2024-08-06T20:32:51.188Z" },
+    { url = "https://files.pythonhosted.org/packages/04/24/b7721e4845c2f162d26f50521b825fb061bc0a5afcf9a386840f23ea19fa/PyYAML-6.0.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:70b189594dbe54f75ab3a1acec5f1e3faa7e8cf2f1e08d9b561cb41b845f69d5", size = 759523, upload-time = "2024-08-06T20:32:53.019Z" },
+    { url = "https://files.pythonhosted.org/packages/2b/b2/e3234f59ba06559c6ff63c4e10baea10e5e7df868092bf9ab40e5b9c56b6/PyYAML-6.0.2-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:41e4e3953a79407c794916fa277a82531dd93aad34e29c2a514c2c0c5fe971cc", size = 726660, upload-time = "2024-08-06T20:32:54.708Z" },
+    { url = "https://files.pythonhosted.org/packages/fe/0f/25911a9f080464c59fab9027482f822b86bf0608957a5fcc6eaac85aa515/PyYAML-6.0.2-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:68ccc6023a3400877818152ad9a1033e3db8625d899c72eacb5a668902e4d652", size = 751597, upload-time = "2024-08-06T20:32:56.985Z" },
+    { url = "https://files.pythonhosted.org/packages/14/0d/e2c3b43bbce3cf6bd97c840b46088a3031085179e596d4929729d8d68270/PyYAML-6.0.2-cp313-cp313-win32.whl", hash = "sha256:bc2fa7c6b47d6bc618dd7fb02ef6fdedb1090ec036abab80d4681424b84c1183", size = 140527, upload-time = "2024-08-06T20:33:03.001Z" },
+    { url = "https://files.pythonhosted.org/packages/fa/de/02b54f42487e3d3c6efb3f89428677074ca7bf43aae402517bc7cca949f3/PyYAML-6.0.2-cp313-cp313-win_amd64.whl", hash = "sha256:8388ee1976c416731879ac16da0aff3f63b286ffdd57cdeb95f3f2e085687563", size = 156446, upload-time = "2024-08-06T20:33:04.33Z" },
+]
+
+[[package]]
+name = "ruff"
+version = "0.12.7"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/a1/81/0bd3594fa0f690466e41bd033bdcdf86cba8288345ac77ad4afbe5ec743a/ruff-0.12.7.tar.gz", hash = "sha256:1fc3193f238bc2d7968772c82831a4ff69252f673be371fb49663f0068b7ec71", size = 5197814, upload-time = "2025-07-29T22:32:35.877Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/e1/d2/6cb35e9c85e7a91e8d22ab32ae07ac39cc34a71f1009a6f9e4a2a019e602/ruff-0.12.7-py3-none-linux_armv6l.whl", hash = "sha256:76e4f31529899b8c434c3c1dede98c4483b89590e15fb49f2d46183801565303", size = 11852189, upload-time = "2025-07-29T22:31:41.281Z" },
+    { url = "https://files.pythonhosted.org/packages/63/5b/a4136b9921aa84638f1a6be7fb086f8cad0fde538ba76bda3682f2599a2f/ruff-0.12.7-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:789b7a03e72507c54fb3ba6209e4bb36517b90f1a3569ea17084e3fd295500fb", size = 12519389, upload-time = "2025-07-29T22:31:54.265Z" },
+    { url = "https://files.pythonhosted.org/packages/a8/c9/3e24a8472484269b6b1821794141f879c54645a111ded4b6f58f9ab0705f/ruff-0.12.7-py3-none-macosx_11_0_arm64.whl", hash = "sha256:2e1c2a3b8626339bb6369116e7030a4cf194ea48f49b64bb505732a7fce4f4e3", size = 11743384, upload-time = "2025-07-29T22:31:59.575Z" },
+    { url = "https://files.pythonhosted.org/packages/26/7c/458dd25deeb3452c43eaee853c0b17a1e84169f8021a26d500ead77964fd/ruff-0.12.7-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:32dec41817623d388e645612ec70d5757a6d9c035f3744a52c7b195a57e03860", size = 11943759, upload-time = "2025-07-29T22:32:01.95Z" },
+    { url = "https://files.pythonhosted.org/packages/7f/8b/658798472ef260ca050e400ab96ef7e85c366c39cf3dfbef4d0a46a528b6/ruff-0.12.7-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:47ef751f722053a5df5fa48d412dbb54d41ab9b17875c6840a58ec63ff0c247c", size = 11654028, upload-time = "2025-07-29T22:32:04.367Z" },
+    { url = "https://files.pythonhosted.org/packages/a8/86/9c2336f13b2a3326d06d39178fd3448dcc7025f82514d1b15816fe42bfe8/ruff-0.12.7-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:a828a5fc25a3efd3e1ff7b241fd392686c9386f20e5ac90aa9234a5faa12c423", size = 13225209, upload-time = "2025-07-29T22:32:06.952Z" },
+    { url = "https://files.pythonhosted.org/packages/76/69/df73f65f53d6c463b19b6b312fd2391dc36425d926ec237a7ed028a90fc1/ruff-0.12.7-py3-none-manylinux_2_17_ppc64.manylinux2014_ppc64.whl", hash = "sha256:5726f59b171111fa6a69d82aef48f00b56598b03a22f0f4170664ff4d8298efb", size = 14182353, upload-time = "2025-07-29T22:32:10.053Z" },
+    { url = "https://files.pythonhosted.org/packages/58/1e/de6cda406d99fea84b66811c189b5ea139814b98125b052424b55d28a41c/ruff-0.12.7-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:74e6f5c04c4dd4aba223f4fe6e7104f79e0eebf7d307e4f9b18c18362124bccd", size = 13631555, upload-time = "2025-07-29T22:32:12.644Z" },
+    { url = "https://files.pythonhosted.org/packages/6f/ae/625d46d5164a6cc9261945a5e89df24457dc8262539ace3ac36c40f0b51e/ruff-0.12.7-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:5d0bfe4e77fba61bf2ccadf8cf005d6133e3ce08793bbe870dd1c734f2699a3e", size = 12667556, upload-time = "2025-07-29T22:32:15.312Z" },
+    { url = "https://files.pythonhosted.org/packages/55/bf/9cb1ea5e3066779e42ade8d0cd3d3b0582a5720a814ae1586f85014656b6/ruff-0.12.7-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:06bfb01e1623bf7f59ea749a841da56f8f653d641bfd046edee32ede7ff6c606", size = 12939784, upload-time = "2025-07-29T22:32:17.69Z" },
+    { url = "https://files.pythonhosted.org/packages/55/7f/7ead2663be5627c04be83754c4f3096603bf5e99ed856c7cd29618c691bd/ruff-0.12.7-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:e41df94a957d50083fd09b916d6e89e497246698c3f3d5c681c8b3e7b9bb4ac8", size = 11771356, upload-time = "2025-07-29T22:32:20.134Z" },
+    { url = "https://files.pythonhosted.org/packages/17/40/a95352ea16edf78cd3a938085dccc55df692a4d8ba1b3af7accbe2c806b0/ruff-0.12.7-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:4000623300563c709458d0ce170c3d0d788c23a058912f28bbadc6f905d67afa", size = 11612124, upload-time = "2025-07-29T22:32:22.645Z" },
+    { url = "https://files.pythonhosted.org/packages/4d/74/633b04871c669e23b8917877e812376827c06df866e1677f15abfadc95cb/ruff-0.12.7-py3-none-musllinux_1_2_i686.whl", hash = "sha256:69ffe0e5f9b2cf2b8e289a3f8945b402a1b19eff24ec389f45f23c42a3dd6fb5", size = 12479945, upload-time = "2025-07-29T22:32:24.765Z" },
+    { url = "https://files.pythonhosted.org/packages/be/34/c3ef2d7799c9778b835a76189c6f53c179d3bdebc8c65288c29032e03613/ruff-0.12.7-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:a07a5c8ffa2611a52732bdc67bf88e243abd84fe2d7f6daef3826b59abbfeda4", size = 12998677, upload-time = "2025-07-29T22:32:27.022Z" },
+    { url = "https://files.pythonhosted.org/packages/77/ab/aca2e756ad7b09b3d662a41773f3edcbd262872a4fc81f920dc1ffa44541/ruff-0.12.7-py3-none-win32.whl", hash = "sha256:c928f1b2ec59fb77dfdf70e0419408898b63998789cc98197e15f560b9e77f77", size = 11756687, upload-time = "2025-07-29T22:32:29.381Z" },
+    { url = "https://files.pythonhosted.org/packages/b4/71/26d45a5042bc71db22ddd8252ca9d01e9ca454f230e2996bb04f16d72799/ruff-0.12.7-py3-none-win_amd64.whl", hash = "sha256:9c18f3d707ee9edf89da76131956aba1270c6348bfee8f6c647de841eac7194f", size = 12912365, upload-time = "2025-07-29T22:32:31.517Z" },
+    { url = "https://files.pythonhosted.org/packages/4c/9b/0b8aa09817b63e78d94b4977f18b1fcaead3165a5ee49251c5d5c245bb2d/ruff-0.12.7-py3-none-win_arm64.whl", hash = "sha256:dfce05101dbd11833a0776716d5d1578641b7fddb537fe7fa956ab85d1769b69", size = 11982083, upload-time = "2025-07-29T22:32:33.881Z" },
+]
+
+[[package]]
+name = "safetensors"
+version = "0.6.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/6c/d2/94fe37355a1d4ff86b0f43b9a018515d5d29bf7ad6d01318a80f5db2fd6a/safetensors-0.6.1.tar.gz", hash = "sha256:a766ba6e19b198eff09be05f24cd89eda1670ed404ae828e2aa3fc09816ba8d8", size = 197968, upload-time = "2025-08-06T09:39:38.376Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/6b/c0/40263a2103511917f9a92b4e114ecaff68586df07f12d1d877312f1261f3/safetensors-0.6.1-cp38-abi3-macosx_10_12_x86_64.whl", hash = "sha256:81ed1b69d6f8acd7e759a71197ce3a69da4b7e9faa9dbb005eb06a83b1a4e52d", size = 455232, upload-time = "2025-08-06T09:39:32.037Z" },
+    { url = "https://files.pythonhosted.org/packages/86/bf/432cb4bb1c336d338dd9b29f78622b1441ee06e5868bf1de2ca2bec74c08/safetensors-0.6.1-cp38-abi3-macosx_11_0_arm64.whl", hash = "sha256:01b51af8cb7a3870203f2735e3c7c24d1a65fb2846e75613c8cf9d284271eccc", size = 432150, upload-time = "2025-08-06T09:39:31.008Z" },
+    { url = "https://files.pythonhosted.org/packages/05/d7/820c99032a53d57279ae199df7d114a8c9e2bbce4fa69bc0de53743495f0/safetensors-0.6.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:64a733886d79e726899b9d9643813e48a2eec49f3ef0fdb8cd4b8152046101c3", size = 471634, upload-time = "2025-08-06T09:39:22.17Z" },
+    { url = "https://files.pythonhosted.org/packages/ea/8b/bcd960087eded7690f118ceeda294912f92a3b508a1d9a504f9c2e02041b/safetensors-0.6.1-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:f233dc3b12fb641b36724844754b6bb41349615a0e258087560968d6da92add5", size = 487855, upload-time = "2025-08-06T09:39:24.142Z" },
+    { url = "https://files.pythonhosted.org/packages/41/64/b44eac4ad87c4e1c0cf5ba5e204c032b1b1eac8ce2b8f65f87791e647bd6/safetensors-0.6.1-cp38-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:6f16289e2af54affd591dd78ed12b5465e4dc5823f818beaeddd49a010cf3ba7", size = 607240, upload-time = "2025-08-06T09:39:25.463Z" },
+    { url = "https://files.pythonhosted.org/packages/52/75/0347fa0c080af8bd3341af26a30b85939f6362d4f5240add1a0c9d793354/safetensors-0.6.1-cp38-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:1b62eab84e2c69918b598272504c5d2ebfe64da6c16fdf8682054eec9572534d", size = 519864, upload-time = "2025-08-06T09:39:26.872Z" },
+    { url = "https://files.pythonhosted.org/packages/ea/f3/83843d1fe9164f44a267373c55cba706530b209b58415f807b40edddcd3e/safetensors-0.6.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d498363746555dccffc02a47dfe1dee70f7784f3f37f1d66b408366c5d3a989e", size = 485926, upload-time = "2025-08-06T09:39:29.109Z" },
+    { url = "https://files.pythonhosted.org/packages/b8/26/f6b0cb5210bab0e343214fdba7c2df80a69b019e62e760ddc61b18bec383/safetensors-0.6.1-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:eed2079dca3ca948d7b0d7120396e776bbc6680637cf199d393e157fde25c937", size = 518999, upload-time = "2025-08-06T09:39:28.054Z" },
+    { url = "https://files.pythonhosted.org/packages/90/b7/8910b165c97d3bd6d445c6ca8b704ec23d0fa33849ce9a51dc783827a302/safetensors-0.6.1-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:294040ff20ebe079a2b4976cfa9a5be0202f56ca4f7f190b4e52009e8c026ceb", size = 650669, upload-time = "2025-08-06T09:39:32.997Z" },
+    { url = "https://files.pythonhosted.org/packages/00/bc/2eeb025381d0834ae038aae2d383dfa830c2e0068e2e4e512ea99b135a4b/safetensors-0.6.1-cp38-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:75693208b492a026b926edeebbae888cc644433bee4993573ead2dc44810b519", size = 750019, upload-time = "2025-08-06T09:39:34.397Z" },
+    { url = "https://files.pythonhosted.org/packages/f9/38/5dda9a8e056eb1f17ed3a7846698fd94623a1648013cdf522538845755da/safetensors-0.6.1-cp38-abi3-musllinux_1_2_i686.whl", hash = "sha256:a8687b71ac67a0b3f8ce87df9e8024edf087e94c34ef46eaaad694dce8d2f83f", size = 689888, upload-time = "2025-08-06T09:39:35.584Z" },
+    { url = "https://files.pythonhosted.org/packages/dd/60/15ee3961996d951002378d041bd82863a5c70738a71375b42d6dd5d2a6d3/safetensors-0.6.1-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:5dd969a01c738104f707fa0e306b757f5beb3ebdcd682fe0724170a0bf1c21fb", size = 655539, upload-time = "2025-08-06T09:39:37.093Z" },
+    { url = "https://files.pythonhosted.org/packages/91/d6/01172a9c77c566800286d379bfc341d75370eae2118dfd339edfd0394c4a/safetensors-0.6.1-cp38-abi3-win32.whl", hash = "sha256:7c3d8d34d01673d1a917445c9437ee73a9d48bc6af10352b84bbd46c5da93ca5", size = 308594, upload-time = "2025-08-06T09:39:40.916Z" },
+    { url = "https://files.pythonhosted.org/packages/6c/5d/195dc1917d7ae93dd990d9b2f8b9c88e451bcc78e0b63ee107beebc1e4be/safetensors-0.6.1-cp38-abi3-win_amd64.whl", hash = "sha256:4720957052d57c5ac48912c3f6e07e9a334d9632758c9b0c054afba477fcbe2d", size = 320282, upload-time = "2025-08-06T09:39:39.54Z" },
+]
+
+[[package]]
+name = "setuptools"
+version = "70.2.0"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/setuptools-70.2.0-py3-none-any.whl", hash = "sha256:b8b8060bb426838fbe942479c90296ce976249451118ef566a5a0b7d8b78fb05" },
+]
+
+[[package]]
+name = "sympy"
+version = "1.13.3"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+dependencies = [
+    { name = "mpmath" },
+]
+wheels = [
+    { url = "https://download.pytorch.org/whl/sympy-1.13.3-py3-none-any.whl" },
+]
+
+[[package]]
+name = "torch"
+version = "2.8.0"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+resolution-markers = [
+    "sys_platform == 'darwin'",
+]
+dependencies = [
+    { name = "filelock", marker = "sys_platform == 'darwin'" },
+    { name = "fsspec", marker = "sys_platform == 'darwin'" },
+    { name = "jinja2", marker = "sys_platform == 'darwin'" },
+    { name = "networkx", marker = "sys_platform == 'darwin'" },
+    { name = "setuptools", marker = "sys_platform == 'darwin'" },
+    { name = "sympy", marker = "sys_platform == 'darwin'" },
+    { name = "typing-extensions", marker = "sys_platform == 'darwin'" },
+]
+wheels = [
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:fbe2e149c5174ef90d29a5f84a554dfaf28e003cb4f61fa2c8c024c17ec7ca58" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp313-none-macosx_11_0_arm64.whl", hash = "sha256:057efd30a6778d2ee5e2374cd63a63f63311aa6f33321e627c655df60abdd390" },
+]
+
+[[package]]
+name = "torch"
+version = "2.8.0+cpu"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+resolution-markers = [
+    "sys_platform != 'darwin'",
+]
+dependencies = [
+    { name = "filelock", marker = "sys_platform != 'darwin'" },
+    { name = "fsspec", marker = "sys_platform != 'darwin'" },
+    { name = "jinja2", marker = "sys_platform != 'darwin'" },
+    { name = "networkx", marker = "sys_platform != 'darwin'" },
+    { name = "setuptools", marker = "sys_platform != 'darwin'" },
+    { name = "sympy", marker = "sys_platform != 'darwin'" },
+    { name = "typing-extensions", marker = "sys_platform != 'darwin'" },
+]
+wheels = [
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-linux_s390x.whl", hash = "sha256:8b5882276633cf91fe3d2d7246c743b94d44a7e660b27f1308007fdb1bb89f7d" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:a5064b5e23772c8d164068cc7c12e01a75faf7b948ecd95a0d4007d7487e5f25" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:8f81dedb4c6076ec325acc3b47525f9c550e5284a18eae1d9061c543f7b6e7de" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-win_amd64.whl", hash = "sha256:e1ee1b2346ade3ea90306dfbec7e8ff17bc220d344109d189ae09078333b0856" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-win_arm64.whl", hash = "sha256:64c187345509f2b1bb334feed4666e2c781ca381874bde589182f81247e61f88" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:af81283ac671f434b1b25c95ba295f270e72db1fad48831eb5e4748ff9840041" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:a9dbb6f64f63258bc811e2c0c99640a81e5af93c531ad96e95c5ec777ea46dab" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-win_amd64.whl", hash = "sha256:6d93a7165419bc4b2b907e859ccab0dea5deeab261448ae9a5ec5431f14c0e64" },
+]
+
+[[package]]
+name = "tqdm"
+version = "4.66.5"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+dependencies = [
+    { name = "colorama", marker = "sys_platform == 'win32'" },
+]
+wheels = [
+    { url = "https://download.pytorch.org/whl/tqdm-4.66.5-py3-none-any.whl", hash = "sha256:90279a3770753eafc9194a0364852159802111925aa30eb3f9d85b0e805ac7cd" },
+]
+
+[[package]]
+name = "typing-extensions"
+version = "4.12.2"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/typing_extensions-4.12.2-py3-none-any.whl", hash = "sha256:04e5ca0351e0f3f85c6853954072df659d0d13fac324d0072316b67d7794700d" },
+]
+
+[[package]]
+name = "typing-inspection"
+version = "0.4.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/f8/b1/0c11f5058406b3af7609f121aaa6b609744687f1d158b3c3a5bf4cc94238/typing_inspection-0.4.1.tar.gz", hash = "sha256:6ae134cc0203c33377d43188d4064e9b357dba58cff3185f22924610e70a9d28", size = 75726, upload-time = "2025-05-21T18:55:23.885Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/17/69/cd203477f944c353c31bade965f880aa1061fd6bf05ded0726ca845b6ff7/typing_inspection-0.4.1-py3-none-any.whl", hash = "sha256:389055682238f53b04f7badcb49b989835495a96700ced5dab2d8feae4b26f51", size = 14552, upload-time = "2025-05-21T18:55:22.152Z" },
+]
+
+[[package]]
+name = "uv"
+version = "0.8.5"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/83/94/e18a40fe6f6d724c1fbf2c9328806359e341710b2fd42dc928a1a8fc636b/uv-0.8.5.tar.gz", hash = "sha256:078cf2935062d5b61816505f9d6f30b0221943a1433b4a1de8f31a1dfe55736b", size = 3451272, upload-time = "2025-08-05T20:50:21.159Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/d9/b9/78cde56283b6b9a8a84b0bf9334442ed75a843310229aaf7f1a71fe67818/uv-0.8.5-py3-none-linux_armv6l.whl", hash = "sha256:e236372a260e312aef5485a0e5819a0ec16c9197af06d162ad5a3e8bd62f9bba", size = 18146198, upload-time = "2025-08-05T20:49:18.859Z" },
+    { url = "https://files.pythonhosted.org/packages/ed/83/5deda1a19362ce426da7f9cc4764a0dd57e665ecbaddd9900d4200bc10ab/uv-0.8.5-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:53a40628329e543a5c5414553f5898131d5c1c6f963708cb0afc2ecf3e8d8167", size = 18242690, upload-time = "2025-08-05T20:49:23.409Z" },
+    { url = "https://files.pythonhosted.org/packages/06/6e/80b08ee544728317d9c8003d4c10234007e12f384da1c3dfe579489833c9/uv-0.8.5-py3-none-macosx_11_0_arm64.whl", hash = "sha256:43a689027696bc9c62e6da3f06900c52eafc4debbf4fba9ecb906196730b34c8", size = 16913881, upload-time = "2025-08-05T20:49:26.631Z" },
+    { url = "https://files.pythonhosted.org/packages/34/f6/47a44dabfc25b598ea6f2ab9aa32ebf1cbd87ed8af18ccde6c5d36f35476/uv-0.8.5-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.musllinux_1_1_aarch64.whl", hash = "sha256:a34d783f5cef00f1918357c0cd9226666e22640794e9e3862820abf4ee791141", size = 17527439, upload-time = "2025-08-05T20:49:30.464Z" },
+    { url = "https://files.pythonhosted.org/packages/ef/7d/ee7c2514e064412133ee9f01c4c42de20da24617b8c25d81cf7021b774d8/uv-0.8.5-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:2140383bc25228281090cc34c00500d8e5822877c955f691d69bbf967e8efa73", size = 17833275, upload-time = "2025-08-05T20:49:33.783Z" },
+    { url = "https://files.pythonhosted.org/packages/f9/e7/5233cf5cbcca8ea65aa1f1e48bf210dc9773fb86b8104ffbc523be7f6a3f/uv-0.8.5-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6b449779ff463b059504dc30316a634f810149e02482ce36ea35daea8f6ce7af", size = 18568916, upload-time = "2025-08-05T20:49:37.031Z" },
+    { url = "https://files.pythonhosted.org/packages/d8/54/6cabb2a0347c51c8366ca3bffeeebd7f829a15f6b29ad20f51fd5ca9c4bd/uv-0.8.5-py3-none-manylinux_2_17_ppc64.manylinux2014_ppc64.whl", hash = "sha256:a7f8739d05cc513eee2f1f8a7e6c482a9c1e8860d77cd078d1ea7c3fe36d7a65", size = 19993334, upload-time = "2025-08-05T20:49:40.361Z" },
+    { url = "https://files.pythonhosted.org/packages/3c/7a/b84d994d52f20bc56229840c31e77aff4653e5902ea7b7c2616e9381b5b8/uv-0.8.5-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:62ebbd22f780ba2585690332765caf9e29c9758e48a678148e8b1ea90580cdb9", size = 19643358, upload-time = "2025-08-05T20:49:43.955Z" },
+    { url = "https://files.pythonhosted.org/packages/c8/f1/7552f2bea528456d34bc245f2959ce910631e01571c4b7ea421ead9a9fc6/uv-0.8.5-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:4f8dd0555f05d66ff46fdab551137cc2b1ea9c5363358913e2af175e367f4398", size = 18947757, upload-time = "2025-08-05T20:49:47.381Z" },
+    { url = "https://files.pythonhosted.org/packages/57/9b/46aadd186a1e16a23cd0701dda0e640197db49a3add074a47231fed45a4f/uv-0.8.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:38c04408ad5eae7a178a1e3b0e09afeb436d0c97075530a3c82de453b78d0448", size = 18906135, upload-time = "2025-08-05T20:49:50.985Z" },
+    { url = "https://files.pythonhosted.org/packages/c0/31/6661adedaba9ebac8bb449ec9901f8cbf124fa25e0db3a9e6cf3053cee88/uv-0.8.5-py3-none-manylinux_2_28_aarch64.whl", hash = "sha256:73e772caf7310af4b21eaf8c25531b934391f1e84f3afa8e67822d7c432f6dad", size = 17787943, upload-time = "2025-08-05T20:49:54.59Z" },
+    { url = "https://files.pythonhosted.org/packages/11/f2/73fb5c3156fdae830b83edec2f430db84cb4bc4b78f61d21694bd59004cb/uv-0.8.5-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:3ddd7d8c01073f23ba2a4929ab246adb30d4f8a55c5e007ad7c8341f7bf06978", size = 18675864, upload-time = "2025-08-05T20:49:57.87Z" },
+    { url = "https://files.pythonhosted.org/packages/b5/29/774c6f174c53d68ae9a51c2fabf1b09003b93a53c24591a108be0dc338d7/uv-0.8.5-py3-none-musllinux_1_1_armv7l.whl", hash = "sha256:7d601f021cbc179320ea3a75cd1d91bd49af03d2a630c4d04ebd38ff6b87d419", size = 17808770, upload-time = "2025-08-05T20:50:01.566Z" },
+    { url = "https://files.pythonhosted.org/packages/a9/b0/5d164ce84691f5018c5832e9e3371c0196631b1f1025474a179de1d6a70a/uv-0.8.5-py3-none-musllinux_1_1_i686.whl", hash = "sha256:6ee97b7299990026619c20e30e253972c6c0fb6fba4f5658144e62aa1c07785a", size = 18076516, upload-time = "2025-08-05T20:50:04.94Z" },
+    { url = "https://files.pythonhosted.org/packages/d1/73/4d8baefb4f4b07df6a4db7bbd604cb361d4f5215b94d3f66553ea26edfd4/uv-0.8.5-py3-none-musllinux_1_1_x86_64.whl", hash = "sha256:09804055d6346febf0767767c04bdd2fab7d911535639f9c18de2ea744b2954c", size = 19031195, upload-time = "2025-08-05T20:50:08.211Z" },
+    { url = "https://files.pythonhosted.org/packages/44/2a/3d074391df2c16c79fc6bf333e4bde75662e64dac465050a03391c75b289/uv-0.8.5-py3-none-win32.whl", hash = "sha256:6362a2e1fa535af0e4c0a01f83e666a4d5f9024d808f9e64e3b6ef07c97aff54", size = 18026273, upload-time = "2025-08-05T20:50:11.868Z" },
+    { url = "https://files.pythonhosted.org/packages/3c/2f/e850d3e745ccd1125b7a48898421824700fd3e996d27d835139160650124/uv-0.8.5-py3-none-win_amd64.whl", hash = "sha256:dd89836735860461c3a5563731e77c011d1831f14ada540f94bf1a7011dbea14", size = 19822158, upload-time = "2025-08-05T20:50:15.428Z" },
+    { url = "https://files.pythonhosted.org/packages/6f/df/e5565b3faf2c6147a877ab7e96ef31e2333f08c5138a98ce77003b1bf65e/uv-0.8.5-py3-none-win_arm64.whl", hash = "sha256:37c1a22915392014d8b4ade9e69e157c8e5ccdf32f37070a84f749a708268335", size = 18430102, upload-time = "2025-08-05T20:50:18.785Z" },
+]