Initial commit

2025-08-07 18:29:12 +01:00 · 2025-08-07 18:29:12 +01:00 · ef7df1a8c3
commit ef7df1a8c3
28 changed files with 6829 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,51 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
--- a/73
+++ b/73
@ -0,0 +1,73 @@
+Apache License
+Version 2.0, January 2004
+http://www.apache.org/licenses/
+
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+1. Definitions.
+
+"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
+
+"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
+
+"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
+
+"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
+
+"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
+
+"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
+
+"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
+
+"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
+
+"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
+
+"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
+
+2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
+
+3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
+
+4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
+
+     (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and
+
+     (b) You must cause any modified files to carry prominent notices stating that You changed the files; and
+
+     (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
+
+     (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
+
+     You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
+
+5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
+
+6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
+
+7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
+
+8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
+
+9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
+
+END OF TERMS AND CONDITIONS
+
+APPENDIX: How to apply the Apache License to your work.
+
+To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!)  The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives.
+
+Copyright 2025 tom
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
--- a/README.md
+++ b/README.md
@ -0,0 +1,59 @@
+# 🤖 LLM GGUF Tools
+
+A collection of Python tools for converting and quantising language models to
+[GGUF format](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md), featuring advanced
+quantisation methods and direct SafeTensors conversion capabilities.
+
+> 💡 **Looking for quantised models?** Check out [tcpipuk's HuggingFace profile](https://huggingface.co/tcpipuk)
+> for models quantised using these tools!
+
+## Available Tools
+
+| Tool | Purpose | Documentation |
+|------|---------|---------------|
+| [quantise_gguf.py](./quantise_gguf.py) | ⚡ GGUF quantisation using a variant of [Bartowski's method](https://huggingface.co/bartowski) | [📖 Docs](docs/quantise_gguf.md) |
+| [safetensors2gguf.py](./safetensors2gguf.py) | 🔄 Direct SafeTensors to GGUF conversion | [📖 Docs](docs/safetensors2gguf.md) |
+
+## Installation
+
+1. You need [`uv`](https://docs.astral.sh/uv/) for the dependencies:
+
+   ```bash
+   # Install uv (see https://docs.astral.sh/uv/#installation for more options)
+   curl -LsSf https://astral.sh/uv/install.sh | sh
+
+   # Or update your existing instance
+   uv self update
+   ```
+
+2. Then to set up the environment for these scripts:
+
+   ```bash
+   # Clone the repository
+   git clone https://git.tomfos.tr/tom/llm-gguf-tools.git
+   cd llm-gguf-tools
+
+   # Set up virtual environment and install dependencies
+   uv sync
+   ```
+
+## Requirements
+
+- **For quantisation**: [llama.cpp](https://github.com/ggerganov/llama.cpp) binaries
+  (`llama-quantize`, `llama-cli`, `llama-imatrix`)
+- **For BFloat16 models**: PyTorch (optional, auto-detected)
+- **For uploads**: HuggingFace API token (set `HF_TOKEN` environment variable)
+
+## Development
+
+For development setup and contribution guidelines, see [📖 Development Guide](docs/development.md).
+
+## Notes
+
+The `resources/imatrix_data.txt` file contains importance matrix calibration data from
+[Bartowski's Gist](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8),
+based on calibration data provided by Dampf, building upon Kalomaze's foundational work.
+
+## License
+
+Apache 2.0 License - see [LICENSE](./LICENSE) file for details.
--- a/docs/development.md
+++ b/docs/development.md
@ -0,0 +1,86 @@
+# Development Guide
+
+This guide covers development setup, code quality standards, and project structure for contributors.
+
+## Code Quality
+
+```bash
+# Run linting
+uv run ruff check
+
+# Format code
+uv run ruff format
+
+# Run with debug logging
+DEBUG=true uv run <script>
+```
+
+## Project Structure
+
+```plain
+llm-gguf-tools/
+├── quantise.py                    # Bartowski quantisation tool
+├── direct_safetensors_to_gguf.py  # Direct conversion tool
+├── helpers/                       # Shared utilities
+│   ├── __init__.py
+│   └── logger.py                  # Colour-coded logging
+├── resources/                     # Resource files
+│   └── imatrix_data.txt          # Calibration data for imatrix
+├── docs/                          # Detailed documentation
+│   ├── quantise.md
+│   ├── direct_safetensors_to_gguf.md
+│   └── development.md
+└── pyproject.toml                # Project configuration
+```
+
+## Contributing Guidelines
+
+Contributions are welcome! Please ensure:
+
+1. Code follows the existing style (run `uv run ruff format`)
+2. All functions have Google-style docstrings
+3. Type hints are used throughout
+4. Tests pass (if applicable)
+
+## Development Workflow
+
+### Setting Up Development Environment
+
+```bash
+# Clone the repository
+git clone https://git.tomfos.tr/tom/llm-gguf-tools.git
+cd llm-gguf-tools
+
+# Install all dependencies including dev
+uv sync --all-groups
+```
+
+### Code Style
+
+- Follow PEP 8 with ruff enforcement
+- Use UK English spelling in comments and documentation
+- Maximum line length: 100 characters
+- Use type hints for all function parameters and returns
+
+### Testing
+
+While formal tests are not yet implemented, ensure:
+
+- Scripts run without errors on sample models
+- Logger output is correctly formatted
+- File I/O operations handle errors gracefully
+
+### Debugging
+
+Enable debug logging for verbose output:
+
+```bash
+DEBUG=true uv run quantise.py <model_url>
+```
+
+This will show additional information about:
+
+- Model download progress
+- Conversion steps
+- File operations
+- Error details
--- a/docs/quantise_gguf.md
+++ b/docs/quantise_gguf.md
@ -0,0 +1,102 @@
+# quantise.py - Advanced GGUF Quantisation
+
+Advanced GGUF quantisation tool implementing Bartowski's sophisticated quantisation pipeline.
+
+## Overview
+
+This tool automates the complete quantisation workflow for converting models to GGUF format with
+multiple precision variants, importance matrix generation, and automatic upload to HuggingFace.
+
+## Quantisation Variants
+
+The tool produces four quantisation variants based on Bartowski's method:
+
+- **Q4_K_M**: Standard baseline quantisation
+- **Q4_K_L**: Q6_K embeddings + Q6_K attention layers for better quality
+- **Q4_K_XL**: Q8_0 embeddings + Q6_K attention layers for enhanced precision
+- **Q4_K_XXL**: Q8_0 embeddings + Q8_0 attention for maximum precision
+
+## Features
+
+- **Automatic model download**: Downloads models from HuggingFace automatically
+- **Importance matrix generation**: Creates imatrix for improved quantisation quality
+- **Parallel processing**: Uploads multiple variants simultaneously
+- **Progress tracking**: Real-time status updates during conversion
+- **README generation**: Automatically creates model cards with quantisation details
+- **HuggingFace integration**: Direct upload to HuggingFace with proper metadata
+
+## Usage
+
+### Basic Usage
+
+```bash
+# Quantise a model from HuggingFace
+uv run quantise.py https://huggingface.co/meta-llama/Llama-3.2-1B
+```
+
+### Command Line Options
+
+```bash
+# Skip imatrix generation for faster processing
+uv run quantise.py <model_url> --no-imatrix
+
+# Local testing without upload
+uv run quantise.py <model_url> --no-upload
+
+# Custom output directory
+uv run quantise.py <model_url> --output-dir ./my-models
+
+# Use specific HuggingFace token
+uv run quantise.py <model_url> --hf-token YOUR_TOKEN
+```
+
+## Environment Variables
+
+- `HF_TOKEN`: HuggingFace API token for uploads
+- `LLAMA_CPP_DIR`: Custom path to llama.cpp binaries
+- `DEBUG`: Enable debug logging when set to "true"
+
+## Requirements
+
+- **llama.cpp binaries**: `llama-quantize`, `llama-cli`, `llama-imatrix`
+- **Calibration data**: `resources/imatrix_data.txt` for importance matrix generation
+- **HuggingFace account**: For uploading quantised models (optional)
+
+## Workflow
+
+1. **Download**: Fetches the model from HuggingFace
+2. **Convert**: Converts to initial GGUF format (F32)
+3. **Generate imatrix**: Creates importance matrix using calibration data
+4. **Quantise**: Produces multiple quantisation variants in parallel
+5. **Upload**: Pushes quantised models to HuggingFace with metadata
+6. **Clean up**: Removes temporary files and caches
+
+## Output Structure
+
+```plain
+output_dir/
+├── model-F32.gguf           # Full precision conversion
+├── model-Q4_K_M.gguf        # Standard quantisation
+├── model-Q4_K_M-imat.gguf   # With importance matrix
+├── model-Q4_K_L-imat.gguf   # Enhanced embeddings/attention
+├── model-Q4_K_XL-imat.gguf  # High precision embeddings
+├── model-Q4_K_XXL-imat.gguf # Maximum precision
+└── imatrix.dat              # Generated importance matrix
+```
+
+## Error Handling
+
+The tool includes comprehensive error handling for:
+
+- Network failures during download
+- Missing binaries or dependencies
+- Insufficient disk space
+- HuggingFace API errors
+- Conversion failures
+
+## Performance Considerations
+
+- **Disk space**: Requires ~3x model size in free space
+- **Memory**: Needs RAM proportional to model size
+- **Processing time**: Varies from minutes to hours based on model size
+- **Network**: Downloads can be large (10-100+ GB for large models)
--- a/docs/safetensors2gguf.md
+++ b/docs/safetensors2gguf.md
@ -0,0 +1,164 @@
+# direct_safetensors_to_gguf.py - Direct SafeTensors Conversion
+
+Direct SafeTensors to GGUF converter for unsupported architectures.
+
+## Overview
+
+This tool converts SafeTensors models directly to GGUF format without requiring specific
+architecture support in llama.cpp. It's particularly useful for experimental models, custom
+architectures, or when llama.cpp's standard conversion tools don't recognise your model
+architecture.
+
+## Features
+
+- **Architecture-agnostic**: Works with unsupported model architectures
+- **Automatic mapping**: Intelligently maps tensor names to GGUF conventions
+- **BFloat16 support**: Handles BF16 tensors with PyTorch (optional)
+- **Vision models**: Supports models with vision components
+- **Tokeniser preservation**: Extracts and includes tokeniser metadata
+- **Fallback mechanisms**: Provides sensible defaults for unknown architectures
+
+## Usage
+
+### Basic Usage
+
+```bash
+# Convert a local SafeTensors model
+uv run direct_safetensors_to_gguf.py /path/to/model/directory
+```
+
+### Command Line Options
+
+```bash
+# Specify output file
+uv run direct_safetensors_to_gguf.py /path/to/model -o output.gguf
+
+# Force specific architecture mapping
+uv run direct_safetensors_to_gguf.py /path/to/model --force-arch qwen2
+
+# Convert with custom output path
+uv run direct_safetensors_to_gguf.py ./my-model --output ./converted/my-model.gguf
+```
+
+## Supported Input Formats
+
+The tool automatically detects and handles:
+
+1. **Single file models**: `model.safetensors`
+2. **Sharded models**: `model-00001-of-00005.safetensors`, etc.
+3. **Custom names**: Any `*.safetensors` files in the directory
+
+## Architecture Mapping
+
+The tool includes built-in mappings for several architectures:
+
+- `DotsOCRForCausalLM` → `qwen2`
+- `GptOssForCausalLM` → `llama`
+- Unknown architectures → `llama` (fallback)
+
+You can override these with the `--force-arch` parameter.
+
+## Tensor Name Mapping
+
+The converter automatically maps common tensor patterns:
+
+| Original Pattern | GGUF Name |
+|-----------------|-----------|
+| `model.embed_tokens.weight` | `token_embd.weight` |
+| `model.norm.weight` | `output_norm.weight` |
+| `lm_head.weight` | `output.weight` |
+| `layers.N.self_attn.q_proj` | `blk.N.attn_q` |
+| `layers.N.self_attn.k_proj` | `blk.N.attn_k` |
+| `layers.N.self_attn.v_proj` | `blk.N.attn_v` |
+| `layers.N.mlp.gate_proj` | `blk.N.ffn_gate` |
+| `layers.N.mlp.up_proj` | `blk.N.ffn_up` |
+| `layers.N.mlp.down_proj` | `blk.N.ffn_down` |
+
+## Configuration Requirements
+
+The model directory must contain:
+
+- **config.json**: Model configuration file (required)
+- **\*.safetensors**: One or more SafeTensors files (required)
+- **tokenizer_config.json**: Tokeniser configuration (optional)
+- **tokenizer.json**: Tokeniser data (optional)
+
+## Output Format
+
+The tool produces a single GGUF file containing:
+
+- All model weights in F32 format
+- Model architecture metadata
+- Tokeniser configuration (if available)
+- Special token IDs (BOS, EOS, UNK, PAD)
+
+## Error Handling
+
+| Error | Message | Solution |
+|-------|---------|----------|
+| Missing config.json | `FileNotFoundError: Config file not found` | Ensure the model directory contains a valid `config.json` file |
+| No SafeTensors files | `FileNotFoundError: No safetensor files found` | Check that the directory contains `.safetensors` files |
+| BFloat16 without PyTorch | `Warning: PyTorch not available, BFloat16 models may not convert properly` | Install PyTorch for BF16 support: `uv add torch` |
+| Unknown architecture | `Warning: Unknown architecture X, using llama as fallback` | Use `--force-arch` to specify a known compatible architecture |
+
+## Technical Details
+
+### Parameter Inference
+
+The tool infers GGUF parameters from the model configuration:
+
+- `vocab_size` → vocabulary size (default: 32000)
+- `max_position_embeddings` → context length (default: 2048)
+- `hidden_size` → embedding dimension (default: 4096)
+- `num_hidden_layers` → number of transformer blocks (default: 32)
+- `num_attention_heads` → attention head count (default: 32)
+- `num_key_value_heads` → KV head count (defaults to attention heads)
+- `rope_theta` → RoPE frequency base (default: 10000.0)
+- `rms_norm_eps` → layer normalisation epsilon (default: 1e-5)
+
+### Vision Model Support
+
+For models with vision components, the tool extracts:
+
+- Vision embedding dimensions
+- Vision transformer block count
+- Vision attention heads
+- Vision feed-forward dimensions
+- Patch size and spatial merge parameters
+
+## Limitations
+
+- **F32 only**: Currently outputs only full precision (F32) models
+- **Architecture guessing**: May require manual architecture specification
+- **Tokeniser compatibility**: Uses llama tokeniser as default fallback
+- **Memory usage**: Requires loading full tensors into memory
+
+## Examples
+
+### Converting a custom model
+
+```bash
+# Download a model first
+git clone https://huggingface.co/my-org/my-model ./my-model
+
+# Convert to GGUF
+uv run direct_safetensors_to_gguf.py ./my-model
+
+# Output will be at ./my-model/my-model-f32.gguf
+```
+
+### Converting with specific architecture
+
+```bash
+# For a Qwen2-based model
+uv run direct_safetensors_to_gguf.py ./qwen-model --force-arch qwen2
+```
+
+### Batch conversion
+
+```bash
+# Convert multiple models
+for model in ./models/*; do
+    uv run direct_safetensors_to_gguf.py "$model" -o "./gguf/$(basename $model).gguf"
+done
+```
--- a/helpers/init.py
+++ b/helpers/init.py
@ -0,0 +1,6 @@
+"""Helper utilities for LLM GGUF tools.
+
+This package provides common utilities, logging, and shared functionality
+used across the quantisation and conversion tools. Uses UK English spelling
+conventions throughout.
+"""
--- a/helpers/config/init.py
+++ b/helpers/config/init.py
@ -0,0 +1,6 @@
+"""Configuration module for quantisation settings and tensor-level precision control.
+
+Provides structured configuration definitions for Bartowski quantisation methods
+including Q4_K_M, Q4_K_L, Q4_K_XL, and Q4_K_XXL variants with fallback strategies
+for different model architectures and deployment scenarios.
+"""
--- a/helpers/config/quantisation_configs.py
+++ b/helpers/config/quantisation_configs.py
@ -0,0 +1,95 @@
+"""Quantisation configuration definitions.
+
+Pre-defined quantisation configurations for the Bartowski method, supporting
+Q4_K_M, Q4_K_L, Q4_K_XL, and Q4_K_XXL variants with tensor-level precision control.
+"""
+
+from __future__ import annotations
+
+from helpers.models.quantisation import QuantisationConfig, QuantisationType
+
+QUANTISATION_CONFIGS: dict[QuantisationType, QuantisationConfig] = {
+    QuantisationType.Q4_K_M: QuantisationConfig(
+        name="Q4_K_M",
+        description="Standard Q4_K_M quantisation (baseline)",
+        tensor_types={},  # No special tensor overrides - uses default Q4_K_M
+        fallback_methods=[],
+    ),
+    QuantisationType.Q4_K_L: QuantisationConfig(
+        name="Q4_K_L",
+        description="Q6_K embeddings + Q6_K attention (+753MB for vocab + reasoning)",
+        tensor_types={
+            "token_embd.weight": "Q6_K",
+            "output.weight": "Q6_K",
+            "lm_head.weight": "Q6_K",
+            "blk.*.attn_q.weight": "Q6_K",
+            "blk.*.attn_k.weight": "Q6_K",
+            "blk.*.attn_v.weight": "Q6_K",
+        },
+        fallback_methods=[
+            {
+                "embed_tokens.weight": "Q6_K",
+                "output.weight": "Q6_K",
+                "lm_head.weight": "Q6_K",
+                "blk.*.attn_q.weight": "Q6_K",
+                "blk.*.attn_k.weight": "Q6_K",
+                "blk.*.attn_v.weight": "Q6_K",
+            },
+            {"token-embedding-type": "Q6_K", "output-tensor-type": "Q6_K"},
+        ],
+    ),
+    QuantisationType.Q4_K_XL: QuantisationConfig(
+        name="Q4_K_XL",
+        description="Q8_0 embeddings + Q6_K attention (+2.1GB for vocabulary + reasoning)",
+        tensor_types={
+            "token_embd.weight": "Q8_0",
+            "output.weight": "Q8_0",
+            "lm_head.weight": "Q8_0",
+            "blk.*.attn_q.weight": "Q6_K",
+            "blk.*.attn_k.weight": "Q6_K",
+            "blk.*.attn_v.weight": "Q6_K",
+        },
+        fallback_methods=[
+            {
+                "embed_tokens.weight": "Q8_0",
+                "output.weight": "Q8_0",
+                "lm_head.weight": "Q8_0",
+                "blk.*.attn_q.weight": "Q6_K",
+                "blk.*.attn_k.weight": "Q6_K",
+                "blk.*.attn_v.weight": "Q6_K",
+            },
+            {"token-embedding-type": "Q8_0", "output-tensor-type": "Q8_0"},
+        ],
+    ),
+    QuantisationType.Q4_K_XXL: QuantisationConfig(
+        name="Q4_K_XXL",
+        description="Q8_0 embeddings + Q8_0 attention (+2.8GB total, maximum precision)",
+        tensor_types={
+            "token_embd.weight": "Q8_0",
+            "output.weight": "Q8_0",
+            "lm_head.weight": "Q8_0",
+            "blk.*.attn_q.weight": "Q8_0",
+            "blk.*.attn_k.weight": "Q8_0",
+            "blk.*.attn_v.weight": "Q8_0",
+        },
+        fallback_methods=[
+            {
+                "embed_tokens.weight": "Q8_0",
+                "output.weight": "Q8_0",
+                "lm_head.weight": "Q8_0",
+                "blk.*.attn_q.weight": "Q8_0",
+                "blk.*.attn_k.weight": "Q8_0",
+                "blk.*.attn_v.weight": "Q8_0",
+            },
+            {"token-embedding-type": "Q8_0", "output-tensor-type": "Q8_0"},
+        ],
+    ),
+}
+
+
+SUPPORTED_QUANTISATION_TYPES: list[QuantisationType] = [
+    QuantisationType.Q4_K_M,
+    QuantisationType.Q4_K_L,
+    QuantisationType.Q4_K_XL,
+    QuantisationType.Q4_K_XXL,
+]
--- a/helpers/logger.py
+++ b/helpers/logger.py
@ -0,0 +1,94 @@
+"""Colour-coded logging configuration for LLM GGUF tools.
+
+Provides a consistent logging interface with colour-coded output for different
+log levels, making it easier to identify warnings, errors, and informational
+messages at a glance during tool execution and debugging sessions.
+"""
+
+from __future__ import annotations
+
+from logging import (
+    CRITICAL,
+    DEBUG,
+    ERROR,
+    INFO,
+    WARNING,
+    Formatter as LoggingFormatter,
+    Logger,
+    LogRecord,
+    StreamHandler as LoggingStreamHandler,
+    getLogger,
+)
+from os import getenv as os_getenv
+from sys import stdout as sys_stdout
+from typing import ClassVar
+
+DEBUG_MODE = os_getenv("DEBUG", "false").lower() == "true"
+
+
+class ColourFormatter(LoggingFormatter):
+    """Custom formatter adding colours to log messages based on severity level.
+
+    Uses ANSI escape codes to provide visual distinction between different
+    log levels in terminal output. Supports standard logging levels with
+    appropriate colour coding: DEBUG (cyan), INFO (green), WARNING (yellow),
+    ERROR (red), and CRITICAL (bold red) for immediate visual feedback.
+    """
+
+    # ANSI colour codes
+    COLOURS: ClassVar[dict[int, str]] = {
+        DEBUG: "\033[36m",  # Cyan
+        INFO: "\033[32m",  # Green
+        WARNING: "\033[33m",  # Yellow
+        ERROR: "\033[31m",  # Red
+        CRITICAL: "\033[1;31m",  # Bold Red
+    }
+    RESET = "\033[0m"
+
+    # Emoji prefixes for different levels
+    EMOJIS: ClassVar[dict[int, str]] = {
+        DEBUG: "🔍",
+        INFO: "ℹ️ ",  # noqa: RUF001
+        WARNING: "⚠️ ",
+        ERROR: "❌",
+        CRITICAL: "🔥",
+    }
+
+    def format(self, record: LogRecord) -> str:
+        """Format log record with colour and emoji based on severity level.
+
+        Enhances standard log formatting by prepending ANSI colour codes and
+        emoji indicators, then appending reset codes to prevent colour bleeding.
+        Maintains standard log structure whilst adding visual enhancements for
+        improved readability in terminal environments.
+
+        Returns:
+            str: Formatted log message with colour and emoji.
+        """
+        # Get colour for this level
+        colour = self.COLOURS.get(record.levelno, "")
+        emoji = self.EMOJIS.get(record.levelno, "")
+
+        # Format the message
+        record.msg = f"{emoji} {record.msg}"
+        formatted = super().format(record)
+
+        # Add colour codes
+        return f"{colour}{formatted}{self.RESET}"
+
+
+# Create and configure the logger
+logger: Logger = getLogger("llm-gguf-tools")
+logger.setLevel(DEBUG if DEBUG_MODE else INFO)
+
+# Create console handler with colour formatter
+handler = LoggingStreamHandler(sys_stdout)
+handler.setLevel(DEBUG if DEBUG_MODE else INFO)
+
+# Set formatter without timestamp for cleaner output
+formatter = ColourFormatter(fmt="%(message)s", datefmt="%H:%M:%S")
+handler.setFormatter(formatter)
+logger.addHandler(handler)
+
+# Prevent propagation to root logger
+logger.propagate = False
--- a/helpers/models/init.py
+++ b/helpers/models/init.py
@ -0,0 +1,35 @@
+"""Pydantic models for llm-gguf-tools.
+
+This module provides structured data models for quantisation and conversion
+operations, ensuring type safety and validation across the toolset.
+"""
+
+from __future__ import annotations
+
+from helpers.models.conversion import (
+    GGUFParameters,
+    ModelConfig,
+    TensorMapping,
+    VisionConfig,
+)
+from helpers.models.quantisation import (
+    LlamaCppEnvironment,
+    ModelSource,
+    QuantisationConfig,
+    QuantisationResult,
+    QuantisationType,
+    URLType,
+)
+
+__all__ = [
+    "GGUFParameters",
+    "LlamaCppEnvironment",
+    "ModelConfig",
+    "ModelSource",
+    "QuantisationConfig",
+    "QuantisationResult",
+    "QuantisationType",
+    "TensorMapping",
+    "URLType",
+    "VisionConfig",
+]
--- a/helpers/models/conversion.py
+++ b/helpers/models/conversion.py
@ -0,0 +1,150 @@
+"""Pydantic models for GGUF conversion operations.
+
+Contains data models for SafeTensors to GGUF conversion including
+model configurations, parameter mappings, and tensor specifications.
+Uses UK English spelling conventions throughout.
+"""
+
+from __future__ import annotations
+
+from typing import Any
+
+from pydantic import BaseModel, ConfigDict, Field
+
+
+class ModelConfig(BaseModel):
+    """Parsed model configuration from HuggingFace config.json.
+
+    Represents the standard configuration metadata extracted from HuggingFace
+    models, providing structured access to architecture details, hyperparameters,
+    and quantisation settings required for GGUF conversion.
+    """
+
+    model_config = ConfigDict(extra="allow")
+
+    architectures: list[str] = Field(default_factory=lambda: ["Unknown"])
+    model_type: str = "unknown"
+    vocab_size: int = 32000
+    max_position_embeddings: int = 2048
+    hidden_size: int = 4096
+    num_hidden_layers: int = 32
+    intermediate_size: int = 11008
+    num_attention_heads: int = 32
+    num_key_value_heads: int | None = None
+    rope_theta: float = 10000.0
+    rope_scaling: dict[str, Any] | None = None
+    rms_norm_eps: float = 1e-5
+    vision_config: VisionConfig | None = None
+
+    def to_gguf_params(self) -> GGUFParameters:
+        """Convert model configuration to GGUF parameters.
+
+        Translates HuggingFace model configuration values to GGUF-specific
+        parameter format, handling defaults and calculating derived values
+        like RoPE dimension count from head dimensions.
+
+        Returns:
+            GGUFParameters instance with converted values.
+        """
+        params = {
+            "vocab_size": self.vocab_size,
+            "context_length": self.max_position_embeddings,
+            "embedding_length": self.hidden_size,
+            "block_count": self.num_hidden_layers,
+            "feed_forward_length": self.intermediate_size,
+            "attention.head_count": self.num_attention_heads,
+            "attention.head_count_kv": self.num_key_value_heads or self.num_attention_heads,
+            "attention.layer_norm_rms_epsilon": self.rms_norm_eps,
+            "rope.freq_base": self.rope_theta,
+            "rope.dimension_count": self.hidden_size // self.num_attention_heads,
+        }
+        return GGUFParameters(**params)  # type: ignore[arg-type]
+
+
+class VisionConfig(BaseModel):
+    """Vision model configuration for multimodal models.
+
+    Contains parameters specific to vision components in multimodal architectures,
+    including patch sizes, embedding dimensions, and spatial merge configurations
+    for proper GGUF metadata generation.
+    """
+
+    model_config = ConfigDict(extra="allow")
+
+    hidden_size: int = 1536
+    num_hidden_layers: int = 42
+    num_attention_heads: int = 12
+    intermediate_size: int = 4224
+    patch_size: int = 14
+    spatial_merge_size: int = 2
+    rms_norm_eps: float | None = None
+
+
+class GGUFParameters(BaseModel):
+    """GGUF-specific parameters inferred from model configuration.
+
+    Translates HuggingFace configuration values to GGUF parameter names and
+    formats, providing a standardised interface for GGUF writer configuration
+    across different model architectures and quantisation strategies.
+    """
+
+    model_config = ConfigDict(extra="allow")
+
+    # Basic parameters
+    vocab_size: int
+    context_length: int
+    embedding_length: int
+    block_count: int
+    feed_forward_length: int
+
+    # Attention parameters
+    attention_head_count: int = Field(alias="attention.head_count")
+    attention_head_count_kv: int = Field(alias="attention.head_count_kv")
+    attention_layer_norm_rms_epsilon: float = Field(alias="attention.layer_norm_rms_epsilon")
+
+    # RoPE parameters
+    rope_freq_base: float = Field(alias="rope.freq_base")
+    rope_dimension_count: int = Field(alias="rope.dimension_count")
+    rope_scaling_type: str | None = Field(default=None, alias="rope.scaling.type")
+    rope_scaling_factor: float | None = Field(default=None, alias="rope.scaling.factor")
+
+
+class TensorMapping(BaseModel):
+    """Mapping configuration for tensor name conversion.
+
+    Defines rules for translating between HuggingFace tensor naming conventions
+    and GGUF tensor names, supporting both direct mappings and pattern-based
+    transformations for layer-specific tensors.
+    """
+
+    model_config = ConfigDict(frozen=True)
+
+    # Direct mappings (exact name matches)
+    direct_mappings: dict[str, str] = Field(
+        default_factory=lambda: {
+            "model.embed_tokens.weight": "token_embd.weight",
+            "model.norm.weight": "output_norm.weight",
+            "lm_head.weight": "output.weight",
+        }
+    )
+
+    # Layer component patterns (for .layers.N. tensors)
+    layer_patterns: dict[str, str] = Field(
+        default_factory=lambda: {
+            "self_attn.q_proj.weight": "attn_q.weight",
+            "self_attn.q_proj.bias": "attn_q.bias",
+            "self_attn.k_proj.weight": "attn_k.weight",
+            "self_attn.k_proj.bias": "attn_k.bias",
+            "self_attn.v_proj.weight": "attn_v.weight",
+            "self_attn.v_proj.bias": "attn_v.bias",
+            "self_attn.o_proj": "attn_output.weight",
+            "mlp.gate_proj": "ffn_gate.weight",
+            "mlp.up_proj": "ffn_up.weight",
+            "mlp.down_proj": "ffn_down.weight",
+            "input_layernorm": "attn_norm.weight",
+            "post_attention_layernorm": "ffn_norm.weight",
+        }
+    )
+
+    # Architecture-specific overrides
+    architecture_overrides: dict[str, dict[str, str]] = Field(default_factory=dict)
--- a/helpers/models/quantisation.py
+++ b/helpers/models/quantisation.py
@ -0,0 +1,168 @@
+"""Pydantic models for quantisation operations.
+
+Contains data models specific to the quantisation workflow including
+quantisation types, configurations, and results. Uses UK English spelling
+conventions throughout (quantisation, not quantization).
+"""
+
+from __future__ import annotations
+
+from enum import StrEnum
+from typing import TYPE_CHECKING
+
+from pydantic import BaseModel, ConfigDict, Field, field_validator
+
+if TYPE_CHECKING:
+    from pathlib import Path
+
+
+class QuantisationType(StrEnum):
+    """Available quantisation types for Bartowski-method GGUF model conversion.
+
+    Defines the specific quantisation strategies supported by this tool, ranging
+    from Q4_K_M baseline to Q4_K_XXL maximum precision variants. Each type
+    represents different trade-offs between model size and quality preservation
+    for embeddings, attention layers, and feed-forward networks.
+    """
+
+    Q4_K_M = "Q4_K_M"
+    Q4_K_L = "Q4_K_L"
+    Q4_K_XL = "Q4_K_XL"
+    Q4_K_XXL = "Q4_K_XXL"
+
+
+class URLType(StrEnum):
+    """Supported URL formats for model source specification.
+
+    Categorises input URL formats to enable appropriate handling strategies.
+    HuggingFace URLs require full model download and conversion, whilst Ollama
+    GGUF URLs allow direct GGUF file downloads with pattern matching for
+    efficient processing of pre-quantised models.
+    """
+
+    HUGGINGFACE = "huggingface"
+    OLLAMA_GGUF = "ollama_gguf"
+
+
+class QuantisationConfig(BaseModel):
+    """Configuration for a specific quantisation method with tensor-level precision control.
+
+    Defines quantisation parameters including tensor type mappings and fallback
+    methods for handling different model architectures. Enables fine-grained
+    control over which layers receive higher precision treatment whilst
+    maintaining compatibility across diverse model structures.
+    """
+
+    model_config = ConfigDict(use_enum_values=True)
+
+    name: str
+    description: str
+    tensor_types: dict[str, str] = Field(default_factory=dict)
+    fallback_methods: list[dict[str, str]] = Field(default_factory=list)
+
+
+class ModelSource(BaseModel):
+    """Represents a model source with parsed information from URL analysis.
+
+    Contains comprehensive metadata extracted from model URLs including source
+    repository details, author information, and GGUF file patterns. Enables
+    differentiation between regular HuggingFace repositories requiring conversion
+    and GGUF repositories allowing direct file downloads.
+    """
+
+    model_config = ConfigDict(use_enum_values=True, protected_namespaces=())
+
+    url: str
+    url_type: URLType
+    source_model: str
+    original_author: str
+    model_name: str
+    gguf_file_pattern: str | None = None
+    is_gguf_repo: bool = False
+
+    @field_validator("url")
+    @classmethod
+    def validate_url(cls, v: str) -> str:
+        """Validate that URL is not empty.
+
+        Ensures the provided URL string is not empty or None,
+        as this is required for model source identification.
+
+        Returns:
+            The validated URL string.
+
+        Raises:
+            ValueError: If URL is empty or None.
+        """
+        if not v:
+            msg = "URL cannot be empty"
+            raise ValueError(msg)
+        return v
+
+
+class QuantisationResult(BaseModel):
+    """Result of a quantisation operation with comprehensive status tracking.
+
+    Captures the outcome of individual quantisation attempts including success
+    status, file paths, sizes, and error details. Supports workflow status
+    tracking from planning through processing to completion, enabling real-time
+    progress reporting and parallel upload coordination.
+    """
+
+    model_config = ConfigDict(use_enum_values=True, arbitrary_types_allowed=True)
+
+    quantisation_type: QuantisationType
+    success: bool
+    file_path: Path | None = None
+    file_size: str | None = None
+    method_used: str | None = None
+    error_message: str | None = None
+    status: str = "pending"  # planned, processing, uploading, completed, failed
+
+
+class LlamaCppEnvironment(BaseModel):
+    """Represents llama.cpp environment setup with binary and script locations.
+
+    Encapsulates the runtime environment for llama.cpp tools including paths
+    to quantisation binaries, CLI tools, and conversion scripts. Handles both
+    local binary installations and repository-based setups to provide flexible
+    deployment options across different system configurations.
+    """
+
+    model_config = ConfigDict(arbitrary_types_allowed=True)
+
+    quantise_binary: Path  # UK spelling
+    cli_binary: Path
+    convert_script: str
+    use_repo: bool = False
+
+
+class QuantisationContext(BaseModel):
+    """Context object containing all parameters needed for quantisation execution.
+
+    Encapsulates quantisation parameters to reduce method argument counts
+    and improve code maintainability following parameter object pattern.
+    """
+
+    model_config = ConfigDict(frozen=True)
+
+    f16_model_path: Path
+    model_source: ModelSource
+    config: QuantisationConfig
+    llama_env: LlamaCppEnvironment
+    models_dir: Path
+    imatrix_path: Path | None = None
+    base_quant: str = "Q4_K_M"
+
+    def get_output_path(self) -> Path:
+        """Generate output path for quantised model.
+
+        Returns:
+            Path to the output GGUF file.
+        """
+        output_filename = (
+            f"{self.model_source.original_author}-"
+            f"{self.model_source.model_name}-"
+            f"{self.config.name}.gguf"
+        )
+        return self.models_dir / self.model_source.model_name / output_filename
--- a/helpers/services/init.py
+++ b/helpers/services/init.py
@ -0,0 +1,20 @@
+"""Service layer for llm-gguf-tools.
+
+Provides high-level service interfaces for interacting with external systems
+including HuggingFace, llama.cpp, and filesystem operations. Uses UK English
+spelling conventions throughout.
+"""
+
+from __future__ import annotations
+
+from helpers.services.filesystem import FilesystemService
+from helpers.services.huggingface import HuggingFaceService, ReadmeGenerator
+from helpers.services.llama_cpp import EnvironmentManager, IMatrixGenerator
+
+__all__ = [
+    "EnvironmentManager",
+    "FilesystemService",
+    "HuggingFaceService",
+    "IMatrixGenerator",
+    "ReadmeGenerator",
+]
--- a/helpers/services/filesystem.py
+++ b/helpers/services/filesystem.py
@ -0,0 +1,174 @@
+"""Filesystem operations service.
+
+Provides unified filesystem operations including file discovery, size
+calculation, and path management. Consolidates common filesystem patterns
+used across quantisation and conversion workflows.
+"""
+
+from __future__ import annotations
+
+import json
+import subprocess
+from pathlib import Path
+from typing import Any
+
+from helpers.logger import logger
+
+BYTES_PER_UNIT = 1024.0
+
+
+class FilesystemService:
+    """Handles filesystem operations with consistent error handling.
+
+    Provides methods for file discovery, size formatting, and JSON loading
+    with proper error handling and logging. Ensures consistent behaviour
+    across different tools and workflows.
+    """
+
+    @staticmethod
+    def get_file_size(file_path: Path) -> str:
+        """Get human-readable file size using system utilities.
+
+        Attempts to use `du -h` for human-readable output, falling back to
+        Python calculation if the system command fails. Provides consistent
+        size formatting across the toolset.
+
+        Returns:
+            Human-readable file size string (e.g., "1.5G", "750M").
+        """
+        try:
+            result = subprocess.run(
+                ["du", "-h", str(file_path)], capture_output=True, text=True, check=True
+            )
+            return result.stdout.split()[0]
+        except (subprocess.CalledProcessError, FileNotFoundError):
+            # Fallback to Python calculation
+
+            try:
+                size_bytes: float = float(file_path.stat().st_size)
+                for unit in ["B", "K", "M", "G", "T"]:
+                    if size_bytes < BYTES_PER_UNIT:
+                        return f"{size_bytes:.1f}{unit}"
+                    size_bytes /= BYTES_PER_UNIT
+            except Exception:
+                return "Unknown"
+            else:
+                return f"{size_bytes:.1f}P"
+
+    @staticmethod
+    def load_json_config(config_path: Path) -> dict[str, Any]:
+        """Load and parse JSON configuration file.
+
+        Provides consistent JSON loading with proper error handling and
+        encoding specification. Used for loading model configurations,
+        tokeniser settings, and other JSON-based metadata.
+
+        Returns:
+            Parsed JSON content as dictionary.
+
+        Raises:
+            FileNotFoundError: If config file doesn't exist.
+        """
+        if not config_path.exists():
+            msg = f"Configuration file not found: {config_path}"
+            raise FileNotFoundError(msg)
+
+        with Path(config_path).open(encoding="utf-8") as f:
+            return json.load(f)
+
+    @staticmethod
+    def find_safetensor_files(model_path: Path) -> list[Path]:
+        """Find all SafeTensor files in model directory using priority search.
+
+        Searches for tensor files in order of preference: single model.safetensors,
+        sharded model-*-of-*.safetensors files, then any *.safetensors files. This
+        approach handles both single-file and multi-shard model distributions whilst
+        ensuring predictable file ordering for conversion consistency.
+
+        Returns:
+            List of SafeTensor file paths in priority order.
+
+        Raises:
+            FileNotFoundError: If no SafeTensor files are found.
+        """
+        # Check for single file
+        single_file = model_path / "model.safetensors"
+        if single_file.exists():
+            return [single_file]
+
+        # Check for sharded files
+        pattern = "model-*-of-*.safetensors"
+        sharded_files = sorted(model_path.glob(pattern))
+        if sharded_files:
+            return sharded_files
+
+        # Check for any safetensor files
+        any_files = sorted(model_path.glob("*.safetensors"))
+        if any_files:
+            return any_files
+
+        msg = f"No SafeTensor files found in {model_path}"
+        raise FileNotFoundError(msg)
+
+    @staticmethod
+    def find_gguf_files(model_path: Path, pattern: str | None = None) -> list[Path]:
+        """Find GGUF files in directory, optionally filtered by pattern.
+
+        Searches for GGUF files with optional pattern matching. Prioritises
+        multi-part files (00001-of-*) over single files for proper handling
+        of large models split across multiple files.
+
+        Returns:
+            List of GGUF file paths, sorted with multi-part files first.
+        """
+        if pattern:
+            gguf_files = list(model_path.glob(f"*{pattern}*.gguf"))
+        else:
+            gguf_files = list(model_path.glob("*.gguf"))
+
+        # Sort to prioritise 00001-of-* files
+        gguf_files.sort(
+            key=lambda x: (
+                "00001-of-" not in x.name,  # False sorts before True
+                x.name,
+            )
+        )
+
+        return gguf_files
+
+    @staticmethod
+    def ensure_directory(path: Path) -> Path:
+        """Ensure directory exists, creating if necessary.
+
+        Creates directory and all parent directories if they don't exist.
+        Returns the path for method chaining convenience.
+
+        Returns:
+            The directory path.
+        """
+        path.mkdir(parents=True, exist_ok=True)
+        return path
+
+    @staticmethod
+    def cleanup_directory(path: Path, pattern: str = "*") -> int:
+        """Remove files matching pattern from directory.
+
+        Safely removes files matching the specified glob pattern. Returns
+        count of files removed for logging purposes.
+
+        Returns:
+            Number of files removed.
+        """
+        if not path.exists():
+            return 0
+
+        files_removed = 0
+        for file_path in path.glob(pattern):
+            if file_path.is_file():
+                try:
+                    file_path.unlink()
+                    files_removed += 1
+                except Exception as e:
+                    logger.warning(f"Failed to remove {file_path}: {e}")
+
+        return files_removed
--- a/helpers/services/gguf.py
+++ b/helpers/services/gguf.py
@ -0,0 +1,210 @@
+"""GGUF file operations service.
+
+Provides unified interface for creating, writing, and manipulating GGUF files.
+Consolidates GGUF-specific operations from conversion and quantisation workflows.
+Uses UK English spelling conventions throughout.
+"""
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any
+
+import gguf
+import torch
+from safetensors import safe_open
+
+from helpers.logger import logger
+from helpers.services.filesystem import FilesystemService
+from helpers.utils.config_parser import ConfigParser
+
+if TYPE_CHECKING:
+    from pathlib import Path
+
+    import numpy as np
+
+    from helpers.models.conversion import ModelConfig
+
+
+class GGUFWriter:
+    """Manages GGUF file creation and metadata writing.
+
+    Provides high-level interface for GGUF file operations including metadata
+    configuration, tensor addition, and tokeniser integration. Encapsulates
+    low-level GGUF library interactions for consistent error handling.
+    """
+
+    def __init__(self, output_path: Path, architecture: str) -> None:
+        """Initialise GGUF writer with output path and architecture.
+
+        Creates the underlying GGUF writer instance and prepares for metadata
+        and tensor addition. Sets up the file structure for the specified
+        model architecture.
+        """
+        self.output_path = output_path
+        self.architecture = architecture
+        self.writer = gguf.GGUFWriter(str(output_path), architecture)
+        logger.info(f"Created GGUF writer for {architecture} architecture")
+
+    def add_metadata(self, model_config: ModelConfig, model_name: str) -> None:
+        """Add comprehensive metadata from model configuration.
+
+        Writes general model information, architectural parameters, and
+        quantisation settings to the GGUF file header. Handles both standard
+        and vision model configurations with appropriate parameter mapping.
+        """
+        # General metadata
+        self.writer.add_name(model_name)
+        self.writer.add_description(f"Converted from {model_config.architectures[0]}")
+        self.writer.add_file_type(gguf.LlamaFileType.ALL_F32)
+
+        # Model parameters from config
+        params = model_config.to_gguf_params()
+        self.writer.add_context_length(params.context_length)
+        self.writer.add_embedding_length(params.embedding_length)
+        self.writer.add_block_count(params.block_count)
+        self.writer.add_feed_forward_length(params.feed_forward_length)
+        self.writer.add_head_count(params.attention_head_count)
+        self.writer.add_head_count_kv(params.attention_head_count_kv)
+        self.writer.add_layer_norm_rms_eps(params.attention_layer_norm_rms_epsilon)
+        self.writer.add_rope_freq_base(params.rope_freq_base)
+        self.writer.add_rope_dimension_count(params.rope_dimension_count)
+
+        logger.info(f"Added metadata: {params.block_count} layers, {params.context_length} context")
+
+    def add_vision_metadata(self, vision_config: Any) -> None:
+        """Add vision model parameters to GGUF metadata.
+
+        Configures vision-specific parameters for multimodal models including
+        embedding dimensions, attention heads, and spatial processing settings.
+        """
+        if not vision_config:
+            return
+
+        logger.info("Adding vision model parameters...")
+        self.writer.add_vision_embedding_length(vision_config.hidden_size)
+        self.writer.add_vision_block_count(vision_config.num_hidden_layers)
+        self.writer.add_vision_head_count(vision_config.num_attention_heads)
+        self.writer.add_vision_feed_forward_length(vision_config.intermediate_size)
+        self.writer.add_vision_patch_size(vision_config.patch_size)
+        self.writer.add_vision_spatial_merge_size(vision_config.spatial_merge_size)
+
+        if hasattr(vision_config, "rms_norm_eps") and vision_config.rms_norm_eps:
+            self.writer.add_vision_attention_layernorm_eps(vision_config.rms_norm_eps)
+
+    def add_tokeniser(self, tokeniser_config: dict[str, Any]) -> None:
+        """Add tokeniser metadata to GGUF file.
+
+        Writes special token IDs and tokeniser model type to enable proper
+        text processing during inference. Uses sensible defaults for missing
+        configuration values.
+        """
+        self.writer.add_bos_token_id(tokeniser_config.get("bos_token_id", 1))
+        self.writer.add_eos_token_id(tokeniser_config.get("eos_token_id", 2))
+        self.writer.add_unk_token_id(tokeniser_config.get("unk_token_id", 0))
+        self.writer.add_pad_token_id(tokeniser_config.get("pad_token_id", 0))
+        self.writer.add_tokenizer_model(tokeniser_config.get("model_type", "llama"))
+
+        logger.info("Added tokeniser configuration")
+
+    def add_tensor(self, name: str, data: np.ndarray) -> None:
+        """Add a tensor to the GGUF file.
+
+        Writes tensor data with the specified name to the file. Handles
+        data type conversions and validates tensor shapes.
+        """
+        self.writer.add_tensor(name, data)
+
+    def finalise(self) -> None:
+        """Write all data to file and close writer.
+
+        Completes the GGUF file creation by writing headers, key-value data,
+        and tensor data in the correct order. Ensures proper file closure.
+        """
+        logger.info(f"Writing GGUF file to {self.output_path}")
+        self.writer.write_header_to_file()
+        self.writer.write_kv_data_to_file()
+        self.writer.write_tensors_to_file()
+        self.writer.close()
+        logger.info("GGUF file written successfully")
+
+
+class GGUFConverter:
+    """High-level GGUF conversion orchestrator.
+
+    Coordinates the complete conversion workflow from source models to GGUF
+    format, managing metadata extraction, tensor mapping, and file writing.
+    """
+
+    @staticmethod
+    def convert_safetensors(
+        model_path: Path,
+        output_path: Path,
+        model_config: ModelConfig,
+        architecture: str,
+        tensor_mapper: Any,
+    ) -> bool:
+        """Convert SafeTensors model to GGUF format.
+
+        Orchestrates the conversion process including metadata setup, tensor
+        loading with BFloat16 support, name mapping, and tokeniser integration.
+
+        Returns:
+            True if conversion successful, False otherwise.
+        """
+        logger.info(f"Converting {model_path.name} to GGUF...")
+
+        # Create writer
+        writer_wrapper = GGUFWriter(output_path, architecture)
+
+        # Add metadata
+        writer_wrapper.add_metadata(model_config, model_path.name)
+
+        # Add vision metadata if present
+        if model_config.vision_config:
+            writer_wrapper.add_vision_metadata(model_config.vision_config)
+
+        # Load and add tensors
+        fs = FilesystemService()
+        tensor_files = fs.find_safetensor_files(model_path)
+        logger.info(f"Found {len(tensor_files)} tensor file(s)")
+
+        tensor_count = 0
+        for tensor_file in tensor_files:
+            logger.info(f"Loading {tensor_file.name}...")
+            with safe_open(tensor_file, framework="pt") as f:
+                for tensor_name in f:
+                    tensor_data = f.get_tensor(tensor_name)
+
+                    # Convert BFloat16 to Float32
+                    if hasattr(tensor_data, "numpy"):
+                        if torch and tensor_data.dtype == torch.bfloat16:
+                            tensor_data = tensor_data.float()
+                        tensor_data = tensor_data.numpy()
+
+                    # Map tensor name
+                    gguf_name = tensor_mapper.map_tensor_name(tensor_name)
+
+                    if gguf_name:
+                        writer_wrapper.add_tensor(gguf_name, tensor_data)
+                        tensor_count += 1
+
+                        if tensor_count % 100 == 0:
+                            logger.info(f"  Processed {tensor_count} tensors...")
+
+        logger.info(f"Total tensors processed: {tensor_count}")
+
+        # Add tokeniser
+        try:
+            tok_config = ConfigParser.load_tokeniser_config(model_path)
+            writer_wrapper.add_tokeniser(tok_config)
+            logger.info("Tokeniser added")
+        except Exception as e:
+            logger.warning(f"Could not add tokeniser: {e}")
+
+        # Finalise file
+        writer_wrapper.finalise()
+
+        file_size = fs.get_file_size(output_path)
+        logger.info(f"Conversion complete! Output: {output_path} ({file_size})")
+
+        return True
--- a/helpers/services/huggingface.py
+++ b/helpers/services/huggingface.py
@ -0,0 +1,454 @@
+"""HuggingFace operations service.
+
+Handles all interactions with HuggingFace including model downloads,
+uploads, README generation, and repository management. Uses UK English
+spelling conventions throughout.
+"""
+
+from __future__ import annotations
+
+import re
+import subprocess
+import tempfile
+from pathlib import Path
+from typing import TYPE_CHECKING
+
+from helpers.logger import logger
+from helpers.models.quantisation import QuantisationType
+
+if TYPE_CHECKING:
+    from helpers.models.quantisation import ModelSource, QuantisationResult
+
+
+class HuggingFaceService:
+    """Manages HuggingFace repository operations.
+
+    Provides methods for downloading models, uploading files, and managing
+    repositories. Handles authentication, error recovery, and progress tracking
+    for robust interaction with HuggingFace services.
+    """
+
+    @staticmethod
+    def get_username() -> str:
+        """Get authenticated HuggingFace username.
+
+        Retrieves the current user's HuggingFace username using the CLI.
+        Requires prior authentication via `huggingface-cli login`.
+
+        Returns:
+            HuggingFace username.
+
+        Raises:
+            RuntimeError: If not authenticated or CLI not available.
+        """
+        try:
+            result = subprocess.run(
+                ["huggingface-cli", "whoami"],
+                capture_output=True,
+                text=True,
+                check=True,
+            )
+            return result.stdout.strip()
+        except (subprocess.CalledProcessError, FileNotFoundError) as err:
+            msg = "Please log in to HuggingFace first: huggingface-cli login"
+            raise RuntimeError(msg) from err
+
+    @staticmethod
+    def download_model(
+        model_name: str, output_dir: Path, include_pattern: str | None = None
+    ) -> None:
+        """Download model from HuggingFace.
+
+        Downloads a complete model or specific files matching a pattern.
+        Creates the output directory if it doesn't exist. Supports filtered
+        downloads for efficient bandwidth usage when only certain files are needed.
+        """
+        logger.info(f"Downloading {model_name} to {output_dir}")
+
+        cmd = [
+            "huggingface-cli",
+            "download",
+            model_name,
+            "--local-dir",
+            str(output_dir),
+        ]
+
+        if include_pattern:
+            cmd.extend(["--include", include_pattern])
+
+        subprocess.run(cmd, check=True)
+        logger.info("Download complete")
+
+    @staticmethod
+    def upload_file(
+        repo_id: str,
+        local_path: Path,
+        repo_path: str | None = None,
+        create_repo: bool = False,
+    ) -> None:
+        """Upload a file to HuggingFace repository.
+
+        Uploads a single file to the specified repository path. Can create
+        the repository if it doesn't exist. Handles repository creation conflicts
+        gracefully by retrying without the create flag when needed.
+
+        Raises:
+            CalledProcessError: If upload fails.
+        """
+        repo_path = repo_path or local_path.name
+        logger.info(f"Uploading {local_path.name} to {repo_id}/{repo_path}")
+
+        cmd = [
+            "huggingface-cli",
+            "upload",
+            repo_id,
+            str(local_path),
+            repo_path,
+        ]
+
+        if create_repo:
+            cmd.append("--create")
+
+        try:
+            subprocess.run(cmd, check=True, capture_output=True)
+            logger.info(f"Uploaded {repo_path}")
+        except subprocess.CalledProcessError:
+            if create_repo:
+                # Repository might already exist, retry without --create
+                cmd = cmd[:-1]  # Remove --create flag
+                subprocess.run(cmd, check=True)
+                logger.info(f"Updated {repo_path}")
+            else:
+                raise
+
+
+class ReadmeGenerator:
+    """Generates README files for quantised models.
+
+    Creates comprehensive README documentation including model cards,
+    quantisation details, and status tracking. Supports both initial
+    planning documentation and final result summaries.
+    """
+
+    def generate(
+        self,
+        model_source: ModelSource,
+        results: dict[QuantisationType, QuantisationResult],
+        models_dir: Path,
+        output_repo: str | None = None,
+    ) -> Path:
+        """Generate README file for quantised model repository.
+
+        Creates a comprehensive README with frontmatter, quantisation table,
+        and original model information. Handles status tracking for planned,
+        processing, and completed quantisations.
+
+        Returns:
+            Path to generated README file.
+        """
+        logger.info("Creating model card...")
+
+        model_dir = models_dir / model_source.model_name
+        readme_path = model_dir / "README.md"
+
+        # Get original README content
+        original_content = self._get_original_readme(model_source, model_dir)
+
+        # Generate new README
+        readme_content = self._generate_readme_content(
+            model_source, results, original_content, output_repo
+        )
+
+        readme_path.write_text(readme_content)
+        return readme_path
+
+    def _get_original_readme(self, model_source: ModelSource, model_dir: Path) -> dict[str, str]:
+        """Extract original README and metadata.
+
+        Downloads or reads the original model's README for inclusion in the
+        quantised model documentation. Parses YAML frontmatter if present.
+
+        Returns:
+            Dictionary with readme content, licence, tags, and frontmatter.
+        """
+        content = {"readme": "", "licence": "apache-2.0", "tags": "", "frontmatter": ""}
+
+        # Try local file first
+        readme_path = model_dir / "README.md"
+        if readme_path.exists():
+            content["readme"] = readme_path.read_text(encoding="utf-8")
+            logger.info(f"Found original README ({len(content['readme'])} characters)")
+        else:
+            # Download separately
+            content = self._download_readme(model_source)
+
+        # Parse frontmatter if present
+        if content["readme"].startswith("---\n"):
+            content = self._parse_frontmatter(content["readme"])
+
+        return content
+
+    def _download_readme(self, model_source: ModelSource) -> dict[str, str]:
+        """Download README from HuggingFace repository.
+
+        Attempts to download just the README.md file from the source repository
+        for efficient documentation extraction.
+
+        Returns:
+            Dictionary with readme content and default metadata.
+        """
+        content = {"readme": "", "licence": "apache-2.0", "tags": "", "frontmatter": ""}
+
+        with tempfile.TemporaryDirectory() as temp_dir:
+            try:
+                logger.info(f"Downloading README from {model_source.source_model}...")
+                subprocess.run(
+                    [
+                        "huggingface-cli",
+                        "download",
+                        model_source.source_model,
+                        "--include",
+                        "README.md",
+                        "--local-dir",
+                        temp_dir,
+                    ],
+                    check=True,
+                    capture_output=True,
+                )
+
+                readme_path = Path(temp_dir) / "README.md"
+                if readme_path.exists():
+                    content["readme"] = readme_path.read_text(encoding="utf-8")
+                    logger.info(f"Downloaded README ({len(content['readme'])} characters)")
+            except subprocess.CalledProcessError as e:
+                logger.warning(f"Failed to download README: {e}")
+
+        return content
+
+    def _parse_frontmatter(self, readme_text: str) -> dict[str, str]:
+        """Parse YAML frontmatter from README.
+
+        Extracts metadata from YAML frontmatter including licence, tags,
+        and other model card fields.
+
+        Returns:
+            Dictionary with separated content and metadata.
+        """
+        lines = readme_text.split("\n")
+        if lines[0] != "---":
+            return {
+                "readme": readme_text,
+                "licence": "apache-2.0",
+                "tags": "",
+                "frontmatter": "",
+            }
+
+        frontmatter_end = -1
+        for i, line in enumerate(lines[1:], 1):
+            if line == "---":
+                frontmatter_end = i
+                break
+
+        if frontmatter_end == -1:
+            return {
+                "readme": readme_text,
+                "licence": "apache-2.0",
+                "tags": "",
+                "frontmatter": "",
+            }
+
+        frontmatter = "\n".join(lines[1:frontmatter_end])
+        content = "\n".join(lines[frontmatter_end + 1 :])
+
+        # Extract licence
+        licence_match = re.search(r"^license:\s*(.+)$", frontmatter, re.MULTILINE)
+        licence_val = licence_match.group(1).strip().strip('"') if licence_match else "apache-2.0"
+
+        # Extract tags
+        tags = []
+        in_tags = False
+        for line in frontmatter.split("\n"):
+            if line.startswith("tags:"):
+                in_tags = True
+                continue
+            if in_tags:
+                if line.startswith("- "):
+                    tags.append(line[2:].strip())
+                elif line and not line.startswith(" "):
+                    break
+
+        return {
+            "readme": content,
+            "licence": licence_val,
+            "tags": ",".join(tags),
+            "frontmatter": frontmatter,
+        }
+
+    def _generate_readme_content(
+        self,
+        model_source: ModelSource,
+        results: dict[QuantisationType, QuantisationResult],
+        original_content: dict[str, str],
+        output_repo: str | None = None,
+    ) -> str:
+        """Generate complete README content with quantisation details.
+
+        Creates the full README including YAML frontmatter, quantisation status
+        table, and original model information.
+
+        Returns:
+            Complete README markdown content.
+        """
+        # Build tags
+        our_tags = [
+            "quantised",
+            "gguf",
+            "q4_k_m",
+            "q4_k_l",
+            "q4_k_xl",
+            "q4_k_xxl",
+            "bartowski-method",
+        ]
+        original_tags = original_content["tags"].split(",") if original_content["tags"] else []
+        all_tags = sorted(set(our_tags + original_tags))
+
+        # Build frontmatter
+        frontmatter = f"""---
+license: {original_content["licence"]}
+library_name: gguf
+base_model: {model_source.source_model}
+tags:
+"""
+        for tag in all_tags:
+            if tag.strip():
+                frontmatter += f"- {tag.strip()}\n"
+
+        frontmatter += "---\n\n"
+
+        # Build main content
+        hf_url = f"https://huggingface.co/{model_source.source_model}"
+        content = f"""# {model_source.original_author}-{model_source.model_name}-GGUF
+
+GGUF quantisations of [{model_source.source_model}]({hf_url}) using Bartowski's method.
+
+| Quantisation | Embeddings/Output | Attention | Feed-Forward | Status |
+|--------------|-------------------|-----------|--------------|--------|
+"""
+
+        # Add results table
+        for quant_type in [
+            QuantisationType.Q4_K_M,
+            QuantisationType.Q4_K_L,
+            QuantisationType.Q4_K_XL,
+            QuantisationType.Q4_K_XXL,
+        ]:
+            result = results.get(quant_type)
+            if not result:
+                result = type("Result", (), {"status": "planned", "success": False})()
+
+            layers = self._get_layers_config(quant_type)
+            status = self._format_status(result, model_source, quant_type, output_repo)
+
+            content += (
+                f"| {quant_type.value} | {layers['embeddings']} | "
+                f"{layers['attention']} | {layers['ffn']} | {status} |\n"
+            )
+
+        content += "\n---\n\n"
+
+        # Add original content
+        if original_content["readme"]:
+            content += "# Original Model Information\n\n" + original_content["readme"]
+        else:
+            content += f"## Original Model\n\nQuantisation of [{model_source.source_model}](https://huggingface.co/{model_source.source_model}).\n"
+
+        return frontmatter + content
+
+    def _get_layers_config(self, quant_type: QuantisationType) -> dict[str, str]:
+        """Get layer configuration for quantisation type.
+
+        Returns layer precision specifications for the quantisation table.
+
+        Returns:
+            Dictionary with embeddings, attention, and ffn precision labels.
+        """
+        configs = {
+            QuantisationType.Q4_K_M: {
+                "embeddings": "Q4_K_M",
+                "attention": "Q4_K_M",
+                "ffn": "Q4_K_M",
+            },
+            QuantisationType.Q4_K_L: {"embeddings": "Q6_K", "attention": "Q6_K", "ffn": "Q4_K_M"},
+            QuantisationType.Q4_K_XL: {"embeddings": "Q8_0", "attention": "Q6_K", "ffn": "Q4_K_M"},
+            QuantisationType.Q4_K_XXL: {"embeddings": "Q8_0", "attention": "Q8_0", "ffn": "Q4_K_M"},
+        }
+        return configs.get(
+            quant_type, {"embeddings": "Unknown", "attention": "Unknown", "ffn": "Unknown"}
+        )
+
+    def _format_status(
+        self,
+        result: QuantisationResult,
+        model_source: ModelSource,
+        quant_type: QuantisationType,
+        output_repo: str | None,
+    ) -> str:
+        """Format status indicator for README table.
+
+        Creates appropriate status indicator based on quantisation state
+        including progress indicators, file sizes, and download links.
+
+        Returns:
+            Formatted status string for table cell.
+        """
+        status_map = {
+            "planned": "⏳ Planned",
+            "processing": "🔄 Processing...",
+            "uploading": "⬆️ Uploading...",
+            "failed": "❌ Failed",
+        }
+
+        if hasattr(result, "status") and result.status in status_map:
+            base_status = status_map[result.status]
+
+            if result.status == "uploading" and hasattr(result, "file_size") and result.file_size:
+                return f"{base_status} ({result.file_size})"
+            if result.status == "completed" or (hasattr(result, "success") and result.success):
+                return self._format_success_status(result, model_source, quant_type, output_repo)
+            return base_status
+
+        # Legacy support
+        if hasattr(result, "success") and result.success:
+            return self._format_success_status(result, model_source, quant_type, output_repo)
+        return "❌ Failed"
+
+    def _format_success_status(
+        self,
+        result: QuantisationResult,
+        model_source: ModelSource,
+        quant_type: QuantisationType,
+        output_repo: str | None,
+    ) -> str:
+        """Format successful quantisation status with download link.
+
+        Creates a download link if repository information is available,
+        otherwise shows file size.
+
+        Returns:
+            Formatted success status string.
+        """
+        if not output_repo:
+            return (
+                f"✅ {result.file_size}"
+                if hasattr(result, "file_size") and result.file_size
+                else "✅ Available"
+            )
+
+        filename = (
+            f"{model_source.original_author}-{model_source.model_name}-{quant_type.value}.gguf"
+        )
+        url = f"https://huggingface.co/{output_repo}?show_file_info={filename}"
+
+        if hasattr(result, "file_size") and result.file_size:
+            return f"[✅ {result.file_size}]({url})"
+        return f"[✅ Available]({url})"
--- a/helpers/services/llama_cpp.py
+++ b/helpers/services/llama_cpp.py
@ -0,0 +1,417 @@
+"""llama.cpp environment and operations service.
+
+Manages llama.cpp binary discovery, environment setup, and imatrix generation.
+Provides consistent interface for interacting with llama.cpp tools across
+different installation methods.
+"""
+
+from __future__ import annotations
+
+import subprocess
+from pathlib import Path
+
+from helpers.logger import logger
+from helpers.models.quantisation import LlamaCppEnvironment
+from helpers.services.filesystem import FilesystemService
+
+
+class EnvironmentManager:
+    """Manages llama.cpp environment setup and binary discovery.
+
+    Handles detection of local binaries, repository setup, and conversion
+    script location. Provides fallback strategies for different installation
+    scenarios including local builds and repository-based setups.
+    """
+
+    def __init__(self, work_dir: Path) -> None:
+        """Initialise EnvironmentManager."""
+        self.work_dir = work_dir
+        self.llama_cpp_dir = work_dir / "llama.cpp"
+        self.fs = FilesystemService()
+
+    def setup(self) -> LlamaCppEnvironment:
+        """Set up llama.cpp environment with automatic detection.
+
+        Checks for local llama.cpp binaries first, then falls back to
+        repository-based setup if needed. Handles conversion script location,
+        dependency installation, and path resolution.
+
+        Returns:
+            Configured LlamaCppEnvironment instance.
+        """
+        # Check for local binaries first
+        local_env = self._check_local_binaries()
+        if local_env:
+            return local_env
+
+        # Setup repository if needed
+        return self.setup_repository()
+
+    def _check_local_binaries(self) -> LlamaCppEnvironment | None:
+        """Check for existing llama.cpp binaries in current directory.
+
+        Searches for quantise and CLI binaries in the current directory
+        and standard installation paths. Also locates conversion scripts.
+
+        Returns:
+            LlamaCppEnvironment if binaries found, None otherwise.
+        """
+        quantise_bin = Path("./llama-quantize")
+        cli_bin = Path("./llama-cli")
+
+        if not (quantise_bin.exists() and cli_bin.exists()):
+            return None
+
+        logger.info("Found llama.cpp binaries in current directory")
+
+        # Check for conversion script
+        convert_script = self._find_convert_script()
+        if convert_script:
+            logger.info(f"Found conversion script: {convert_script}")
+            return LlamaCppEnvironment(
+                quantise_binary=quantise_bin.resolve(),
+                cli_binary=cli_bin.resolve(),
+                convert_script=convert_script,
+                use_repo=False,
+            )
+
+        logger.warning("No conversion script found in current directory")
+        logger.info("Will use llama.cpp repository method for conversion")
+        return LlamaCppEnvironment(
+            quantise_binary=quantise_bin.resolve(),
+            cli_binary=cli_bin.resolve(),
+            convert_script=f"python3 {self.llama_cpp_dir}/convert_hf_to_gguf.py",
+            use_repo=True,
+        )
+
+    def _find_convert_script(self) -> str | None:
+        """Find conversion script in current directory.
+
+        Searches for various naming conventions of the HF to GGUF
+        conversion script.
+
+        Returns:
+            Command to run conversion script, or None if not found.
+        """
+        scripts = [
+            "./llama-convert-hf-to-gguf",
+            "python3 ./convert_hf_to_gguf.py",
+            "python3 ./convert-hf-to-gguf.py",
+        ]
+
+        for script in scripts:
+            if script.startswith("python3"):
+                script_path = script.split(" ", 1)[1]
+                if Path(script_path).exists():
+                    return script
+            elif Path(script).exists():
+                return script
+        return None
+
+    def setup_repository(self) -> LlamaCppEnvironment:
+        """Setup llama.cpp repository for conversion scripts.
+
+        Clones the llama.cpp repository if not present and installs
+        Python dependencies for model conversion.
+
+        Returns:
+            LlamaCppEnvironment configured with repository paths.
+        """
+        if not self.llama_cpp_dir.exists():
+            logger.info("Cloning llama.cpp for conversion script...")
+            subprocess.run(
+                [
+                    "git",
+                    "clone",
+                    "https://github.com/ggerganov/llama.cpp.git",
+                    str(self.llama_cpp_dir),
+                ],
+                check=True,
+            )
+
+            # Install Python requirements
+            logger.info("Installing Python requirements...")
+            subprocess.run(
+                [
+                    "pip3",
+                    "install",
+                    "-r",
+                    "requirements.txt",
+                    "--break-system-packages",
+                    "--root-user-action=ignore",
+                ],
+                cwd=self.llama_cpp_dir,
+                check=True,
+            )
+
+            # Install additional conversion dependencies
+            logger.info("Installing additional conversion dependencies...")
+            subprocess.run(
+                [
+                    "pip3",
+                    "install",
+                    "transformers",
+                    "sentencepiece",
+                    "protobuf",
+                    "--break-system-packages",
+                    "--root-user-action=ignore",
+                ],
+                check=True,
+            )
+        else:
+            logger.info("llama.cpp repository already exists")
+
+        # Use local binaries but repo conversion script
+        return LlamaCppEnvironment(
+            quantise_binary=Path("./llama-quantize").resolve(),
+            cli_binary=Path("./llama-cli").resolve(),
+            convert_script=f"python3 {self.llama_cpp_dir}/convert_hf_to_gguf.py",
+            use_repo=False,
+        )
+
+
+class IMatrixGenerator:
+    """Handles importance matrix generation for quantisation guidance.
+
+    Generates or locates importance matrices that guide quantisation
+    decisions, helping preserve model quality by identifying critical
+    tensors requiring higher precision.
+    """
+
+    def __init__(self) -> None:
+        """Initialise IMatrixGenerator."""
+        self.fs = FilesystemService()
+
+    def generate_imatrix(
+        self, f16_model_path: Path, llama_env: LlamaCppEnvironment, model_dir: Path
+    ) -> Path | None:
+        """Generate importance matrix for quantisation guidance.
+
+        Searches for existing imatrix files first, provides interactive
+        prompts for user-supplied matrices, then generates new matrices
+        using calibration data if necessary.
+
+        Returns:
+            Path to imatrix file, or None if generation fails.
+        """
+        imatrix_path = model_dir / "imatrix.dat"
+
+        # Check for existing imatrix
+        if imatrix_path.exists():
+            logger.info(f"Found existing imatrix: {imatrix_path.name}")
+            return imatrix_path
+
+        # Try user-provided imatrix
+        user_imatrix = self._prompt_for_user_imatrix(model_dir, imatrix_path)
+        if user_imatrix:
+            return user_imatrix
+
+        # Generate new imatrix
+        calibration_file = self._get_calibration_file()
+        if not calibration_file:
+            return None
+
+        return self._generate_new_imatrix(f16_model_path, llama_env, imatrix_path, calibration_file)
+
+    def _prompt_for_user_imatrix(self, model_dir: Path, imatrix_path: Path) -> Path | None:
+        """Prompt user for existing imatrix file.
+
+        Returns:
+            Path to user-provided imatrix, or None if not available.
+        """
+        logger.info(f"Model directory: {model_dir}")
+        logger.info(f"Looking for imatrix file at: {imatrix_path}")
+        logger.info(
+            "Tip: You can download pre-computed imatrix files from Bartowski's repositories!"
+        )
+        logger.info(
+            "   Example: https://huggingface.co/bartowski/MODEL-NAME-GGUF/resolve/main/MODEL-NAME.imatrix"
+        )
+
+        response = (
+            input("\n❓ Do you have an imatrix file to place in the model directory? (y/N): ")
+            .strip()
+            .lower()
+        )
+
+        if response != "y":
+            return None
+
+        logger.info(f"Please place your imatrix.dat file in: {model_dir}")
+        input("⏳ Press Enter when you've placed the imatrix.dat file (or Ctrl+C to cancel)...")
+
+        if imatrix_path.exists():
+            file_size = self.fs.get_file_size(imatrix_path)
+            logger.info(f"Found imatrix file! ({file_size})")
+            return imatrix_path
+
+        logger.warning("No imatrix.dat file found - continuing with automatic generation")
+        return None
+
+    def _get_calibration_file(self) -> Path | None:
+        """Get calibration data file for imatrix generation.
+
+        Returns:
+            Path to calibration file, or None if not found.
+        """
+        calibration_file = Path(__file__).parent.parent.parent / "resources" / "imatrix_data.txt"
+        if not calibration_file.exists():
+            logger.warning("resources/imatrix_data.txt not found - skipping imatrix generation")
+            logger.info(
+                "Download from: https://gist.githubusercontent.com/bartowski1182/"
+                "eb213dccb3571f863da82e99418f81e8/raw/calibration_datav3.txt"
+            )
+            return None
+        return calibration_file
+
+    def _generate_new_imatrix(
+        self,
+        f16_model_path: Path,
+        llama_env: LlamaCppEnvironment,
+        imatrix_path: Path,
+        calibration_file: Path,
+    ) -> Path | None:
+        """Generate new importance matrix using calibration data.
+
+        Returns:
+            Path to generated imatrix, or None if generation fails.
+        """
+        logger.info("Generating importance matrix (this may take 1-4 hours for large models)...")
+        logger.info(f"Model: {f16_model_path.name}")
+        logger.info(f"Calibration: {calibration_file}")
+        logger.info(f"Output: {imatrix_path}")
+
+        # Find imatrix binary
+        imatrix_binary = self._find_imatrix_binary(llama_env)
+        if not imatrix_binary:
+            logger.warning("llama-imatrix binary not found - skipping imatrix generation")
+            logger.info("Make sure llama-imatrix is in the same directory as llama-quantize")
+            return None
+
+        # Build and execute command
+        cmd = self._build_imatrix_command(
+            imatrix_binary, f16_model_path, calibration_file, imatrix_path
+        )
+        return self._execute_imatrix_generation(cmd, imatrix_path)
+
+    def _build_imatrix_command(
+        self, binary: Path, model_path: Path, calibration_file: Path, output_path: Path
+    ) -> list[str]:
+        """Build imatrix generation command.
+
+        Returns:
+            Command arguments as list.
+        """
+        return [
+            str(binary),
+            "-m",
+            str(model_path),
+            "-f",
+            str(calibration_file),
+            "-o",
+            str(output_path),
+            "--process-output",
+            "--output-frequency",
+            "10",
+            "--save-frequency",
+            "50",
+            "-t",
+            "8",
+            "-c",
+            "2048",
+            "-b",
+            "512",
+        ]
+
+    def _execute_imatrix_generation(self, cmd: list[str], imatrix_path: Path) -> Path | None:
+        """Execute imatrix generation command with real-time output.
+
+        Returns:
+            Path to generated imatrix file, or None if generation fails.
+        """
+        logger.info(f"Running: {' '.join(cmd)}")
+        logger.info("Starting imatrix generation... (progress will be shown)")
+
+        try:
+            process = subprocess.Popen(
+                cmd,
+                stdout=subprocess.PIPE,
+                stderr=subprocess.STDOUT,
+                universal_newlines=True,
+                bufsize=1,
+            )
+
+            self._stream_imatrix_output(process)
+
+            return_code = process.poll()
+            if return_code == 0:
+                return self._validate_imatrix_output(imatrix_path)
+
+        except KeyboardInterrupt:
+            logger.info("imatrix generation cancelled by user")
+            process.terminate()
+            return None
+        except Exception as e:
+            logger.error(f"imatrix generation failed with exception: {e}")
+            return None
+        else:
+            logger.error(f"imatrix generation failed with return code {return_code}")
+            return None
+
+    def _stream_imatrix_output(self, process: subprocess.Popen) -> None:
+        """Stream imatrix generation output in real-time."""
+        while True:
+            if process.stdout is not None:
+                output = process.stdout.readline()
+            else:
+                break
+            if not output and process.poll() is not None:
+                break
+            if output:
+                line = output.strip()
+                if self._should_log_imatrix_line(line):
+                    logger.info(line)
+
+    def _should_log_imatrix_line(self, line: str) -> bool:
+        """Determine if imatrix output line should be logged.
+
+        Returns:
+            True if line should be logged, False otherwise.
+        """
+        keywords = ["Computing imatrix", "perplexity:", "save_imatrix", "entries =", "ETA"]
+        return any(keyword in line for keyword in keywords) or line.startswith("[")
+
+    def _validate_imatrix_output(self, imatrix_path: Path) -> Path | None:
+        """Validate generated imatrix file.
+
+        Returns:
+            Path to imatrix if valid, None otherwise.
+        """
+        if imatrix_path.exists():
+            file_size = self.fs.get_file_size(imatrix_path)
+            logger.info(f"imatrix generation successful! ({file_size})")
+            return imatrix_path
+        logger.error("imatrix generation completed but file not found")
+        return None
+
+    def _find_imatrix_binary(self, llama_env: LlamaCppEnvironment) -> Path | None:
+        """Find llama-imatrix binary in common locations.
+
+        Searches for the imatrix binary in the current directory and
+        standard installation paths.
+
+        Returns:
+            Path to imatrix binary, or None if not found.
+        """
+        candidates = [
+            Path("./llama-imatrix"),
+            llama_env.quantise_binary.parent / "llama-imatrix",
+            Path("/usr/local/bin/llama-imatrix"),
+            Path("/usr/bin/llama-imatrix"),
+        ]
+
+        for candidate in candidates:
+            if candidate.exists() and candidate.is_file():
+                return candidate
+
+        return None
--- a/helpers/services/orchestrator.py
+++ b/helpers/services/orchestrator.py
@ -0,0 +1,397 @@
+"""Quantisation orchestration service.
+
+High-level orchestration of the complete quantisation workflow from model
+acquisition through processing to upload. Manages parallel processing,
+status tracking, and cleanup operations for efficient resource utilisation.
+"""
+
+from __future__ import annotations
+
+from concurrent.futures import Future, ThreadPoolExecutor
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any
+
+from helpers.config.quantisation_configs import QUANTISATION_CONFIGS, SUPPORTED_QUANTISATION_TYPES
+from helpers.logger import logger
+from helpers.models.quantisation import (
+    ModelSource,
+    QuantisationContext,
+    QuantisationResult,
+    QuantisationType,
+)
+from helpers.services.huggingface import ReadmeGenerator
+from helpers.services.llama_cpp import EnvironmentManager, IMatrixGenerator
+from helpers.services.quantisation import HuggingFaceUploader, ModelManager, QuantisationEngine
+from helpers.utils.tensor_mapping import URLParser
+
+
+@dataclass(slots=True)
+class QuantisationOrchestrator:
+    """Orchestrates the complete quantisation workflow.
+
+    Uses dataclass with slots for efficient memory usage and dependency injection
+    for modular service interaction following SOLID principles.
+    """
+
+    work_dir: Path = field(default_factory=lambda: Path.cwd() / "quantisation_work")
+    use_imatrix: bool = True
+    imatrix_base: str = "Q4_K_M"
+    no_upload: bool = False
+
+    # Service dependencies with factory defaults
+    url_parser: URLParser = field(default_factory=URLParser)
+    quantisation_engine: QuantisationEngine = field(default_factory=QuantisationEngine)
+    imatrix_generator: IMatrixGenerator = field(default_factory=IMatrixGenerator)
+    readme_generator: ReadmeGenerator = field(default_factory=ReadmeGenerator)
+    uploader: HuggingFaceUploader = field(default_factory=HuggingFaceUploader)
+
+    # Computed properties
+    models_dir: Path = field(init=False)
+    environment_manager: EnvironmentManager = field(init=False)
+    model_manager: ModelManager = field(init=False)
+
+    def __post_init__(self) -> None:
+        """Initialise computed properties after dataclass construction."""
+        self.models_dir = self.work_dir / "models"
+        self.environment_manager = EnvironmentManager(self.work_dir)
+        self.model_manager = ModelManager(self.models_dir, self.environment_manager)
+
+    def quantise(self, url: str) -> dict[QuantisationType, QuantisationResult]:
+        """Main quantisation workflow orchestrating model processing from URL to upload.
+
+        Returns:
+            dict[QuantisationType, QuantisationResult]: Quantisation results for each type.
+        """
+        logger.info("Starting Bartowski quantisation process...")
+
+        # Setup and preparation
+        model_source, llama_env, f16_model_path, imatrix_path, output_repo = (
+            self._setup_environment(url)
+        )
+
+        # Create initial repository
+        self._create_initial_repository(model_source, output_repo)
+
+        # Execute all quantisations
+        results = self._execute_quantisations(
+            model_source, llama_env, f16_model_path, imatrix_path, output_repo
+        )
+
+        # Cleanup
+        self._cleanup_files(f16_model_path, model_source)
+
+        self._print_completion_summary(model_source, results, output_repo)
+        return results
+
+    def _setup_environment(self, url: str) -> tuple[ModelSource, Any, Path, Path | None, str]:
+        """Setup environment and prepare model for quantisation.
+
+        Returns:
+            Tuple of (model_source, llama_env, f16_model_path, imatrix_path, output_repo).
+        """
+        model_source = self.url_parser.parse(url)
+        self._print_model_info(model_source)
+
+        self.models_dir.mkdir(parents=True, exist_ok=True)
+        llama_env = self.environment_manager.setup()
+
+        f16_model_path = self.model_manager.prepare_model(model_source, llama_env)
+
+        imatrix_path = None
+        if self.use_imatrix:
+            logger.info("Generating importance matrix (imatrix)...")
+            imatrix_path = self.imatrix_generator.generate_imatrix(
+                f16_model_path, llama_env, self.models_dir / model_source.model_name
+            )
+
+        output_repo = (
+            f"{self.uploader.get_username()}/"
+            f"{model_source.original_author}-{model_source.model_name}-GGUF"
+        )
+
+        return model_source, llama_env, f16_model_path, imatrix_path, output_repo
+
+    def _create_initial_repository(self, model_source: ModelSource, output_repo: str) -> None:
+        """Create initial repository with planned quantisations."""
+        logger.info("Creating initial README with planned quantisations...")
+        planned_results = {
+            qt: QuantisationResult(quantisation_type=qt, success=False, status="planned")
+            for qt in SUPPORTED_QUANTISATION_TYPES
+        }
+        readme_path = self.readme_generator.generate(
+            model_source, planned_results, self.models_dir, output_repo
+        )
+
+        if not self.no_upload:
+            logger.info("Creating repository with planned quantisations...")
+            self.uploader.upload_readme(output_repo, readme_path)
+        else:
+            logger.info("Skipping repository creation (--no-upload specified)")
+
+    def _execute_quantisations(
+        self,
+        model_source: ModelSource,
+        llama_env: Any,
+        f16_model_path: Path,
+        imatrix_path: Path | None,
+        output_repo: str,
+    ) -> dict[QuantisationType, QuantisationResult]:
+        """Execute all quantisation types with parallel uploads.
+
+        Returns:
+            dict[QuantisationType, QuantisationResult]: Quantisation results for each type.
+        """
+        results: dict[QuantisationType, QuantisationResult] = {}
+        upload_futures: list[Future[None]] = []
+
+        with ThreadPoolExecutor(max_workers=1, thread_name_prefix="uploader") as upload_executor:
+            for quant_type in SUPPORTED_QUANTISATION_TYPES:
+                result = self._process_single_quantisation(
+                    quant_type,
+                    model_source,
+                    llama_env,
+                    f16_model_path,
+                    imatrix_path,
+                    output_repo,
+                    results,
+                    upload_executor,
+                    upload_futures,
+                )
+                results[quant_type] = result
+
+            self._wait_for_uploads(upload_futures)
+
+        return results
+
+    def _process_single_quantisation(
+        self,
+        quant_type: QuantisationType,
+        model_source: ModelSource,
+        llama_env: Any,
+        f16_model_path: Path,
+        imatrix_path: Path | None,
+        output_repo: str,
+        results: dict[QuantisationType, QuantisationResult],
+        upload_executor: ThreadPoolExecutor,
+        upload_futures: list,
+    ) -> QuantisationResult:
+        """Process a single quantisation type.
+
+        Returns:
+            QuantisationResult: Result of the quantisation attempt.
+        """
+        try:
+            logger.info(f"Starting {quant_type.value} quantisation...")
+            config = QUANTISATION_CONFIGS[quant_type]
+
+            # Update status to processing
+            result = QuantisationResult(quantisation_type=quant_type, success=False)
+            result.status = "processing"
+            results[quant_type] = result
+
+            self._update_readme_status(model_source, results, output_repo)
+
+            # Perform quantisation
+            context = QuantisationContext(
+                f16_model_path=f16_model_path,
+                model_source=model_source,
+                config=config,
+                llama_env=llama_env,
+                models_dir=self.models_dir,
+                imatrix_path=imatrix_path,
+                base_quant=self.imatrix_base,
+            )
+            result = self.quantisation_engine.quantise(context)
+
+            self._handle_quantisation_result(
+                result,
+                quant_type,
+                model_source,
+                results,
+                output_repo,
+                upload_executor,
+                upload_futures,
+            )
+        except Exception as e:
+            return self._handle_quantisation_error(
+                e, quant_type, model_source, results, output_repo
+            )
+        else:
+            return result
+
+    def _handle_quantisation_result(
+        self,
+        result: QuantisationResult,
+        quant_type: QuantisationType,
+        model_source: ModelSource,
+        results: dict[QuantisationType, QuantisationResult],
+        output_repo: str,
+        upload_executor: ThreadPoolExecutor,
+        upload_futures: list,
+    ) -> None:
+        """Handle successful or failed quantisation result."""
+        if result.success and result.file_path:
+            quant_str = getattr(result.quantisation_type, "value", result.quantisation_type)
+            logger.info(f"Starting parallel upload of {quant_str}...")
+            upload_future = upload_executor.submit(
+                self._upload_and_cleanup,
+                output_repo,
+                result.file_path,
+                quant_type,
+                model_source,
+                results,
+            )
+            upload_futures.append(upload_future)
+            result.file_path = None  # Mark as being uploaded
+            result.status = "uploading"
+        else:
+            result.status = "failed"
+
+        self._update_readme_status(model_source, results, output_repo)
+
+    def _handle_quantisation_error(
+        self,
+        error: Exception,
+        quant_type: QuantisationType,
+        model_source: ModelSource,
+        results: dict[QuantisationType, QuantisationResult],
+        output_repo: str,
+    ) -> QuantisationResult:
+        """Handle quantisation processing error.
+
+        Returns:
+            QuantisationResult: Failed quantisation result with error information.
+        """
+        logger.error(f"Error processing {quant_type.value}: {error}")
+        result = QuantisationResult(quantisation_type=quant_type, success=False)
+        result.status = "failed"
+        result.error_message = str(error)
+
+        try:
+            self._update_readme_status(model_source, results, output_repo)
+        except Exception as readme_error:
+            logger.error(f"Failed to update README after error: {readme_error}")
+
+        return result
+
+    def _update_readme_status(
+        self,
+        model_source: ModelSource,
+        results: dict[QuantisationType, QuantisationResult],
+        output_repo: str,
+    ) -> None:
+        """Update README with current quantisation status."""
+        if not self.no_upload:
+            updated_readme_path = self.readme_generator.generate(
+                model_source, results, self.models_dir, output_repo
+            )
+            self.uploader.upload_readme(output_repo, updated_readme_path)
+
+    def _wait_for_uploads(self, upload_futures: list) -> None:
+        """Wait for all parallel uploads to complete."""
+        logger.info("Waiting for any remaining uploads to complete...")
+        for future in upload_futures:
+            try:
+                future.result(timeout=300)  # 5 minute timeout per upload
+            except Exception as e:
+                logger.warning(f"Upload error: {e}")
+
+    def _cleanup_files(self, f16_model_path: Path, model_source: ModelSource) -> None:
+        """Clean up temporary files after processing."""
+        if f16_model_path.exists():
+            logger.info(f"Removing F16 model {f16_model_path.name} to save disk space...")
+            f16_model_path.unlink()
+
+        if not model_source.is_gguf_repo:
+            self._cleanup_original_model(model_source)
+
+    def _cleanup_original_model(self, model_source: ModelSource) -> None:
+        """Clean up original safetensors/PyTorch files after successful conversion."""
+        model_dir = self.models_dir / model_source.model_name
+
+        pytorch_files = list(model_dir.glob("pytorch_model*.bin"))
+        if pytorch_files:
+            logger.info(f"Removing {len(pytorch_files)} PyTorch model files to save disk space...")
+            for file in pytorch_files:
+                file.unlink()
+
+        logger.info("Keeping config files, tokeniser, and metadata for reference")
+
+    def _upload_and_cleanup(
+        self,
+        output_repo: str,
+        file_path: Path,
+        quant_type: QuantisationType,
+        model_source: ModelSource,
+        results: dict[QuantisationType, QuantisationResult],
+    ) -> None:
+        """Upload file and clean up (runs in background thread)."""
+        try:
+            logger.info(f"[PARALLEL] Uploading {quant_type}...")
+            self.uploader.upload_model_file(output_repo, file_path)
+
+            logger.info(f"[PARALLEL] Removing {file_path.name} to save disk space...")
+            file_path.unlink()
+
+            results[quant_type].status = "completed"
+            updated_readme_path = self.readme_generator.generate(
+                model_source, results, self.models_dir, output_repo
+            )
+            self.uploader.upload_readme(output_repo, updated_readme_path)
+
+            logger.info(f"[PARALLEL] {quant_type} upload and cleanup complete")
+        except Exception as e:
+            logger.error(f"[PARALLEL] Failed to upload {quant_type}: {e}")
+            results[quant_type].status = "failed"
+            results[quant_type].error_message = str(e)
+
+            updated_readme_path = self.readme_generator.generate(
+                model_source, results, self.models_dir, output_repo
+            )
+            self.uploader.upload_readme(output_repo, updated_readme_path)
+            raise
+
+    def _print_model_info(self, model_source: ModelSource) -> None:
+        """Print model information."""
+        logger.info(f"Source URL: {model_source.url}")
+        logger.info(f"Source model: {model_source.source_model}")
+        logger.info(f"Original author: {model_source.original_author}")
+        logger.info(f"Model name: {model_source.model_name}")
+        logger.info(f"Your HF username: {self.uploader.get_username()}")
+        logger.info(f"Working directory: {self.work_dir}")
+
+    def _print_completion_summary(
+        self,
+        model_source: ModelSource,
+        results: dict[QuantisationType, QuantisationResult],
+        output_repo: str,
+    ) -> None:
+        """Print completion summary."""
+        successful_results = [r for r in results.values() if r.success]
+
+        if successful_results:
+            logger.info("Complete! Your quantised models are available at:")
+            logger.info(f"   https://huggingface.co/{output_repo}")
+            logger.info("Model info:")
+            logger.info(f"   - Source URL: {model_source.url}")
+            logger.info(f"   - Original: {model_source.source_model}")
+            logger.info(
+                "   - Method: "
+                f"{'Direct GGUF download' if model_source.is_gguf_repo else 'HF model conversion'}"
+            )
+            logger.info(f"   - Quantised: {output_repo}")
+
+            for result in successful_results:
+                if result.file_size:
+                    filename = (
+                        f"{model_source.original_author}-{model_source.model_name}-"
+                        f"{result.quantisation_type}.gguf"
+                    )
+                    logger.info(f"   - {result.quantisation_type}: {filename} ({result.file_size})")
+        else:
+            logger.error(
+                "All quantisations failed - repository created with documentation "
+                "but no model files"
+            )
+            logger.error(f"   Repository: https://huggingface.co/{output_repo}")
--- a/helpers/services/quantisation.py
+++ b/helpers/services/quantisation.py
@ -0,0 +1,486 @@
+"""Quantisation operations service.
+
+Provides modular quantisation engine, model management, and upload capabilities
+for GGUF model processing. Consolidates quantisation logic from various tools
+into reusable components following SOLID principles.
+"""
+
+from __future__ import annotations
+
+import shutil
+import subprocess
+from typing import TYPE_CHECKING
+
+from helpers.logger import logger
+from helpers.models.quantisation import (
+    ModelSource,
+    QuantisationContext,
+    QuantisationResult,
+    QuantisationType,
+)
+from helpers.services.filesystem import FilesystemService
+
+if TYPE_CHECKING:
+    from pathlib import Path
+
+    from helpers.models.quantisation import LlamaCppEnvironment
+    from helpers.services.llama_cpp import EnvironmentManager
+
+
+class QuantisationEngine:
+    """Handles the actual quantisation process with configurable methods.
+
+    Provides flexible quantisation execution supporting multiple tensor
+    precision configurations, importance matrices, and fallback strategies.
+    Encapsulates llama-quantize binary interactions with real-time output.
+    """
+
+    def __init__(self) -> None:
+        """Initialise quantisation engine."""
+        self.fs = FilesystemService()
+
+    def quantise(self, context: QuantisationContext) -> QuantisationResult:
+        """Perform quantisation using the specified configuration.
+
+        Executes quantisation with primary and fallback methods, handling
+        tensor-specific precision overrides and importance matrix guidance.
+
+        Returns:
+            QuantisationResult with success status and file information.
+        """
+        logger.info(
+            f"⚙️ Creating {context.config.name} quantisation ({context.config.description})..."
+        )
+
+        output_path = context.get_output_path()
+
+        logger.info(f"🎯 Attempting {context.config.name} quantisation...")
+        logger.info(f"📝 Source: {context.f16_model_path}")
+        logger.info(f"📝 Target: {output_path}")
+
+        # Try primary method
+        if self._try_quantisation_method(
+            context, output_path, context.config.tensor_types, "method 1"
+        ):
+            return self._create_success_result(context.config.name, output_path, "method 1")
+
+        # Try fallback methods
+        for i, fallback_method in enumerate(context.config.fallback_methods, 2):
+            method_name = f"method {i}"
+            if self._try_quantisation_method(context, output_path, fallback_method, method_name):
+                return self._create_success_result(context.config.name, output_path, method_name)
+
+        logger.error("All %s quantisation methods failed", context.config.name)
+        return QuantisationResult(
+            quantisation_type=QuantisationType(context.config.name),
+            success=False,
+            error_message="All quantisation methods failed",
+        )
+
+    def _try_quantisation_method(
+        self,
+        context: QuantisationContext,
+        output_path: Path,
+        tensor_config: dict[str, str],
+        method_name: str,
+    ) -> bool:
+        """Try a specific quantisation method with real-time output.
+
+        Builds and executes llama-quantize command with appropriate parameters,
+        streaming output for progress monitoring.
+
+        Returns:
+            True if quantisation successful, False otherwise.
+        """
+        logger.info(f"🔍 Trying {method_name}...")
+
+        cmd = self._build_quantisation_command(context, output_path, tensor_config)
+        return self._execute_quantisation_command(cmd, method_name)
+
+    def _build_quantisation_command(
+        self, context: QuantisationContext, output_path: Path, tensor_config: dict[str, str]
+    ) -> list[str]:
+        """Build quantisation command with all required parameters.
+
+        Returns:
+            List of command arguments.
+        """
+        cmd = [str(context.llama_env.quantise_binary)]
+
+        # Add imatrix if available
+        if context.imatrix_path and context.imatrix_path.exists():
+            cmd.extend(["--imatrix", str(context.imatrix_path)])
+            logger.info(f"🧮 Using imatrix: {context.imatrix_path.name}")
+
+        # Add tensor type arguments
+        self._add_tensor_type_arguments(cmd, tensor_config)
+
+        cmd.extend([str(context.f16_model_path), str(output_path), context.base_quant])
+        return cmd
+
+    def _add_tensor_type_arguments(self, cmd: list[str], tensor_config: dict[str, str]) -> None:
+        """Add tensor type arguments to command."""
+        if not tensor_config:
+            return
+
+        for tensor_name, quant_type in tensor_config.items():
+            if tensor_name.startswith(("token-embedding-type", "output-tensor-type")):
+                cmd.extend([f"--{tensor_name}", quant_type])
+            else:
+                cmd.extend(["--tensor-type", f"{tensor_name}={quant_type}"])
+
+    def _execute_quantisation_command(self, cmd: list[str], method_name: str) -> bool:
+        """Execute quantisation command with real-time output.
+
+        Returns:
+            True if quantisation successful, False otherwise.
+        """
+        logger.info(f"💻 Running: {' '.join(cmd)}")
+        logger.info("⏳ Quantisation in progress... (this may take several minutes)")
+
+        try:
+            process = subprocess.Popen(
+                cmd,
+                stdout=subprocess.PIPE,
+                stderr=subprocess.STDOUT,
+                universal_newlines=True,
+                bufsize=1,
+            )
+
+            self._stream_quantisation_output(process)
+
+            return_code = process.poll()
+            if return_code == 0:
+                logger.info(f"✅ {method_name} quantisation successful!")
+                return True
+        except Exception as e:
+            logger.info(f"❌ {method_name} failed with exception: {e}")
+            return False
+        else:
+            logger.info(f"❌ {method_name} failed with return code {return_code}")
+            return False
+
+    def _stream_quantisation_output(self, process: subprocess.Popen) -> None:
+        """Stream quantisation output in real-time."""
+        while True:
+            if process.stdout is not None:
+                output = process.stdout.readline()
+            else:
+                break
+            if not output and process.poll() is not None:
+                break
+            if output:
+                logger.info(f"📊 {output.strip()}")
+
+    def _create_success_result(
+        self, quant_type: str, output_path: Path, method_used: str
+    ) -> QuantisationResult:
+        """Create successful quantisation result with file metadata.
+
+        Returns:
+            QuantisationResult with file path and size information.
+        """
+        file_size = self.fs.get_file_size(output_path)
+        return QuantisationResult(
+            quantisation_type=QuantisationType(quant_type),
+            success=True,
+            file_path=output_path,
+            file_size=file_size,
+            method_used=method_used,
+        )
+
+
+class ModelManager:
+    """Handles model downloading and preparation for quantisation.
+
+    Manages both GGUF repository downloads and HuggingFace model conversions,
+    providing unified interface for model acquisition and preparation.
+    """
+
+    def __init__(self, models_dir: Path, environment_manager: EnvironmentManager) -> None:
+        """Initialise model manager with storage and environment configuration.
+
+        Sets up model storage directory and links to environment manager for
+        conversion script access and llama.cpp tool discovery.
+        """
+        self.models_dir = models_dir
+        self.environment_manager = environment_manager
+        self.fs = FilesystemService()
+
+    def prepare_model(self, model_source: ModelSource, llama_env: LlamaCppEnvironment) -> Path:
+        """Prepare model for quantisation and return F16 model path.
+
+        Handles both GGUF repository downloads and regular HuggingFace model
+        conversion workflows with automatic format detection.
+
+        Returns:
+            Path to F16 GGUF model ready for quantisation.
+        """
+        model_dir = self.models_dir / model_source.model_name
+
+        if model_source.is_gguf_repo:
+            return self._handle_gguf_repo(model_source, model_dir)
+        return self._handle_regular_repo(model_source, model_dir, llama_env)
+
+    def _handle_gguf_repo(self, model_source: ModelSource, model_dir: Path) -> Path:
+        """Handle GGUF repository download with pattern matching.
+
+        Downloads GGUF files matching specified patterns, prioritising
+        multi-part files and F16 variants.
+
+        Returns:
+            Path to downloaded or existing GGUF file.
+        """
+        logger.info(f"⬇️ Downloading GGUF file from repository: {model_source.source_model}")
+        logger.info(f"🔍 Looking for file pattern: *{model_source.gguf_file_pattern}*")
+
+        f16_model = model_dir / f"{model_source.model_name}-f16.gguf"
+
+        if f16_model.exists():
+            logger.info(f"✅ Found existing F16 file: {f16_model.name}")
+            return f16_model
+
+        # Check for existing GGUF files
+        model_dir.mkdir(parents=True, exist_ok=True)
+        existing_gguf = self.fs.find_gguf_files(model_dir)
+
+        if existing_gguf:
+            logger.info(f"✅ Found existing GGUF file: {existing_gguf[0].name}")
+            return existing_gguf[0]
+
+        # Download with patterns
+        downloaded_file = self._download_gguf_with_patterns(
+            model_source.source_model, model_source.gguf_file_pattern, model_dir
+        )
+
+        if downloaded_file:
+            # Handle multi-part files
+            if "00001-of-" in downloaded_file.name:
+                return downloaded_file
+            if "-00002-of-" in downloaded_file.name or "-00003-of-" in downloaded_file.name:
+                base_name = downloaded_file.name.replace("-00002-of-", "-00001-of-").replace(
+                    "-00003-of-", "-00001-of-"
+                )
+                first_part = downloaded_file.parent / base_name
+                if first_part.exists():
+                    logger.info(f"🔄 Using first part: {first_part.name}")
+                    return first_part
+
+            # Rename single file to standard name
+            downloaded_file.rename(f16_model)
+            return f16_model
+
+        # Fallback to regular conversion
+        logger.info("💡 Falling back to downloading full repository and converting...")
+        return self._handle_regular_repo(
+            ModelSource(**{**model_source.dict(), "is_gguf_repo": False}),
+            model_dir,
+            None,
+        )
+
+    def _download_gguf_with_patterns(
+        self, source_model: str, pattern: str | None, model_dir: Path
+    ) -> Path | None:
+        """Download GGUF file using various pattern strategies.
+
+        Tries multiple pattern variations to find and download appropriate
+        GGUF files, handling timeouts and temporary directories.
+
+        Returns:
+            Path to downloaded file, or None if all patterns fail.
+        """
+        if pattern:
+            patterns = [
+                f"*{pattern}*",
+                f"*{pattern.lower()}*",
+                f"*{pattern.upper()}*",
+                "*f16*",
+                "*F16*",
+                "*fp16*",
+            ]
+        else:
+            patterns = ["*f16*", "*F16*", "*fp16*"]
+
+        temp_dir = model_dir / "gguf_temp"
+
+        for search_pattern in patterns:
+            logger.info(f"🔍 Trying pattern: {search_pattern}")
+            temp_dir.mkdir(exist_ok=True)
+
+            try:
+                subprocess.run(
+                    [
+                        "timeout",
+                        "300",
+                        "huggingface-cli",
+                        "download",
+                        source_model,
+                        "--include",
+                        search_pattern,
+                        "--local-dir",
+                        str(temp_dir),
+                    ],
+                    check=True,
+                    capture_output=True,
+                )
+
+                # Find downloaded GGUF files
+                gguf_files = self.fs.find_gguf_files(temp_dir, pattern)
+                if gguf_files:
+                    found_file = gguf_files[0]
+                    logger.info(f"✅ Found GGUF file: {found_file.name}")
+
+                    # Move to parent directory
+                    final_path = model_dir / found_file.name
+                    shutil.move(str(found_file), str(final_path))
+                    shutil.rmtree(temp_dir)
+                    return final_path
+
+            except subprocess.CalledProcessError:
+                logger.info(f"⚠️ Pattern {search_pattern} failed or timed out")
+                continue
+            finally:
+                if temp_dir.exists():
+                    shutil.rmtree(temp_dir, ignore_errors=True)
+
+        return None
+
+    def _handle_regular_repo(
+        self,
+        model_source: ModelSource,
+        model_dir: Path,
+        llama_env: LlamaCppEnvironment | None,
+    ) -> Path:
+        """Handle regular HuggingFace repository conversion.
+
+        Downloads full model repository and converts to F16 GGUF format
+        using llama.cpp conversion scripts.
+
+        Returns:
+            Path to converted F16 GGUF model.
+        """
+        logger.info(f"⬇️ Downloading source model: {model_source.source_model}")
+
+        if not model_dir.exists():
+            subprocess.run(
+                [
+                    "huggingface-cli",
+                    "download",
+                    model_source.source_model,
+                    "--local-dir",
+                    str(model_dir),
+                ],
+                check=True,
+            )
+        else:
+            logger.info("✅ Model already downloaded")
+
+        logger.info("🔄 Converting to GGUF F16 format...")
+        f16_model = model_dir / f"{model_source.model_name}-f16.gguf"
+
+        if not f16_model.exists():
+            if not llama_env:
+                llama_env = self.environment_manager.setup()
+
+            # Ensure conversion script is available
+            if llama_env.use_repo or not self.environment_manager.llama_cpp_dir.exists():
+                logger.info("Getting conversion script from llama.cpp repository...")
+                llama_env = self.environment_manager.setup_repository()
+
+            subprocess.run(
+                [
+                    *llama_env.convert_script.split(),
+                    str(model_dir),
+                    "--outtype",
+                    "f16",
+                    "--outfile",
+                    str(f16_model),
+                ],
+                check=True,
+            )
+        else:
+            logger.info("✅ F16 model already exists")
+
+        return f16_model
+
+
+class HuggingFaceUploader:
+    """Handles uploading models and documentation to HuggingFace.
+
+    Provides methods for repository creation, file uploads, and README
+    updates with proper error handling and retry logic.
+    """
+
+    @staticmethod
+    def get_username() -> str:
+        """Get authenticated HuggingFace username.
+
+        Returns:
+            HuggingFace username from CLI authentication.
+
+        Raises:
+            RuntimeError: If not authenticated.
+        """
+        try:
+            result = subprocess.run(
+                ["huggingface-cli", "whoami"],
+                capture_output=True,
+                text=True,
+                check=True,
+            )
+            return result.stdout.strip()
+        except (subprocess.CalledProcessError, FileNotFoundError) as err:
+            msg = "Please log in to HuggingFace first: huggingface-cli login"
+            raise RuntimeError(msg) from err
+
+    def upload_readme(self, output_repo: str, readme_path: Path) -> None:
+        """Upload or update README file to repository.
+
+        Creates repository if needed, handles existing repository updates.
+        """
+        logger.info("Uploading README...")
+        try:
+            subprocess.run(
+                [
+                    "huggingface-cli",
+                    "upload",
+                    output_repo,
+                    str(readme_path),
+                    "README.md",
+                    "--create",
+                ],
+                check=True,
+                capture_output=True,
+            )
+            logger.info("README uploaded")
+        except subprocess.CalledProcessError:
+            # Repository exists, update without --create
+            subprocess.run(
+                [
+                    "huggingface-cli",
+                    "upload",
+                    output_repo,
+                    str(readme_path),
+                    "README.md",
+                ],
+                check=True,
+            )
+            logger.info("README updated")
+
+    def upload_model_file(self, output_repo: str, model_path: Path) -> None:
+        """Upload model file to repository.
+
+        Uploads GGUF model file to specified repository path.
+        """
+        logger.info(f"Uploading {model_path.name}...")
+        subprocess.run(
+            [
+                "huggingface-cli",
+                "upload",
+                output_repo,
+                str(model_path),
+                model_path.name,
+            ],
+            check=True,
+        )
+        logger.info(f"{model_path.name} uploaded")
--- a/helpers/utils/init.py
+++ b/helpers/utils/init.py
@ -0,0 +1,16 @@
+"""Utility functions for llm-gguf-tools.
+
+Provides low-level utilities for tensor mapping, configuration parsing,
+and other common operations. Uses UK English spelling conventions throughout.
+"""
+
+from __future__ import annotations
+
+from helpers.utils.config_parser import ConfigParser
+from helpers.utils.tensor_mapping import TensorMapper, URLParser
+
+__all__ = [
+    "ConfigParser",
+    "TensorMapper",
+    "URLParser",
+]
--- a/helpers/utils/config_parser.py
+++ b/helpers/utils/config_parser.py
@ -0,0 +1,171 @@
+"""Configuration parsing utilities.
+
+Provides utilities for parsing model configurations, inferring parameters,
+and handling architecture-specific settings. Uses UK English spelling
+conventions throughout.
+"""
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any
+
+from helpers.models.conversion import GGUFParameters, ModelConfig, VisionConfig
+from helpers.services.filesystem import FilesystemService
+
+if TYPE_CHECKING:
+    from pathlib import Path
+
+
+class ConfigParser:
+    """Parses and transforms model configuration files.
+
+    Handles loading of HuggingFace config.json files, parameter inference,
+    and conversion to GGUF-compatible formats. Provides sensible defaults
+    for missing values and architecture-specific handling.
+    """
+
+    def __init__(self) -> None:
+        """Initialise ConfigParser."""
+        self.fs = FilesystemService()
+
+    def load_model_config(self, model_path: Path) -> ModelConfig:
+        """Load model configuration from config.json file.
+
+        Reads the standard HuggingFace config.json file and parses it into
+        a structured ModelConfig instance with proper type validation. Handles
+        vision model configurations and provides sensible defaults for missing values.
+
+        Returns:
+            Parsed ModelConfig instance.
+        """
+        config_file = model_path / "config.json"
+        raw_config = self.fs.load_json_config(config_file)
+
+        # Parse vision config if present
+        vision_config = None
+        if "vision_config" in raw_config:
+            vision_config = VisionConfig(**raw_config["vision_config"])
+
+        # Create ModelConfig with parsed values
+        return ModelConfig(
+            architectures=raw_config.get("architectures", ["Unknown"]),
+            model_type=raw_config.get("model_type", "unknown"),
+            vocab_size=raw_config.get("vocab_size", 32000),
+            max_position_embeddings=raw_config.get("max_position_embeddings", 2048),
+            hidden_size=raw_config.get("hidden_size", 4096),
+            num_hidden_layers=raw_config.get("num_hidden_layers", 32),
+            intermediate_size=raw_config.get("intermediate_size", 11008),
+            num_attention_heads=raw_config.get("num_attention_heads", 32),
+            num_key_value_heads=raw_config.get("num_key_value_heads"),
+            rope_theta=raw_config.get("rope_theta", 10000.0),
+            rope_scaling=raw_config.get("rope_scaling"),
+            rms_norm_eps=raw_config.get("rms_norm_eps", 1e-5),
+            vision_config=vision_config,
+        )
+
+    def infer_gguf_parameters(self, config: ModelConfig) -> GGUFParameters:
+        """Infer GGUF parameters from model configuration.
+
+        Translates HuggingFace model configuration to GGUF parameter format,
+        providing sensible defaults for missing values and handling various
+        architecture conventions.
+
+        Args:
+            config: Parsed ModelConfig instance.
+
+        Returns:
+            GGUFParameters with inferred values.
+        """
+        # Calculate derived parameters
+        num_heads = config.num_attention_heads
+        embedding_length = config.hidden_size
+        rope_dimension_count = embedding_length // num_heads
+
+        # Handle KV heads (for GQA models)
+        num_kv_heads = config.num_key_value_heads or num_heads
+
+        # Create GGUFParameters using dict with aliases
+        params_dict = {
+            "vocab_size": config.vocab_size,
+            "context_length": config.max_position_embeddings,
+            "embedding_length": embedding_length,
+            "block_count": config.num_hidden_layers,
+            "feed_forward_length": config.intermediate_size,
+            "attention.head_count": num_heads,
+            "attention.head_count_kv": num_kv_heads,
+            "attention.layer_norm_rms_epsilon": config.rms_norm_eps,
+            "rope.freq_base": config.rope_theta,
+            "rope.dimension_count": rope_dimension_count,
+        }
+
+        params = GGUFParameters.model_validate(params_dict)
+
+        # Add RoPE scaling if present
+        if config.rope_scaling:
+            params.rope_scaling_type = config.rope_scaling.get("type", "linear")
+            params.rope_scaling_factor = config.rope_scaling.get("factor", 1.0)
+
+        return params
+
+    @staticmethod
+    def get_architecture_mapping(architecture: str) -> str:
+        """Map architecture names to known GGUF architectures.
+
+        Provides fallback mappings for architectures not directly supported
+        by GGUF, mapping them to similar known architectures.
+
+        Args:
+            architecture: Original architecture name from config.
+
+        Returns:
+            GGUF-compatible architecture name.
+        """
+        # Architecture mappings to known GGUF types
+        mappings = {
+            "DotsOCRForCausalLM": "qwen2",  # Similar architecture
+            "GptOssForCausalLM": "llama",  # Use llama as fallback
+            "MistralForCausalLM": "llama",  # Mistral is llama-like
+            "Qwen2ForCausalLM": "qwen2",
+            "LlamaForCausalLM": "llama",
+            "GemmaForCausalLM": "gemma",
+            "Phi3ForCausalLM": "phi3",
+            # Add more mappings as needed
+        }
+
+        return mappings.get(architecture, "llama")  # Default to llama
+
+    @staticmethod
+    def load_tokeniser_config(model_path: Path) -> dict[str, Any]:
+        """Load tokeniser configuration from model directory.
+
+        Reads tokenizer_config.json to extract special token IDs and
+        other tokenisation parameters.
+
+        Args:
+            model_path: Path to model directory.
+
+        Returns:
+            Tokeniser configuration dictionary.
+        """
+        fs = FilesystemService()
+        tokeniser_config_path = model_path / "tokenizer_config.json"
+
+        if not tokeniser_config_path.exists():
+            # Return defaults if no config found
+            return {
+                "bos_token_id": 1,
+                "eos_token_id": 2,
+                "unk_token_id": 0,
+                "pad_token_id": 0,
+            }
+
+        config = fs.load_json_config(tokeniser_config_path)
+
+        # Extract token IDs with defaults
+        return {
+            "bos_token_id": config.get("bos_token_id", 1),
+            "eos_token_id": config.get("eos_token_id", 2),
+            "unk_token_id": config.get("unk_token_id", 0),
+            "pad_token_id": config.get("pad_token_id", 0),
+            "model_type": config.get("model_type", "llama"),
+        }
--- a/helpers/utils/tensor_mapping.py
+++ b/helpers/utils/tensor_mapping.py
@ -0,0 +1,196 @@
+"""Tensor mapping and URL parsing utilities.
+
+Provides utilities for mapping tensor names between different formats,
+parsing model URLs, and handling architecture-specific conversions.
+Uses UK English spelling conventions throughout.
+"""
+
+from __future__ import annotations
+
+import re
+from typing import ClassVar
+
+from helpers.models.quantisation import ModelSource, URLType
+
+
+class TensorMapper:
+    """Maps tensor names between HuggingFace and GGUF conventions.
+
+    Provides flexible tensor name translation supporting direct mappings,
+    layer-aware transformations, and architecture-specific overrides.
+    Handles both simple renames and complex pattern-based conversions.
+    """
+
+    # Common direct mappings across architectures
+    DIRECT_MAPPINGS: ClassVar[dict[str, str]] = {
+        "model.embed_tokens.weight": "token_embd.weight",
+        "model.norm.weight": "output_norm.weight",
+        "lm_head.weight": "output.weight",
+    }
+
+    # Layer component patterns for transformer blocks
+    LAYER_PATTERNS: ClassVar[dict[str, str]] = {
+        "self_attn.q_proj.weight": "attn_q.weight",
+        "self_attn.q_proj.bias": "attn_q.bias",
+        "self_attn.k_proj.weight": "attn_k.weight",
+        "self_attn.k_proj.bias": "attn_k.bias",
+        "self_attn.v_proj.weight": "attn_v.weight",
+        "self_attn.v_proj.bias": "attn_v.bias",
+        "self_attn.o_proj": "attn_output.weight",
+        "mlp.gate_proj": "ffn_gate.weight",
+        "mlp.up_proj": "ffn_up.weight",
+        "mlp.down_proj": "ffn_down.weight",
+        "input_layernorm": "attn_norm.weight",
+        "post_attention_layernorm": "ffn_norm.weight",
+    }
+
+    @classmethod
+    def map_tensor_name(cls, original_name: str) -> str | None:
+        """Map original tensor name to GGUF format.
+
+        Translates HuggingFace tensor naming to GGUF format, handling embeddings,
+        attention layers, feed-forward networks, and normalisation layers. Uses
+        layer-aware mapping for transformer blocks whilst maintaining consistency
+        across different model architectures.
+
+        Returns:
+            GGUF tensor name, or None if unmappable.
+        """
+        # Check direct mappings first
+        if original_name in cls.DIRECT_MAPPINGS:
+            return cls.DIRECT_MAPPINGS[original_name]
+
+        # Handle layer-specific tensors
+        if ".layers." in original_name:
+            return cls._map_layer_tensor(original_name)
+
+        # Return None for unmapped tensors
+        return None
+
+    @classmethod
+    def _map_layer_tensor(cls, tensor_name: str) -> str | None:
+        """Map layer-specific tensor names.
+
+        Handles tensors within transformer layers, extracting layer indices
+        and mapping component names to GGUF conventions.
+
+        Args:
+            tensor_name: Layer tensor name containing .layers.N. pattern.
+
+        Returns:
+            Mapped GGUF tensor name, or None if unmappable.
+        """
+        # Extract layer number
+        parts = tensor_name.split(".")
+        layer_idx = None
+        for i, part in enumerate(parts):
+            if part == "layers" and i + 1 < len(parts):
+                layer_idx = parts[i + 1]
+                break
+
+        if layer_idx is None:
+            return None
+
+        # Check each pattern
+        for pattern, replacement in cls.LAYER_PATTERNS.items():
+            if pattern in tensor_name:
+                return f"blk.{layer_idx}.{replacement}"
+
+        return None
+
+
+class URLParser:
+    """Parses and validates model URLs from various sources.
+
+    Handles HuggingFace URLs, Ollama-style GGUF references, and other
+    model source formats. Extracts metadata including author, model name,
+    and file patterns for appropriate download strategies.
+    """
+
+    @staticmethod
+    def parse(url: str) -> ModelSource:
+        """Parse URL and extract model source information.
+
+        Analyses URL format to determine source type and extract relevant
+        metadata for model download and processing.
+
+        Args:
+            url: Model URL in supported format.
+
+        Returns:
+            ModelSource with parsed information.
+
+        Raises:
+            ValueError: If URL format is not recognised.
+        """
+        if not url:
+            msg = "URL cannot be empty"
+            raise ValueError(msg)
+
+        # Try Ollama-style GGUF URL first (hf.co/author/model:pattern)
+        ollama_match = re.match(r"^hf\.co/([^:]+):(.+)$", url)
+        if ollama_match:
+            source_model = ollama_match.group(1)
+            gguf_pattern = ollama_match.group(2)
+            return URLParser._create_model_source(
+                url,
+                URLType.OLLAMA_GGUF,
+                source_model,
+                gguf_file_pattern=gguf_pattern,
+                is_gguf_repo=True,
+            )
+
+        # Try regular HuggingFace URL
+        hf_match = re.match(r"https://huggingface\.co/([^/]+/[^/?]+)", url)
+        if hf_match:
+            source_model = hf_match.group(1)
+            return URLParser._create_model_source(
+                url, URLType.HUGGINGFACE, source_model, is_gguf_repo=False
+            )
+
+        msg = (
+            "Invalid URL format\n"
+            "Supported formats:\n"
+            "  - https://huggingface.co/username/model-name\n"
+            "  - hf.co/username/model-name-GGUF:F16"
+        )
+        raise ValueError(msg)
+
+    @staticmethod
+    def _create_model_source(
+        url: str,
+        url_type: URLType,
+        source_model: str,
+        gguf_file_pattern: str | None = None,
+        is_gguf_repo: bool = False,
+    ) -> ModelSource:
+        """Create ModelSource with parsed information.
+
+        Constructs a ModelSource instance with extracted metadata,
+        handling author/model name splitting and GGUF suffix removal.
+
+        Args:
+            url: Original URL.
+            url_type: Type of URL (HuggingFace or Ollama GGUF).
+            source_model: Repository identifier (author/model).
+            gguf_file_pattern: Optional GGUF file pattern.
+            is_gguf_repo: Whether this is a GGUF repository.
+
+        Returns:
+            Configured ModelSource instance.
+        """
+        author, model_name = source_model.split("/", 1)
+
+        # Strip -GGUF suffix for GGUF repos
+        if is_gguf_repo and model_name.endswith("-GGUF"):
+            model_name = model_name[:-5]
+
+        return ModelSource(
+            url=url,
+            url_type=url_type,
+            source_model=source_model,
+            original_author=author,
+            model_name=model_name,
+            gguf_file_pattern=gguf_file_pattern,
+            is_gguf_repo=is_gguf_repo,
+        )
--- a/pyproject.toml
+++ b/pyproject.toml
@ -0,0 +1,96 @@
+[project]
+name = "llm-gguf-tools"
+version = "0.1.0"
+description = "Tools to convert and quantise language models in GGUF format"
+readme = "README.md"
+license = { text = "Apache-2.0" }
+authors = [{ name = "Tom Foster", email = "tom@tomfos.tr" }]
+maintainers = [{ name = "Tom Foster", email = "tom@tomfos.tr" }]
+requires-python = ">=3.13"
+classifiers = [
+    "Development Status :: 3 - Alpha",
+    "License :: OSI Approved :: Apache Software License",
+    "Programming Language :: Python",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.13",
+    "Topic :: Scientific/Engineering :: Artificial Intelligence",
+    "Topic :: Software Development :: Libraries :: Python Modules",
+]
+dependencies = ["gguf>=0", "pydantic>=2", "safetensors>=0", "torch>=2"]
+
+[project.urls]
+Homepage = "https://git.tomfos.tr/tom/llm-gguf-tools"
+"Bug Reports" = "https://git.tomfos.tr/tom/llm-gguf-tools/issues"
+"Source" = "https://git.tomfos.tr/tom/llm-gguf-tools"
+
+[dependency-groups]
+dev = ["pytest>=8", "ruff>=0", "uv>=0"]
+
+[tool.uv]
+package = true
+
+[[tool.uv.index]]
+name = "pytorch-cpu"
+url = "https://download.pytorch.org/whl/cpu"
+
+[tool.uv.sources]
+torch = { index = "pytorch-cpu" }
+
+[build-system]
+requires = ["setuptools>=61.0"]
+build-backend = "setuptools.build_meta"
+
+[project.scripts]
+quantise = "quantise:main"
+safetensors-to-gguf = "direct_safetensors_to_gguf:main"
+
+[tool.setuptools]
+packages = { find = {} }
+
+[tool.ruff]
+cache-dir = "/tmp/.ruff_cache"
+fix = true
+line-length = 100
+preview = true
+show-fixes = false
+target-version = "py313"
+unsafe-fixes = true
+
+[tool.ruff.format]
+line-ending = "auto"
+skip-magic-trailing-comma = false
+
+[tool.ruff.lint]
+fixable = ["ALL"]
+ignore = [
+    "ANN401",  # use of Any type
+    "BLE001",  # blind Exception usage
+    "COM812",  # missing trailing comma
+    "CPY",     # flake8-copyright
+    "FBT",     # boolean arguments
+    "PLR0912", # too many branches
+    "PLR0913", # too many arguments
+    "PLR0915", # too many statements
+    "PLR0917", # too many positional arguments
+    "PLR6301", # method could be static
+    "RUF029",  # async methods that don't await
+    "S104",    # binding to all interfaces
+    "S110",    # passed exceptions
+    "S404",    # use of subprocess
+    "S603",    # check subprocess input
+    "S607",    # subprocess with partial path
+    "TRY301",  # raise inside try block
+]
+select = ["ALL"]
+unfixable = [
+    "F841",   # local variable assigned but never used
+    "RUF100", # unused noqa comments
+    "T201",   # don't strip print statement
+]
+
+[tool.ruff.lint.isort]
+combine-as-imports = true
+required-imports = ["from __future__ import annotations"]
+
+[tool.ruff.lint.pydocstyle]
+convention = "google"
--- a/quantise_gguf.py
+++ b/quantise_gguf.py
@ -0,0 +1,101 @@
+#!/usr/bin/env python3
+"""Bartowski Quantisation Script for advanced GGUF model processing.
+
+Implements a sophisticated quantisation pipeline supporting Q4_K_M, Q4_K_L,
+Q4_K_XL, and Q4_K_XXL methods with tensor-level precision control. Features
+parallel processing, status tracking, automatic README generation, and
+HuggingFace integration for streamlined model distribution workflows.
+
+Usage: python quantise.py <huggingface_url>
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import sys
+from pathlib import Path
+
+from helpers.logger import logger
+from helpers.services.orchestrator import QuantisationOrchestrator
+
+
+def main() -> None:
+    """Main entry point for the Bartowski quantisation workflow.
+
+    Parses command-line arguments, initialises the quantisation orchestrator,
+    and executes the complete model processing pipeline from HuggingFace URL
+    to quantised GGUF files with optional HuggingFace upload and cleanup.
+    """
+    parser = argparse.ArgumentParser(
+        description="Bartowski Quantisation Script - Supports Q4_K_M, Q4_K_L, Q4_K_XL, Q4_K_XXL",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  python quantise.py https://huggingface.co/DavidAU/Gemma-3-4b-it-Uncensored-DBL-X
+  python quantise.py hf.co/DavidAU/Gemma-3-it-4B-Uncensored-DBL-X-GGUF:F16
+        """,
+    )
+    parser.add_argument("url", help="HuggingFace model URL")
+    parser.add_argument(
+        "--work-dir", type=Path, help="Working directory (default: ./quantisation_work)"
+    )
+    parser.add_argument(
+        "--no-imatrix",
+        action="store_true",
+        help="Skip imatrix generation (faster but lower quality)",
+    )
+    parser.add_argument(
+        "--imatrix-base",
+        choices=[
+            "Q2_K",
+            "Q3_K_L",
+            "Q3_K_M",
+            "Q3_K_S",
+            "Q4_K_S",
+            "Q4_K_M",
+            "Q5_K_S",
+            "Q5_K_M",
+            "Q6_K",
+            "Q8_0",
+        ],
+        default="Q4_K_M",
+        help="Base quantisation for imatrix generation",
+    )
+    parser.add_argument(
+        "--no-upload",
+        action="store_true",
+        help="Skip uploading to HuggingFace (local testing only)",
+    )
+
+    args = parser.parse_args()
+
+    if not args.url:
+        parser.print_help()
+        sys.exit(1)
+
+    try:
+        orchestrator = QuantisationOrchestrator(
+            work_dir=args.work_dir or Path.cwd() / "quantisation_work",
+            use_imatrix=not args.no_imatrix,
+            imatrix_base=args.imatrix_base,
+            no_upload=args.no_upload,
+        )
+        orchestrator.quantise(args.url)
+
+        # Cleanup prompt
+        logger.info("Cleaning up...")
+        response = input("Delete working files? (y/N): ").strip().lower()
+        if response == "y":
+            shutil.rmtree(orchestrator.work_dir)
+            logger.info("Cleanup complete")
+        else:
+            logger.info(f"Working files kept in: {orchestrator.work_dir}")
+
+    except Exception as e:
+        logger.error(f"Error: {e}")
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/resources/imatrix_data.txt
+++ b/resources/imatrix_data.txt
--- a/safetensors2gguf.py
+++ b/safetensors2gguf.py
@ -0,0 +1,95 @@
+#!/usr/bin/env python3
+"""Direct SafeTensors to GGUF converter for unsupported architectures.
+
+This script attempts to convert SafeTensors models to GGUF format directly,
+without relying on llama.cpp's architecture-specific conversion logic.
+"""
+
+from __future__ import annotations
+
+import sys
+import traceback
+from argparse import ArgumentParser
+from pathlib import Path
+
+from helpers.logger import logger
+from helpers.services.gguf import GGUFConverter
+from helpers.utils.config_parser import ConfigParser
+from helpers.utils.tensor_mapping import TensorMapper
+
+
+def convert_safetensors_to_gguf(
+    model_path: Path, output_path: Path, force_architecture: str | None = None
+) -> bool:
+    """Convert SafeTensors model to GGUF format with comprehensive metadata handling.
+
+    Orchestrates the complete conversion workflow: loads configuration, maps
+    architecture to known GGUF types, creates writer with proper metadata,
+    processes all tensor files with name mapping, and adds tokeniser data.
+    Handles BFloat16 conversion and provides fallback architecture mapping
+    for unsupported model types to ensure maximum compatibility.
+
+    Returns:
+        True if conversion was successful, False otherwise.
+    """
+    # Use ConfigParser to load configuration
+    config_parser = ConfigParser()
+    model_config = config_parser.load_model_config(model_path)
+
+    arch_name = model_config.architectures[0]
+    model_type = model_config.model_type
+
+    logger.info(f"Architecture: {arch_name}")
+    logger.info(f"Model type: {model_type}")
+
+    # Use forced architecture or try to map to a known one
+    if force_architecture:
+        arch = force_architecture
+        logger.warning(f"Using forced architecture: {arch}")
+    else:
+        # Use ConfigParser's architecture mapping
+        arch = config_parser.get_architecture_mapping(arch_name)
+        if arch != arch_name:
+            logger.warning(f"Unknown architecture {arch_name}, using {arch} as fallback")
+
+    # Use the new GGUFConverter for the conversion
+    tensor_mapper = TensorMapper()
+    return GGUFConverter.convert_safetensors(
+        model_path, output_path, model_config, arch, tensor_mapper
+    )
+
+
+def main() -> None:
+    """Main entry point for SafeTensors to GGUF conversion command-line interface.
+
+    Parses command-line arguments, validates input paths, and orchestrates the
+    conversion process with proper error handling. Supports forced architecture
+    mapping and flexible output path specification. Provides comprehensive
+    error reporting and exit codes for integration with automated workflows.
+    """
+    parser = ArgumentParser(description="Convert SafeTensors to GGUF directly")
+    parser.add_argument("model_path", help="Path to SafeTensors model directory")
+    parser.add_argument("-o", "--output", help="Output GGUF file path")
+    parser.add_argument("--force-arch", help="Force a specific architecture mapping")
+
+    args = parser.parse_args()
+
+    model_path = Path(args.model_path)
+    if not model_path.exists():
+        logger.error(f"Model path not found: {model_path}")
+        sys.exit(1)
+
+    output_path = Path(args.output) if args.output else model_path / f"{model_path.name}-f32.gguf"
+
+    try:
+        success = convert_safetensors_to_gguf(model_path, output_path, args.force_arch)
+        sys.exit(0 if success else 1)
+    except Exception as e:
+        logger.error(f"Conversion failed: {e}")
+
+        traceback.print_exc()
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/uv.lock
+++ b/uv.lock
@ -0,0 +1,425 @@
+version = 1
+revision = 2
+requires-python = ">=3.13"
+resolution-markers = [
+    "sys_platform != 'darwin'",
+    "sys_platform == 'darwin'",
+]
+
+[[package]]
+name = "annotated-types"
+version = "0.7.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/ee/67/531ea369ba64dcff5ec9c3402f9f51bf748cec26dde048a2f973a4eea7f5/annotated_types-0.7.0.tar.gz", hash = "sha256:aff07c09a53a08bc8cfccb9c85b05f1aa9a2a6f23728d790723543408344ce89", size = 16081, upload-time = "2024-05-20T21:33:25.928Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/78/b6/6307fbef88d9b5ee7421e68d78a9f162e0da4900bc5f5793f6d3d0e34fb8/annotated_types-0.7.0-py3-none-any.whl", hash = "sha256:1f02e8b43a8fbbc3f3e0d4f0f4bfc8131bcb4eebe8849b8e5c773f3a1c582a53", size = 13643, upload-time = "2024-05-20T21:33:24.1Z" },
+]
+
+[[package]]
+name = "colorama"
+version = "0.4.6"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6" },
+]
+
+[[package]]
+name = "filelock"
+version = "3.13.1"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/filelock-3.13.1-py3-none-any.whl", hash = "sha256:57dbda9b35157b05fb3e58ee91448612eb674172fab98ee235ccb0b5bee19a1c" },
+]
+
+[[package]]
+name = "fsspec"
+version = "2024.6.1"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/fsspec-2024.6.1-py3-none-any.whl", hash = "sha256:3cb443f8bcd2efb31295a5b9fdb02aee81d8452c80d28f97a6d0959e6cee101e" },
+]
+
+[[package]]
+name = "gguf"
+version = "0.17.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "numpy" },
+    { name = "pyyaml" },
+    { name = "tqdm" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/08/08/7de1ca4b71e7bf33b547f82bb22505e221b5fa42f67d635e200e0ad22ad6/gguf-0.17.1.tar.gz", hash = "sha256:36ad71aad900a3e75fc94ebe96ea6029f03a4e44be7627ef7ad3d03e8c7bcb53", size = 89338, upload-time = "2025-06-19T14:00:33.705Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/fc/31/6a93a887617ee7deeaa602ca3d02d1c12a6cb8a742a695de5d128f5fa46a/gguf-0.17.1-py3-none-any.whl", hash = "sha256:7bc5aa7eeb1931f7d39b48fdc5b38fda6b294b9dca75cf607ac69557840a3943", size = 96224, upload-time = "2025-06-19T14:00:32.88Z" },
+]
+
+[[package]]
+name = "iniconfig"
+version = "2.1.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/f2/97/ebf4da567aa6827c909642694d71c9fcf53e5b504f2d96afea02718862f3/iniconfig-2.1.0.tar.gz", hash = "sha256:3abbd2e30b36733fee78f9c7f7308f2d0050e88f0087fd25c2645f63c773e1c7", size = 4793, upload-time = "2025-03-19T20:09:59.721Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/2c/e1/e6716421ea10d38022b952c159d5161ca1193197fb744506875fbb87ea7b/iniconfig-2.1.0-py3-none-any.whl", hash = "sha256:9deba5723312380e77435581c6bf4935c94cbfab9b1ed33ef8d238ea168eb760", size = 6050, upload-time = "2025-03-19T20:10:01.071Z" },
+]
+
+[[package]]
+name = "jinja2"
+version = "3.1.4"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+dependencies = [
+    { name = "markupsafe" },
+]
+wheels = [
+    { url = "https://download.pytorch.org/whl/Jinja2-3.1.4-py3-none-any.whl", hash = "sha256:bc5dd2abb727a5319567b7a813e6a2e7318c39f4f487cfe6c89c6f9c7d25197d" },
+]
+
+[[package]]
+name = "llm-gguf-tools"
+version = "0.1.0"
+source = { editable = "." }
+dependencies = [
+    { name = "gguf" },
+    { name = "pydantic" },
+    { name = "safetensors" },
+    { name = "torch", version = "2.8.0", source = { registry = "https://download.pytorch.org/whl/cpu" }, marker = "sys_platform == 'darwin'" },
+    { name = "torch", version = "2.8.0+cpu", source = { registry = "https://download.pytorch.org/whl/cpu" }, marker = "sys_platform != 'darwin'" },
+]
+
+[package.dev-dependencies]
+dev = [
+    { name = "pytest" },
+    { name = "ruff" },
+    { name = "uv" },
+]
+
+[package.metadata]
+requires-dist = [
+    { name = "gguf", specifier = ">=0" },
+    { name = "pydantic", specifier = ">=2" },
+    { name = "safetensors", specifier = ">=0" },
+    { name = "torch", specifier = ">=2", index = "https://download.pytorch.org/whl/cpu" },
+]
+
+[package.metadata.requires-dev]
+dev = [
+    { name = "pytest", specifier = ">=8" },
+    { name = "ruff", specifier = ">=0" },
+    { name = "uv", specifier = ">=0" },
+]
+
+[[package]]
+name = "markupsafe"
+version = "3.0.2"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/MarkupSafe-3.0.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:15ab75ef81add55874e7ab7055e9c397312385bd9ced94920f2802310c930396" },
+]
+
+[[package]]
+name = "mpmath"
+version = "1.3.0"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/mpmath-1.3.0-py3-none-any.whl", hash = "sha256:a0b2b9fe80bbcd81a6647ff13108738cfb482d481d826cc0e02f5b35e5c88d2c" },
+]
+
+[[package]]
+name = "networkx"
+version = "3.3"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/networkx-3.3-py3-none-any.whl", hash = "sha256:28575580c6ebdaf4505b22c6256a2b9de86b316dc63ba9e93abde3d78dfdbcf2" },
+]
+
+[[package]]
+name = "numpy"
+version = "2.1.2"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:a84498e0d0a1174f2b3ed769b67b656aa5460c92c9554039e11f20a05650f00d" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:4d6ec0d4222e8ffdab1744da2560f07856421b367928026fb540e1945f2eeeaf" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:259ec80d54999cc34cd1eb8ded513cb053c3bf4829152a2e00de2371bd406f5e" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-macosx_14_0_x86_64.whl", hash = "sha256:675c741d4739af2dc20cd6c6a5c4b7355c728167845e3c6b0e824e4e5d36a6c3" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:05b2d4e667895cc55e3ff2b56077e4c8a5604361fc21a042845ea3ad67465aa8" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:43cca367bf94a14aca50b89e9bc2061683116cfe864e56740e083392f533ce7a" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-win_amd64.whl", hash = "sha256:f2ded8d9b6f68cc26f8425eda5d3877b47343e68ca23d0d0846f4d312ecaa445" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:2ffef621c14ebb0188a8633348504a35c13680d6da93ab5cb86f4e54b7e922b5" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:ad369ed238b1959dfbade9018a740fb9392c5ac4f9b5173f420bd4f37ba1f7a0" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:d82075752f40c0ddf57e6e02673a17f6cb0f8eb3f587f63ca1eaab5594da5b17" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313t-macosx_14_0_x86_64.whl", hash = "sha256:1600068c262af1ca9580a527d43dc9d959b0b1d8e56f8a05d830eea39b7c8af6" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a26ae94658d3ba3781d5e103ac07a876b3e9b29db53f68ed7df432fd033358a8" },
+    { url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:13311c2db4c5f7609b462bc0f43d3c465424d25c626d95040f073e30f7570e35" },
+]
+
+[[package]]
+name = "packaging"
+version = "24.1"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/packaging-24.1-py3-none-any.whl", hash = "sha256:5b8f2217dbdbd2f7f384c41c628544e6d52f2d0f53c6d0c3ea61aa5d1d7ff124" },
+]
+
+[[package]]
+name = "pluggy"
+version = "1.6.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/f9/e2/3e91f31a7d2b083fe6ef3fa267035b518369d9511ffab804f839851d2779/pluggy-1.6.0.tar.gz", hash = "sha256:7dcc130b76258d33b90f61b658791dede3486c3e6bfb003ee5c9bfb396dd22f3", size = 69412, upload-time = "2025-05-15T12:30:07.975Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/54/20/4d324d65cc6d9205fabedc306948156824eb9f0ee1633355a8f7ec5c66bf/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538, upload-time = "2025-05-15T12:30:06.134Z" },
+]
+
+[[package]]
+name = "pydantic"
+version = "2.11.7"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "annotated-types" },
+    { name = "pydantic-core" },
+    { name = "typing-extensions" },
+    { name = "typing-inspection" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/00/dd/4325abf92c39ba8623b5af936ddb36ffcfe0beae70405d456ab1fb2f5b8c/pydantic-2.11.7.tar.gz", hash = "sha256:d989c3c6cb79469287b1569f7447a17848c998458d49ebe294e975b9baf0f0db", size = 788350, upload-time = "2025-06-14T08:33:17.137Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/6a/c0/ec2b1c8712ca690e5d61979dee872603e92b8a32f94cc1b72d53beab008a/pydantic-2.11.7-py3-none-any.whl", hash = "sha256:dde5df002701f6de26248661f6835bbe296a47bf73990135c7d07ce741b9623b", size = 444782, upload-time = "2025-06-14T08:33:14.905Z" },
+]
+
+[[package]]
+name = "pydantic-core"
+version = "2.33.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/ad/88/5f2260bdfae97aabf98f1778d43f69574390ad787afb646292a638c923d4/pydantic_core-2.33.2.tar.gz", hash = "sha256:7cb8bc3605c29176e1b105350d2e6474142d7c1bd1d9327c4a9bdb46bf827acc", size = 435195, upload-time = "2025-04-23T18:33:52.104Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/46/8c/99040727b41f56616573a28771b1bfa08a3d3fe74d3d513f01251f79f172/pydantic_core-2.33.2-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:1082dd3e2d7109ad8b7da48e1d4710c8d06c253cbc4a27c1cff4fbcaa97a9e3f", size = 2015688, upload-time = "2025-04-23T18:31:53.175Z" },
+    { url = "https://files.pythonhosted.org/packages/3a/cc/5999d1eb705a6cefc31f0b4a90e9f7fc400539b1a1030529700cc1b51838/pydantic_core-2.33.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:f517ca031dfc037a9c07e748cefd8d96235088b83b4f4ba8939105d20fa1dcd6", size = 1844808, upload-time = "2025-04-23T18:31:54.79Z" },
+    { url = "https://files.pythonhosted.org/packages/6f/5e/a0a7b8885c98889a18b6e376f344da1ef323d270b44edf8174d6bce4d622/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0a9f2c9dd19656823cb8250b0724ee9c60a82f3cdf68a080979d13092a3b0fef", size = 1885580, upload-time = "2025-04-23T18:31:57.393Z" },
+    { url = "https://files.pythonhosted.org/packages/3b/2a/953581f343c7d11a304581156618c3f592435523dd9d79865903272c256a/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:2b0a451c263b01acebe51895bfb0e1cc842a5c666efe06cdf13846c7418caa9a", size = 1973859, upload-time = "2025-04-23T18:31:59.065Z" },
+    { url = "https://files.pythonhosted.org/packages/e6/55/f1a813904771c03a3f97f676c62cca0c0a4138654107c1b61f19c644868b/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1ea40a64d23faa25e62a70ad163571c0b342b8bf66d5fa612ac0dec4f069d916", size = 2120810, upload-time = "2025-04-23T18:32:00.78Z" },
+    { url = "https://files.pythonhosted.org/packages/aa/c3/053389835a996e18853ba107a63caae0b9deb4a276c6b472931ea9ae6e48/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0fb2d542b4d66f9470e8065c5469ec676978d625a8b7a363f07d9a501a9cb36a", size = 2676498, upload-time = "2025-04-23T18:32:02.418Z" },
+    { url = "https://files.pythonhosted.org/packages/eb/3c/f4abd740877a35abade05e437245b192f9d0ffb48bbbbd708df33d3cda37/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9fdac5d6ffa1b5a83bca06ffe7583f5576555e6c8b3a91fbd25ea7780f825f7d", size = 2000611, upload-time = "2025-04-23T18:32:04.152Z" },
+    { url = "https://files.pythonhosted.org/packages/59/a7/63ef2fed1837d1121a894d0ce88439fe3e3b3e48c7543b2a4479eb99c2bd/pydantic_core-2.33.2-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:04a1a413977ab517154eebb2d326da71638271477d6ad87a769102f7c2488c56", size = 2107924, upload-time = "2025-04-23T18:32:06.129Z" },
+    { url = "https://files.pythonhosted.org/packages/04/8f/2551964ef045669801675f1cfc3b0d74147f4901c3ffa42be2ddb1f0efc4/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:c8e7af2f4e0194c22b5b37205bfb293d166a7344a5b0d0eaccebc376546d77d5", size = 2063196, upload-time = "2025-04-23T18:32:08.178Z" },
+    { url = "https://files.pythonhosted.org/packages/26/bd/d9602777e77fc6dbb0c7db9ad356e9a985825547dce5ad1d30ee04903918/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_armv7l.whl", hash = "sha256:5c92edd15cd58b3c2d34873597a1e20f13094f59cf88068adb18947df5455b4e", size = 2236389, upload-time = "2025-04-23T18:32:10.242Z" },
+    { url = "https://files.pythonhosted.org/packages/42/db/0e950daa7e2230423ab342ae918a794964b053bec24ba8af013fc7c94846/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:65132b7b4a1c0beded5e057324b7e16e10910c106d43675d9bd87d4f38dde162", size = 2239223, upload-time = "2025-04-23T18:32:12.382Z" },
+    { url = "https://files.pythonhosted.org/packages/58/4d/4f937099c545a8a17eb52cb67fe0447fd9a373b348ccfa9a87f141eeb00f/pydantic_core-2.33.2-cp313-cp313-win32.whl", hash = "sha256:52fb90784e0a242bb96ec53f42196a17278855b0f31ac7c3cc6f5c1ec4811849", size = 1900473, upload-time = "2025-04-23T18:32:14.034Z" },
+    { url = "https://files.pythonhosted.org/packages/a0/75/4a0a9bac998d78d889def5e4ef2b065acba8cae8c93696906c3a91f310ca/pydantic_core-2.33.2-cp313-cp313-win_amd64.whl", hash = "sha256:c083a3bdd5a93dfe480f1125926afcdbf2917ae714bdb80b36d34318b2bec5d9", size = 1955269, upload-time = "2025-04-23T18:32:15.783Z" },
+    { url = "https://files.pythonhosted.org/packages/f9/86/1beda0576969592f1497b4ce8e7bc8cbdf614c352426271b1b10d5f0aa64/pydantic_core-2.33.2-cp313-cp313-win_arm64.whl", hash = "sha256:e80b087132752f6b3d714f041ccf74403799d3b23a72722ea2e6ba2e892555b9", size = 1893921, upload-time = "2025-04-23T18:32:18.473Z" },
+    { url = "https://files.pythonhosted.org/packages/a4/7d/e09391c2eebeab681df2b74bfe6c43422fffede8dc74187b2b0bf6fd7571/pydantic_core-2.33.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:61c18fba8e5e9db3ab908620af374db0ac1baa69f0f32df4f61ae23f15e586ac", size = 1806162, upload-time = "2025-04-23T18:32:20.188Z" },
+    { url = "https://files.pythonhosted.org/packages/f1/3d/847b6b1fed9f8ed3bb95a9ad04fbd0b212e832d4f0f50ff4d9ee5a9f15cf/pydantic_core-2.33.2-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:95237e53bb015f67b63c91af7518a62a8660376a6a0db19b89acc77a4d6199f5", size = 1981560, upload-time = "2025-04-23T18:32:22.354Z" },
+    { url = "https://files.pythonhosted.org/packages/6f/9a/e73262f6c6656262b5fdd723ad90f518f579b7bc8622e43a942eec53c938/pydantic_core-2.33.2-cp313-cp313t-win_amd64.whl", hash = "sha256:c2fc0a768ef76c15ab9238afa6da7f69895bb5d1ee83aeea2e3509af4472d0b9", size = 1935777, upload-time = "2025-04-23T18:32:25.088Z" },
+]
+
+[[package]]
+name = "pygments"
+version = "2.19.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/b0/77/a5b8c569bf593b0140bde72ea885a803b82086995367bf2037de0159d924/pygments-2.19.2.tar.gz", hash = "sha256:636cb2477cec7f8952536970bc533bc43743542f70392ae026374600add5b887", size = 4968631, upload-time = "2025-06-21T13:39:12.283Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/c7/21/705964c7812476f378728bdf590ca4b771ec72385c533964653c68e86bdc/pygments-2.19.2-py3-none-any.whl", hash = "sha256:86540386c03d588bb81d44bc3928634ff26449851e99741617ecb9037ee5ec0b", size = 1225217, upload-time = "2025-06-21T13:39:07.939Z" },
+]
+
+[[package]]
+name = "pytest"
+version = "8.4.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "colorama", marker = "sys_platform == 'win32'" },
+    { name = "iniconfig" },
+    { name = "packaging" },
+    { name = "pluggy" },
+    { name = "pygments" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/08/ba/45911d754e8eba3d5a841a5ce61a65a685ff1798421ac054f85aa8747dfb/pytest-8.4.1.tar.gz", hash = "sha256:7c67fd69174877359ed9371ec3af8a3d2b04741818c51e5e99cc1742251fa93c", size = 1517714, upload-time = "2025-06-18T05:48:06.109Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/29/16/c8a903f4c4dffe7a12843191437d7cd8e32751d5de349d45d3fe69544e87/pytest-8.4.1-py3-none-any.whl", hash = "sha256:539c70ba6fcead8e78eebbf1115e8b589e7565830d7d006a8723f19ac8a0afb7", size = 365474, upload-time = "2025-06-18T05:48:03.955Z" },
+]
+
+[[package]]
+name = "pyyaml"
+version = "6.0.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/54/ed/79a089b6be93607fa5cdaedf301d7dfb23af5f25c398d5ead2525b063e17/pyyaml-6.0.2.tar.gz", hash = "sha256:d584d9ec91ad65861cc08d42e834324ef890a082e591037abe114850ff7bbc3e", size = 130631, upload-time = "2024-08-06T20:33:50.674Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/ef/e3/3af305b830494fa85d95f6d95ef7fa73f2ee1cc8ef5b495c7c3269fb835f/PyYAML-6.0.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:efdca5630322a10774e8e98e1af481aad470dd62c3170801852d752aa7a783ba", size = 181309, upload-time = "2024-08-06T20:32:43.4Z" },
+    { url = "https://files.pythonhosted.org/packages/45/9f/3b1c20a0b7a3200524eb0076cc027a970d320bd3a6592873c85c92a08731/PyYAML-6.0.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:50187695423ffe49e2deacb8cd10510bc361faac997de9efef88badc3bb9e2d1", size = 171679, upload-time = "2024-08-06T20:32:44.801Z" },
+    { url = "https://files.pythonhosted.org/packages/7c/9a/337322f27005c33bcb656c655fa78325b730324c78620e8328ae28b64d0c/PyYAML-6.0.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0ffe8360bab4910ef1b9e87fb812d8bc0a308b0d0eef8c8f44e0254ab3b07133", size = 733428, upload-time = "2024-08-06T20:32:46.432Z" },
+    { url = "https://files.pythonhosted.org/packages/a3/69/864fbe19e6c18ea3cc196cbe5d392175b4cf3d5d0ac1403ec3f2d237ebb5/PyYAML-6.0.2-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:17e311b6c678207928d649faa7cb0d7b4c26a0ba73d41e99c4fff6b6c3276484", size = 763361, upload-time = "2024-08-06T20:32:51.188Z" },
+    { url = "https://files.pythonhosted.org/packages/04/24/b7721e4845c2f162d26f50521b825fb061bc0a5afcf9a386840f23ea19fa/PyYAML-6.0.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:70b189594dbe54f75ab3a1acec5f1e3faa7e8cf2f1e08d9b561cb41b845f69d5", size = 759523, upload-time = "2024-08-06T20:32:53.019Z" },
+    { url = "https://files.pythonhosted.org/packages/2b/b2/e3234f59ba06559c6ff63c4e10baea10e5e7df868092bf9ab40e5b9c56b6/PyYAML-6.0.2-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:41e4e3953a79407c794916fa277a82531dd93aad34e29c2a514c2c0c5fe971cc", size = 726660, upload-time = "2024-08-06T20:32:54.708Z" },
+    { url = "https://files.pythonhosted.org/packages/fe/0f/25911a9f080464c59fab9027482f822b86bf0608957a5fcc6eaac85aa515/PyYAML-6.0.2-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:68ccc6023a3400877818152ad9a1033e3db8625d899c72eacb5a668902e4d652", size = 751597, upload-time = "2024-08-06T20:32:56.985Z" },
+    { url = "https://files.pythonhosted.org/packages/14/0d/e2c3b43bbce3cf6bd97c840b46088a3031085179e596d4929729d8d68270/PyYAML-6.0.2-cp313-cp313-win32.whl", hash = "sha256:bc2fa7c6b47d6bc618dd7fb02ef6fdedb1090ec036abab80d4681424b84c1183", size = 140527, upload-time = "2024-08-06T20:33:03.001Z" },
+    { url = "https://files.pythonhosted.org/packages/fa/de/02b54f42487e3d3c6efb3f89428677074ca7bf43aae402517bc7cca949f3/PyYAML-6.0.2-cp313-cp313-win_amd64.whl", hash = "sha256:8388ee1976c416731879ac16da0aff3f63b286ffdd57cdeb95f3f2e085687563", size = 156446, upload-time = "2024-08-06T20:33:04.33Z" },
+]
+
+[[package]]
+name = "ruff"
+version = "0.12.7"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/a1/81/0bd3594fa0f690466e41bd033bdcdf86cba8288345ac77ad4afbe5ec743a/ruff-0.12.7.tar.gz", hash = "sha256:1fc3193f238bc2d7968772c82831a4ff69252f673be371fb49663f0068b7ec71", size = 5197814, upload-time = "2025-07-29T22:32:35.877Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/e1/d2/6cb35e9c85e7a91e8d22ab32ae07ac39cc34a71f1009a6f9e4a2a019e602/ruff-0.12.7-py3-none-linux_armv6l.whl", hash = "sha256:76e4f31529899b8c434c3c1dede98c4483b89590e15fb49f2d46183801565303", size = 11852189, upload-time = "2025-07-29T22:31:41.281Z" },
+    { url = "https://files.pythonhosted.org/packages/63/5b/a4136b9921aa84638f1a6be7fb086f8cad0fde538ba76bda3682f2599a2f/ruff-0.12.7-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:789b7a03e72507c54fb3ba6209e4bb36517b90f1a3569ea17084e3fd295500fb", size = 12519389, upload-time = "2025-07-29T22:31:54.265Z" },
+    { url = "https://files.pythonhosted.org/packages/a8/c9/3e24a8472484269b6b1821794141f879c54645a111ded4b6f58f9ab0705f/ruff-0.12.7-py3-none-macosx_11_0_arm64.whl", hash = "sha256:2e1c2a3b8626339bb6369116e7030a4cf194ea48f49b64bb505732a7fce4f4e3", size = 11743384, upload-time = "2025-07-29T22:31:59.575Z" },
+    { url = "https://files.pythonhosted.org/packages/26/7c/458dd25deeb3452c43eaee853c0b17a1e84169f8021a26d500ead77964fd/ruff-0.12.7-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:32dec41817623d388e645612ec70d5757a6d9c035f3744a52c7b195a57e03860", size = 11943759, upload-time = "2025-07-29T22:32:01.95Z" },
+    { url = "https://files.pythonhosted.org/packages/7f/8b/658798472ef260ca050e400ab96ef7e85c366c39cf3dfbef4d0a46a528b6/ruff-0.12.7-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:47ef751f722053a5df5fa48d412dbb54d41ab9b17875c6840a58ec63ff0c247c", size = 11654028, upload-time = "2025-07-29T22:32:04.367Z" },
+    { url = "https://files.pythonhosted.org/packages/a8/86/9c2336f13b2a3326d06d39178fd3448dcc7025f82514d1b15816fe42bfe8/ruff-0.12.7-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:a828a5fc25a3efd3e1ff7b241fd392686c9386f20e5ac90aa9234a5faa12c423", size = 13225209, upload-time = "2025-07-29T22:32:06.952Z" },
+    { url = "https://files.pythonhosted.org/packages/76/69/df73f65f53d6c463b19b6b312fd2391dc36425d926ec237a7ed028a90fc1/ruff-0.12.7-py3-none-manylinux_2_17_ppc64.manylinux2014_ppc64.whl", hash = "sha256:5726f59b171111fa6a69d82aef48f00b56598b03a22f0f4170664ff4d8298efb", size = 14182353, upload-time = "2025-07-29T22:32:10.053Z" },
+    { url = "https://files.pythonhosted.org/packages/58/1e/de6cda406d99fea84b66811c189b5ea139814b98125b052424b55d28a41c/ruff-0.12.7-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:74e6f5c04c4dd4aba223f4fe6e7104f79e0eebf7d307e4f9b18c18362124bccd", size = 13631555, upload-time = "2025-07-29T22:32:12.644Z" },
+    { url = "https://files.pythonhosted.org/packages/6f/ae/625d46d5164a6cc9261945a5e89df24457dc8262539ace3ac36c40f0b51e/ruff-0.12.7-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:5d0bfe4e77fba61bf2ccadf8cf005d6133e3ce08793bbe870dd1c734f2699a3e", size = 12667556, upload-time = "2025-07-29T22:32:15.312Z" },
+    { url = "https://files.pythonhosted.org/packages/55/bf/9cb1ea5e3066779e42ade8d0cd3d3b0582a5720a814ae1586f85014656b6/ruff-0.12.7-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:06bfb01e1623bf7f59ea749a841da56f8f653d641bfd046edee32ede7ff6c606", size = 12939784, upload-time = "2025-07-29T22:32:17.69Z" },
+    { url = "https://files.pythonhosted.org/packages/55/7f/7ead2663be5627c04be83754c4f3096603bf5e99ed856c7cd29618c691bd/ruff-0.12.7-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:e41df94a957d50083fd09b916d6e89e497246698c3f3d5c681c8b3e7b9bb4ac8", size = 11771356, upload-time = "2025-07-29T22:32:20.134Z" },
+    { url = "https://files.pythonhosted.org/packages/17/40/a95352ea16edf78cd3a938085dccc55df692a4d8ba1b3af7accbe2c806b0/ruff-0.12.7-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:4000623300563c709458d0ce170c3d0d788c23a058912f28bbadc6f905d67afa", size = 11612124, upload-time = "2025-07-29T22:32:22.645Z" },
+    { url = "https://files.pythonhosted.org/packages/4d/74/633b04871c669e23b8917877e812376827c06df866e1677f15abfadc95cb/ruff-0.12.7-py3-none-musllinux_1_2_i686.whl", hash = "sha256:69ffe0e5f9b2cf2b8e289a3f8945b402a1b19eff24ec389f45f23c42a3dd6fb5", size = 12479945, upload-time = "2025-07-29T22:32:24.765Z" },
+    { url = "https://files.pythonhosted.org/packages/be/34/c3ef2d7799c9778b835a76189c6f53c179d3bdebc8c65288c29032e03613/ruff-0.12.7-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:a07a5c8ffa2611a52732bdc67bf88e243abd84fe2d7f6daef3826b59abbfeda4", size = 12998677, upload-time = "2025-07-29T22:32:27.022Z" },
+    { url = "https://files.pythonhosted.org/packages/77/ab/aca2e756ad7b09b3d662a41773f3edcbd262872a4fc81f920dc1ffa44541/ruff-0.12.7-py3-none-win32.whl", hash = "sha256:c928f1b2ec59fb77dfdf70e0419408898b63998789cc98197e15f560b9e77f77", size = 11756687, upload-time = "2025-07-29T22:32:29.381Z" },
+    { url = "https://files.pythonhosted.org/packages/b4/71/26d45a5042bc71db22ddd8252ca9d01e9ca454f230e2996bb04f16d72799/ruff-0.12.7-py3-none-win_amd64.whl", hash = "sha256:9c18f3d707ee9edf89da76131956aba1270c6348bfee8f6c647de841eac7194f", size = 12912365, upload-time = "2025-07-29T22:32:31.517Z" },
+    { url = "https://files.pythonhosted.org/packages/4c/9b/0b8aa09817b63e78d94b4977f18b1fcaead3165a5ee49251c5d5c245bb2d/ruff-0.12.7-py3-none-win_arm64.whl", hash = "sha256:dfce05101dbd11833a0776716d5d1578641b7fddb537fe7fa956ab85d1769b69", size = 11982083, upload-time = "2025-07-29T22:32:33.881Z" },
+]
+
+[[package]]
+name = "safetensors"
+version = "0.6.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/6c/d2/94fe37355a1d4ff86b0f43b9a018515d5d29bf7ad6d01318a80f5db2fd6a/safetensors-0.6.1.tar.gz", hash = "sha256:a766ba6e19b198eff09be05f24cd89eda1670ed404ae828e2aa3fc09816ba8d8", size = 197968, upload-time = "2025-08-06T09:39:38.376Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/6b/c0/40263a2103511917f9a92b4e114ecaff68586df07f12d1d877312f1261f3/safetensors-0.6.1-cp38-abi3-macosx_10_12_x86_64.whl", hash = "sha256:81ed1b69d6f8acd7e759a71197ce3a69da4b7e9faa9dbb005eb06a83b1a4e52d", size = 455232, upload-time = "2025-08-06T09:39:32.037Z" },
+    { url = "https://files.pythonhosted.org/packages/86/bf/432cb4bb1c336d338dd9b29f78622b1441ee06e5868bf1de2ca2bec74c08/safetensors-0.6.1-cp38-abi3-macosx_11_0_arm64.whl", hash = "sha256:01b51af8cb7a3870203f2735e3c7c24d1a65fb2846e75613c8cf9d284271eccc", size = 432150, upload-time = "2025-08-06T09:39:31.008Z" },
+    { url = "https://files.pythonhosted.org/packages/05/d7/820c99032a53d57279ae199df7d114a8c9e2bbce4fa69bc0de53743495f0/safetensors-0.6.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:64a733886d79e726899b9d9643813e48a2eec49f3ef0fdb8cd4b8152046101c3", size = 471634, upload-time = "2025-08-06T09:39:22.17Z" },
+    { url = "https://files.pythonhosted.org/packages/ea/8b/bcd960087eded7690f118ceeda294912f92a3b508a1d9a504f9c2e02041b/safetensors-0.6.1-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:f233dc3b12fb641b36724844754b6bb41349615a0e258087560968d6da92add5", size = 487855, upload-time = "2025-08-06T09:39:24.142Z" },
+    { url = "https://files.pythonhosted.org/packages/41/64/b44eac4ad87c4e1c0cf5ba5e204c032b1b1eac8ce2b8f65f87791e647bd6/safetensors-0.6.1-cp38-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:6f16289e2af54affd591dd78ed12b5465e4dc5823f818beaeddd49a010cf3ba7", size = 607240, upload-time = "2025-08-06T09:39:25.463Z" },
+    { url = "https://files.pythonhosted.org/packages/52/75/0347fa0c080af8bd3341af26a30b85939f6362d4f5240add1a0c9d793354/safetensors-0.6.1-cp38-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:1b62eab84e2c69918b598272504c5d2ebfe64da6c16fdf8682054eec9572534d", size = 519864, upload-time = "2025-08-06T09:39:26.872Z" },
+    { url = "https://files.pythonhosted.org/packages/ea/f3/83843d1fe9164f44a267373c55cba706530b209b58415f807b40edddcd3e/safetensors-0.6.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d498363746555dccffc02a47dfe1dee70f7784f3f37f1d66b408366c5d3a989e", size = 485926, upload-time = "2025-08-06T09:39:29.109Z" },
+    { url = "https://files.pythonhosted.org/packages/b8/26/f6b0cb5210bab0e343214fdba7c2df80a69b019e62e760ddc61b18bec383/safetensors-0.6.1-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:eed2079dca3ca948d7b0d7120396e776bbc6680637cf199d393e157fde25c937", size = 518999, upload-time = "2025-08-06T09:39:28.054Z" },
+    { url = "https://files.pythonhosted.org/packages/90/b7/8910b165c97d3bd6d445c6ca8b704ec23d0fa33849ce9a51dc783827a302/safetensors-0.6.1-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:294040ff20ebe079a2b4976cfa9a5be0202f56ca4f7f190b4e52009e8c026ceb", size = 650669, upload-time = "2025-08-06T09:39:32.997Z" },
+    { url = "https://files.pythonhosted.org/packages/00/bc/2eeb025381d0834ae038aae2d383dfa830c2e0068e2e4e512ea99b135a4b/safetensors-0.6.1-cp38-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:75693208b492a026b926edeebbae888cc644433bee4993573ead2dc44810b519", size = 750019, upload-time = "2025-08-06T09:39:34.397Z" },
+    { url = "https://files.pythonhosted.org/packages/f9/38/5dda9a8e056eb1f17ed3a7846698fd94623a1648013cdf522538845755da/safetensors-0.6.1-cp38-abi3-musllinux_1_2_i686.whl", hash = "sha256:a8687b71ac67a0b3f8ce87df9e8024edf087e94c34ef46eaaad694dce8d2f83f", size = 689888, upload-time = "2025-08-06T09:39:35.584Z" },
+    { url = "https://files.pythonhosted.org/packages/dd/60/15ee3961996d951002378d041bd82863a5c70738a71375b42d6dd5d2a6d3/safetensors-0.6.1-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:5dd969a01c738104f707fa0e306b757f5beb3ebdcd682fe0724170a0bf1c21fb", size = 655539, upload-time = "2025-08-06T09:39:37.093Z" },
+    { url = "https://files.pythonhosted.org/packages/91/d6/01172a9c77c566800286d379bfc341d75370eae2118dfd339edfd0394c4a/safetensors-0.6.1-cp38-abi3-win32.whl", hash = "sha256:7c3d8d34d01673d1a917445c9437ee73a9d48bc6af10352b84bbd46c5da93ca5", size = 308594, upload-time = "2025-08-06T09:39:40.916Z" },
+    { url = "https://files.pythonhosted.org/packages/6c/5d/195dc1917d7ae93dd990d9b2f8b9c88e451bcc78e0b63ee107beebc1e4be/safetensors-0.6.1-cp38-abi3-win_amd64.whl", hash = "sha256:4720957052d57c5ac48912c3f6e07e9a334d9632758c9b0c054afba477fcbe2d", size = 320282, upload-time = "2025-08-06T09:39:39.54Z" },
+]
+
+[[package]]
+name = "setuptools"
+version = "70.2.0"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/setuptools-70.2.0-py3-none-any.whl", hash = "sha256:b8b8060bb426838fbe942479c90296ce976249451118ef566a5a0b7d8b78fb05" },
+]
+
+[[package]]
+name = "sympy"
+version = "1.13.3"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+dependencies = [
+    { name = "mpmath" },
+]
+wheels = [
+    { url = "https://download.pytorch.org/whl/sympy-1.13.3-py3-none-any.whl" },
+]
+
+[[package]]
+name = "torch"
+version = "2.8.0"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+resolution-markers = [
+    "sys_platform == 'darwin'",
+]
+dependencies = [
+    { name = "filelock", marker = "sys_platform == 'darwin'" },
+    { name = "fsspec", marker = "sys_platform == 'darwin'" },
+    { name = "jinja2", marker = "sys_platform == 'darwin'" },
+    { name = "networkx", marker = "sys_platform == 'darwin'" },
+    { name = "setuptools", marker = "sys_platform == 'darwin'" },
+    { name = "sympy", marker = "sys_platform == 'darwin'" },
+    { name = "typing-extensions", marker = "sys_platform == 'darwin'" },
+]
+wheels = [
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:fbe2e149c5174ef90d29a5f84a554dfaf28e003cb4f61fa2c8c024c17ec7ca58" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp313-none-macosx_11_0_arm64.whl", hash = "sha256:057efd30a6778d2ee5e2374cd63a63f63311aa6f33321e627c655df60abdd390" },
+]
+
+[[package]]
+name = "torch"
+version = "2.8.0+cpu"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+resolution-markers = [
+    "sys_platform != 'darwin'",
+]
+dependencies = [
+    { name = "filelock", marker = "sys_platform != 'darwin'" },
+    { name = "fsspec", marker = "sys_platform != 'darwin'" },
+    { name = "jinja2", marker = "sys_platform != 'darwin'" },
+    { name = "networkx", marker = "sys_platform != 'darwin'" },
+    { name = "setuptools", marker = "sys_platform != 'darwin'" },
+    { name = "sympy", marker = "sys_platform != 'darwin'" },
+    { name = "typing-extensions", marker = "sys_platform != 'darwin'" },
+]
+wheels = [
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-linux_s390x.whl", hash = "sha256:8b5882276633cf91fe3d2d7246c743b94d44a7e660b27f1308007fdb1bb89f7d" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:a5064b5e23772c8d164068cc7c12e01a75faf7b948ecd95a0d4007d7487e5f25" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:8f81dedb4c6076ec325acc3b47525f9c550e5284a18eae1d9061c543f7b6e7de" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-win_amd64.whl", hash = "sha256:e1ee1b2346ade3ea90306dfbec7e8ff17bc220d344109d189ae09078333b0856" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-win_arm64.whl", hash = "sha256:64c187345509f2b1bb334feed4666e2c781ca381874bde589182f81247e61f88" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:af81283ac671f434b1b25c95ba295f270e72db1fad48831eb5e4748ff9840041" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:a9dbb6f64f63258bc811e2c0c99640a81e5af93c531ad96e95c5ec777ea46dab" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-win_amd64.whl", hash = "sha256:6d93a7165419bc4b2b907e859ccab0dea5deeab261448ae9a5ec5431f14c0e64" },
+]
+
+[[package]]
+name = "tqdm"
+version = "4.66.5"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+dependencies = [
+    { name = "colorama", marker = "sys_platform == 'win32'" },
+]
+wheels = [
+    { url = "https://download.pytorch.org/whl/tqdm-4.66.5-py3-none-any.whl", hash = "sha256:90279a3770753eafc9194a0364852159802111925aa30eb3f9d85b0e805ac7cd" },
+]
+
+[[package]]
+name = "typing-extensions"
+version = "4.12.2"
+source = { registry = "https://download.pytorch.org/whl/cpu" }
+wheels = [
+    { url = "https://download.pytorch.org/whl/typing_extensions-4.12.2-py3-none-any.whl", hash = "sha256:04e5ca0351e0f3f85c6853954072df659d0d13fac324d0072316b67d7794700d" },
+]
+
+[[package]]
+name = "typing-inspection"
+version = "0.4.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/f8/b1/0c11f5058406b3af7609f121aaa6b609744687f1d158b3c3a5bf4cc94238/typing_inspection-0.4.1.tar.gz", hash = "sha256:6ae134cc0203c33377d43188d4064e9b357dba58cff3185f22924610e70a9d28", size = 75726, upload-time = "2025-05-21T18:55:23.885Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/17/69/cd203477f944c353c31bade965f880aa1061fd6bf05ded0726ca845b6ff7/typing_inspection-0.4.1-py3-none-any.whl", hash = "sha256:389055682238f53b04f7badcb49b989835495a96700ced5dab2d8feae4b26f51", size = 14552, upload-time = "2025-05-21T18:55:22.152Z" },
+]
+
+[[package]]
+name = "uv"
+version = "0.8.5"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/83/94/e18a40fe6f6d724c1fbf2c9328806359e341710b2fd42dc928a1a8fc636b/uv-0.8.5.tar.gz", hash = "sha256:078cf2935062d5b61816505f9d6f30b0221943a1433b4a1de8f31a1dfe55736b", size = 3451272, upload-time = "2025-08-05T20:50:21.159Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/d9/b9/78cde56283b6b9a8a84b0bf9334442ed75a843310229aaf7f1a71fe67818/uv-0.8.5-py3-none-linux_armv6l.whl", hash = "sha256:e236372a260e312aef5485a0e5819a0ec16c9197af06d162ad5a3e8bd62f9bba", size = 18146198, upload-time = "2025-08-05T20:49:18.859Z" },
+    { url = "https://files.pythonhosted.org/packages/ed/83/5deda1a19362ce426da7f9cc4764a0dd57e665ecbaddd9900d4200bc10ab/uv-0.8.5-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:53a40628329e543a5c5414553f5898131d5c1c6f963708cb0afc2ecf3e8d8167", size = 18242690, upload-time = "2025-08-05T20:49:23.409Z" },
+    { url = "https://files.pythonhosted.org/packages/06/6e/80b08ee544728317d9c8003d4c10234007e12f384da1c3dfe579489833c9/uv-0.8.5-py3-none-macosx_11_0_arm64.whl", hash = "sha256:43a689027696bc9c62e6da3f06900c52eafc4debbf4fba9ecb906196730b34c8", size = 16913881, upload-time = "2025-08-05T20:49:26.631Z" },
+    { url = "https://files.pythonhosted.org/packages/34/f6/47a44dabfc25b598ea6f2ab9aa32ebf1cbd87ed8af18ccde6c5d36f35476/uv-0.8.5-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.musllinux_1_1_aarch64.whl", hash = "sha256:a34d783f5cef00f1918357c0cd9226666e22640794e9e3862820abf4ee791141", size = 17527439, upload-time = "2025-08-05T20:49:30.464Z" },
+    { url = "https://files.pythonhosted.org/packages/ef/7d/ee7c2514e064412133ee9f01c4c42de20da24617b8c25d81cf7021b774d8/uv-0.8.5-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:2140383bc25228281090cc34c00500d8e5822877c955f691d69bbf967e8efa73", size = 17833275, upload-time = "2025-08-05T20:49:33.783Z" },
+    { url = "https://files.pythonhosted.org/packages/f9/e7/5233cf5cbcca8ea65aa1f1e48bf210dc9773fb86b8104ffbc523be7f6a3f/uv-0.8.5-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6b449779ff463b059504dc30316a634f810149e02482ce36ea35daea8f6ce7af", size = 18568916, upload-time = "2025-08-05T20:49:37.031Z" },
+    { url = "https://files.pythonhosted.org/packages/d8/54/6cabb2a0347c51c8366ca3bffeeebd7f829a15f6b29ad20f51fd5ca9c4bd/uv-0.8.5-py3-none-manylinux_2_17_ppc64.manylinux2014_ppc64.whl", hash = "sha256:a7f8739d05cc513eee2f1f8a7e6c482a9c1e8860d77cd078d1ea7c3fe36d7a65", size = 19993334, upload-time = "2025-08-05T20:49:40.361Z" },
+    { url = "https://files.pythonhosted.org/packages/3c/7a/b84d994d52f20bc56229840c31e77aff4653e5902ea7b7c2616e9381b5b8/uv-0.8.5-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:62ebbd22f780ba2585690332765caf9e29c9758e48a678148e8b1ea90580cdb9", size = 19643358, upload-time = "2025-08-05T20:49:43.955Z" },
+    { url = "https://files.pythonhosted.org/packages/c8/f1/7552f2bea528456d34bc245f2959ce910631e01571c4b7ea421ead9a9fc6/uv-0.8.5-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:4f8dd0555f05d66ff46fdab551137cc2b1ea9c5363358913e2af175e367f4398", size = 18947757, upload-time = "2025-08-05T20:49:47.381Z" },
+    { url = "https://files.pythonhosted.org/packages/57/9b/46aadd186a1e16a23cd0701dda0e640197db49a3add074a47231fed45a4f/uv-0.8.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:38c04408ad5eae7a178a1e3b0e09afeb436d0c97075530a3c82de453b78d0448", size = 18906135, upload-time = "2025-08-05T20:49:50.985Z" },
+    { url = "https://files.pythonhosted.org/packages/c0/31/6661adedaba9ebac8bb449ec9901f8cbf124fa25e0db3a9e6cf3053cee88/uv-0.8.5-py3-none-manylinux_2_28_aarch64.whl", hash = "sha256:73e772caf7310af4b21eaf8c25531b934391f1e84f3afa8e67822d7c432f6dad", size = 17787943, upload-time = "2025-08-05T20:49:54.59Z" },
+    { url = "https://files.pythonhosted.org/packages/11/f2/73fb5c3156fdae830b83edec2f430db84cb4bc4b78f61d21694bd59004cb/uv-0.8.5-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:3ddd7d8c01073f23ba2a4929ab246adb30d4f8a55c5e007ad7c8341f7bf06978", size = 18675864, upload-time = "2025-08-05T20:49:57.87Z" },
+    { url = "https://files.pythonhosted.org/packages/b5/29/774c6f174c53d68ae9a51c2fabf1b09003b93a53c24591a108be0dc338d7/uv-0.8.5-py3-none-musllinux_1_1_armv7l.whl", hash = "sha256:7d601f021cbc179320ea3a75cd1d91bd49af03d2a630c4d04ebd38ff6b87d419", size = 17808770, upload-time = "2025-08-05T20:50:01.566Z" },
+    { url = "https://files.pythonhosted.org/packages/a9/b0/5d164ce84691f5018c5832e9e3371c0196631b1f1025474a179de1d6a70a/uv-0.8.5-py3-none-musllinux_1_1_i686.whl", hash = "sha256:6ee97b7299990026619c20e30e253972c6c0fb6fba4f5658144e62aa1c07785a", size = 18076516, upload-time = "2025-08-05T20:50:04.94Z" },
+    { url = "https://files.pythonhosted.org/packages/d1/73/4d8baefb4f4b07df6a4db7bbd604cb361d4f5215b94d3f66553ea26edfd4/uv-0.8.5-py3-none-musllinux_1_1_x86_64.whl", hash = "sha256:09804055d6346febf0767767c04bdd2fab7d911535639f9c18de2ea744b2954c", size = 19031195, upload-time = "2025-08-05T20:50:08.211Z" },
+    { url = "https://files.pythonhosted.org/packages/44/2a/3d074391df2c16c79fc6bf333e4bde75662e64dac465050a03391c75b289/uv-0.8.5-py3-none-win32.whl", hash = "sha256:6362a2e1fa535af0e4c0a01f83e666a4d5f9024d808f9e64e3b6ef07c97aff54", size = 18026273, upload-time = "2025-08-05T20:50:11.868Z" },
+    { url = "https://files.pythonhosted.org/packages/3c/2f/e850d3e745ccd1125b7a48898421824700fd3e996d27d835139160650124/uv-0.8.5-py3-none-win_amd64.whl", hash = "sha256:dd89836735860461c3a5563731e77c011d1831f14ada540f94bf1a7011dbea14", size = 19822158, upload-time = "2025-08-05T20:50:15.428Z" },
+    { url = "https://files.pythonhosted.org/packages/6f/df/e5565b3faf2c6147a877ab7e96ef31e2333f08c5138a98ce77003b1bf65e/uv-0.8.5-py3-none-win_arm64.whl", hash = "sha256:37c1a22915392014d8b4ade9e69e157c8e5ccdf32f37070a84f749a708268335", size = 18430102, upload-time = "2025-08-05T20:50:18.785Z" },
+]