Initial commit

This commit is contained in:
Tom Foster 2025-08-07 18:29:12 +01:00
commit ef7df1a8c3
28 changed files with 6829 additions and 0 deletions

51
.gitignore vendored Normal file
View file

@ -0,0 +1,51 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

73
LICENSE Normal file
View file

@ -0,0 +1,73 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
(a) You must give any other recipients of the Work or Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives.
Copyright 2025 tom
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

59
README.md Normal file
View file

@ -0,0 +1,59 @@
# 🤖 LLM GGUF Tools
A collection of Python tools for converting and quantising language models to
[GGUF format](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md), featuring advanced
quantisation methods and direct SafeTensors conversion capabilities.
> 💡 **Looking for quantised models?** Check out [tcpipuk's HuggingFace profile](https://huggingface.co/tcpipuk)
> for models quantised using these tools!
## Available Tools
| Tool | Purpose | Documentation |
|------|---------|---------------|
| [quantise_gguf.py](./quantise_gguf.py) | ⚡ GGUF quantisation using a variant of [Bartowski's method](https://huggingface.co/bartowski) | [📖 Docs](docs/quantise_gguf.md) |
| [safetensors2gguf.py](./safetensors2gguf.py) | 🔄 Direct SafeTensors to GGUF conversion | [📖 Docs](docs/safetensors2gguf.md) |
## Installation
1. You need [`uv`](https://docs.astral.sh/uv/) for the dependencies:
```bash
# Install uv (see https://docs.astral.sh/uv/#installation for more options)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or update your existing instance
uv self update
```
2. Then to set up the environment for these scripts:
```bash
# Clone the repository
git clone https://git.tomfos.tr/tom/llm-gguf-tools.git
cd llm-gguf-tools
# Set up virtual environment and install dependencies
uv sync
```
## Requirements
- **For quantisation**: [llama.cpp](https://github.com/ggerganov/llama.cpp) binaries
(`llama-quantize`, `llama-cli`, `llama-imatrix`)
- **For BFloat16 models**: PyTorch (optional, auto-detected)
- **For uploads**: HuggingFace API token (set `HF_TOKEN` environment variable)
## Development
For development setup and contribution guidelines, see [📖 Development Guide](docs/development.md).
## Notes
The `resources/imatrix_data.txt` file contains importance matrix calibration data from
[Bartowski's Gist](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8),
based on calibration data provided by Dampf, building upon Kalomaze's foundational work.
## License
Apache 2.0 License - see [LICENSE](./LICENSE) file for details.

86
docs/development.md Normal file
View file

@ -0,0 +1,86 @@
# Development Guide
This guide covers development setup, code quality standards, and project structure for contributors.
## Code Quality
```bash
# Run linting
uv run ruff check
# Format code
uv run ruff format
# Run with debug logging
DEBUG=true uv run <script>
```
## Project Structure
```plain
llm-gguf-tools/
├── quantise.py # Bartowski quantisation tool
├── direct_safetensors_to_gguf.py # Direct conversion tool
├── helpers/ # Shared utilities
│ ├── __init__.py
│ └── logger.py # Colour-coded logging
├── resources/ # Resource files
│ └── imatrix_data.txt # Calibration data for imatrix
├── docs/ # Detailed documentation
│ ├── quantise.md
│ ├── direct_safetensors_to_gguf.md
│ └── development.md
└── pyproject.toml # Project configuration
```
## Contributing Guidelines
Contributions are welcome! Please ensure:
1. Code follows the existing style (run `uv run ruff format`)
2. All functions have Google-style docstrings
3. Type hints are used throughout
4. Tests pass (if applicable)
## Development Workflow
### Setting Up Development Environment
```bash
# Clone the repository
git clone https://git.tomfos.tr/tom/llm-gguf-tools.git
cd llm-gguf-tools
# Install all dependencies including dev
uv sync --all-groups
```
### Code Style
- Follow PEP 8 with ruff enforcement
- Use UK English spelling in comments and documentation
- Maximum line length: 100 characters
- Use type hints for all function parameters and returns
### Testing
While formal tests are not yet implemented, ensure:
- Scripts run without errors on sample models
- Logger output is correctly formatted
- File I/O operations handle errors gracefully
### Debugging
Enable debug logging for verbose output:
```bash
DEBUG=true uv run quantise.py <model_url>
```
This will show additional information about:
- Model download progress
- Conversion steps
- File operations
- Error details

102
docs/quantise_gguf.md Normal file
View file

@ -0,0 +1,102 @@
# quantise.py - Advanced GGUF Quantisation
Advanced GGUF quantisation tool implementing Bartowski's sophisticated quantisation pipeline.
## Overview
This tool automates the complete quantisation workflow for converting models to GGUF format with
multiple precision variants, importance matrix generation, and automatic upload to HuggingFace.
## Quantisation Variants
The tool produces four quantisation variants based on Bartowski's method:
- **Q4_K_M**: Standard baseline quantisation
- **Q4_K_L**: Q6_K embeddings + Q6_K attention layers for better quality
- **Q4_K_XL**: Q8_0 embeddings + Q6_K attention layers for enhanced precision
- **Q4_K_XXL**: Q8_0 embeddings + Q8_0 attention for maximum precision
## Features
- **Automatic model download**: Downloads models from HuggingFace automatically
- **Importance matrix generation**: Creates imatrix for improved quantisation quality
- **Parallel processing**: Uploads multiple variants simultaneously
- **Progress tracking**: Real-time status updates during conversion
- **README generation**: Automatically creates model cards with quantisation details
- **HuggingFace integration**: Direct upload to HuggingFace with proper metadata
## Usage
### Basic Usage
```bash
# Quantise a model from HuggingFace
uv run quantise.py https://huggingface.co/meta-llama/Llama-3.2-1B
```
### Command Line Options
```bash
# Skip imatrix generation for faster processing
uv run quantise.py <model_url> --no-imatrix
# Local testing without upload
uv run quantise.py <model_url> --no-upload
# Custom output directory
uv run quantise.py <model_url> --output-dir ./my-models
# Use specific HuggingFace token
uv run quantise.py <model_url> --hf-token YOUR_TOKEN
```
## Environment Variables
- `HF_TOKEN`: HuggingFace API token for uploads
- `LLAMA_CPP_DIR`: Custom path to llama.cpp binaries
- `DEBUG`: Enable debug logging when set to "true"
## Requirements
- **llama.cpp binaries**: `llama-quantize`, `llama-cli`, `llama-imatrix`
- **Calibration data**: `resources/imatrix_data.txt` for importance matrix generation
- **HuggingFace account**: For uploading quantised models (optional)
## Workflow
1. **Download**: Fetches the model from HuggingFace
2. **Convert**: Converts to initial GGUF format (F32)
3. **Generate imatrix**: Creates importance matrix using calibration data
4. **Quantise**: Produces multiple quantisation variants in parallel
5. **Upload**: Pushes quantised models to HuggingFace with metadata
6. **Clean up**: Removes temporary files and caches
## Output Structure
```plain
output_dir/
├── model-F32.gguf # Full precision conversion
├── model-Q4_K_M.gguf # Standard quantisation
├── model-Q4_K_M-imat.gguf # With importance matrix
├── model-Q4_K_L-imat.gguf # Enhanced embeddings/attention
├── model-Q4_K_XL-imat.gguf # High precision embeddings
├── model-Q4_K_XXL-imat.gguf # Maximum precision
└── imatrix.dat # Generated importance matrix
```
## Error Handling
The tool includes comprehensive error handling for:
- Network failures during download
- Missing binaries or dependencies
- Insufficient disk space
- HuggingFace API errors
- Conversion failures
## Performance Considerations
- **Disk space**: Requires ~3x model size in free space
- **Memory**: Needs RAM proportional to model size
- **Processing time**: Varies from minutes to hours based on model size
- **Network**: Downloads can be large (10-100+ GB for large models)

164
docs/safetensors2gguf.md Normal file
View file

@ -0,0 +1,164 @@
# direct_safetensors_to_gguf.py - Direct SafeTensors Conversion
Direct SafeTensors to GGUF converter for unsupported architectures.
## Overview
This tool converts SafeTensors models directly to GGUF format without requiring specific
architecture support in llama.cpp. It's particularly useful for experimental models, custom
architectures, or when llama.cpp's standard conversion tools don't recognise your model
architecture.
## Features
- **Architecture-agnostic**: Works with unsupported model architectures
- **Automatic mapping**: Intelligently maps tensor names to GGUF conventions
- **BFloat16 support**: Handles BF16 tensors with PyTorch (optional)
- **Vision models**: Supports models with vision components
- **Tokeniser preservation**: Extracts and includes tokeniser metadata
- **Fallback mechanisms**: Provides sensible defaults for unknown architectures
## Usage
### Basic Usage
```bash
# Convert a local SafeTensors model
uv run direct_safetensors_to_gguf.py /path/to/model/directory
```
### Command Line Options
```bash
# Specify output file
uv run direct_safetensors_to_gguf.py /path/to/model -o output.gguf
# Force specific architecture mapping
uv run direct_safetensors_to_gguf.py /path/to/model --force-arch qwen2
# Convert with custom output path
uv run direct_safetensors_to_gguf.py ./my-model --output ./converted/my-model.gguf
```
## Supported Input Formats
The tool automatically detects and handles:
1. **Single file models**: `model.safetensors`
2. **Sharded models**: `model-00001-of-00005.safetensors`, etc.
3. **Custom names**: Any `*.safetensors` files in the directory
## Architecture Mapping
The tool includes built-in mappings for several architectures:
- `DotsOCRForCausalLM``qwen2`
- `GptOssForCausalLM``llama`
- Unknown architectures → `llama` (fallback)
You can override these with the `--force-arch` parameter.
## Tensor Name Mapping
The converter automatically maps common tensor patterns:
| Original Pattern | GGUF Name |
|-----------------|-----------|
| `model.embed_tokens.weight` | `token_embd.weight` |
| `model.norm.weight` | `output_norm.weight` |
| `lm_head.weight` | `output.weight` |
| `layers.N.self_attn.q_proj` | `blk.N.attn_q` |
| `layers.N.self_attn.k_proj` | `blk.N.attn_k` |
| `layers.N.self_attn.v_proj` | `blk.N.attn_v` |
| `layers.N.mlp.gate_proj` | `blk.N.ffn_gate` |
| `layers.N.mlp.up_proj` | `blk.N.ffn_up` |
| `layers.N.mlp.down_proj` | `blk.N.ffn_down` |
## Configuration Requirements
The model directory must contain:
- **config.json**: Model configuration file (required)
- **\*.safetensors**: One or more SafeTensors files (required)
- **tokenizer_config.json**: Tokeniser configuration (optional)
- **tokenizer.json**: Tokeniser data (optional)
## Output Format
The tool produces a single GGUF file containing:
- All model weights in F32 format
- Model architecture metadata
- Tokeniser configuration (if available)
- Special token IDs (BOS, EOS, UNK, PAD)
## Error Handling
| Error | Message | Solution |
|-------|---------|----------|
| Missing config.json | `FileNotFoundError: Config file not found` | Ensure the model directory contains a valid `config.json` file |
| No SafeTensors files | `FileNotFoundError: No safetensor files found` | Check that the directory contains `.safetensors` files |
| BFloat16 without PyTorch | `Warning: PyTorch not available, BFloat16 models may not convert properly` | Install PyTorch for BF16 support: `uv add torch` |
| Unknown architecture | `Warning: Unknown architecture X, using llama as fallback` | Use `--force-arch` to specify a known compatible architecture |
## Technical Details
### Parameter Inference
The tool infers GGUF parameters from the model configuration:
- `vocab_size` → vocabulary size (default: 32000)
- `max_position_embeddings` → context length (default: 2048)
- `hidden_size` → embedding dimension (default: 4096)
- `num_hidden_layers` → number of transformer blocks (default: 32)
- `num_attention_heads` → attention head count (default: 32)
- `num_key_value_heads` → KV head count (defaults to attention heads)
- `rope_theta` → RoPE frequency base (default: 10000.0)
- `rms_norm_eps` → layer normalisation epsilon (default: 1e-5)
### Vision Model Support
For models with vision components, the tool extracts:
- Vision embedding dimensions
- Vision transformer block count
- Vision attention heads
- Vision feed-forward dimensions
- Patch size and spatial merge parameters
## Limitations
- **F32 only**: Currently outputs only full precision (F32) models
- **Architecture guessing**: May require manual architecture specification
- **Tokeniser compatibility**: Uses llama tokeniser as default fallback
- **Memory usage**: Requires loading full tensors into memory
## Examples
### Converting a custom model
```bash
# Download a model first
git clone https://huggingface.co/my-org/my-model ./my-model
# Convert to GGUF
uv run direct_safetensors_to_gguf.py ./my-model
# Output will be at ./my-model/my-model-f32.gguf
```
### Converting with specific architecture
```bash
# For a Qwen2-based model
uv run direct_safetensors_to_gguf.py ./qwen-model --force-arch qwen2
```
### Batch conversion
```bash
# Convert multiple models
for model in ./models/*; do
uv run direct_safetensors_to_gguf.py "$model" -o "./gguf/$(basename $model).gguf"
done
```

6
helpers/__init__.py Normal file
View file

@ -0,0 +1,6 @@
"""Helper utilities for LLM GGUF tools.
This package provides common utilities, logging, and shared functionality
used across the quantisation and conversion tools. Uses UK English spelling
conventions throughout.
"""

View file

@ -0,0 +1,6 @@
"""Configuration module for quantisation settings and tensor-level precision control.
Provides structured configuration definitions for Bartowski quantisation methods
including Q4_K_M, Q4_K_L, Q4_K_XL, and Q4_K_XXL variants with fallback strategies
for different model architectures and deployment scenarios.
"""

View file

@ -0,0 +1,95 @@
"""Quantisation configuration definitions.
Pre-defined quantisation configurations for the Bartowski method, supporting
Q4_K_M, Q4_K_L, Q4_K_XL, and Q4_K_XXL variants with tensor-level precision control.
"""
from __future__ import annotations
from helpers.models.quantisation import QuantisationConfig, QuantisationType
QUANTISATION_CONFIGS: dict[QuantisationType, QuantisationConfig] = {
QuantisationType.Q4_K_M: QuantisationConfig(
name="Q4_K_M",
description="Standard Q4_K_M quantisation (baseline)",
tensor_types={}, # No special tensor overrides - uses default Q4_K_M
fallback_methods=[],
),
QuantisationType.Q4_K_L: QuantisationConfig(
name="Q4_K_L",
description="Q6_K embeddings + Q6_K attention (+753MB for vocab + reasoning)",
tensor_types={
"token_embd.weight": "Q6_K",
"output.weight": "Q6_K",
"lm_head.weight": "Q6_K",
"blk.*.attn_q.weight": "Q6_K",
"blk.*.attn_k.weight": "Q6_K",
"blk.*.attn_v.weight": "Q6_K",
},
fallback_methods=[
{
"embed_tokens.weight": "Q6_K",
"output.weight": "Q6_K",
"lm_head.weight": "Q6_K",
"blk.*.attn_q.weight": "Q6_K",
"blk.*.attn_k.weight": "Q6_K",
"blk.*.attn_v.weight": "Q6_K",
},
{"token-embedding-type": "Q6_K", "output-tensor-type": "Q6_K"},
],
),
QuantisationType.Q4_K_XL: QuantisationConfig(
name="Q4_K_XL",
description="Q8_0 embeddings + Q6_K attention (+2.1GB for vocabulary + reasoning)",
tensor_types={
"token_embd.weight": "Q8_0",
"output.weight": "Q8_0",
"lm_head.weight": "Q8_0",
"blk.*.attn_q.weight": "Q6_K",
"blk.*.attn_k.weight": "Q6_K",
"blk.*.attn_v.weight": "Q6_K",
},
fallback_methods=[
{
"embed_tokens.weight": "Q8_0",
"output.weight": "Q8_0",
"lm_head.weight": "Q8_0",
"blk.*.attn_q.weight": "Q6_K",
"blk.*.attn_k.weight": "Q6_K",
"blk.*.attn_v.weight": "Q6_K",
},
{"token-embedding-type": "Q8_0", "output-tensor-type": "Q8_0"},
],
),
QuantisationType.Q4_K_XXL: QuantisationConfig(
name="Q4_K_XXL",
description="Q8_0 embeddings + Q8_0 attention (+2.8GB total, maximum precision)",
tensor_types={
"token_embd.weight": "Q8_0",
"output.weight": "Q8_0",
"lm_head.weight": "Q8_0",
"blk.*.attn_q.weight": "Q8_0",
"blk.*.attn_k.weight": "Q8_0",
"blk.*.attn_v.weight": "Q8_0",
},
fallback_methods=[
{
"embed_tokens.weight": "Q8_0",
"output.weight": "Q8_0",
"lm_head.weight": "Q8_0",
"blk.*.attn_q.weight": "Q8_0",
"blk.*.attn_k.weight": "Q8_0",
"blk.*.attn_v.weight": "Q8_0",
},
{"token-embedding-type": "Q8_0", "output-tensor-type": "Q8_0"},
],
),
}
SUPPORTED_QUANTISATION_TYPES: list[QuantisationType] = [
QuantisationType.Q4_K_M,
QuantisationType.Q4_K_L,
QuantisationType.Q4_K_XL,
QuantisationType.Q4_K_XXL,
]

94
helpers/logger.py Normal file
View file

@ -0,0 +1,94 @@
"""Colour-coded logging configuration for LLM GGUF tools.
Provides a consistent logging interface with colour-coded output for different
log levels, making it easier to identify warnings, errors, and informational
messages at a glance during tool execution and debugging sessions.
"""
from __future__ import annotations
from logging import (
CRITICAL,
DEBUG,
ERROR,
INFO,
WARNING,
Formatter as LoggingFormatter,
Logger,
LogRecord,
StreamHandler as LoggingStreamHandler,
getLogger,
)
from os import getenv as os_getenv
from sys import stdout as sys_stdout
from typing import ClassVar
DEBUG_MODE = os_getenv("DEBUG", "false").lower() == "true"
class ColourFormatter(LoggingFormatter):
"""Custom formatter adding colours to log messages based on severity level.
Uses ANSI escape codes to provide visual distinction between different
log levels in terminal output. Supports standard logging levels with
appropriate colour coding: DEBUG (cyan), INFO (green), WARNING (yellow),
ERROR (red), and CRITICAL (bold red) for immediate visual feedback.
"""
# ANSI colour codes
COLOURS: ClassVar[dict[int, str]] = {
DEBUG: "\033[36m", # Cyan
INFO: "\033[32m", # Green
WARNING: "\033[33m", # Yellow
ERROR: "\033[31m", # Red
CRITICAL: "\033[1;31m", # Bold Red
}
RESET = "\033[0m"
# Emoji prefixes for different levels
EMOJIS: ClassVar[dict[int, str]] = {
DEBUG: "🔍",
INFO: " ", # noqa: RUF001
WARNING: "⚠️ ",
ERROR: "",
CRITICAL: "🔥",
}
def format(self, record: LogRecord) -> str:
"""Format log record with colour and emoji based on severity level.
Enhances standard log formatting by prepending ANSI colour codes and
emoji indicators, then appending reset codes to prevent colour bleeding.
Maintains standard log structure whilst adding visual enhancements for
improved readability in terminal environments.
Returns:
str: Formatted log message with colour and emoji.
"""
# Get colour for this level
colour = self.COLOURS.get(record.levelno, "")
emoji = self.EMOJIS.get(record.levelno, "")
# Format the message
record.msg = f"{emoji} {record.msg}"
formatted = super().format(record)
# Add colour codes
return f"{colour}{formatted}{self.RESET}"
# Create and configure the logger
logger: Logger = getLogger("llm-gguf-tools")
logger.setLevel(DEBUG if DEBUG_MODE else INFO)
# Create console handler with colour formatter
handler = LoggingStreamHandler(sys_stdout)
handler.setLevel(DEBUG if DEBUG_MODE else INFO)
# Set formatter without timestamp for cleaner output
formatter = ColourFormatter(fmt="%(message)s", datefmt="%H:%M:%S")
handler.setFormatter(formatter)
logger.addHandler(handler)
# Prevent propagation to root logger
logger.propagate = False

View file

@ -0,0 +1,35 @@
"""Pydantic models for llm-gguf-tools.
This module provides structured data models for quantisation and conversion
operations, ensuring type safety and validation across the toolset.
"""
from __future__ import annotations
from helpers.models.conversion import (
GGUFParameters,
ModelConfig,
TensorMapping,
VisionConfig,
)
from helpers.models.quantisation import (
LlamaCppEnvironment,
ModelSource,
QuantisationConfig,
QuantisationResult,
QuantisationType,
URLType,
)
__all__ = [
"GGUFParameters",
"LlamaCppEnvironment",
"ModelConfig",
"ModelSource",
"QuantisationConfig",
"QuantisationResult",
"QuantisationType",
"TensorMapping",
"URLType",
"VisionConfig",
]

View file

@ -0,0 +1,150 @@
"""Pydantic models for GGUF conversion operations.
Contains data models for SafeTensors to GGUF conversion including
model configurations, parameter mappings, and tensor specifications.
Uses UK English spelling conventions throughout.
"""
from __future__ import annotations
from typing import Any
from pydantic import BaseModel, ConfigDict, Field
class ModelConfig(BaseModel):
"""Parsed model configuration from HuggingFace config.json.
Represents the standard configuration metadata extracted from HuggingFace
models, providing structured access to architecture details, hyperparameters,
and quantisation settings required for GGUF conversion.
"""
model_config = ConfigDict(extra="allow")
architectures: list[str] = Field(default_factory=lambda: ["Unknown"])
model_type: str = "unknown"
vocab_size: int = 32000
max_position_embeddings: int = 2048
hidden_size: int = 4096
num_hidden_layers: int = 32
intermediate_size: int = 11008
num_attention_heads: int = 32
num_key_value_heads: int | None = None
rope_theta: float = 10000.0
rope_scaling: dict[str, Any] | None = None
rms_norm_eps: float = 1e-5
vision_config: VisionConfig | None = None
def to_gguf_params(self) -> GGUFParameters:
"""Convert model configuration to GGUF parameters.
Translates HuggingFace model configuration values to GGUF-specific
parameter format, handling defaults and calculating derived values
like RoPE dimension count from head dimensions.
Returns:
GGUFParameters instance with converted values.
"""
params = {
"vocab_size": self.vocab_size,
"context_length": self.max_position_embeddings,
"embedding_length": self.hidden_size,
"block_count": self.num_hidden_layers,
"feed_forward_length": self.intermediate_size,
"attention.head_count": self.num_attention_heads,
"attention.head_count_kv": self.num_key_value_heads or self.num_attention_heads,
"attention.layer_norm_rms_epsilon": self.rms_norm_eps,
"rope.freq_base": self.rope_theta,
"rope.dimension_count": self.hidden_size // self.num_attention_heads,
}
return GGUFParameters(**params) # type: ignore[arg-type]
class VisionConfig(BaseModel):
"""Vision model configuration for multimodal models.
Contains parameters specific to vision components in multimodal architectures,
including patch sizes, embedding dimensions, and spatial merge configurations
for proper GGUF metadata generation.
"""
model_config = ConfigDict(extra="allow")
hidden_size: int = 1536
num_hidden_layers: int = 42
num_attention_heads: int = 12
intermediate_size: int = 4224
patch_size: int = 14
spatial_merge_size: int = 2
rms_norm_eps: float | None = None
class GGUFParameters(BaseModel):
"""GGUF-specific parameters inferred from model configuration.
Translates HuggingFace configuration values to GGUF parameter names and
formats, providing a standardised interface for GGUF writer configuration
across different model architectures and quantisation strategies.
"""
model_config = ConfigDict(extra="allow")
# Basic parameters
vocab_size: int
context_length: int
embedding_length: int
block_count: int
feed_forward_length: int
# Attention parameters
attention_head_count: int = Field(alias="attention.head_count")
attention_head_count_kv: int = Field(alias="attention.head_count_kv")
attention_layer_norm_rms_epsilon: float = Field(alias="attention.layer_norm_rms_epsilon")
# RoPE parameters
rope_freq_base: float = Field(alias="rope.freq_base")
rope_dimension_count: int = Field(alias="rope.dimension_count")
rope_scaling_type: str | None = Field(default=None, alias="rope.scaling.type")
rope_scaling_factor: float | None = Field(default=None, alias="rope.scaling.factor")
class TensorMapping(BaseModel):
"""Mapping configuration for tensor name conversion.
Defines rules for translating between HuggingFace tensor naming conventions
and GGUF tensor names, supporting both direct mappings and pattern-based
transformations for layer-specific tensors.
"""
model_config = ConfigDict(frozen=True)
# Direct mappings (exact name matches)
direct_mappings: dict[str, str] = Field(
default_factory=lambda: {
"model.embed_tokens.weight": "token_embd.weight",
"model.norm.weight": "output_norm.weight",
"lm_head.weight": "output.weight",
}
)
# Layer component patterns (for .layers.N. tensors)
layer_patterns: dict[str, str] = Field(
default_factory=lambda: {
"self_attn.q_proj.weight": "attn_q.weight",
"self_attn.q_proj.bias": "attn_q.bias",
"self_attn.k_proj.weight": "attn_k.weight",
"self_attn.k_proj.bias": "attn_k.bias",
"self_attn.v_proj.weight": "attn_v.weight",
"self_attn.v_proj.bias": "attn_v.bias",
"self_attn.o_proj": "attn_output.weight",
"mlp.gate_proj": "ffn_gate.weight",
"mlp.up_proj": "ffn_up.weight",
"mlp.down_proj": "ffn_down.weight",
"input_layernorm": "attn_norm.weight",
"post_attention_layernorm": "ffn_norm.weight",
}
)
# Architecture-specific overrides
architecture_overrides: dict[str, dict[str, str]] = Field(default_factory=dict)

View file

@ -0,0 +1,168 @@
"""Pydantic models for quantisation operations.
Contains data models specific to the quantisation workflow including
quantisation types, configurations, and results. Uses UK English spelling
conventions throughout (quantisation, not quantization).
"""
from __future__ import annotations
from enum import StrEnum
from typing import TYPE_CHECKING
from pydantic import BaseModel, ConfigDict, Field, field_validator
if TYPE_CHECKING:
from pathlib import Path
class QuantisationType(StrEnum):
"""Available quantisation types for Bartowski-method GGUF model conversion.
Defines the specific quantisation strategies supported by this tool, ranging
from Q4_K_M baseline to Q4_K_XXL maximum precision variants. Each type
represents different trade-offs between model size and quality preservation
for embeddings, attention layers, and feed-forward networks.
"""
Q4_K_M = "Q4_K_M"
Q4_K_L = "Q4_K_L"
Q4_K_XL = "Q4_K_XL"
Q4_K_XXL = "Q4_K_XXL"
class URLType(StrEnum):
"""Supported URL formats for model source specification.
Categorises input URL formats to enable appropriate handling strategies.
HuggingFace URLs require full model download and conversion, whilst Ollama
GGUF URLs allow direct GGUF file downloads with pattern matching for
efficient processing of pre-quantised models.
"""
HUGGINGFACE = "huggingface"
OLLAMA_GGUF = "ollama_gguf"
class QuantisationConfig(BaseModel):
"""Configuration for a specific quantisation method with tensor-level precision control.
Defines quantisation parameters including tensor type mappings and fallback
methods for handling different model architectures. Enables fine-grained
control over which layers receive higher precision treatment whilst
maintaining compatibility across diverse model structures.
"""
model_config = ConfigDict(use_enum_values=True)
name: str
description: str
tensor_types: dict[str, str] = Field(default_factory=dict)
fallback_methods: list[dict[str, str]] = Field(default_factory=list)
class ModelSource(BaseModel):
"""Represents a model source with parsed information from URL analysis.
Contains comprehensive metadata extracted from model URLs including source
repository details, author information, and GGUF file patterns. Enables
differentiation between regular HuggingFace repositories requiring conversion
and GGUF repositories allowing direct file downloads.
"""
model_config = ConfigDict(use_enum_values=True, protected_namespaces=())
url: str
url_type: URLType
source_model: str
original_author: str
model_name: str
gguf_file_pattern: str | None = None
is_gguf_repo: bool = False
@field_validator("url")
@classmethod
def validate_url(cls, v: str) -> str:
"""Validate that URL is not empty.
Ensures the provided URL string is not empty or None,
as this is required for model source identification.
Returns:
The validated URL string.
Raises:
ValueError: If URL is empty or None.
"""
if not v:
msg = "URL cannot be empty"
raise ValueError(msg)
return v
class QuantisationResult(BaseModel):
"""Result of a quantisation operation with comprehensive status tracking.
Captures the outcome of individual quantisation attempts including success
status, file paths, sizes, and error details. Supports workflow status
tracking from planning through processing to completion, enabling real-time
progress reporting and parallel upload coordination.
"""
model_config = ConfigDict(use_enum_values=True, arbitrary_types_allowed=True)
quantisation_type: QuantisationType
success: bool
file_path: Path | None = None
file_size: str | None = None
method_used: str | None = None
error_message: str | None = None
status: str = "pending" # planned, processing, uploading, completed, failed
class LlamaCppEnvironment(BaseModel):
"""Represents llama.cpp environment setup with binary and script locations.
Encapsulates the runtime environment for llama.cpp tools including paths
to quantisation binaries, CLI tools, and conversion scripts. Handles both
local binary installations and repository-based setups to provide flexible
deployment options across different system configurations.
"""
model_config = ConfigDict(arbitrary_types_allowed=True)
quantise_binary: Path # UK spelling
cli_binary: Path
convert_script: str
use_repo: bool = False
class QuantisationContext(BaseModel):
"""Context object containing all parameters needed for quantisation execution.
Encapsulates quantisation parameters to reduce method argument counts
and improve code maintainability following parameter object pattern.
"""
model_config = ConfigDict(frozen=True)
f16_model_path: Path
model_source: ModelSource
config: QuantisationConfig
llama_env: LlamaCppEnvironment
models_dir: Path
imatrix_path: Path | None = None
base_quant: str = "Q4_K_M"
def get_output_path(self) -> Path:
"""Generate output path for quantised model.
Returns:
Path to the output GGUF file.
"""
output_filename = (
f"{self.model_source.original_author}-"
f"{self.model_source.model_name}-"
f"{self.config.name}.gguf"
)
return self.models_dir / self.model_source.model_name / output_filename

View file

@ -0,0 +1,20 @@
"""Service layer for llm-gguf-tools.
Provides high-level service interfaces for interacting with external systems
including HuggingFace, llama.cpp, and filesystem operations. Uses UK English
spelling conventions throughout.
"""
from __future__ import annotations
from helpers.services.filesystem import FilesystemService
from helpers.services.huggingface import HuggingFaceService, ReadmeGenerator
from helpers.services.llama_cpp import EnvironmentManager, IMatrixGenerator
__all__ = [
"EnvironmentManager",
"FilesystemService",
"HuggingFaceService",
"IMatrixGenerator",
"ReadmeGenerator",
]

View file

@ -0,0 +1,174 @@
"""Filesystem operations service.
Provides unified filesystem operations including file discovery, size
calculation, and path management. Consolidates common filesystem patterns
used across quantisation and conversion workflows.
"""
from __future__ import annotations
import json
import subprocess
from pathlib import Path
from typing import Any
from helpers.logger import logger
BYTES_PER_UNIT = 1024.0
class FilesystemService:
"""Handles filesystem operations with consistent error handling.
Provides methods for file discovery, size formatting, and JSON loading
with proper error handling and logging. Ensures consistent behaviour
across different tools and workflows.
"""
@staticmethod
def get_file_size(file_path: Path) -> str:
"""Get human-readable file size using system utilities.
Attempts to use `du -h` for human-readable output, falling back to
Python calculation if the system command fails. Provides consistent
size formatting across the toolset.
Returns:
Human-readable file size string (e.g., "1.5G", "750M").
"""
try:
result = subprocess.run(
["du", "-h", str(file_path)], capture_output=True, text=True, check=True
)
return result.stdout.split()[0]
except (subprocess.CalledProcessError, FileNotFoundError):
# Fallback to Python calculation
try:
size_bytes: float = float(file_path.stat().st_size)
for unit in ["B", "K", "M", "G", "T"]:
if size_bytes < BYTES_PER_UNIT:
return f"{size_bytes:.1f}{unit}"
size_bytes /= BYTES_PER_UNIT
except Exception:
return "Unknown"
else:
return f"{size_bytes:.1f}P"
@staticmethod
def load_json_config(config_path: Path) -> dict[str, Any]:
"""Load and parse JSON configuration file.
Provides consistent JSON loading with proper error handling and
encoding specification. Used for loading model configurations,
tokeniser settings, and other JSON-based metadata.
Returns:
Parsed JSON content as dictionary.
Raises:
FileNotFoundError: If config file doesn't exist.
"""
if not config_path.exists():
msg = f"Configuration file not found: {config_path}"
raise FileNotFoundError(msg)
with Path(config_path).open(encoding="utf-8") as f:
return json.load(f)
@staticmethod
def find_safetensor_files(model_path: Path) -> list[Path]:
"""Find all SafeTensor files in model directory using priority search.
Searches for tensor files in order of preference: single model.safetensors,
sharded model-*-of-*.safetensors files, then any *.safetensors files. This
approach handles both single-file and multi-shard model distributions whilst
ensuring predictable file ordering for conversion consistency.
Returns:
List of SafeTensor file paths in priority order.
Raises:
FileNotFoundError: If no SafeTensor files are found.
"""
# Check for single file
single_file = model_path / "model.safetensors"
if single_file.exists():
return [single_file]
# Check for sharded files
pattern = "model-*-of-*.safetensors"
sharded_files = sorted(model_path.glob(pattern))
if sharded_files:
return sharded_files
# Check for any safetensor files
any_files = sorted(model_path.glob("*.safetensors"))
if any_files:
return any_files
msg = f"No SafeTensor files found in {model_path}"
raise FileNotFoundError(msg)
@staticmethod
def find_gguf_files(model_path: Path, pattern: str | None = None) -> list[Path]:
"""Find GGUF files in directory, optionally filtered by pattern.
Searches for GGUF files with optional pattern matching. Prioritises
multi-part files (00001-of-*) over single files for proper handling
of large models split across multiple files.
Returns:
List of GGUF file paths, sorted with multi-part files first.
"""
if pattern:
gguf_files = list(model_path.glob(f"*{pattern}*.gguf"))
else:
gguf_files = list(model_path.glob("*.gguf"))
# Sort to prioritise 00001-of-* files
gguf_files.sort(
key=lambda x: (
"00001-of-" not in x.name, # False sorts before True
x.name,
)
)
return gguf_files
@staticmethod
def ensure_directory(path: Path) -> Path:
"""Ensure directory exists, creating if necessary.
Creates directory and all parent directories if they don't exist.
Returns the path for method chaining convenience.
Returns:
The directory path.
"""
path.mkdir(parents=True, exist_ok=True)
return path
@staticmethod
def cleanup_directory(path: Path, pattern: str = "*") -> int:
"""Remove files matching pattern from directory.
Safely removes files matching the specified glob pattern. Returns
count of files removed for logging purposes.
Returns:
Number of files removed.
"""
if not path.exists():
return 0
files_removed = 0
for file_path in path.glob(pattern):
if file_path.is_file():
try:
file_path.unlink()
files_removed += 1
except Exception as e:
logger.warning(f"Failed to remove {file_path}: {e}")
return files_removed

210
helpers/services/gguf.py Normal file
View file

@ -0,0 +1,210 @@
"""GGUF file operations service.
Provides unified interface for creating, writing, and manipulating GGUF files.
Consolidates GGUF-specific operations from conversion and quantisation workflows.
Uses UK English spelling conventions throughout.
"""
from __future__ import annotations
from typing import TYPE_CHECKING, Any
import gguf
import torch
from safetensors import safe_open
from helpers.logger import logger
from helpers.services.filesystem import FilesystemService
from helpers.utils.config_parser import ConfigParser
if TYPE_CHECKING:
from pathlib import Path
import numpy as np
from helpers.models.conversion import ModelConfig
class GGUFWriter:
"""Manages GGUF file creation and metadata writing.
Provides high-level interface for GGUF file operations including metadata
configuration, tensor addition, and tokeniser integration. Encapsulates
low-level GGUF library interactions for consistent error handling.
"""
def __init__(self, output_path: Path, architecture: str) -> None:
"""Initialise GGUF writer with output path and architecture.
Creates the underlying GGUF writer instance and prepares for metadata
and tensor addition. Sets up the file structure for the specified
model architecture.
"""
self.output_path = output_path
self.architecture = architecture
self.writer = gguf.GGUFWriter(str(output_path), architecture)
logger.info(f"Created GGUF writer for {architecture} architecture")
def add_metadata(self, model_config: ModelConfig, model_name: str) -> None:
"""Add comprehensive metadata from model configuration.
Writes general model information, architectural parameters, and
quantisation settings to the GGUF file header. Handles both standard
and vision model configurations with appropriate parameter mapping.
"""
# General metadata
self.writer.add_name(model_name)
self.writer.add_description(f"Converted from {model_config.architectures[0]}")
self.writer.add_file_type(gguf.LlamaFileType.ALL_F32)
# Model parameters from config
params = model_config.to_gguf_params()
self.writer.add_context_length(params.context_length)
self.writer.add_embedding_length(params.embedding_length)
self.writer.add_block_count(params.block_count)
self.writer.add_feed_forward_length(params.feed_forward_length)
self.writer.add_head_count(params.attention_head_count)
self.writer.add_head_count_kv(params.attention_head_count_kv)
self.writer.add_layer_norm_rms_eps(params.attention_layer_norm_rms_epsilon)
self.writer.add_rope_freq_base(params.rope_freq_base)
self.writer.add_rope_dimension_count(params.rope_dimension_count)
logger.info(f"Added metadata: {params.block_count} layers, {params.context_length} context")
def add_vision_metadata(self, vision_config: Any) -> None:
"""Add vision model parameters to GGUF metadata.
Configures vision-specific parameters for multimodal models including
embedding dimensions, attention heads, and spatial processing settings.
"""
if not vision_config:
return
logger.info("Adding vision model parameters...")
self.writer.add_vision_embedding_length(vision_config.hidden_size)
self.writer.add_vision_block_count(vision_config.num_hidden_layers)
self.writer.add_vision_head_count(vision_config.num_attention_heads)
self.writer.add_vision_feed_forward_length(vision_config.intermediate_size)
self.writer.add_vision_patch_size(vision_config.patch_size)
self.writer.add_vision_spatial_merge_size(vision_config.spatial_merge_size)
if hasattr(vision_config, "rms_norm_eps") and vision_config.rms_norm_eps:
self.writer.add_vision_attention_layernorm_eps(vision_config.rms_norm_eps)
def add_tokeniser(self, tokeniser_config: dict[str, Any]) -> None:
"""Add tokeniser metadata to GGUF file.
Writes special token IDs and tokeniser model type to enable proper
text processing during inference. Uses sensible defaults for missing
configuration values.
"""
self.writer.add_bos_token_id(tokeniser_config.get("bos_token_id", 1))
self.writer.add_eos_token_id(tokeniser_config.get("eos_token_id", 2))
self.writer.add_unk_token_id(tokeniser_config.get("unk_token_id", 0))
self.writer.add_pad_token_id(tokeniser_config.get("pad_token_id", 0))
self.writer.add_tokenizer_model(tokeniser_config.get("model_type", "llama"))
logger.info("Added tokeniser configuration")
def add_tensor(self, name: str, data: np.ndarray) -> None:
"""Add a tensor to the GGUF file.
Writes tensor data with the specified name to the file. Handles
data type conversions and validates tensor shapes.
"""
self.writer.add_tensor(name, data)
def finalise(self) -> None:
"""Write all data to file and close writer.
Completes the GGUF file creation by writing headers, key-value data,
and tensor data in the correct order. Ensures proper file closure.
"""
logger.info(f"Writing GGUF file to {self.output_path}")
self.writer.write_header_to_file()
self.writer.write_kv_data_to_file()
self.writer.write_tensors_to_file()
self.writer.close()
logger.info("GGUF file written successfully")
class GGUFConverter:
"""High-level GGUF conversion orchestrator.
Coordinates the complete conversion workflow from source models to GGUF
format, managing metadata extraction, tensor mapping, and file writing.
"""
@staticmethod
def convert_safetensors(
model_path: Path,
output_path: Path,
model_config: ModelConfig,
architecture: str,
tensor_mapper: Any,
) -> bool:
"""Convert SafeTensors model to GGUF format.
Orchestrates the conversion process including metadata setup, tensor
loading with BFloat16 support, name mapping, and tokeniser integration.
Returns:
True if conversion successful, False otherwise.
"""
logger.info(f"Converting {model_path.name} to GGUF...")
# Create writer
writer_wrapper = GGUFWriter(output_path, architecture)
# Add metadata
writer_wrapper.add_metadata(model_config, model_path.name)
# Add vision metadata if present
if model_config.vision_config:
writer_wrapper.add_vision_metadata(model_config.vision_config)
# Load and add tensors
fs = FilesystemService()
tensor_files = fs.find_safetensor_files(model_path)
logger.info(f"Found {len(tensor_files)} tensor file(s)")
tensor_count = 0
for tensor_file in tensor_files:
logger.info(f"Loading {tensor_file.name}...")
with safe_open(tensor_file, framework="pt") as f:
for tensor_name in f:
tensor_data = f.get_tensor(tensor_name)
# Convert BFloat16 to Float32
if hasattr(tensor_data, "numpy"):
if torch and tensor_data.dtype == torch.bfloat16:
tensor_data = tensor_data.float()
tensor_data = tensor_data.numpy()
# Map tensor name
gguf_name = tensor_mapper.map_tensor_name(tensor_name)
if gguf_name:
writer_wrapper.add_tensor(gguf_name, tensor_data)
tensor_count += 1
if tensor_count % 100 == 0:
logger.info(f" Processed {tensor_count} tensors...")
logger.info(f"Total tensors processed: {tensor_count}")
# Add tokeniser
try:
tok_config = ConfigParser.load_tokeniser_config(model_path)
writer_wrapper.add_tokeniser(tok_config)
logger.info("Tokeniser added")
except Exception as e:
logger.warning(f"Could not add tokeniser: {e}")
# Finalise file
writer_wrapper.finalise()
file_size = fs.get_file_size(output_path)
logger.info(f"Conversion complete! Output: {output_path} ({file_size})")
return True

View file

@ -0,0 +1,454 @@
"""HuggingFace operations service.
Handles all interactions with HuggingFace including model downloads,
uploads, README generation, and repository management. Uses UK English
spelling conventions throughout.
"""
from __future__ import annotations
import re
import subprocess
import tempfile
from pathlib import Path
from typing import TYPE_CHECKING
from helpers.logger import logger
from helpers.models.quantisation import QuantisationType
if TYPE_CHECKING:
from helpers.models.quantisation import ModelSource, QuantisationResult
class HuggingFaceService:
"""Manages HuggingFace repository operations.
Provides methods for downloading models, uploading files, and managing
repositories. Handles authentication, error recovery, and progress tracking
for robust interaction with HuggingFace services.
"""
@staticmethod
def get_username() -> str:
"""Get authenticated HuggingFace username.
Retrieves the current user's HuggingFace username using the CLI.
Requires prior authentication via `huggingface-cli login`.
Returns:
HuggingFace username.
Raises:
RuntimeError: If not authenticated or CLI not available.
"""
try:
result = subprocess.run(
["huggingface-cli", "whoami"],
capture_output=True,
text=True,
check=True,
)
return result.stdout.strip()
except (subprocess.CalledProcessError, FileNotFoundError) as err:
msg = "Please log in to HuggingFace first: huggingface-cli login"
raise RuntimeError(msg) from err
@staticmethod
def download_model(
model_name: str, output_dir: Path, include_pattern: str | None = None
) -> None:
"""Download model from HuggingFace.
Downloads a complete model or specific files matching a pattern.
Creates the output directory if it doesn't exist. Supports filtered
downloads for efficient bandwidth usage when only certain files are needed.
"""
logger.info(f"Downloading {model_name} to {output_dir}")
cmd = [
"huggingface-cli",
"download",
model_name,
"--local-dir",
str(output_dir),
]
if include_pattern:
cmd.extend(["--include", include_pattern])
subprocess.run(cmd, check=True)
logger.info("Download complete")
@staticmethod
def upload_file(
repo_id: str,
local_path: Path,
repo_path: str | None = None,
create_repo: bool = False,
) -> None:
"""Upload a file to HuggingFace repository.
Uploads a single file to the specified repository path. Can create
the repository if it doesn't exist. Handles repository creation conflicts
gracefully by retrying without the create flag when needed.
Raises:
CalledProcessError: If upload fails.
"""
repo_path = repo_path or local_path.name
logger.info(f"Uploading {local_path.name} to {repo_id}/{repo_path}")
cmd = [
"huggingface-cli",
"upload",
repo_id,
str(local_path),
repo_path,
]
if create_repo:
cmd.append("--create")
try:
subprocess.run(cmd, check=True, capture_output=True)
logger.info(f"Uploaded {repo_path}")
except subprocess.CalledProcessError:
if create_repo:
# Repository might already exist, retry without --create
cmd = cmd[:-1] # Remove --create flag
subprocess.run(cmd, check=True)
logger.info(f"Updated {repo_path}")
else:
raise
class ReadmeGenerator:
"""Generates README files for quantised models.
Creates comprehensive README documentation including model cards,
quantisation details, and status tracking. Supports both initial
planning documentation and final result summaries.
"""
def generate(
self,
model_source: ModelSource,
results: dict[QuantisationType, QuantisationResult],
models_dir: Path,
output_repo: str | None = None,
) -> Path:
"""Generate README file for quantised model repository.
Creates a comprehensive README with frontmatter, quantisation table,
and original model information. Handles status tracking for planned,
processing, and completed quantisations.
Returns:
Path to generated README file.
"""
logger.info("Creating model card...")
model_dir = models_dir / model_source.model_name
readme_path = model_dir / "README.md"
# Get original README content
original_content = self._get_original_readme(model_source, model_dir)
# Generate new README
readme_content = self._generate_readme_content(
model_source, results, original_content, output_repo
)
readme_path.write_text(readme_content)
return readme_path
def _get_original_readme(self, model_source: ModelSource, model_dir: Path) -> dict[str, str]:
"""Extract original README and metadata.
Downloads or reads the original model's README for inclusion in the
quantised model documentation. Parses YAML frontmatter if present.
Returns:
Dictionary with readme content, licence, tags, and frontmatter.
"""
content = {"readme": "", "licence": "apache-2.0", "tags": "", "frontmatter": ""}
# Try local file first
readme_path = model_dir / "README.md"
if readme_path.exists():
content["readme"] = readme_path.read_text(encoding="utf-8")
logger.info(f"Found original README ({len(content['readme'])} characters)")
else:
# Download separately
content = self._download_readme(model_source)
# Parse frontmatter if present
if content["readme"].startswith("---\n"):
content = self._parse_frontmatter(content["readme"])
return content
def _download_readme(self, model_source: ModelSource) -> dict[str, str]:
"""Download README from HuggingFace repository.
Attempts to download just the README.md file from the source repository
for efficient documentation extraction.
Returns:
Dictionary with readme content and default metadata.
"""
content = {"readme": "", "licence": "apache-2.0", "tags": "", "frontmatter": ""}
with tempfile.TemporaryDirectory() as temp_dir:
try:
logger.info(f"Downloading README from {model_source.source_model}...")
subprocess.run(
[
"huggingface-cli",
"download",
model_source.source_model,
"--include",
"README.md",
"--local-dir",
temp_dir,
],
check=True,
capture_output=True,
)
readme_path = Path(temp_dir) / "README.md"
if readme_path.exists():
content["readme"] = readme_path.read_text(encoding="utf-8")
logger.info(f"Downloaded README ({len(content['readme'])} characters)")
except subprocess.CalledProcessError as e:
logger.warning(f"Failed to download README: {e}")
return content
def _parse_frontmatter(self, readme_text: str) -> dict[str, str]:
"""Parse YAML frontmatter from README.
Extracts metadata from YAML frontmatter including licence, tags,
and other model card fields.
Returns:
Dictionary with separated content and metadata.
"""
lines = readme_text.split("\n")
if lines[0] != "---":
return {
"readme": readme_text,
"licence": "apache-2.0",
"tags": "",
"frontmatter": "",
}
frontmatter_end = -1
for i, line in enumerate(lines[1:], 1):
if line == "---":
frontmatter_end = i
break
if frontmatter_end == -1:
return {
"readme": readme_text,
"licence": "apache-2.0",
"tags": "",
"frontmatter": "",
}
frontmatter = "\n".join(lines[1:frontmatter_end])
content = "\n".join(lines[frontmatter_end + 1 :])
# Extract licence
licence_match = re.search(r"^license:\s*(.+)$", frontmatter, re.MULTILINE)
licence_val = licence_match.group(1).strip().strip('"') if licence_match else "apache-2.0"
# Extract tags
tags = []
in_tags = False
for line in frontmatter.split("\n"):
if line.startswith("tags:"):
in_tags = True
continue
if in_tags:
if line.startswith("- "):
tags.append(line[2:].strip())
elif line and not line.startswith(" "):
break
return {
"readme": content,
"licence": licence_val,
"tags": ",".join(tags),
"frontmatter": frontmatter,
}
def _generate_readme_content(
self,
model_source: ModelSource,
results: dict[QuantisationType, QuantisationResult],
original_content: dict[str, str],
output_repo: str | None = None,
) -> str:
"""Generate complete README content with quantisation details.
Creates the full README including YAML frontmatter, quantisation status
table, and original model information.
Returns:
Complete README markdown content.
"""
# Build tags
our_tags = [
"quantised",
"gguf",
"q4_k_m",
"q4_k_l",
"q4_k_xl",
"q4_k_xxl",
"bartowski-method",
]
original_tags = original_content["tags"].split(",") if original_content["tags"] else []
all_tags = sorted(set(our_tags + original_tags))
# Build frontmatter
frontmatter = f"""---
license: {original_content["licence"]}
library_name: gguf
base_model: {model_source.source_model}
tags:
"""
for tag in all_tags:
if tag.strip():
frontmatter += f"- {tag.strip()}\n"
frontmatter += "---\n\n"
# Build main content
hf_url = f"https://huggingface.co/{model_source.source_model}"
content = f"""# {model_source.original_author}-{model_source.model_name}-GGUF
GGUF quantisations of [{model_source.source_model}]({hf_url}) using Bartowski's method.
| Quantisation | Embeddings/Output | Attention | Feed-Forward | Status |
|--------------|-------------------|-----------|--------------|--------|
"""
# Add results table
for quant_type in [
QuantisationType.Q4_K_M,
QuantisationType.Q4_K_L,
QuantisationType.Q4_K_XL,
QuantisationType.Q4_K_XXL,
]:
result = results.get(quant_type)
if not result:
result = type("Result", (), {"status": "planned", "success": False})()
layers = self._get_layers_config(quant_type)
status = self._format_status(result, model_source, quant_type, output_repo)
content += (
f"| {quant_type.value} | {layers['embeddings']} | "
f"{layers['attention']} | {layers['ffn']} | {status} |\n"
)
content += "\n---\n\n"
# Add original content
if original_content["readme"]:
content += "# Original Model Information\n\n" + original_content["readme"]
else:
content += f"## Original Model\n\nQuantisation of [{model_source.source_model}](https://huggingface.co/{model_source.source_model}).\n"
return frontmatter + content
def _get_layers_config(self, quant_type: QuantisationType) -> dict[str, str]:
"""Get layer configuration for quantisation type.
Returns layer precision specifications for the quantisation table.
Returns:
Dictionary with embeddings, attention, and ffn precision labels.
"""
configs = {
QuantisationType.Q4_K_M: {
"embeddings": "Q4_K_M",
"attention": "Q4_K_M",
"ffn": "Q4_K_M",
},
QuantisationType.Q4_K_L: {"embeddings": "Q6_K", "attention": "Q6_K", "ffn": "Q4_K_M"},
QuantisationType.Q4_K_XL: {"embeddings": "Q8_0", "attention": "Q6_K", "ffn": "Q4_K_M"},
QuantisationType.Q4_K_XXL: {"embeddings": "Q8_0", "attention": "Q8_0", "ffn": "Q4_K_M"},
}
return configs.get(
quant_type, {"embeddings": "Unknown", "attention": "Unknown", "ffn": "Unknown"}
)
def _format_status(
self,
result: QuantisationResult,
model_source: ModelSource,
quant_type: QuantisationType,
output_repo: str | None,
) -> str:
"""Format status indicator for README table.
Creates appropriate status indicator based on quantisation state
including progress indicators, file sizes, and download links.
Returns:
Formatted status string for table cell.
"""
status_map = {
"planned": "⏳ Planned",
"processing": "🔄 Processing...",
"uploading": "⬆️ Uploading...",
"failed": "❌ Failed",
}
if hasattr(result, "status") and result.status in status_map:
base_status = status_map[result.status]
if result.status == "uploading" and hasattr(result, "file_size") and result.file_size:
return f"{base_status} ({result.file_size})"
if result.status == "completed" or (hasattr(result, "success") and result.success):
return self._format_success_status(result, model_source, quant_type, output_repo)
return base_status
# Legacy support
if hasattr(result, "success") and result.success:
return self._format_success_status(result, model_source, quant_type, output_repo)
return "❌ Failed"
def _format_success_status(
self,
result: QuantisationResult,
model_source: ModelSource,
quant_type: QuantisationType,
output_repo: str | None,
) -> str:
"""Format successful quantisation status with download link.
Creates a download link if repository information is available,
otherwise shows file size.
Returns:
Formatted success status string.
"""
if not output_repo:
return (
f"{result.file_size}"
if hasattr(result, "file_size") and result.file_size
else "✅ Available"
)
filename = (
f"{model_source.original_author}-{model_source.model_name}-{quant_type.value}.gguf"
)
url = f"https://huggingface.co/{output_repo}?show_file_info={filename}"
if hasattr(result, "file_size") and result.file_size:
return f"[✅ {result.file_size}]({url})"
return f"[✅ Available]({url})"

View file

@ -0,0 +1,417 @@
"""llama.cpp environment and operations service.
Manages llama.cpp binary discovery, environment setup, and imatrix generation.
Provides consistent interface for interacting with llama.cpp tools across
different installation methods.
"""
from __future__ import annotations
import subprocess
from pathlib import Path
from helpers.logger import logger
from helpers.models.quantisation import LlamaCppEnvironment
from helpers.services.filesystem import FilesystemService
class EnvironmentManager:
"""Manages llama.cpp environment setup and binary discovery.
Handles detection of local binaries, repository setup, and conversion
script location. Provides fallback strategies for different installation
scenarios including local builds and repository-based setups.
"""
def __init__(self, work_dir: Path) -> None:
"""Initialise EnvironmentManager."""
self.work_dir = work_dir
self.llama_cpp_dir = work_dir / "llama.cpp"
self.fs = FilesystemService()
def setup(self) -> LlamaCppEnvironment:
"""Set up llama.cpp environment with automatic detection.
Checks for local llama.cpp binaries first, then falls back to
repository-based setup if needed. Handles conversion script location,
dependency installation, and path resolution.
Returns:
Configured LlamaCppEnvironment instance.
"""
# Check for local binaries first
local_env = self._check_local_binaries()
if local_env:
return local_env
# Setup repository if needed
return self.setup_repository()
def _check_local_binaries(self) -> LlamaCppEnvironment | None:
"""Check for existing llama.cpp binaries in current directory.
Searches for quantise and CLI binaries in the current directory
and standard installation paths. Also locates conversion scripts.
Returns:
LlamaCppEnvironment if binaries found, None otherwise.
"""
quantise_bin = Path("./llama-quantize")
cli_bin = Path("./llama-cli")
if not (quantise_bin.exists() and cli_bin.exists()):
return None
logger.info("Found llama.cpp binaries in current directory")
# Check for conversion script
convert_script = self._find_convert_script()
if convert_script:
logger.info(f"Found conversion script: {convert_script}")
return LlamaCppEnvironment(
quantise_binary=quantise_bin.resolve(),
cli_binary=cli_bin.resolve(),
convert_script=convert_script,
use_repo=False,
)
logger.warning("No conversion script found in current directory")
logger.info("Will use llama.cpp repository method for conversion")
return LlamaCppEnvironment(
quantise_binary=quantise_bin.resolve(),
cli_binary=cli_bin.resolve(),
convert_script=f"python3 {self.llama_cpp_dir}/convert_hf_to_gguf.py",
use_repo=True,
)
def _find_convert_script(self) -> str | None:
"""Find conversion script in current directory.
Searches for various naming conventions of the HF to GGUF
conversion script.
Returns:
Command to run conversion script, or None if not found.
"""
scripts = [
"./llama-convert-hf-to-gguf",
"python3 ./convert_hf_to_gguf.py",
"python3 ./convert-hf-to-gguf.py",
]
for script in scripts:
if script.startswith("python3"):
script_path = script.split(" ", 1)[1]
if Path(script_path).exists():
return script
elif Path(script).exists():
return script
return None
def setup_repository(self) -> LlamaCppEnvironment:
"""Setup llama.cpp repository for conversion scripts.
Clones the llama.cpp repository if not present and installs
Python dependencies for model conversion.
Returns:
LlamaCppEnvironment configured with repository paths.
"""
if not self.llama_cpp_dir.exists():
logger.info("Cloning llama.cpp for conversion script...")
subprocess.run(
[
"git",
"clone",
"https://github.com/ggerganov/llama.cpp.git",
str(self.llama_cpp_dir),
],
check=True,
)
# Install Python requirements
logger.info("Installing Python requirements...")
subprocess.run(
[
"pip3",
"install",
"-r",
"requirements.txt",
"--break-system-packages",
"--root-user-action=ignore",
],
cwd=self.llama_cpp_dir,
check=True,
)
# Install additional conversion dependencies
logger.info("Installing additional conversion dependencies...")
subprocess.run(
[
"pip3",
"install",
"transformers",
"sentencepiece",
"protobuf",
"--break-system-packages",
"--root-user-action=ignore",
],
check=True,
)
else:
logger.info("llama.cpp repository already exists")
# Use local binaries but repo conversion script
return LlamaCppEnvironment(
quantise_binary=Path("./llama-quantize").resolve(),
cli_binary=Path("./llama-cli").resolve(),
convert_script=f"python3 {self.llama_cpp_dir}/convert_hf_to_gguf.py",
use_repo=False,
)
class IMatrixGenerator:
"""Handles importance matrix generation for quantisation guidance.
Generates or locates importance matrices that guide quantisation
decisions, helping preserve model quality by identifying critical
tensors requiring higher precision.
"""
def __init__(self) -> None:
"""Initialise IMatrixGenerator."""
self.fs = FilesystemService()
def generate_imatrix(
self, f16_model_path: Path, llama_env: LlamaCppEnvironment, model_dir: Path
) -> Path | None:
"""Generate importance matrix for quantisation guidance.
Searches for existing imatrix files first, provides interactive
prompts for user-supplied matrices, then generates new matrices
using calibration data if necessary.
Returns:
Path to imatrix file, or None if generation fails.
"""
imatrix_path = model_dir / "imatrix.dat"
# Check for existing imatrix
if imatrix_path.exists():
logger.info(f"Found existing imatrix: {imatrix_path.name}")
return imatrix_path
# Try user-provided imatrix
user_imatrix = self._prompt_for_user_imatrix(model_dir, imatrix_path)
if user_imatrix:
return user_imatrix
# Generate new imatrix
calibration_file = self._get_calibration_file()
if not calibration_file:
return None
return self._generate_new_imatrix(f16_model_path, llama_env, imatrix_path, calibration_file)
def _prompt_for_user_imatrix(self, model_dir: Path, imatrix_path: Path) -> Path | None:
"""Prompt user for existing imatrix file.
Returns:
Path to user-provided imatrix, or None if not available.
"""
logger.info(f"Model directory: {model_dir}")
logger.info(f"Looking for imatrix file at: {imatrix_path}")
logger.info(
"Tip: You can download pre-computed imatrix files from Bartowski's repositories!"
)
logger.info(
" Example: https://huggingface.co/bartowski/MODEL-NAME-GGUF/resolve/main/MODEL-NAME.imatrix"
)
response = (
input("\n❓ Do you have an imatrix file to place in the model directory? (y/N): ")
.strip()
.lower()
)
if response != "y":
return None
logger.info(f"Please place your imatrix.dat file in: {model_dir}")
input("⏳ Press Enter when you've placed the imatrix.dat file (or Ctrl+C to cancel)...")
if imatrix_path.exists():
file_size = self.fs.get_file_size(imatrix_path)
logger.info(f"Found imatrix file! ({file_size})")
return imatrix_path
logger.warning("No imatrix.dat file found - continuing with automatic generation")
return None
def _get_calibration_file(self) -> Path | None:
"""Get calibration data file for imatrix generation.
Returns:
Path to calibration file, or None if not found.
"""
calibration_file = Path(__file__).parent.parent.parent / "resources" / "imatrix_data.txt"
if not calibration_file.exists():
logger.warning("resources/imatrix_data.txt not found - skipping imatrix generation")
logger.info(
"Download from: https://gist.githubusercontent.com/bartowski1182/"
"eb213dccb3571f863da82e99418f81e8/raw/calibration_datav3.txt"
)
return None
return calibration_file
def _generate_new_imatrix(
self,
f16_model_path: Path,
llama_env: LlamaCppEnvironment,
imatrix_path: Path,
calibration_file: Path,
) -> Path | None:
"""Generate new importance matrix using calibration data.
Returns:
Path to generated imatrix, or None if generation fails.
"""
logger.info("Generating importance matrix (this may take 1-4 hours for large models)...")
logger.info(f"Model: {f16_model_path.name}")
logger.info(f"Calibration: {calibration_file}")
logger.info(f"Output: {imatrix_path}")
# Find imatrix binary
imatrix_binary = self._find_imatrix_binary(llama_env)
if not imatrix_binary:
logger.warning("llama-imatrix binary not found - skipping imatrix generation")
logger.info("Make sure llama-imatrix is in the same directory as llama-quantize")
return None
# Build and execute command
cmd = self._build_imatrix_command(
imatrix_binary, f16_model_path, calibration_file, imatrix_path
)
return self._execute_imatrix_generation(cmd, imatrix_path)
def _build_imatrix_command(
self, binary: Path, model_path: Path, calibration_file: Path, output_path: Path
) -> list[str]:
"""Build imatrix generation command.
Returns:
Command arguments as list.
"""
return [
str(binary),
"-m",
str(model_path),
"-f",
str(calibration_file),
"-o",
str(output_path),
"--process-output",
"--output-frequency",
"10",
"--save-frequency",
"50",
"-t",
"8",
"-c",
"2048",
"-b",
"512",
]
def _execute_imatrix_generation(self, cmd: list[str], imatrix_path: Path) -> Path | None:
"""Execute imatrix generation command with real-time output.
Returns:
Path to generated imatrix file, or None if generation fails.
"""
logger.info(f"Running: {' '.join(cmd)}")
logger.info("Starting imatrix generation... (progress will be shown)")
try:
process = subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
universal_newlines=True,
bufsize=1,
)
self._stream_imatrix_output(process)
return_code = process.poll()
if return_code == 0:
return self._validate_imatrix_output(imatrix_path)
except KeyboardInterrupt:
logger.info("imatrix generation cancelled by user")
process.terminate()
return None
except Exception as e:
logger.error(f"imatrix generation failed with exception: {e}")
return None
else:
logger.error(f"imatrix generation failed with return code {return_code}")
return None
def _stream_imatrix_output(self, process: subprocess.Popen) -> None:
"""Stream imatrix generation output in real-time."""
while True:
if process.stdout is not None:
output = process.stdout.readline()
else:
break
if not output and process.poll() is not None:
break
if output:
line = output.strip()
if self._should_log_imatrix_line(line):
logger.info(line)
def _should_log_imatrix_line(self, line: str) -> bool:
"""Determine if imatrix output line should be logged.
Returns:
True if line should be logged, False otherwise.
"""
keywords = ["Computing imatrix", "perplexity:", "save_imatrix", "entries =", "ETA"]
return any(keyword in line for keyword in keywords) or line.startswith("[")
def _validate_imatrix_output(self, imatrix_path: Path) -> Path | None:
"""Validate generated imatrix file.
Returns:
Path to imatrix if valid, None otherwise.
"""
if imatrix_path.exists():
file_size = self.fs.get_file_size(imatrix_path)
logger.info(f"imatrix generation successful! ({file_size})")
return imatrix_path
logger.error("imatrix generation completed but file not found")
return None
def _find_imatrix_binary(self, llama_env: LlamaCppEnvironment) -> Path | None:
"""Find llama-imatrix binary in common locations.
Searches for the imatrix binary in the current directory and
standard installation paths.
Returns:
Path to imatrix binary, or None if not found.
"""
candidates = [
Path("./llama-imatrix"),
llama_env.quantise_binary.parent / "llama-imatrix",
Path("/usr/local/bin/llama-imatrix"),
Path("/usr/bin/llama-imatrix"),
]
for candidate in candidates:
if candidate.exists() and candidate.is_file():
return candidate
return None

View file

@ -0,0 +1,397 @@
"""Quantisation orchestration service.
High-level orchestration of the complete quantisation workflow from model
acquisition through processing to upload. Manages parallel processing,
status tracking, and cleanup operations for efficient resource utilisation.
"""
from __future__ import annotations
from concurrent.futures import Future, ThreadPoolExecutor
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
from helpers.config.quantisation_configs import QUANTISATION_CONFIGS, SUPPORTED_QUANTISATION_TYPES
from helpers.logger import logger
from helpers.models.quantisation import (
ModelSource,
QuantisationContext,
QuantisationResult,
QuantisationType,
)
from helpers.services.huggingface import ReadmeGenerator
from helpers.services.llama_cpp import EnvironmentManager, IMatrixGenerator
from helpers.services.quantisation import HuggingFaceUploader, ModelManager, QuantisationEngine
from helpers.utils.tensor_mapping import URLParser
@dataclass(slots=True)
class QuantisationOrchestrator:
"""Orchestrates the complete quantisation workflow.
Uses dataclass with slots for efficient memory usage and dependency injection
for modular service interaction following SOLID principles.
"""
work_dir: Path = field(default_factory=lambda: Path.cwd() / "quantisation_work")
use_imatrix: bool = True
imatrix_base: str = "Q4_K_M"
no_upload: bool = False
# Service dependencies with factory defaults
url_parser: URLParser = field(default_factory=URLParser)
quantisation_engine: QuantisationEngine = field(default_factory=QuantisationEngine)
imatrix_generator: IMatrixGenerator = field(default_factory=IMatrixGenerator)
readme_generator: ReadmeGenerator = field(default_factory=ReadmeGenerator)
uploader: HuggingFaceUploader = field(default_factory=HuggingFaceUploader)
# Computed properties
models_dir: Path = field(init=False)
environment_manager: EnvironmentManager = field(init=False)
model_manager: ModelManager = field(init=False)
def __post_init__(self) -> None:
"""Initialise computed properties after dataclass construction."""
self.models_dir = self.work_dir / "models"
self.environment_manager = EnvironmentManager(self.work_dir)
self.model_manager = ModelManager(self.models_dir, self.environment_manager)
def quantise(self, url: str) -> dict[QuantisationType, QuantisationResult]:
"""Main quantisation workflow orchestrating model processing from URL to upload.
Returns:
dict[QuantisationType, QuantisationResult]: Quantisation results for each type.
"""
logger.info("Starting Bartowski quantisation process...")
# Setup and preparation
model_source, llama_env, f16_model_path, imatrix_path, output_repo = (
self._setup_environment(url)
)
# Create initial repository
self._create_initial_repository(model_source, output_repo)
# Execute all quantisations
results = self._execute_quantisations(
model_source, llama_env, f16_model_path, imatrix_path, output_repo
)
# Cleanup
self._cleanup_files(f16_model_path, model_source)
self._print_completion_summary(model_source, results, output_repo)
return results
def _setup_environment(self, url: str) -> tuple[ModelSource, Any, Path, Path | None, str]:
"""Setup environment and prepare model for quantisation.
Returns:
Tuple of (model_source, llama_env, f16_model_path, imatrix_path, output_repo).
"""
model_source = self.url_parser.parse(url)
self._print_model_info(model_source)
self.models_dir.mkdir(parents=True, exist_ok=True)
llama_env = self.environment_manager.setup()
f16_model_path = self.model_manager.prepare_model(model_source, llama_env)
imatrix_path = None
if self.use_imatrix:
logger.info("Generating importance matrix (imatrix)...")
imatrix_path = self.imatrix_generator.generate_imatrix(
f16_model_path, llama_env, self.models_dir / model_source.model_name
)
output_repo = (
f"{self.uploader.get_username()}/"
f"{model_source.original_author}-{model_source.model_name}-GGUF"
)
return model_source, llama_env, f16_model_path, imatrix_path, output_repo
def _create_initial_repository(self, model_source: ModelSource, output_repo: str) -> None:
"""Create initial repository with planned quantisations."""
logger.info("Creating initial README with planned quantisations...")
planned_results = {
qt: QuantisationResult(quantisation_type=qt, success=False, status="planned")
for qt in SUPPORTED_QUANTISATION_TYPES
}
readme_path = self.readme_generator.generate(
model_source, planned_results, self.models_dir, output_repo
)
if not self.no_upload:
logger.info("Creating repository with planned quantisations...")
self.uploader.upload_readme(output_repo, readme_path)
else:
logger.info("Skipping repository creation (--no-upload specified)")
def _execute_quantisations(
self,
model_source: ModelSource,
llama_env: Any,
f16_model_path: Path,
imatrix_path: Path | None,
output_repo: str,
) -> dict[QuantisationType, QuantisationResult]:
"""Execute all quantisation types with parallel uploads.
Returns:
dict[QuantisationType, QuantisationResult]: Quantisation results for each type.
"""
results: dict[QuantisationType, QuantisationResult] = {}
upload_futures: list[Future[None]] = []
with ThreadPoolExecutor(max_workers=1, thread_name_prefix="uploader") as upload_executor:
for quant_type in SUPPORTED_QUANTISATION_TYPES:
result = self._process_single_quantisation(
quant_type,
model_source,
llama_env,
f16_model_path,
imatrix_path,
output_repo,
results,
upload_executor,
upload_futures,
)
results[quant_type] = result
self._wait_for_uploads(upload_futures)
return results
def _process_single_quantisation(
self,
quant_type: QuantisationType,
model_source: ModelSource,
llama_env: Any,
f16_model_path: Path,
imatrix_path: Path | None,
output_repo: str,
results: dict[QuantisationType, QuantisationResult],
upload_executor: ThreadPoolExecutor,
upload_futures: list,
) -> QuantisationResult:
"""Process a single quantisation type.
Returns:
QuantisationResult: Result of the quantisation attempt.
"""
try:
logger.info(f"Starting {quant_type.value} quantisation...")
config = QUANTISATION_CONFIGS[quant_type]
# Update status to processing
result = QuantisationResult(quantisation_type=quant_type, success=False)
result.status = "processing"
results[quant_type] = result
self._update_readme_status(model_source, results, output_repo)
# Perform quantisation
context = QuantisationContext(
f16_model_path=f16_model_path,
model_source=model_source,
config=config,
llama_env=llama_env,
models_dir=self.models_dir,
imatrix_path=imatrix_path,
base_quant=self.imatrix_base,
)
result = self.quantisation_engine.quantise(context)
self._handle_quantisation_result(
result,
quant_type,
model_source,
results,
output_repo,
upload_executor,
upload_futures,
)
except Exception as e:
return self._handle_quantisation_error(
e, quant_type, model_source, results, output_repo
)
else:
return result
def _handle_quantisation_result(
self,
result: QuantisationResult,
quant_type: QuantisationType,
model_source: ModelSource,
results: dict[QuantisationType, QuantisationResult],
output_repo: str,
upload_executor: ThreadPoolExecutor,
upload_futures: list,
) -> None:
"""Handle successful or failed quantisation result."""
if result.success and result.file_path:
quant_str = getattr(result.quantisation_type, "value", result.quantisation_type)
logger.info(f"Starting parallel upload of {quant_str}...")
upload_future = upload_executor.submit(
self._upload_and_cleanup,
output_repo,
result.file_path,
quant_type,
model_source,
results,
)
upload_futures.append(upload_future)
result.file_path = None # Mark as being uploaded
result.status = "uploading"
else:
result.status = "failed"
self._update_readme_status(model_source, results, output_repo)
def _handle_quantisation_error(
self,
error: Exception,
quant_type: QuantisationType,
model_source: ModelSource,
results: dict[QuantisationType, QuantisationResult],
output_repo: str,
) -> QuantisationResult:
"""Handle quantisation processing error.
Returns:
QuantisationResult: Failed quantisation result with error information.
"""
logger.error(f"Error processing {quant_type.value}: {error}")
result = QuantisationResult(quantisation_type=quant_type, success=False)
result.status = "failed"
result.error_message = str(error)
try:
self._update_readme_status(model_source, results, output_repo)
except Exception as readme_error:
logger.error(f"Failed to update README after error: {readme_error}")
return result
def _update_readme_status(
self,
model_source: ModelSource,
results: dict[QuantisationType, QuantisationResult],
output_repo: str,
) -> None:
"""Update README with current quantisation status."""
if not self.no_upload:
updated_readme_path = self.readme_generator.generate(
model_source, results, self.models_dir, output_repo
)
self.uploader.upload_readme(output_repo, updated_readme_path)
def _wait_for_uploads(self, upload_futures: list) -> None:
"""Wait for all parallel uploads to complete."""
logger.info("Waiting for any remaining uploads to complete...")
for future in upload_futures:
try:
future.result(timeout=300) # 5 minute timeout per upload
except Exception as e:
logger.warning(f"Upload error: {e}")
def _cleanup_files(self, f16_model_path: Path, model_source: ModelSource) -> None:
"""Clean up temporary files after processing."""
if f16_model_path.exists():
logger.info(f"Removing F16 model {f16_model_path.name} to save disk space...")
f16_model_path.unlink()
if not model_source.is_gguf_repo:
self._cleanup_original_model(model_source)
def _cleanup_original_model(self, model_source: ModelSource) -> None:
"""Clean up original safetensors/PyTorch files after successful conversion."""
model_dir = self.models_dir / model_source.model_name
pytorch_files = list(model_dir.glob("pytorch_model*.bin"))
if pytorch_files:
logger.info(f"Removing {len(pytorch_files)} PyTorch model files to save disk space...")
for file in pytorch_files:
file.unlink()
logger.info("Keeping config files, tokeniser, and metadata for reference")
def _upload_and_cleanup(
self,
output_repo: str,
file_path: Path,
quant_type: QuantisationType,
model_source: ModelSource,
results: dict[QuantisationType, QuantisationResult],
) -> None:
"""Upload file and clean up (runs in background thread)."""
try:
logger.info(f"[PARALLEL] Uploading {quant_type}...")
self.uploader.upload_model_file(output_repo, file_path)
logger.info(f"[PARALLEL] Removing {file_path.name} to save disk space...")
file_path.unlink()
results[quant_type].status = "completed"
updated_readme_path = self.readme_generator.generate(
model_source, results, self.models_dir, output_repo
)
self.uploader.upload_readme(output_repo, updated_readme_path)
logger.info(f"[PARALLEL] {quant_type} upload and cleanup complete")
except Exception as e:
logger.error(f"[PARALLEL] Failed to upload {quant_type}: {e}")
results[quant_type].status = "failed"
results[quant_type].error_message = str(e)
updated_readme_path = self.readme_generator.generate(
model_source, results, self.models_dir, output_repo
)
self.uploader.upload_readme(output_repo, updated_readme_path)
raise
def _print_model_info(self, model_source: ModelSource) -> None:
"""Print model information."""
logger.info(f"Source URL: {model_source.url}")
logger.info(f"Source model: {model_source.source_model}")
logger.info(f"Original author: {model_source.original_author}")
logger.info(f"Model name: {model_source.model_name}")
logger.info(f"Your HF username: {self.uploader.get_username()}")
logger.info(f"Working directory: {self.work_dir}")
def _print_completion_summary(
self,
model_source: ModelSource,
results: dict[QuantisationType, QuantisationResult],
output_repo: str,
) -> None:
"""Print completion summary."""
successful_results = [r for r in results.values() if r.success]
if successful_results:
logger.info("Complete! Your quantised models are available at:")
logger.info(f" https://huggingface.co/{output_repo}")
logger.info("Model info:")
logger.info(f" - Source URL: {model_source.url}")
logger.info(f" - Original: {model_source.source_model}")
logger.info(
" - Method: "
f"{'Direct GGUF download' if model_source.is_gguf_repo else 'HF model conversion'}"
)
logger.info(f" - Quantised: {output_repo}")
for result in successful_results:
if result.file_size:
filename = (
f"{model_source.original_author}-{model_source.model_name}-"
f"{result.quantisation_type}.gguf"
)
logger.info(f" - {result.quantisation_type}: {filename} ({result.file_size})")
else:
logger.error(
"All quantisations failed - repository created with documentation "
"but no model files"
)
logger.error(f" Repository: https://huggingface.co/{output_repo}")

View file

@ -0,0 +1,486 @@
"""Quantisation operations service.
Provides modular quantisation engine, model management, and upload capabilities
for GGUF model processing. Consolidates quantisation logic from various tools
into reusable components following SOLID principles.
"""
from __future__ import annotations
import shutil
import subprocess
from typing import TYPE_CHECKING
from helpers.logger import logger
from helpers.models.quantisation import (
ModelSource,
QuantisationContext,
QuantisationResult,
QuantisationType,
)
from helpers.services.filesystem import FilesystemService
if TYPE_CHECKING:
from pathlib import Path
from helpers.models.quantisation import LlamaCppEnvironment
from helpers.services.llama_cpp import EnvironmentManager
class QuantisationEngine:
"""Handles the actual quantisation process with configurable methods.
Provides flexible quantisation execution supporting multiple tensor
precision configurations, importance matrices, and fallback strategies.
Encapsulates llama-quantize binary interactions with real-time output.
"""
def __init__(self) -> None:
"""Initialise quantisation engine."""
self.fs = FilesystemService()
def quantise(self, context: QuantisationContext) -> QuantisationResult:
"""Perform quantisation using the specified configuration.
Executes quantisation with primary and fallback methods, handling
tensor-specific precision overrides and importance matrix guidance.
Returns:
QuantisationResult with success status and file information.
"""
logger.info(
f"⚙️ Creating {context.config.name} quantisation ({context.config.description})..."
)
output_path = context.get_output_path()
logger.info(f"🎯 Attempting {context.config.name} quantisation...")
logger.info(f"📝 Source: {context.f16_model_path}")
logger.info(f"📝 Target: {output_path}")
# Try primary method
if self._try_quantisation_method(
context, output_path, context.config.tensor_types, "method 1"
):
return self._create_success_result(context.config.name, output_path, "method 1")
# Try fallback methods
for i, fallback_method in enumerate(context.config.fallback_methods, 2):
method_name = f"method {i}"
if self._try_quantisation_method(context, output_path, fallback_method, method_name):
return self._create_success_result(context.config.name, output_path, method_name)
logger.error("All %s quantisation methods failed", context.config.name)
return QuantisationResult(
quantisation_type=QuantisationType(context.config.name),
success=False,
error_message="All quantisation methods failed",
)
def _try_quantisation_method(
self,
context: QuantisationContext,
output_path: Path,
tensor_config: dict[str, str],
method_name: str,
) -> bool:
"""Try a specific quantisation method with real-time output.
Builds and executes llama-quantize command with appropriate parameters,
streaming output for progress monitoring.
Returns:
True if quantisation successful, False otherwise.
"""
logger.info(f"🔍 Trying {method_name}...")
cmd = self._build_quantisation_command(context, output_path, tensor_config)
return self._execute_quantisation_command(cmd, method_name)
def _build_quantisation_command(
self, context: QuantisationContext, output_path: Path, tensor_config: dict[str, str]
) -> list[str]:
"""Build quantisation command with all required parameters.
Returns:
List of command arguments.
"""
cmd = [str(context.llama_env.quantise_binary)]
# Add imatrix if available
if context.imatrix_path and context.imatrix_path.exists():
cmd.extend(["--imatrix", str(context.imatrix_path)])
logger.info(f"🧮 Using imatrix: {context.imatrix_path.name}")
# Add tensor type arguments
self._add_tensor_type_arguments(cmd, tensor_config)
cmd.extend([str(context.f16_model_path), str(output_path), context.base_quant])
return cmd
def _add_tensor_type_arguments(self, cmd: list[str], tensor_config: dict[str, str]) -> None:
"""Add tensor type arguments to command."""
if not tensor_config:
return
for tensor_name, quant_type in tensor_config.items():
if tensor_name.startswith(("token-embedding-type", "output-tensor-type")):
cmd.extend([f"--{tensor_name}", quant_type])
else:
cmd.extend(["--tensor-type", f"{tensor_name}={quant_type}"])
def _execute_quantisation_command(self, cmd: list[str], method_name: str) -> bool:
"""Execute quantisation command with real-time output.
Returns:
True if quantisation successful, False otherwise.
"""
logger.info(f"💻 Running: {' '.join(cmd)}")
logger.info("⏳ Quantisation in progress... (this may take several minutes)")
try:
process = subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
universal_newlines=True,
bufsize=1,
)
self._stream_quantisation_output(process)
return_code = process.poll()
if return_code == 0:
logger.info(f"{method_name} quantisation successful!")
return True
except Exception as e:
logger.info(f"{method_name} failed with exception: {e}")
return False
else:
logger.info(f"{method_name} failed with return code {return_code}")
return False
def _stream_quantisation_output(self, process: subprocess.Popen) -> None:
"""Stream quantisation output in real-time."""
while True:
if process.stdout is not None:
output = process.stdout.readline()
else:
break
if not output and process.poll() is not None:
break
if output:
logger.info(f"📊 {output.strip()}")
def _create_success_result(
self, quant_type: str, output_path: Path, method_used: str
) -> QuantisationResult:
"""Create successful quantisation result with file metadata.
Returns:
QuantisationResult with file path and size information.
"""
file_size = self.fs.get_file_size(output_path)
return QuantisationResult(
quantisation_type=QuantisationType(quant_type),
success=True,
file_path=output_path,
file_size=file_size,
method_used=method_used,
)
class ModelManager:
"""Handles model downloading and preparation for quantisation.
Manages both GGUF repository downloads and HuggingFace model conversions,
providing unified interface for model acquisition and preparation.
"""
def __init__(self, models_dir: Path, environment_manager: EnvironmentManager) -> None:
"""Initialise model manager with storage and environment configuration.
Sets up model storage directory and links to environment manager for
conversion script access and llama.cpp tool discovery.
"""
self.models_dir = models_dir
self.environment_manager = environment_manager
self.fs = FilesystemService()
def prepare_model(self, model_source: ModelSource, llama_env: LlamaCppEnvironment) -> Path:
"""Prepare model for quantisation and return F16 model path.
Handles both GGUF repository downloads and regular HuggingFace model
conversion workflows with automatic format detection.
Returns:
Path to F16 GGUF model ready for quantisation.
"""
model_dir = self.models_dir / model_source.model_name
if model_source.is_gguf_repo:
return self._handle_gguf_repo(model_source, model_dir)
return self._handle_regular_repo(model_source, model_dir, llama_env)
def _handle_gguf_repo(self, model_source: ModelSource, model_dir: Path) -> Path:
"""Handle GGUF repository download with pattern matching.
Downloads GGUF files matching specified patterns, prioritising
multi-part files and F16 variants.
Returns:
Path to downloaded or existing GGUF file.
"""
logger.info(f"⬇️ Downloading GGUF file from repository: {model_source.source_model}")
logger.info(f"🔍 Looking for file pattern: *{model_source.gguf_file_pattern}*")
f16_model = model_dir / f"{model_source.model_name}-f16.gguf"
if f16_model.exists():
logger.info(f"✅ Found existing F16 file: {f16_model.name}")
return f16_model
# Check for existing GGUF files
model_dir.mkdir(parents=True, exist_ok=True)
existing_gguf = self.fs.find_gguf_files(model_dir)
if existing_gguf:
logger.info(f"✅ Found existing GGUF file: {existing_gguf[0].name}")
return existing_gguf[0]
# Download with patterns
downloaded_file = self._download_gguf_with_patterns(
model_source.source_model, model_source.gguf_file_pattern, model_dir
)
if downloaded_file:
# Handle multi-part files
if "00001-of-" in downloaded_file.name:
return downloaded_file
if "-00002-of-" in downloaded_file.name or "-00003-of-" in downloaded_file.name:
base_name = downloaded_file.name.replace("-00002-of-", "-00001-of-").replace(
"-00003-of-", "-00001-of-"
)
first_part = downloaded_file.parent / base_name
if first_part.exists():
logger.info(f"🔄 Using first part: {first_part.name}")
return first_part
# Rename single file to standard name
downloaded_file.rename(f16_model)
return f16_model
# Fallback to regular conversion
logger.info("💡 Falling back to downloading full repository and converting...")
return self._handle_regular_repo(
ModelSource(**{**model_source.dict(), "is_gguf_repo": False}),
model_dir,
None,
)
def _download_gguf_with_patterns(
self, source_model: str, pattern: str | None, model_dir: Path
) -> Path | None:
"""Download GGUF file using various pattern strategies.
Tries multiple pattern variations to find and download appropriate
GGUF files, handling timeouts and temporary directories.
Returns:
Path to downloaded file, or None if all patterns fail.
"""
if pattern:
patterns = [
f"*{pattern}*",
f"*{pattern.lower()}*",
f"*{pattern.upper()}*",
"*f16*",
"*F16*",
"*fp16*",
]
else:
patterns = ["*f16*", "*F16*", "*fp16*"]
temp_dir = model_dir / "gguf_temp"
for search_pattern in patterns:
logger.info(f"🔍 Trying pattern: {search_pattern}")
temp_dir.mkdir(exist_ok=True)
try:
subprocess.run(
[
"timeout",
"300",
"huggingface-cli",
"download",
source_model,
"--include",
search_pattern,
"--local-dir",
str(temp_dir),
],
check=True,
capture_output=True,
)
# Find downloaded GGUF files
gguf_files = self.fs.find_gguf_files(temp_dir, pattern)
if gguf_files:
found_file = gguf_files[0]
logger.info(f"✅ Found GGUF file: {found_file.name}")
# Move to parent directory
final_path = model_dir / found_file.name
shutil.move(str(found_file), str(final_path))
shutil.rmtree(temp_dir)
return final_path
except subprocess.CalledProcessError:
logger.info(f"⚠️ Pattern {search_pattern} failed or timed out")
continue
finally:
if temp_dir.exists():
shutil.rmtree(temp_dir, ignore_errors=True)
return None
def _handle_regular_repo(
self,
model_source: ModelSource,
model_dir: Path,
llama_env: LlamaCppEnvironment | None,
) -> Path:
"""Handle regular HuggingFace repository conversion.
Downloads full model repository and converts to F16 GGUF format
using llama.cpp conversion scripts.
Returns:
Path to converted F16 GGUF model.
"""
logger.info(f"⬇️ Downloading source model: {model_source.source_model}")
if not model_dir.exists():
subprocess.run(
[
"huggingface-cli",
"download",
model_source.source_model,
"--local-dir",
str(model_dir),
],
check=True,
)
else:
logger.info("✅ Model already downloaded")
logger.info("🔄 Converting to GGUF F16 format...")
f16_model = model_dir / f"{model_source.model_name}-f16.gguf"
if not f16_model.exists():
if not llama_env:
llama_env = self.environment_manager.setup()
# Ensure conversion script is available
if llama_env.use_repo or not self.environment_manager.llama_cpp_dir.exists():
logger.info("Getting conversion script from llama.cpp repository...")
llama_env = self.environment_manager.setup_repository()
subprocess.run(
[
*llama_env.convert_script.split(),
str(model_dir),
"--outtype",
"f16",
"--outfile",
str(f16_model),
],
check=True,
)
else:
logger.info("✅ F16 model already exists")
return f16_model
class HuggingFaceUploader:
"""Handles uploading models and documentation to HuggingFace.
Provides methods for repository creation, file uploads, and README
updates with proper error handling and retry logic.
"""
@staticmethod
def get_username() -> str:
"""Get authenticated HuggingFace username.
Returns:
HuggingFace username from CLI authentication.
Raises:
RuntimeError: If not authenticated.
"""
try:
result = subprocess.run(
["huggingface-cli", "whoami"],
capture_output=True,
text=True,
check=True,
)
return result.stdout.strip()
except (subprocess.CalledProcessError, FileNotFoundError) as err:
msg = "Please log in to HuggingFace first: huggingface-cli login"
raise RuntimeError(msg) from err
def upload_readme(self, output_repo: str, readme_path: Path) -> None:
"""Upload or update README file to repository.
Creates repository if needed, handles existing repository updates.
"""
logger.info("Uploading README...")
try:
subprocess.run(
[
"huggingface-cli",
"upload",
output_repo,
str(readme_path),
"README.md",
"--create",
],
check=True,
capture_output=True,
)
logger.info("README uploaded")
except subprocess.CalledProcessError:
# Repository exists, update without --create
subprocess.run(
[
"huggingface-cli",
"upload",
output_repo,
str(readme_path),
"README.md",
],
check=True,
)
logger.info("README updated")
def upload_model_file(self, output_repo: str, model_path: Path) -> None:
"""Upload model file to repository.
Uploads GGUF model file to specified repository path.
"""
logger.info(f"Uploading {model_path.name}...")
subprocess.run(
[
"huggingface-cli",
"upload",
output_repo,
str(model_path),
model_path.name,
],
check=True,
)
logger.info(f"{model_path.name} uploaded")

16
helpers/utils/__init__.py Normal file
View file

@ -0,0 +1,16 @@
"""Utility functions for llm-gguf-tools.
Provides low-level utilities for tensor mapping, configuration parsing,
and other common operations. Uses UK English spelling conventions throughout.
"""
from __future__ import annotations
from helpers.utils.config_parser import ConfigParser
from helpers.utils.tensor_mapping import TensorMapper, URLParser
__all__ = [
"ConfigParser",
"TensorMapper",
"URLParser",
]

View file

@ -0,0 +1,171 @@
"""Configuration parsing utilities.
Provides utilities for parsing model configurations, inferring parameters,
and handling architecture-specific settings. Uses UK English spelling
conventions throughout.
"""
from __future__ import annotations
from typing import TYPE_CHECKING, Any
from helpers.models.conversion import GGUFParameters, ModelConfig, VisionConfig
from helpers.services.filesystem import FilesystemService
if TYPE_CHECKING:
from pathlib import Path
class ConfigParser:
"""Parses and transforms model configuration files.
Handles loading of HuggingFace config.json files, parameter inference,
and conversion to GGUF-compatible formats. Provides sensible defaults
for missing values and architecture-specific handling.
"""
def __init__(self) -> None:
"""Initialise ConfigParser."""
self.fs = FilesystemService()
def load_model_config(self, model_path: Path) -> ModelConfig:
"""Load model configuration from config.json file.
Reads the standard HuggingFace config.json file and parses it into
a structured ModelConfig instance with proper type validation. Handles
vision model configurations and provides sensible defaults for missing values.
Returns:
Parsed ModelConfig instance.
"""
config_file = model_path / "config.json"
raw_config = self.fs.load_json_config(config_file)
# Parse vision config if present
vision_config = None
if "vision_config" in raw_config:
vision_config = VisionConfig(**raw_config["vision_config"])
# Create ModelConfig with parsed values
return ModelConfig(
architectures=raw_config.get("architectures", ["Unknown"]),
model_type=raw_config.get("model_type", "unknown"),
vocab_size=raw_config.get("vocab_size", 32000),
max_position_embeddings=raw_config.get("max_position_embeddings", 2048),
hidden_size=raw_config.get("hidden_size", 4096),
num_hidden_layers=raw_config.get("num_hidden_layers", 32),
intermediate_size=raw_config.get("intermediate_size", 11008),
num_attention_heads=raw_config.get("num_attention_heads", 32),
num_key_value_heads=raw_config.get("num_key_value_heads"),
rope_theta=raw_config.get("rope_theta", 10000.0),
rope_scaling=raw_config.get("rope_scaling"),
rms_norm_eps=raw_config.get("rms_norm_eps", 1e-5),
vision_config=vision_config,
)
def infer_gguf_parameters(self, config: ModelConfig) -> GGUFParameters:
"""Infer GGUF parameters from model configuration.
Translates HuggingFace model configuration to GGUF parameter format,
providing sensible defaults for missing values and handling various
architecture conventions.
Args:
config: Parsed ModelConfig instance.
Returns:
GGUFParameters with inferred values.
"""
# Calculate derived parameters
num_heads = config.num_attention_heads
embedding_length = config.hidden_size
rope_dimension_count = embedding_length // num_heads
# Handle KV heads (for GQA models)
num_kv_heads = config.num_key_value_heads or num_heads
# Create GGUFParameters using dict with aliases
params_dict = {
"vocab_size": config.vocab_size,
"context_length": config.max_position_embeddings,
"embedding_length": embedding_length,
"block_count": config.num_hidden_layers,
"feed_forward_length": config.intermediate_size,
"attention.head_count": num_heads,
"attention.head_count_kv": num_kv_heads,
"attention.layer_norm_rms_epsilon": config.rms_norm_eps,
"rope.freq_base": config.rope_theta,
"rope.dimension_count": rope_dimension_count,
}
params = GGUFParameters.model_validate(params_dict)
# Add RoPE scaling if present
if config.rope_scaling:
params.rope_scaling_type = config.rope_scaling.get("type", "linear")
params.rope_scaling_factor = config.rope_scaling.get("factor", 1.0)
return params
@staticmethod
def get_architecture_mapping(architecture: str) -> str:
"""Map architecture names to known GGUF architectures.
Provides fallback mappings for architectures not directly supported
by GGUF, mapping them to similar known architectures.
Args:
architecture: Original architecture name from config.
Returns:
GGUF-compatible architecture name.
"""
# Architecture mappings to known GGUF types
mappings = {
"DotsOCRForCausalLM": "qwen2", # Similar architecture
"GptOssForCausalLM": "llama", # Use llama as fallback
"MistralForCausalLM": "llama", # Mistral is llama-like
"Qwen2ForCausalLM": "qwen2",
"LlamaForCausalLM": "llama",
"GemmaForCausalLM": "gemma",
"Phi3ForCausalLM": "phi3",
# Add more mappings as needed
}
return mappings.get(architecture, "llama") # Default to llama
@staticmethod
def load_tokeniser_config(model_path: Path) -> dict[str, Any]:
"""Load tokeniser configuration from model directory.
Reads tokenizer_config.json to extract special token IDs and
other tokenisation parameters.
Args:
model_path: Path to model directory.
Returns:
Tokeniser configuration dictionary.
"""
fs = FilesystemService()
tokeniser_config_path = model_path / "tokenizer_config.json"
if not tokeniser_config_path.exists():
# Return defaults if no config found
return {
"bos_token_id": 1,
"eos_token_id": 2,
"unk_token_id": 0,
"pad_token_id": 0,
}
config = fs.load_json_config(tokeniser_config_path)
# Extract token IDs with defaults
return {
"bos_token_id": config.get("bos_token_id", 1),
"eos_token_id": config.get("eos_token_id", 2),
"unk_token_id": config.get("unk_token_id", 0),
"pad_token_id": config.get("pad_token_id", 0),
"model_type": config.get("model_type", "llama"),
}

View file

@ -0,0 +1,196 @@
"""Tensor mapping and URL parsing utilities.
Provides utilities for mapping tensor names between different formats,
parsing model URLs, and handling architecture-specific conversions.
Uses UK English spelling conventions throughout.
"""
from __future__ import annotations
import re
from typing import ClassVar
from helpers.models.quantisation import ModelSource, URLType
class TensorMapper:
"""Maps tensor names between HuggingFace and GGUF conventions.
Provides flexible tensor name translation supporting direct mappings,
layer-aware transformations, and architecture-specific overrides.
Handles both simple renames and complex pattern-based conversions.
"""
# Common direct mappings across architectures
DIRECT_MAPPINGS: ClassVar[dict[str, str]] = {
"model.embed_tokens.weight": "token_embd.weight",
"model.norm.weight": "output_norm.weight",
"lm_head.weight": "output.weight",
}
# Layer component patterns for transformer blocks
LAYER_PATTERNS: ClassVar[dict[str, str]] = {
"self_attn.q_proj.weight": "attn_q.weight",
"self_attn.q_proj.bias": "attn_q.bias",
"self_attn.k_proj.weight": "attn_k.weight",
"self_attn.k_proj.bias": "attn_k.bias",
"self_attn.v_proj.weight": "attn_v.weight",
"self_attn.v_proj.bias": "attn_v.bias",
"self_attn.o_proj": "attn_output.weight",
"mlp.gate_proj": "ffn_gate.weight",
"mlp.up_proj": "ffn_up.weight",
"mlp.down_proj": "ffn_down.weight",
"input_layernorm": "attn_norm.weight",
"post_attention_layernorm": "ffn_norm.weight",
}
@classmethod
def map_tensor_name(cls, original_name: str) -> str | None:
"""Map original tensor name to GGUF format.
Translates HuggingFace tensor naming to GGUF format, handling embeddings,
attention layers, feed-forward networks, and normalisation layers. Uses
layer-aware mapping for transformer blocks whilst maintaining consistency
across different model architectures.
Returns:
GGUF tensor name, or None if unmappable.
"""
# Check direct mappings first
if original_name in cls.DIRECT_MAPPINGS:
return cls.DIRECT_MAPPINGS[original_name]
# Handle layer-specific tensors
if ".layers." in original_name:
return cls._map_layer_tensor(original_name)
# Return None for unmapped tensors
return None
@classmethod
def _map_layer_tensor(cls, tensor_name: str) -> str | None:
"""Map layer-specific tensor names.
Handles tensors within transformer layers, extracting layer indices
and mapping component names to GGUF conventions.
Args:
tensor_name: Layer tensor name containing .layers.N. pattern.
Returns:
Mapped GGUF tensor name, or None if unmappable.
"""
# Extract layer number
parts = tensor_name.split(".")
layer_idx = None
for i, part in enumerate(parts):
if part == "layers" and i + 1 < len(parts):
layer_idx = parts[i + 1]
break
if layer_idx is None:
return None
# Check each pattern
for pattern, replacement in cls.LAYER_PATTERNS.items():
if pattern in tensor_name:
return f"blk.{layer_idx}.{replacement}"
return None
class URLParser:
"""Parses and validates model URLs from various sources.
Handles HuggingFace URLs, Ollama-style GGUF references, and other
model source formats. Extracts metadata including author, model name,
and file patterns for appropriate download strategies.
"""
@staticmethod
def parse(url: str) -> ModelSource:
"""Parse URL and extract model source information.
Analyses URL format to determine source type and extract relevant
metadata for model download and processing.
Args:
url: Model URL in supported format.
Returns:
ModelSource with parsed information.
Raises:
ValueError: If URL format is not recognised.
"""
if not url:
msg = "URL cannot be empty"
raise ValueError(msg)
# Try Ollama-style GGUF URL first (hf.co/author/model:pattern)
ollama_match = re.match(r"^hf\.co/([^:]+):(.+)$", url)
if ollama_match:
source_model = ollama_match.group(1)
gguf_pattern = ollama_match.group(2)
return URLParser._create_model_source(
url,
URLType.OLLAMA_GGUF,
source_model,
gguf_file_pattern=gguf_pattern,
is_gguf_repo=True,
)
# Try regular HuggingFace URL
hf_match = re.match(r"https://huggingface\.co/([^/]+/[^/?]+)", url)
if hf_match:
source_model = hf_match.group(1)
return URLParser._create_model_source(
url, URLType.HUGGINGFACE, source_model, is_gguf_repo=False
)
msg = (
"Invalid URL format\n"
"Supported formats:\n"
" - https://huggingface.co/username/model-name\n"
" - hf.co/username/model-name-GGUF:F16"
)
raise ValueError(msg)
@staticmethod
def _create_model_source(
url: str,
url_type: URLType,
source_model: str,
gguf_file_pattern: str | None = None,
is_gguf_repo: bool = False,
) -> ModelSource:
"""Create ModelSource with parsed information.
Constructs a ModelSource instance with extracted metadata,
handling author/model name splitting and GGUF suffix removal.
Args:
url: Original URL.
url_type: Type of URL (HuggingFace or Ollama GGUF).
source_model: Repository identifier (author/model).
gguf_file_pattern: Optional GGUF file pattern.
is_gguf_repo: Whether this is a GGUF repository.
Returns:
Configured ModelSource instance.
"""
author, model_name = source_model.split("/", 1)
# Strip -GGUF suffix for GGUF repos
if is_gguf_repo and model_name.endswith("-GGUF"):
model_name = model_name[:-5]
return ModelSource(
url=url,
url_type=url_type,
source_model=source_model,
original_author=author,
model_name=model_name,
gguf_file_pattern=gguf_file_pattern,
is_gguf_repo=is_gguf_repo,
)

96
pyproject.toml Normal file
View file

@ -0,0 +1,96 @@
[project]
name = "llm-gguf-tools"
version = "0.1.0"
description = "Tools to convert and quantise language models in GGUF format"
readme = "README.md"
license = { text = "Apache-2.0" }
authors = [{ name = "Tom Foster", email = "tom@tomfos.tr" }]
maintainers = [{ name = "Tom Foster", email = "tom@tomfos.tr" }]
requires-python = ">=3.13"
classifiers = [
"Development Status :: 3 - Alpha",
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.13",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
"Topic :: Software Development :: Libraries :: Python Modules",
]
dependencies = ["gguf>=0", "pydantic>=2", "safetensors>=0", "torch>=2"]
[project.urls]
Homepage = "https://git.tomfos.tr/tom/llm-gguf-tools"
"Bug Reports" = "https://git.tomfos.tr/tom/llm-gguf-tools/issues"
"Source" = "https://git.tomfos.tr/tom/llm-gguf-tools"
[dependency-groups]
dev = ["pytest>=8", "ruff>=0", "uv>=0"]
[tool.uv]
package = true
[[tool.uv.index]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
[tool.uv.sources]
torch = { index = "pytorch-cpu" }
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"
[project.scripts]
quantise = "quantise:main"
safetensors-to-gguf = "direct_safetensors_to_gguf:main"
[tool.setuptools]
packages = { find = {} }
[tool.ruff]
cache-dir = "/tmp/.ruff_cache"
fix = true
line-length = 100
preview = true
show-fixes = false
target-version = "py313"
unsafe-fixes = true
[tool.ruff.format]
line-ending = "auto"
skip-magic-trailing-comma = false
[tool.ruff.lint]
fixable = ["ALL"]
ignore = [
"ANN401", # use of Any type
"BLE001", # blind Exception usage
"COM812", # missing trailing comma
"CPY", # flake8-copyright
"FBT", # boolean arguments
"PLR0912", # too many branches
"PLR0913", # too many arguments
"PLR0915", # too many statements
"PLR0917", # too many positional arguments
"PLR6301", # method could be static
"RUF029", # async methods that don't await
"S104", # binding to all interfaces
"S110", # passed exceptions
"S404", # use of subprocess
"S603", # check subprocess input
"S607", # subprocess with partial path
"TRY301", # raise inside try block
]
select = ["ALL"]
unfixable = [
"F841", # local variable assigned but never used
"RUF100", # unused noqa comments
"T201", # don't strip print statement
]
[tool.ruff.lint.isort]
combine-as-imports = true
required-imports = ["from __future__ import annotations"]
[tool.ruff.lint.pydocstyle]
convention = "google"

101
quantise_gguf.py Normal file
View file

@ -0,0 +1,101 @@
#!/usr/bin/env python3
"""Bartowski Quantisation Script for advanced GGUF model processing.
Implements a sophisticated quantisation pipeline supporting Q4_K_M, Q4_K_L,
Q4_K_XL, and Q4_K_XXL methods with tensor-level precision control. Features
parallel processing, status tracking, automatic README generation, and
HuggingFace integration for streamlined model distribution workflows.
Usage: python quantise.py <huggingface_url>
"""
from __future__ import annotations
import argparse
import shutil
import sys
from pathlib import Path
from helpers.logger import logger
from helpers.services.orchestrator import QuantisationOrchestrator
def main() -> None:
"""Main entry point for the Bartowski quantisation workflow.
Parses command-line arguments, initialises the quantisation orchestrator,
and executes the complete model processing pipeline from HuggingFace URL
to quantised GGUF files with optional HuggingFace upload and cleanup.
"""
parser = argparse.ArgumentParser(
description="Bartowski Quantisation Script - Supports Q4_K_M, Q4_K_L, Q4_K_XL, Q4_K_XXL",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
python quantise.py https://huggingface.co/DavidAU/Gemma-3-4b-it-Uncensored-DBL-X
python quantise.py hf.co/DavidAU/Gemma-3-it-4B-Uncensored-DBL-X-GGUF:F16
""",
)
parser.add_argument("url", help="HuggingFace model URL")
parser.add_argument(
"--work-dir", type=Path, help="Working directory (default: ./quantisation_work)"
)
parser.add_argument(
"--no-imatrix",
action="store_true",
help="Skip imatrix generation (faster but lower quality)",
)
parser.add_argument(
"--imatrix-base",
choices=[
"Q2_K",
"Q3_K_L",
"Q3_K_M",
"Q3_K_S",
"Q4_K_S",
"Q4_K_M",
"Q5_K_S",
"Q5_K_M",
"Q6_K",
"Q8_0",
],
default="Q4_K_M",
help="Base quantisation for imatrix generation",
)
parser.add_argument(
"--no-upload",
action="store_true",
help="Skip uploading to HuggingFace (local testing only)",
)
args = parser.parse_args()
if not args.url:
parser.print_help()
sys.exit(1)
try:
orchestrator = QuantisationOrchestrator(
work_dir=args.work_dir or Path.cwd() / "quantisation_work",
use_imatrix=not args.no_imatrix,
imatrix_base=args.imatrix_base,
no_upload=args.no_upload,
)
orchestrator.quantise(args.url)
# Cleanup prompt
logger.info("Cleaning up...")
response = input("Delete working files? (y/N): ").strip().lower()
if response == "y":
shutil.rmtree(orchestrator.work_dir)
logger.info("Cleanup complete")
else:
logger.info(f"Working files kept in: {orchestrator.work_dir}")
except Exception as e:
logger.error(f"Error: {e}")
sys.exit(1)
if __name__ == "__main__":
main()

2482
resources/imatrix_data.txt Normal file

File diff suppressed because one or more lines are too long

95
safetensors2gguf.py Normal file
View file

@ -0,0 +1,95 @@
#!/usr/bin/env python3
"""Direct SafeTensors to GGUF converter for unsupported architectures.
This script attempts to convert SafeTensors models to GGUF format directly,
without relying on llama.cpp's architecture-specific conversion logic.
"""
from __future__ import annotations
import sys
import traceback
from argparse import ArgumentParser
from pathlib import Path
from helpers.logger import logger
from helpers.services.gguf import GGUFConverter
from helpers.utils.config_parser import ConfigParser
from helpers.utils.tensor_mapping import TensorMapper
def convert_safetensors_to_gguf(
model_path: Path, output_path: Path, force_architecture: str | None = None
) -> bool:
"""Convert SafeTensors model to GGUF format with comprehensive metadata handling.
Orchestrates the complete conversion workflow: loads configuration, maps
architecture to known GGUF types, creates writer with proper metadata,
processes all tensor files with name mapping, and adds tokeniser data.
Handles BFloat16 conversion and provides fallback architecture mapping
for unsupported model types to ensure maximum compatibility.
Returns:
True if conversion was successful, False otherwise.
"""
# Use ConfigParser to load configuration
config_parser = ConfigParser()
model_config = config_parser.load_model_config(model_path)
arch_name = model_config.architectures[0]
model_type = model_config.model_type
logger.info(f"Architecture: {arch_name}")
logger.info(f"Model type: {model_type}")
# Use forced architecture or try to map to a known one
if force_architecture:
arch = force_architecture
logger.warning(f"Using forced architecture: {arch}")
else:
# Use ConfigParser's architecture mapping
arch = config_parser.get_architecture_mapping(arch_name)
if arch != arch_name:
logger.warning(f"Unknown architecture {arch_name}, using {arch} as fallback")
# Use the new GGUFConverter for the conversion
tensor_mapper = TensorMapper()
return GGUFConverter.convert_safetensors(
model_path, output_path, model_config, arch, tensor_mapper
)
def main() -> None:
"""Main entry point for SafeTensors to GGUF conversion command-line interface.
Parses command-line arguments, validates input paths, and orchestrates the
conversion process with proper error handling. Supports forced architecture
mapping and flexible output path specification. Provides comprehensive
error reporting and exit codes for integration with automated workflows.
"""
parser = ArgumentParser(description="Convert SafeTensors to GGUF directly")
parser.add_argument("model_path", help="Path to SafeTensors model directory")
parser.add_argument("-o", "--output", help="Output GGUF file path")
parser.add_argument("--force-arch", help="Force a specific architecture mapping")
args = parser.parse_args()
model_path = Path(args.model_path)
if not model_path.exists():
logger.error(f"Model path not found: {model_path}")
sys.exit(1)
output_path = Path(args.output) if args.output else model_path / f"{model_path.name}-f32.gguf"
try:
success = convert_safetensors_to_gguf(model_path, output_path, args.force_arch)
sys.exit(0 if success else 1)
except Exception as e:
logger.error(f"Conversion failed: {e}")
traceback.print_exc()
sys.exit(1)
if __name__ == "__main__":
main()

425
uv.lock generated Normal file
View file

@ -0,0 +1,425 @@
version = 1
revision = 2
requires-python = ">=3.13"
resolution-markers = [
"sys_platform != 'darwin'",
"sys_platform == 'darwin'",
]
[[package]]
name = "annotated-types"
version = "0.7.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/ee/67/531ea369ba64dcff5ec9c3402f9f51bf748cec26dde048a2f973a4eea7f5/annotated_types-0.7.0.tar.gz", hash = "sha256:aff07c09a53a08bc8cfccb9c85b05f1aa9a2a6f23728d790723543408344ce89", size = 16081, upload-time = "2024-05-20T21:33:25.928Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/78/b6/6307fbef88d9b5ee7421e68d78a9f162e0da4900bc5f5793f6d3d0e34fb8/annotated_types-0.7.0-py3-none-any.whl", hash = "sha256:1f02e8b43a8fbbc3f3e0d4f0f4bfc8131bcb4eebe8849b8e5c773f3a1c582a53", size = 13643, upload-time = "2024-05-20T21:33:24.1Z" },
]
[[package]]
name = "colorama"
version = "0.4.6"
source = { registry = "https://download.pytorch.org/whl/cpu" }
wheels = [
{ url = "https://download.pytorch.org/whl/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6" },
]
[[package]]
name = "filelock"
version = "3.13.1"
source = { registry = "https://download.pytorch.org/whl/cpu" }
wheels = [
{ url = "https://download.pytorch.org/whl/filelock-3.13.1-py3-none-any.whl", hash = "sha256:57dbda9b35157b05fb3e58ee91448612eb674172fab98ee235ccb0b5bee19a1c" },
]
[[package]]
name = "fsspec"
version = "2024.6.1"
source = { registry = "https://download.pytorch.org/whl/cpu" }
wheels = [
{ url = "https://download.pytorch.org/whl/fsspec-2024.6.1-py3-none-any.whl", hash = "sha256:3cb443f8bcd2efb31295a5b9fdb02aee81d8452c80d28f97a6d0959e6cee101e" },
]
[[package]]
name = "gguf"
version = "0.17.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "numpy" },
{ name = "pyyaml" },
{ name = "tqdm" },
]
sdist = { url = "https://files.pythonhosted.org/packages/08/08/7de1ca4b71e7bf33b547f82bb22505e221b5fa42f67d635e200e0ad22ad6/gguf-0.17.1.tar.gz", hash = "sha256:36ad71aad900a3e75fc94ebe96ea6029f03a4e44be7627ef7ad3d03e8c7bcb53", size = 89338, upload-time = "2025-06-19T14:00:33.705Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/fc/31/6a93a887617ee7deeaa602ca3d02d1c12a6cb8a742a695de5d128f5fa46a/gguf-0.17.1-py3-none-any.whl", hash = "sha256:7bc5aa7eeb1931f7d39b48fdc5b38fda6b294b9dca75cf607ac69557840a3943", size = 96224, upload-time = "2025-06-19T14:00:32.88Z" },
]
[[package]]
name = "iniconfig"
version = "2.1.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/f2/97/ebf4da567aa6827c909642694d71c9fcf53e5b504f2d96afea02718862f3/iniconfig-2.1.0.tar.gz", hash = "sha256:3abbd2e30b36733fee78f9c7f7308f2d0050e88f0087fd25c2645f63c773e1c7", size = 4793, upload-time = "2025-03-19T20:09:59.721Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/2c/e1/e6716421ea10d38022b952c159d5161ca1193197fb744506875fbb87ea7b/iniconfig-2.1.0-py3-none-any.whl", hash = "sha256:9deba5723312380e77435581c6bf4935c94cbfab9b1ed33ef8d238ea168eb760", size = 6050, upload-time = "2025-03-19T20:10:01.071Z" },
]
[[package]]
name = "jinja2"
version = "3.1.4"
source = { registry = "https://download.pytorch.org/whl/cpu" }
dependencies = [
{ name = "markupsafe" },
]
wheels = [
{ url = "https://download.pytorch.org/whl/Jinja2-3.1.4-py3-none-any.whl", hash = "sha256:bc5dd2abb727a5319567b7a813e6a2e7318c39f4f487cfe6c89c6f9c7d25197d" },
]
[[package]]
name = "llm-gguf-tools"
version = "0.1.0"
source = { editable = "." }
dependencies = [
{ name = "gguf" },
{ name = "pydantic" },
{ name = "safetensors" },
{ name = "torch", version = "2.8.0", source = { registry = "https://download.pytorch.org/whl/cpu" }, marker = "sys_platform == 'darwin'" },
{ name = "torch", version = "2.8.0+cpu", source = { registry = "https://download.pytorch.org/whl/cpu" }, marker = "sys_platform != 'darwin'" },
]
[package.dev-dependencies]
dev = [
{ name = "pytest" },
{ name = "ruff" },
{ name = "uv" },
]
[package.metadata]
requires-dist = [
{ name = "gguf", specifier = ">=0" },
{ name = "pydantic", specifier = ">=2" },
{ name = "safetensors", specifier = ">=0" },
{ name = "torch", specifier = ">=2", index = "https://download.pytorch.org/whl/cpu" },
]
[package.metadata.requires-dev]
dev = [
{ name = "pytest", specifier = ">=8" },
{ name = "ruff", specifier = ">=0" },
{ name = "uv", specifier = ">=0" },
]
[[package]]
name = "markupsafe"
version = "3.0.2"
source = { registry = "https://download.pytorch.org/whl/cpu" }
wheels = [
{ url = "https://download.pytorch.org/whl/MarkupSafe-3.0.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:15ab75ef81add55874e7ab7055e9c397312385bd9ced94920f2802310c930396" },
]
[[package]]
name = "mpmath"
version = "1.3.0"
source = { registry = "https://download.pytorch.org/whl/cpu" }
wheels = [
{ url = "https://download.pytorch.org/whl/mpmath-1.3.0-py3-none-any.whl", hash = "sha256:a0b2b9fe80bbcd81a6647ff13108738cfb482d481d826cc0e02f5b35e5c88d2c" },
]
[[package]]
name = "networkx"
version = "3.3"
source = { registry = "https://download.pytorch.org/whl/cpu" }
wheels = [
{ url = "https://download.pytorch.org/whl/networkx-3.3-py3-none-any.whl", hash = "sha256:28575580c6ebdaf4505b22c6256a2b9de86b316dc63ba9e93abde3d78dfdbcf2" },
]
[[package]]
name = "numpy"
version = "2.1.2"
source = { registry = "https://download.pytorch.org/whl/cpu" }
wheels = [
{ url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:a84498e0d0a1174f2b3ed769b67b656aa5460c92c9554039e11f20a05650f00d" },
{ url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:4d6ec0d4222e8ffdab1744da2560f07856421b367928026fb540e1945f2eeeaf" },
{ url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:259ec80d54999cc34cd1eb8ded513cb053c3bf4829152a2e00de2371bd406f5e" },
{ url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-macosx_14_0_x86_64.whl", hash = "sha256:675c741d4739af2dc20cd6c6a5c4b7355c728167845e3c6b0e824e4e5d36a6c3" },
{ url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:05b2d4e667895cc55e3ff2b56077e4c8a5604361fc21a042845ea3ad67465aa8" },
{ url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:43cca367bf94a14aca50b89e9bc2061683116cfe864e56740e083392f533ce7a" },
{ url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313-win_amd64.whl", hash = "sha256:f2ded8d9b6f68cc26f8425eda5d3877b47343e68ca23d0d0846f4d312ecaa445" },
{ url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:2ffef621c14ebb0188a8633348504a35c13680d6da93ab5cb86f4e54b7e922b5" },
{ url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:ad369ed238b1959dfbade9018a740fb9392c5ac4f9b5173f420bd4f37ba1f7a0" },
{ url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:d82075752f40c0ddf57e6e02673a17f6cb0f8eb3f587f63ca1eaab5594da5b17" },
{ url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313t-macosx_14_0_x86_64.whl", hash = "sha256:1600068c262af1ca9580a527d43dc9d959b0b1d8e56f8a05d830eea39b7c8af6" },
{ url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a26ae94658d3ba3781d5e103ac07a876b3e9b29db53f68ed7df432fd033358a8" },
{ url = "https://download.pytorch.org/whl/numpy-2.1.2-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:13311c2db4c5f7609b462bc0f43d3c465424d25c626d95040f073e30f7570e35" },
]
[[package]]
name = "packaging"
version = "24.1"
source = { registry = "https://download.pytorch.org/whl/cpu" }
wheels = [
{ url = "https://download.pytorch.org/whl/packaging-24.1-py3-none-any.whl", hash = "sha256:5b8f2217dbdbd2f7f384c41c628544e6d52f2d0f53c6d0c3ea61aa5d1d7ff124" },
]
[[package]]
name = "pluggy"
version = "1.6.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/f9/e2/3e91f31a7d2b083fe6ef3fa267035b518369d9511ffab804f839851d2779/pluggy-1.6.0.tar.gz", hash = "sha256:7dcc130b76258d33b90f61b658791dede3486c3e6bfb003ee5c9bfb396dd22f3", size = 69412, upload-time = "2025-05-15T12:30:07.975Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/54/20/4d324d65cc6d9205fabedc306948156824eb9f0ee1633355a8f7ec5c66bf/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538, upload-time = "2025-05-15T12:30:06.134Z" },
]
[[package]]
name = "pydantic"
version = "2.11.7"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "annotated-types" },
{ name = "pydantic-core" },
{ name = "typing-extensions" },
{ name = "typing-inspection" },
]
sdist = { url = "https://files.pythonhosted.org/packages/00/dd/4325abf92c39ba8623b5af936ddb36ffcfe0beae70405d456ab1fb2f5b8c/pydantic-2.11.7.tar.gz", hash = "sha256:d989c3c6cb79469287b1569f7447a17848c998458d49ebe294e975b9baf0f0db", size = 788350, upload-time = "2025-06-14T08:33:17.137Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/6a/c0/ec2b1c8712ca690e5d61979dee872603e92b8a32f94cc1b72d53beab008a/pydantic-2.11.7-py3-none-any.whl", hash = "sha256:dde5df002701f6de26248661f6835bbe296a47bf73990135c7d07ce741b9623b", size = 444782, upload-time = "2025-06-14T08:33:14.905Z" },
]
[[package]]
name = "pydantic-core"
version = "2.33.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/ad/88/5f2260bdfae97aabf98f1778d43f69574390ad787afb646292a638c923d4/pydantic_core-2.33.2.tar.gz", hash = "sha256:7cb8bc3605c29176e1b105350d2e6474142d7c1bd1d9327c4a9bdb46bf827acc", size = 435195, upload-time = "2025-04-23T18:33:52.104Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/46/8c/99040727b41f56616573a28771b1bfa08a3d3fe74d3d513f01251f79f172/pydantic_core-2.33.2-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:1082dd3e2d7109ad8b7da48e1d4710c8d06c253cbc4a27c1cff4fbcaa97a9e3f", size = 2015688, upload-time = "2025-04-23T18:31:53.175Z" },
{ url = "https://files.pythonhosted.org/packages/3a/cc/5999d1eb705a6cefc31f0b4a90e9f7fc400539b1a1030529700cc1b51838/pydantic_core-2.33.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:f517ca031dfc037a9c07e748cefd8d96235088b83b4f4ba8939105d20fa1dcd6", size = 1844808, upload-time = "2025-04-23T18:31:54.79Z" },
{ url = "https://files.pythonhosted.org/packages/6f/5e/a0a7b8885c98889a18b6e376f344da1ef323d270b44edf8174d6bce4d622/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0a9f2c9dd19656823cb8250b0724ee9c60a82f3cdf68a080979d13092a3b0fef", size = 1885580, upload-time = "2025-04-23T18:31:57.393Z" },
{ url = "https://files.pythonhosted.org/packages/3b/2a/953581f343c7d11a304581156618c3f592435523dd9d79865903272c256a/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:2b0a451c263b01acebe51895bfb0e1cc842a5c666efe06cdf13846c7418caa9a", size = 1973859, upload-time = "2025-04-23T18:31:59.065Z" },
{ url = "https://files.pythonhosted.org/packages/e6/55/f1a813904771c03a3f97f676c62cca0c0a4138654107c1b61f19c644868b/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1ea40a64d23faa25e62a70ad163571c0b342b8bf66d5fa612ac0dec4f069d916", size = 2120810, upload-time = "2025-04-23T18:32:00.78Z" },
{ url = "https://files.pythonhosted.org/packages/aa/c3/053389835a996e18853ba107a63caae0b9deb4a276c6b472931ea9ae6e48/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0fb2d542b4d66f9470e8065c5469ec676978d625a8b7a363f07d9a501a9cb36a", size = 2676498, upload-time = "2025-04-23T18:32:02.418Z" },
{ url = "https://files.pythonhosted.org/packages/eb/3c/f4abd740877a35abade05e437245b192f9d0ffb48bbbbd708df33d3cda37/pydantic_core-2.33.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9fdac5d6ffa1b5a83bca06ffe7583f5576555e6c8b3a91fbd25ea7780f825f7d", size = 2000611, upload-time = "2025-04-23T18:32:04.152Z" },
{ url = "https://files.pythonhosted.org/packages/59/a7/63ef2fed1837d1121a894d0ce88439fe3e3b3e48c7543b2a4479eb99c2bd/pydantic_core-2.33.2-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:04a1a413977ab517154eebb2d326da71638271477d6ad87a769102f7c2488c56", size = 2107924, upload-time = "2025-04-23T18:32:06.129Z" },
{ url = "https://files.pythonhosted.org/packages/04/8f/2551964ef045669801675f1cfc3b0d74147f4901c3ffa42be2ddb1f0efc4/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:c8e7af2f4e0194c22b5b37205bfb293d166a7344a5b0d0eaccebc376546d77d5", size = 2063196, upload-time = "2025-04-23T18:32:08.178Z" },
{ url = "https://files.pythonhosted.org/packages/26/bd/d9602777e77fc6dbb0c7db9ad356e9a985825547dce5ad1d30ee04903918/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_armv7l.whl", hash = "sha256:5c92edd15cd58b3c2d34873597a1e20f13094f59cf88068adb18947df5455b4e", size = 2236389, upload-time = "2025-04-23T18:32:10.242Z" },
{ url = "https://files.pythonhosted.org/packages/42/db/0e950daa7e2230423ab342ae918a794964b053bec24ba8af013fc7c94846/pydantic_core-2.33.2-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:65132b7b4a1c0beded5e057324b7e16e10910c106d43675d9bd87d4f38dde162", size = 2239223, upload-time = "2025-04-23T18:32:12.382Z" },
{ url = "https://files.pythonhosted.org/packages/58/4d/4f937099c545a8a17eb52cb67fe0447fd9a373b348ccfa9a87f141eeb00f/pydantic_core-2.33.2-cp313-cp313-win32.whl", hash = "sha256:52fb90784e0a242bb96ec53f42196a17278855b0f31ac7c3cc6f5c1ec4811849", size = 1900473, upload-time = "2025-04-23T18:32:14.034Z" },
{ url = "https://files.pythonhosted.org/packages/a0/75/4a0a9bac998d78d889def5e4ef2b065acba8cae8c93696906c3a91f310ca/pydantic_core-2.33.2-cp313-cp313-win_amd64.whl", hash = "sha256:c083a3bdd5a93dfe480f1125926afcdbf2917ae714bdb80b36d34318b2bec5d9", size = 1955269, upload-time = "2025-04-23T18:32:15.783Z" },
{ url = "https://files.pythonhosted.org/packages/f9/86/1beda0576969592f1497b4ce8e7bc8cbdf614c352426271b1b10d5f0aa64/pydantic_core-2.33.2-cp313-cp313-win_arm64.whl", hash = "sha256:e80b087132752f6b3d714f041ccf74403799d3b23a72722ea2e6ba2e892555b9", size = 1893921, upload-time = "2025-04-23T18:32:18.473Z" },
{ url = "https://files.pythonhosted.org/packages/a4/7d/e09391c2eebeab681df2b74bfe6c43422fffede8dc74187b2b0bf6fd7571/pydantic_core-2.33.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:61c18fba8e5e9db3ab908620af374db0ac1baa69f0f32df4f61ae23f15e586ac", size = 1806162, upload-time = "2025-04-23T18:32:20.188Z" },
{ url = "https://files.pythonhosted.org/packages/f1/3d/847b6b1fed9f8ed3bb95a9ad04fbd0b212e832d4f0f50ff4d9ee5a9f15cf/pydantic_core-2.33.2-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:95237e53bb015f67b63c91af7518a62a8660376a6a0db19b89acc77a4d6199f5", size = 1981560, upload-time = "2025-04-23T18:32:22.354Z" },
{ url = "https://files.pythonhosted.org/packages/6f/9a/e73262f6c6656262b5fdd723ad90f518f579b7bc8622e43a942eec53c938/pydantic_core-2.33.2-cp313-cp313t-win_amd64.whl", hash = "sha256:c2fc0a768ef76c15ab9238afa6da7f69895bb5d1ee83aeea2e3509af4472d0b9", size = 1935777, upload-time = "2025-04-23T18:32:25.088Z" },
]
[[package]]
name = "pygments"
version = "2.19.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/b0/77/a5b8c569bf593b0140bde72ea885a803b82086995367bf2037de0159d924/pygments-2.19.2.tar.gz", hash = "sha256:636cb2477cec7f8952536970bc533bc43743542f70392ae026374600add5b887", size = 4968631, upload-time = "2025-06-21T13:39:12.283Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/c7/21/705964c7812476f378728bdf590ca4b771ec72385c533964653c68e86bdc/pygments-2.19.2-py3-none-any.whl", hash = "sha256:86540386c03d588bb81d44bc3928634ff26449851e99741617ecb9037ee5ec0b", size = 1225217, upload-time = "2025-06-21T13:39:07.939Z" },
]
[[package]]
name = "pytest"
version = "8.4.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "colorama", marker = "sys_platform == 'win32'" },
{ name = "iniconfig" },
{ name = "packaging" },
{ name = "pluggy" },
{ name = "pygments" },
]
sdist = { url = "https://files.pythonhosted.org/packages/08/ba/45911d754e8eba3d5a841a5ce61a65a685ff1798421ac054f85aa8747dfb/pytest-8.4.1.tar.gz", hash = "sha256:7c67fd69174877359ed9371ec3af8a3d2b04741818c51e5e99cc1742251fa93c", size = 1517714, upload-time = "2025-06-18T05:48:06.109Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/29/16/c8a903f4c4dffe7a12843191437d7cd8e32751d5de349d45d3fe69544e87/pytest-8.4.1-py3-none-any.whl", hash = "sha256:539c70ba6fcead8e78eebbf1115e8b589e7565830d7d006a8723f19ac8a0afb7", size = 365474, upload-time = "2025-06-18T05:48:03.955Z" },
]
[[package]]
name = "pyyaml"
version = "6.0.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/54/ed/79a089b6be93607fa5cdaedf301d7dfb23af5f25c398d5ead2525b063e17/pyyaml-6.0.2.tar.gz", hash = "sha256:d584d9ec91ad65861cc08d42e834324ef890a082e591037abe114850ff7bbc3e", size = 130631, upload-time = "2024-08-06T20:33:50.674Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/ef/e3/3af305b830494fa85d95f6d95ef7fa73f2ee1cc8ef5b495c7c3269fb835f/PyYAML-6.0.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:efdca5630322a10774e8e98e1af481aad470dd62c3170801852d752aa7a783ba", size = 181309, upload-time = "2024-08-06T20:32:43.4Z" },
{ url = "https://files.pythonhosted.org/packages/45/9f/3b1c20a0b7a3200524eb0076cc027a970d320bd3a6592873c85c92a08731/PyYAML-6.0.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:50187695423ffe49e2deacb8cd10510bc361faac997de9efef88badc3bb9e2d1", size = 171679, upload-time = "2024-08-06T20:32:44.801Z" },
{ url = "https://files.pythonhosted.org/packages/7c/9a/337322f27005c33bcb656c655fa78325b730324c78620e8328ae28b64d0c/PyYAML-6.0.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0ffe8360bab4910ef1b9e87fb812d8bc0a308b0d0eef8c8f44e0254ab3b07133", size = 733428, upload-time = "2024-08-06T20:32:46.432Z" },
{ url = "https://files.pythonhosted.org/packages/a3/69/864fbe19e6c18ea3cc196cbe5d392175b4cf3d5d0ac1403ec3f2d237ebb5/PyYAML-6.0.2-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:17e311b6c678207928d649faa7cb0d7b4c26a0ba73d41e99c4fff6b6c3276484", size = 763361, upload-time = "2024-08-06T20:32:51.188Z" },
{ url = "https://files.pythonhosted.org/packages/04/24/b7721e4845c2f162d26f50521b825fb061bc0a5afcf9a386840f23ea19fa/PyYAML-6.0.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:70b189594dbe54f75ab3a1acec5f1e3faa7e8cf2f1e08d9b561cb41b845f69d5", size = 759523, upload-time = "2024-08-06T20:32:53.019Z" },
{ url = "https://files.pythonhosted.org/packages/2b/b2/e3234f59ba06559c6ff63c4e10baea10e5e7df868092bf9ab40e5b9c56b6/PyYAML-6.0.2-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:41e4e3953a79407c794916fa277a82531dd93aad34e29c2a514c2c0c5fe971cc", size = 726660, upload-time = "2024-08-06T20:32:54.708Z" },
{ url = "https://files.pythonhosted.org/packages/fe/0f/25911a9f080464c59fab9027482f822b86bf0608957a5fcc6eaac85aa515/PyYAML-6.0.2-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:68ccc6023a3400877818152ad9a1033e3db8625d899c72eacb5a668902e4d652", size = 751597, upload-time = "2024-08-06T20:32:56.985Z" },
{ url = "https://files.pythonhosted.org/packages/14/0d/e2c3b43bbce3cf6bd97c840b46088a3031085179e596d4929729d8d68270/PyYAML-6.0.2-cp313-cp313-win32.whl", hash = "sha256:bc2fa7c6b47d6bc618dd7fb02ef6fdedb1090ec036abab80d4681424b84c1183", size = 140527, upload-time = "2024-08-06T20:33:03.001Z" },
{ url = "https://files.pythonhosted.org/packages/fa/de/02b54f42487e3d3c6efb3f89428677074ca7bf43aae402517bc7cca949f3/PyYAML-6.0.2-cp313-cp313-win_amd64.whl", hash = "sha256:8388ee1976c416731879ac16da0aff3f63b286ffdd57cdeb95f3f2e085687563", size = 156446, upload-time = "2024-08-06T20:33:04.33Z" },
]
[[package]]
name = "ruff"
version = "0.12.7"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/a1/81/0bd3594fa0f690466e41bd033bdcdf86cba8288345ac77ad4afbe5ec743a/ruff-0.12.7.tar.gz", hash = "sha256:1fc3193f238bc2d7968772c82831a4ff69252f673be371fb49663f0068b7ec71", size = 5197814, upload-time = "2025-07-29T22:32:35.877Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/e1/d2/6cb35e9c85e7a91e8d22ab32ae07ac39cc34a71f1009a6f9e4a2a019e602/ruff-0.12.7-py3-none-linux_armv6l.whl", hash = "sha256:76e4f31529899b8c434c3c1dede98c4483b89590e15fb49f2d46183801565303", size = 11852189, upload-time = "2025-07-29T22:31:41.281Z" },
{ url = "https://files.pythonhosted.org/packages/63/5b/a4136b9921aa84638f1a6be7fb086f8cad0fde538ba76bda3682f2599a2f/ruff-0.12.7-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:789b7a03e72507c54fb3ba6209e4bb36517b90f1a3569ea17084e3fd295500fb", size = 12519389, upload-time = "2025-07-29T22:31:54.265Z" },
{ url = "https://files.pythonhosted.org/packages/a8/c9/3e24a8472484269b6b1821794141f879c54645a111ded4b6f58f9ab0705f/ruff-0.12.7-py3-none-macosx_11_0_arm64.whl", hash = "sha256:2e1c2a3b8626339bb6369116e7030a4cf194ea48f49b64bb505732a7fce4f4e3", size = 11743384, upload-time = "2025-07-29T22:31:59.575Z" },
{ url = "https://files.pythonhosted.org/packages/26/7c/458dd25deeb3452c43eaee853c0b17a1e84169f8021a26d500ead77964fd/ruff-0.12.7-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:32dec41817623d388e645612ec70d5757a6d9c035f3744a52c7b195a57e03860", size = 11943759, upload-time = "2025-07-29T22:32:01.95Z" },
{ url = "https://files.pythonhosted.org/packages/7f/8b/658798472ef260ca050e400ab96ef7e85c366c39cf3dfbef4d0a46a528b6/ruff-0.12.7-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:47ef751f722053a5df5fa48d412dbb54d41ab9b17875c6840a58ec63ff0c247c", size = 11654028, upload-time = "2025-07-29T22:32:04.367Z" },
{ url = "https://files.pythonhosted.org/packages/a8/86/9c2336f13b2a3326d06d39178fd3448dcc7025f82514d1b15816fe42bfe8/ruff-0.12.7-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:a828a5fc25a3efd3e1ff7b241fd392686c9386f20e5ac90aa9234a5faa12c423", size = 13225209, upload-time = "2025-07-29T22:32:06.952Z" },
{ url = "https://files.pythonhosted.org/packages/76/69/df73f65f53d6c463b19b6b312fd2391dc36425d926ec237a7ed028a90fc1/ruff-0.12.7-py3-none-manylinux_2_17_ppc64.manylinux2014_ppc64.whl", hash = "sha256:5726f59b171111fa6a69d82aef48f00b56598b03a22f0f4170664ff4d8298efb", size = 14182353, upload-time = "2025-07-29T22:32:10.053Z" },
{ url = "https://files.pythonhosted.org/packages/58/1e/de6cda406d99fea84b66811c189b5ea139814b98125b052424b55d28a41c/ruff-0.12.7-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:74e6f5c04c4dd4aba223f4fe6e7104f79e0eebf7d307e4f9b18c18362124bccd", size = 13631555, upload-time = "2025-07-29T22:32:12.644Z" },
{ url = "https://files.pythonhosted.org/packages/6f/ae/625d46d5164a6cc9261945a5e89df24457dc8262539ace3ac36c40f0b51e/ruff-0.12.7-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:5d0bfe4e77fba61bf2ccadf8cf005d6133e3ce08793bbe870dd1c734f2699a3e", size = 12667556, upload-time = "2025-07-29T22:32:15.312Z" },
{ url = "https://files.pythonhosted.org/packages/55/bf/9cb1ea5e3066779e42ade8d0cd3d3b0582a5720a814ae1586f85014656b6/ruff-0.12.7-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:06bfb01e1623bf7f59ea749a841da56f8f653d641bfd046edee32ede7ff6c606", size = 12939784, upload-time = "2025-07-29T22:32:17.69Z" },
{ url = "https://files.pythonhosted.org/packages/55/7f/7ead2663be5627c04be83754c4f3096603bf5e99ed856c7cd29618c691bd/ruff-0.12.7-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:e41df94a957d50083fd09b916d6e89e497246698c3f3d5c681c8b3e7b9bb4ac8", size = 11771356, upload-time = "2025-07-29T22:32:20.134Z" },
{ url = "https://files.pythonhosted.org/packages/17/40/a95352ea16edf78cd3a938085dccc55df692a4d8ba1b3af7accbe2c806b0/ruff-0.12.7-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:4000623300563c709458d0ce170c3d0d788c23a058912f28bbadc6f905d67afa", size = 11612124, upload-time = "2025-07-29T22:32:22.645Z" },
{ url = "https://files.pythonhosted.org/packages/4d/74/633b04871c669e23b8917877e812376827c06df866e1677f15abfadc95cb/ruff-0.12.7-py3-none-musllinux_1_2_i686.whl", hash = "sha256:69ffe0e5f9b2cf2b8e289a3f8945b402a1b19eff24ec389f45f23c42a3dd6fb5", size = 12479945, upload-time = "2025-07-29T22:32:24.765Z" },
{ url = "https://files.pythonhosted.org/packages/be/34/c3ef2d7799c9778b835a76189c6f53c179d3bdebc8c65288c29032e03613/ruff-0.12.7-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:a07a5c8ffa2611a52732bdc67bf88e243abd84fe2d7f6daef3826b59abbfeda4", size = 12998677, upload-time = "2025-07-29T22:32:27.022Z" },
{ url = "https://files.pythonhosted.org/packages/77/ab/aca2e756ad7b09b3d662a41773f3edcbd262872a4fc81f920dc1ffa44541/ruff-0.12.7-py3-none-win32.whl", hash = "sha256:c928f1b2ec59fb77dfdf70e0419408898b63998789cc98197e15f560b9e77f77", size = 11756687, upload-time = "2025-07-29T22:32:29.381Z" },
{ url = "https://files.pythonhosted.org/packages/b4/71/26d45a5042bc71db22ddd8252ca9d01e9ca454f230e2996bb04f16d72799/ruff-0.12.7-py3-none-win_amd64.whl", hash = "sha256:9c18f3d707ee9edf89da76131956aba1270c6348bfee8f6c647de841eac7194f", size = 12912365, upload-time = "2025-07-29T22:32:31.517Z" },
{ url = "https://files.pythonhosted.org/packages/4c/9b/0b8aa09817b63e78d94b4977f18b1fcaead3165a5ee49251c5d5c245bb2d/ruff-0.12.7-py3-none-win_arm64.whl", hash = "sha256:dfce05101dbd11833a0776716d5d1578641b7fddb537fe7fa956ab85d1769b69", size = 11982083, upload-time = "2025-07-29T22:32:33.881Z" },
]
[[package]]
name = "safetensors"
version = "0.6.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/6c/d2/94fe37355a1d4ff86b0f43b9a018515d5d29bf7ad6d01318a80f5db2fd6a/safetensors-0.6.1.tar.gz", hash = "sha256:a766ba6e19b198eff09be05f24cd89eda1670ed404ae828e2aa3fc09816ba8d8", size = 197968, upload-time = "2025-08-06T09:39:38.376Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/6b/c0/40263a2103511917f9a92b4e114ecaff68586df07f12d1d877312f1261f3/safetensors-0.6.1-cp38-abi3-macosx_10_12_x86_64.whl", hash = "sha256:81ed1b69d6f8acd7e759a71197ce3a69da4b7e9faa9dbb005eb06a83b1a4e52d", size = 455232, upload-time = "2025-08-06T09:39:32.037Z" },
{ url = "https://files.pythonhosted.org/packages/86/bf/432cb4bb1c336d338dd9b29f78622b1441ee06e5868bf1de2ca2bec74c08/safetensors-0.6.1-cp38-abi3-macosx_11_0_arm64.whl", hash = "sha256:01b51af8cb7a3870203f2735e3c7c24d1a65fb2846e75613c8cf9d284271eccc", size = 432150, upload-time = "2025-08-06T09:39:31.008Z" },
{ url = "https://files.pythonhosted.org/packages/05/d7/820c99032a53d57279ae199df7d114a8c9e2bbce4fa69bc0de53743495f0/safetensors-0.6.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:64a733886d79e726899b9d9643813e48a2eec49f3ef0fdb8cd4b8152046101c3", size = 471634, upload-time = "2025-08-06T09:39:22.17Z" },
{ url = "https://files.pythonhosted.org/packages/ea/8b/bcd960087eded7690f118ceeda294912f92a3b508a1d9a504f9c2e02041b/safetensors-0.6.1-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:f233dc3b12fb641b36724844754b6bb41349615a0e258087560968d6da92add5", size = 487855, upload-time = "2025-08-06T09:39:24.142Z" },
{ url = "https://files.pythonhosted.org/packages/41/64/b44eac4ad87c4e1c0cf5ba5e204c032b1b1eac8ce2b8f65f87791e647bd6/safetensors-0.6.1-cp38-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:6f16289e2af54affd591dd78ed12b5465e4dc5823f818beaeddd49a010cf3ba7", size = 607240, upload-time = "2025-08-06T09:39:25.463Z" },
{ url = "https://files.pythonhosted.org/packages/52/75/0347fa0c080af8bd3341af26a30b85939f6362d4f5240add1a0c9d793354/safetensors-0.6.1-cp38-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:1b62eab84e2c69918b598272504c5d2ebfe64da6c16fdf8682054eec9572534d", size = 519864, upload-time = "2025-08-06T09:39:26.872Z" },
{ url = "https://files.pythonhosted.org/packages/ea/f3/83843d1fe9164f44a267373c55cba706530b209b58415f807b40edddcd3e/safetensors-0.6.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d498363746555dccffc02a47dfe1dee70f7784f3f37f1d66b408366c5d3a989e", size = 485926, upload-time = "2025-08-06T09:39:29.109Z" },
{ url = "https://files.pythonhosted.org/packages/b8/26/f6b0cb5210bab0e343214fdba7c2df80a69b019e62e760ddc61b18bec383/safetensors-0.6.1-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:eed2079dca3ca948d7b0d7120396e776bbc6680637cf199d393e157fde25c937", size = 518999, upload-time = "2025-08-06T09:39:28.054Z" },
{ url = "https://files.pythonhosted.org/packages/90/b7/8910b165c97d3bd6d445c6ca8b704ec23d0fa33849ce9a51dc783827a302/safetensors-0.6.1-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:294040ff20ebe079a2b4976cfa9a5be0202f56ca4f7f190b4e52009e8c026ceb", size = 650669, upload-time = "2025-08-06T09:39:32.997Z" },
{ url = "https://files.pythonhosted.org/packages/00/bc/2eeb025381d0834ae038aae2d383dfa830c2e0068e2e4e512ea99b135a4b/safetensors-0.6.1-cp38-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:75693208b492a026b926edeebbae888cc644433bee4993573ead2dc44810b519", size = 750019, upload-time = "2025-08-06T09:39:34.397Z" },
{ url = "https://files.pythonhosted.org/packages/f9/38/5dda9a8e056eb1f17ed3a7846698fd94623a1648013cdf522538845755da/safetensors-0.6.1-cp38-abi3-musllinux_1_2_i686.whl", hash = "sha256:a8687b71ac67a0b3f8ce87df9e8024edf087e94c34ef46eaaad694dce8d2f83f", size = 689888, upload-time = "2025-08-06T09:39:35.584Z" },
{ url = "https://files.pythonhosted.org/packages/dd/60/15ee3961996d951002378d041bd82863a5c70738a71375b42d6dd5d2a6d3/safetensors-0.6.1-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:5dd969a01c738104f707fa0e306b757f5beb3ebdcd682fe0724170a0bf1c21fb", size = 655539, upload-time = "2025-08-06T09:39:37.093Z" },
{ url = "https://files.pythonhosted.org/packages/91/d6/01172a9c77c566800286d379bfc341d75370eae2118dfd339edfd0394c4a/safetensors-0.6.1-cp38-abi3-win32.whl", hash = "sha256:7c3d8d34d01673d1a917445c9437ee73a9d48bc6af10352b84bbd46c5da93ca5", size = 308594, upload-time = "2025-08-06T09:39:40.916Z" },
{ url = "https://files.pythonhosted.org/packages/6c/5d/195dc1917d7ae93dd990d9b2f8b9c88e451bcc78e0b63ee107beebc1e4be/safetensors-0.6.1-cp38-abi3-win_amd64.whl", hash = "sha256:4720957052d57c5ac48912c3f6e07e9a334d9632758c9b0c054afba477fcbe2d", size = 320282, upload-time = "2025-08-06T09:39:39.54Z" },
]
[[package]]
name = "setuptools"
version = "70.2.0"
source = { registry = "https://download.pytorch.org/whl/cpu" }
wheels = [
{ url = "https://download.pytorch.org/whl/setuptools-70.2.0-py3-none-any.whl", hash = "sha256:b8b8060bb426838fbe942479c90296ce976249451118ef566a5a0b7d8b78fb05" },
]
[[package]]
name = "sympy"
version = "1.13.3"
source = { registry = "https://download.pytorch.org/whl/cpu" }
dependencies = [
{ name = "mpmath" },
]
wheels = [
{ url = "https://download.pytorch.org/whl/sympy-1.13.3-py3-none-any.whl" },
]
[[package]]
name = "torch"
version = "2.8.0"
source = { registry = "https://download.pytorch.org/whl/cpu" }
resolution-markers = [
"sys_platform == 'darwin'",
]
dependencies = [
{ name = "filelock", marker = "sys_platform == 'darwin'" },
{ name = "fsspec", marker = "sys_platform == 'darwin'" },
{ name = "jinja2", marker = "sys_platform == 'darwin'" },
{ name = "networkx", marker = "sys_platform == 'darwin'" },
{ name = "setuptools", marker = "sys_platform == 'darwin'" },
{ name = "sympy", marker = "sys_platform == 'darwin'" },
{ name = "typing-extensions", marker = "sys_platform == 'darwin'" },
]
wheels = [
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:fbe2e149c5174ef90d29a5f84a554dfaf28e003cb4f61fa2c8c024c17ec7ca58" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp313-none-macosx_11_0_arm64.whl", hash = "sha256:057efd30a6778d2ee5e2374cd63a63f63311aa6f33321e627c655df60abdd390" },
]
[[package]]
name = "torch"
version = "2.8.0+cpu"
source = { registry = "https://download.pytorch.org/whl/cpu" }
resolution-markers = [
"sys_platform != 'darwin'",
]
dependencies = [
{ name = "filelock", marker = "sys_platform != 'darwin'" },
{ name = "fsspec", marker = "sys_platform != 'darwin'" },
{ name = "jinja2", marker = "sys_platform != 'darwin'" },
{ name = "networkx", marker = "sys_platform != 'darwin'" },
{ name = "setuptools", marker = "sys_platform != 'darwin'" },
{ name = "sympy", marker = "sys_platform != 'darwin'" },
{ name = "typing-extensions", marker = "sys_platform != 'darwin'" },
]
wheels = [
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-linux_s390x.whl", hash = "sha256:8b5882276633cf91fe3d2d7246c743b94d44a7e660b27f1308007fdb1bb89f7d" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:a5064b5e23772c8d164068cc7c12e01a75faf7b948ecd95a0d4007d7487e5f25" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:8f81dedb4c6076ec325acc3b47525f9c550e5284a18eae1d9061c543f7b6e7de" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-win_amd64.whl", hash = "sha256:e1ee1b2346ade3ea90306dfbec7e8ff17bc220d344109d189ae09078333b0856" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-win_arm64.whl", hash = "sha256:64c187345509f2b1bb334feed4666e2c781ca381874bde589182f81247e61f88" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:af81283ac671f434b1b25c95ba295f270e72db1fad48831eb5e4748ff9840041" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:a9dbb6f64f63258bc811e2c0c99640a81e5af93c531ad96e95c5ec777ea46dab" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-win_amd64.whl", hash = "sha256:6d93a7165419bc4b2b907e859ccab0dea5deeab261448ae9a5ec5431f14c0e64" },
]
[[package]]
name = "tqdm"
version = "4.66.5"
source = { registry = "https://download.pytorch.org/whl/cpu" }
dependencies = [
{ name = "colorama", marker = "sys_platform == 'win32'" },
]
wheels = [
{ url = "https://download.pytorch.org/whl/tqdm-4.66.5-py3-none-any.whl", hash = "sha256:90279a3770753eafc9194a0364852159802111925aa30eb3f9d85b0e805ac7cd" },
]
[[package]]
name = "typing-extensions"
version = "4.12.2"
source = { registry = "https://download.pytorch.org/whl/cpu" }
wheels = [
{ url = "https://download.pytorch.org/whl/typing_extensions-4.12.2-py3-none-any.whl", hash = "sha256:04e5ca0351e0f3f85c6853954072df659d0d13fac324d0072316b67d7794700d" },
]
[[package]]
name = "typing-inspection"
version = "0.4.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/f8/b1/0c11f5058406b3af7609f121aaa6b609744687f1d158b3c3a5bf4cc94238/typing_inspection-0.4.1.tar.gz", hash = "sha256:6ae134cc0203c33377d43188d4064e9b357dba58cff3185f22924610e70a9d28", size = 75726, upload-time = "2025-05-21T18:55:23.885Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/17/69/cd203477f944c353c31bade965f880aa1061fd6bf05ded0726ca845b6ff7/typing_inspection-0.4.1-py3-none-any.whl", hash = "sha256:389055682238f53b04f7badcb49b989835495a96700ced5dab2d8feae4b26f51", size = 14552, upload-time = "2025-05-21T18:55:22.152Z" },
]
[[package]]
name = "uv"
version = "0.8.5"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/83/94/e18a40fe6f6d724c1fbf2c9328806359e341710b2fd42dc928a1a8fc636b/uv-0.8.5.tar.gz", hash = "sha256:078cf2935062d5b61816505f9d6f30b0221943a1433b4a1de8f31a1dfe55736b", size = 3451272, upload-time = "2025-08-05T20:50:21.159Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/d9/b9/78cde56283b6b9a8a84b0bf9334442ed75a843310229aaf7f1a71fe67818/uv-0.8.5-py3-none-linux_armv6l.whl", hash = "sha256:e236372a260e312aef5485a0e5819a0ec16c9197af06d162ad5a3e8bd62f9bba", size = 18146198, upload-time = "2025-08-05T20:49:18.859Z" },
{ url = "https://files.pythonhosted.org/packages/ed/83/5deda1a19362ce426da7f9cc4764a0dd57e665ecbaddd9900d4200bc10ab/uv-0.8.5-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:53a40628329e543a5c5414553f5898131d5c1c6f963708cb0afc2ecf3e8d8167", size = 18242690, upload-time = "2025-08-05T20:49:23.409Z" },
{ url = "https://files.pythonhosted.org/packages/06/6e/80b08ee544728317d9c8003d4c10234007e12f384da1c3dfe579489833c9/uv-0.8.5-py3-none-macosx_11_0_arm64.whl", hash = "sha256:43a689027696bc9c62e6da3f06900c52eafc4debbf4fba9ecb906196730b34c8", size = 16913881, upload-time = "2025-08-05T20:49:26.631Z" },
{ url = "https://files.pythonhosted.org/packages/34/f6/47a44dabfc25b598ea6f2ab9aa32ebf1cbd87ed8af18ccde6c5d36f35476/uv-0.8.5-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.musllinux_1_1_aarch64.whl", hash = "sha256:a34d783f5cef00f1918357c0cd9226666e22640794e9e3862820abf4ee791141", size = 17527439, upload-time = "2025-08-05T20:49:30.464Z" },
{ url = "https://files.pythonhosted.org/packages/ef/7d/ee7c2514e064412133ee9f01c4c42de20da24617b8c25d81cf7021b774d8/uv-0.8.5-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:2140383bc25228281090cc34c00500d8e5822877c955f691d69bbf967e8efa73", size = 17833275, upload-time = "2025-08-05T20:49:33.783Z" },
{ url = "https://files.pythonhosted.org/packages/f9/e7/5233cf5cbcca8ea65aa1f1e48bf210dc9773fb86b8104ffbc523be7f6a3f/uv-0.8.5-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6b449779ff463b059504dc30316a634f810149e02482ce36ea35daea8f6ce7af", size = 18568916, upload-time = "2025-08-05T20:49:37.031Z" },
{ url = "https://files.pythonhosted.org/packages/d8/54/6cabb2a0347c51c8366ca3bffeeebd7f829a15f6b29ad20f51fd5ca9c4bd/uv-0.8.5-py3-none-manylinux_2_17_ppc64.manylinux2014_ppc64.whl", hash = "sha256:a7f8739d05cc513eee2f1f8a7e6c482a9c1e8860d77cd078d1ea7c3fe36d7a65", size = 19993334, upload-time = "2025-08-05T20:49:40.361Z" },
{ url = "https://files.pythonhosted.org/packages/3c/7a/b84d994d52f20bc56229840c31e77aff4653e5902ea7b7c2616e9381b5b8/uv-0.8.5-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:62ebbd22f780ba2585690332765caf9e29c9758e48a678148e8b1ea90580cdb9", size = 19643358, upload-time = "2025-08-05T20:49:43.955Z" },
{ url = "https://files.pythonhosted.org/packages/c8/f1/7552f2bea528456d34bc245f2959ce910631e01571c4b7ea421ead9a9fc6/uv-0.8.5-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:4f8dd0555f05d66ff46fdab551137cc2b1ea9c5363358913e2af175e367f4398", size = 18947757, upload-time = "2025-08-05T20:49:47.381Z" },
{ url = "https://files.pythonhosted.org/packages/57/9b/46aadd186a1e16a23cd0701dda0e640197db49a3add074a47231fed45a4f/uv-0.8.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:38c04408ad5eae7a178a1e3b0e09afeb436d0c97075530a3c82de453b78d0448", size = 18906135, upload-time = "2025-08-05T20:49:50.985Z" },
{ url = "https://files.pythonhosted.org/packages/c0/31/6661adedaba9ebac8bb449ec9901f8cbf124fa25e0db3a9e6cf3053cee88/uv-0.8.5-py3-none-manylinux_2_28_aarch64.whl", hash = "sha256:73e772caf7310af4b21eaf8c25531b934391f1e84f3afa8e67822d7c432f6dad", size = 17787943, upload-time = "2025-08-05T20:49:54.59Z" },
{ url = "https://files.pythonhosted.org/packages/11/f2/73fb5c3156fdae830b83edec2f430db84cb4bc4b78f61d21694bd59004cb/uv-0.8.5-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:3ddd7d8c01073f23ba2a4929ab246adb30d4f8a55c5e007ad7c8341f7bf06978", size = 18675864, upload-time = "2025-08-05T20:49:57.87Z" },
{ url = "https://files.pythonhosted.org/packages/b5/29/774c6f174c53d68ae9a51c2fabf1b09003b93a53c24591a108be0dc338d7/uv-0.8.5-py3-none-musllinux_1_1_armv7l.whl", hash = "sha256:7d601f021cbc179320ea3a75cd1d91bd49af03d2a630c4d04ebd38ff6b87d419", size = 17808770, upload-time = "2025-08-05T20:50:01.566Z" },
{ url = "https://files.pythonhosted.org/packages/a9/b0/5d164ce84691f5018c5832e9e3371c0196631b1f1025474a179de1d6a70a/uv-0.8.5-py3-none-musllinux_1_1_i686.whl", hash = "sha256:6ee97b7299990026619c20e30e253972c6c0fb6fba4f5658144e62aa1c07785a", size = 18076516, upload-time = "2025-08-05T20:50:04.94Z" },
{ url = "https://files.pythonhosted.org/packages/d1/73/4d8baefb4f4b07df6a4db7bbd604cb361d4f5215b94d3f66553ea26edfd4/uv-0.8.5-py3-none-musllinux_1_1_x86_64.whl", hash = "sha256:09804055d6346febf0767767c04bdd2fab7d911535639f9c18de2ea744b2954c", size = 19031195, upload-time = "2025-08-05T20:50:08.211Z" },
{ url = "https://files.pythonhosted.org/packages/44/2a/3d074391df2c16c79fc6bf333e4bde75662e64dac465050a03391c75b289/uv-0.8.5-py3-none-win32.whl", hash = "sha256:6362a2e1fa535af0e4c0a01f83e666a4d5f9024d808f9e64e3b6ef07c97aff54", size = 18026273, upload-time = "2025-08-05T20:50:11.868Z" },
{ url = "https://files.pythonhosted.org/packages/3c/2f/e850d3e745ccd1125b7a48898421824700fd3e996d27d835139160650124/uv-0.8.5-py3-none-win_amd64.whl", hash = "sha256:dd89836735860461c3a5563731e77c011d1831f14ada540f94bf1a7011dbea14", size = 19822158, upload-time = "2025-08-05T20:50:15.428Z" },
{ url = "https://files.pythonhosted.org/packages/6f/df/e5565b3faf2c6147a877ab7e96ef31e2333f08c5138a98ce77003b1bf65e/uv-0.8.5-py3-none-win_arm64.whl", hash = "sha256:37c1a22915392014d8b4ade9e69e157c8e5ccdf32f37070a84f749a708268335", size = 18430102, upload-time = "2025-08-05T20:50:18.785Z" },
]