llm-gguf-tools/docs/development.md

# Development Guide

Contributing to GGUF tools requires understanding quantisation workflows and Python's modern
dependency ecosystem. This guide covers setup, standards, and architectural decisions for fixing
bugs, adding quantisation profiles, or extending conversion capabilities.

## Code Quality

Ruff replaces the traditional Black/isort/flake8 stack as both linter and formatter. Mypy provides
static type checking to catch type-related bugs before runtime. Zero tolerance for linting and type
errors catches issues early. Both tools have extensive configuration in `pyproject.toml` to enforce
only the important code quality standards we've selected. Debug logging reveals quantisation internals
when models fail.

```bash
# Run linting - catches style violations, potential bugs, and code smells
uvx ruff check

# Format code - enforces consistent style automatically
uvx ruff format

# Run type checking - ensures type safety and catches potential bugs
uv run mypy .

# Run with debug logging - reveals conversion steps and tensor processing
DEBUG=true uv run <script>
```

## Project Structure

Architecture separates concerns cleanly: top-level scripts provide interfaces, helpers encapsulate
reusable logic, resources contain community data. Structure evolved from practical needs – helpers
emerged to eliminate duplication, services to abstract external dependencies.

```plain
llm-gguf-tools/
├── quantise.py                    # Bartowski quantisation tool - the main workflow
├── direct_safetensors_to_gguf.py  # Direct conversion for unsupported architectures
├── helpers/                       # Shared utilities and abstractions
│   ├── __init__.py
│   ├── logger.py                  # Colour-coded logging with context awareness
│   ├── services/                  # External service wrappers
│   │   ├── gguf.py                # GGUF writer abstraction
│   │   └── llama_python.py        # llama-cpp-python integration
│   └── utils/                     # Pure utility functions
│       ├── config_parser.py       # Model configuration handling
│       └── tensor_mapping.py      # Architecture-specific tensor name mapping
├── resources/                     # Resource files and calibration data
│   └── imatrix_data.txt           # Curated calibration data from Bartowski
├── docs/                          # Detailed documentation
│   ├── quantise_gguf.md           # Quantisation strategies and profiles
│   ├── safetensors2gguf.md        # Direct conversion documentation
│   ├── bartowski_analysis.md      # Deep dive into variant strategies
│   ├── imatrix_data.md            # Importance matrix guide
│   └── development.md             # This guide
└── pyproject.toml                 # Modern Python project configuration
```

## Contributing Guidelines

The project values pragmatic solutions over theoretical perfection – working code that handles edge
cases beats elegant abstractions. Contributors should understand how quantisation profiles map to
Bartowski's discoveries and where Python-C++ boundaries limit functionality.

Essential requirements:

1. **Style consistency**: Run `uvx ruff format` before committing to keep diffs focused on logic
2. **Documentation**: Google-style docstrings explain behaviour and rationale beyond type hints
3. **Type safety**: Complete type hints for all public functions enable IDE support
4. **Practical testing**: Test with both 1B and 7B+ models to catch scaling issues

## Development Workflow

### Setting Up Development Environment

The project uses `uv` for dependency management – Rust-fast, automatic Python version management,
upfront dependency resolution. Development dependencies include ruff, type stubs, and optional
PyTorch for BFloat16 handling.

```bash
# Clone the repository - uses Forgejo (GitLab-like) hosting
git clone https://git.tomfos.tr/tom/llm-gguf-tools.git
cd llm-gguf-tools

# Install all dependencies including dev tools
# This installs llama-cpp-python with CUDA support if available
uv sync --all-groups
```

### Code Style

Code style reduces cognitive load by letting reviewers focus on logic rather than layout. UK English
maintains llama.cpp consistency. The 100-character line limit balances descriptive names with
readability.

Core conventions:

- **PEP 8 compliance**: Ruff catches mutable defaults, unused imports automatically
- **UK English**: "Optimise" not "optimize", matching upstream llama.cpp
- **Line length**: 100 characters maximum except URLs or unbreakable paths
- **Type annotations**: Complete hints for public functions – documentation that can't go stale
- **Import ordering**: Standard library, third-party, local – ruff handles automatically

### Testing

Formal tests pending. Quantisation "correctness" depends on complex interactions between model
architecture, strategy, and downstream usage. Benchmark performance doesn't guarantee production
success.

Current validation approach:

- **End-to-end testing**: Qwen 0.5B for quick iteration, Llama 3.2 1B for architecture compatibility
- **Output validation**: GGUF must load in llama.cpp and degrade gracefully, not produce gibberish
- **Error handling**: Test corrupted SafeTensors, missing configs, insufficient disk space
- **Logger consistency**: Verify colour coding across terminals, progress bars with piped output

### Debugging

Debug logging transforms black box to glass box, revealing failure points. Colour coding highlights
stages: blue (info), yellow (warnings), red (errors), green (success). Visual hierarchy enables
efficient log scanning.

```bash
# Enable comprehensive debug output
DEBUG=true uv run direct_safetensors_to_gguf.py ./model  # Tensor mapping details
DEBUG=true uv run quantise.py <model_url>                # Memory usage tracking
```

Debug output reveals:

- **Download progress**: Bytes transferred, retries, connection issues
- **Conversion pipeline**: SafeTensors→GGUF steps, tensor mappings, dimension changes
- **Quantisation decisions**: Layer bit depths, importance matrix effects on weight selection
- **Memory usage**: Peak consumption for predicting larger model requirements
- **File operations**: Read/write/temp patterns for disk usage analysis
- **Error context**: Stack traces with local variables at failure points