136 lines
6.3 KiB
Markdown
136 lines
6.3 KiB
Markdown
# Development Guide
|
||
|
||
Contributing to GGUF tools requires understanding quantisation workflows and Python's modern
|
||
dependency ecosystem. This guide covers setup, standards, and architectural decisions for fixing
|
||
bugs, adding quantisation profiles, or extending conversion capabilities.
|
||
|
||
## Code Quality
|
||
|
||
Ruff replaces the traditional Black/isort/flake8 stack as both linter and formatter. Mypy provides
|
||
static type checking to catch type-related bugs before runtime. Zero tolerance for linting and type
|
||
errors catches issues early. Both tools have extensive configuration in `pyproject.toml` to enforce
|
||
only the important code quality standards we've selected. Debug logging reveals quantisation internals
|
||
when models fail.
|
||
|
||
```bash
|
||
# Run linting - catches style violations, potential bugs, and code smells
|
||
uvx ruff check
|
||
|
||
# Format code - enforces consistent style automatically
|
||
uvx ruff format
|
||
|
||
# Run type checking - ensures type safety and catches potential bugs
|
||
uv run mypy .
|
||
|
||
# Run with debug logging - reveals conversion steps and tensor processing
|
||
DEBUG=true uv run <script>
|
||
```
|
||
|
||
## Project Structure
|
||
|
||
Architecture separates concerns cleanly: top-level scripts provide interfaces, helpers encapsulate
|
||
reusable logic, resources contain community data. Structure evolved from practical needs – helpers
|
||
emerged to eliminate duplication, services to abstract external dependencies.
|
||
|
||
```plain
|
||
llm-gguf-tools/
|
||
├── quantise.py # Bartowski quantisation tool - the main workflow
|
||
├── direct_safetensors_to_gguf.py # Direct conversion for unsupported architectures
|
||
├── helpers/ # Shared utilities and abstractions
|
||
│ ├── __init__.py
|
||
│ ├── logger.py # Colour-coded logging with context awareness
|
||
│ ├── services/ # External service wrappers
|
||
│ │ ├── gguf.py # GGUF writer abstraction
|
||
│ │ └── llama_python.py # llama-cpp-python integration
|
||
│ └── utils/ # Pure utility functions
|
||
│ ├── config_parser.py # Model configuration handling
|
||
│ └── tensor_mapping.py # Architecture-specific tensor name mapping
|
||
├── resources/ # Resource files and calibration data
|
||
│ └── imatrix_data.txt # Curated calibration data from Bartowski
|
||
├── docs/ # Detailed documentation
|
||
│ ├── quantise_gguf.md # Quantisation strategies and profiles
|
||
│ ├── safetensors2gguf.md # Direct conversion documentation
|
||
│ ├── bartowski_analysis.md # Deep dive into variant strategies
|
||
│ ├── imatrix_data.md # Importance matrix guide
|
||
│ └── development.md # This guide
|
||
└── pyproject.toml # Modern Python project configuration
|
||
```
|
||
|
||
## Contributing Guidelines
|
||
|
||
The project values pragmatic solutions over theoretical perfection – working code that handles edge
|
||
cases beats elegant abstractions. Contributors should understand how quantisation profiles map to
|
||
Bartowski's discoveries and where Python-C++ boundaries limit functionality.
|
||
|
||
Essential requirements:
|
||
|
||
1. **Style consistency**: Run `uvx ruff format` before committing to keep diffs focused on logic
|
||
2. **Documentation**: Google-style docstrings explain behaviour and rationale beyond type hints
|
||
3. **Type safety**: Complete type hints for all public functions enable IDE support
|
||
4. **Practical testing**: Test with both 1B and 7B+ models to catch scaling issues
|
||
|
||
## Development Workflow
|
||
|
||
### Setting Up Development Environment
|
||
|
||
The project uses `uv` for dependency management – Rust-fast, automatic Python version management,
|
||
upfront dependency resolution. Development dependencies include ruff, type stubs, and optional
|
||
PyTorch for BFloat16 handling.
|
||
|
||
```bash
|
||
# Clone the repository - uses Forgejo (GitLab-like) hosting
|
||
git clone https://git.tomfos.tr/tom/llm-gguf-tools.git
|
||
cd llm-gguf-tools
|
||
|
||
# Install all dependencies including dev tools
|
||
# This installs llama-cpp-python with CUDA support if available
|
||
uv sync --all-groups
|
||
```
|
||
|
||
### Code Style
|
||
|
||
Code style reduces cognitive load by letting reviewers focus on logic rather than layout. UK English
|
||
maintains llama.cpp consistency. The 100-character line limit balances descriptive names with
|
||
readability.
|
||
|
||
Core conventions:
|
||
|
||
- **PEP 8 compliance**: Ruff catches mutable defaults, unused imports automatically
|
||
- **UK English**: "Optimise" not "optimize", matching upstream llama.cpp
|
||
- **Line length**: 100 characters maximum except URLs or unbreakable paths
|
||
- **Type annotations**: Complete hints for public functions – documentation that can't go stale
|
||
- **Import ordering**: Standard library, third-party, local – ruff handles automatically
|
||
|
||
### Testing
|
||
|
||
Formal tests pending. Quantisation "correctness" depends on complex interactions between model
|
||
architecture, strategy, and downstream usage. Benchmark performance doesn't guarantee production
|
||
success.
|
||
|
||
Current validation approach:
|
||
|
||
- **End-to-end testing**: Qwen 0.5B for quick iteration, Llama 3.2 1B for architecture compatibility
|
||
- **Output validation**: GGUF must load in llama.cpp and degrade gracefully, not produce gibberish
|
||
- **Error handling**: Test corrupted SafeTensors, missing configs, insufficient disk space
|
||
- **Logger consistency**: Verify colour coding across terminals, progress bars with piped output
|
||
|
||
### Debugging
|
||
|
||
Debug logging transforms black box to glass box, revealing failure points. Colour coding highlights
|
||
stages: blue (info), yellow (warnings), red (errors), green (success). Visual hierarchy enables
|
||
efficient log scanning.
|
||
|
||
```bash
|
||
# Enable comprehensive debug output
|
||
DEBUG=true uv run direct_safetensors_to_gguf.py ./model # Tensor mapping details
|
||
DEBUG=true uv run quantise.py <model_url> # Memory usage tracking
|
||
```
|
||
|
||
Debug output reveals:
|
||
|
||
- **Download progress**: Bytes transferred, retries, connection issues
|
||
- **Conversion pipeline**: SafeTensors→GGUF steps, tensor mappings, dimension changes
|
||
- **Quantisation decisions**: Layer bit depths, importance matrix effects on weight selection
|
||
- **Memory usage**: Peak consumption for predicting larger model requirements
|
||
- **File operations**: Read/write/temp patterns for disk usage analysis
|
||
- **Error context**: Stack traces with local variables at failure points
|