Switch to llama-cpp-python
This commit is contained in:
parent
ef7df1a8c3
commit
d937f2d5fa
25 changed files with 2957 additions and 1181 deletions
|
@ -1,86 +1,136 @@
|
|||
# Development Guide
|
||||
|
||||
This guide covers development setup, code quality standards, and project structure for contributors.
|
||||
Contributing to GGUF tools requires understanding quantisation workflows and Python's modern
|
||||
dependency ecosystem. This guide covers setup, standards, and architectural decisions for fixing
|
||||
bugs, adding quantisation profiles, or extending conversion capabilities.
|
||||
|
||||
## Code Quality
|
||||
|
||||
Ruff replaces the traditional Black/isort/flake8 stack as both linter and formatter. Mypy provides
|
||||
static type checking to catch type-related bugs before runtime. Zero tolerance for linting and type
|
||||
errors catches issues early. Both tools have extensive configuration in `pyproject.toml` to enforce
|
||||
only the important code quality standards we've selected. Debug logging reveals quantisation internals
|
||||
when models fail.
|
||||
|
||||
```bash
|
||||
# Run linting
|
||||
uv run ruff check
|
||||
# Run linting - catches style violations, potential bugs, and code smells
|
||||
uvx ruff check
|
||||
|
||||
# Format code
|
||||
uv run ruff format
|
||||
# Format code - enforces consistent style automatically
|
||||
uvx ruff format
|
||||
|
||||
# Run with debug logging
|
||||
# Run type checking - ensures type safety and catches potential bugs
|
||||
uv run mypy .
|
||||
|
||||
# Run with debug logging - reveals conversion steps and tensor processing
|
||||
DEBUG=true uv run <script>
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
|
||||
Architecture separates concerns cleanly: top-level scripts provide interfaces, helpers encapsulate
|
||||
reusable logic, resources contain community data. Structure evolved from practical needs – helpers
|
||||
emerged to eliminate duplication, services to abstract external dependencies.
|
||||
|
||||
```plain
|
||||
llm-gguf-tools/
|
||||
├── quantise.py # Bartowski quantisation tool
|
||||
├── direct_safetensors_to_gguf.py # Direct conversion tool
|
||||
├── helpers/ # Shared utilities
|
||||
├── quantise.py # Bartowski quantisation tool - the main workflow
|
||||
├── direct_safetensors_to_gguf.py # Direct conversion for unsupported architectures
|
||||
├── helpers/ # Shared utilities and abstractions
|
||||
│ ├── __init__.py
|
||||
│ └── logger.py # Colour-coded logging
|
||||
├── resources/ # Resource files
|
||||
│ └── imatrix_data.txt # Calibration data for imatrix
|
||||
│ ├── logger.py # Colour-coded logging with context awareness
|
||||
│ ├── services/ # External service wrappers
|
||||
│ │ ├── gguf.py # GGUF writer abstraction
|
||||
│ │ └── llama_python.py # llama-cpp-python integration
|
||||
│ └── utils/ # Pure utility functions
|
||||
│ ├── config_parser.py # Model configuration handling
|
||||
│ └── tensor_mapping.py # Architecture-specific tensor name mapping
|
||||
├── resources/ # Resource files and calibration data
|
||||
│ └── imatrix_data.txt # Curated calibration data from Bartowski
|
||||
├── docs/ # Detailed documentation
|
||||
│ ├── quantise.md
|
||||
│ ├── direct_safetensors_to_gguf.md
|
||||
│ └── development.md
|
||||
└── pyproject.toml # Project configuration
|
||||
│ ├── quantise_gguf.md # Quantisation strategies and profiles
|
||||
│ ├── safetensors2gguf.md # Direct conversion documentation
|
||||
│ ├── bartowski_analysis.md # Deep dive into variant strategies
|
||||
│ ├── imatrix_data.md # Importance matrix guide
|
||||
│ └── development.md # This guide
|
||||
└── pyproject.toml # Modern Python project configuration
|
||||
```
|
||||
|
||||
## Contributing Guidelines
|
||||
|
||||
Contributions are welcome! Please ensure:
|
||||
The project values pragmatic solutions over theoretical perfection – working code that handles edge
|
||||
cases beats elegant abstractions. Contributors should understand how quantisation profiles map to
|
||||
Bartowski's discoveries and where Python-C++ boundaries limit functionality.
|
||||
|
||||
1. Code follows the existing style (run `uv run ruff format`)
|
||||
2. All functions have Google-style docstrings
|
||||
3. Type hints are used throughout
|
||||
4. Tests pass (if applicable)
|
||||
Essential requirements:
|
||||
|
||||
1. **Style consistency**: Run `uvx ruff format` before committing to keep diffs focused on logic
|
||||
2. **Documentation**: Google-style docstrings explain behaviour and rationale beyond type hints
|
||||
3. **Type safety**: Complete type hints for all public functions enable IDE support
|
||||
4. **Practical testing**: Test with both 1B and 7B+ models to catch scaling issues
|
||||
|
||||
## Development Workflow
|
||||
|
||||
### Setting Up Development Environment
|
||||
|
||||
The project uses `uv` for dependency management – Rust-fast, automatic Python version management,
|
||||
upfront dependency resolution. Development dependencies include ruff, type stubs, and optional
|
||||
PyTorch for BFloat16 handling.
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
# Clone the repository - uses Forgejo (GitLab-like) hosting
|
||||
git clone https://git.tomfos.tr/tom/llm-gguf-tools.git
|
||||
cd llm-gguf-tools
|
||||
|
||||
# Install all dependencies including dev
|
||||
# Install all dependencies including dev tools
|
||||
# This installs llama-cpp-python with CUDA support if available
|
||||
uv sync --all-groups
|
||||
```
|
||||
|
||||
### Code Style
|
||||
|
||||
- Follow PEP 8 with ruff enforcement
|
||||
- Use UK English spelling in comments and documentation
|
||||
- Maximum line length: 100 characters
|
||||
- Use type hints for all function parameters and returns
|
||||
Code style reduces cognitive load by letting reviewers focus on logic rather than layout. UK English
|
||||
maintains llama.cpp consistency. The 100-character line limit balances descriptive names with
|
||||
readability.
|
||||
|
||||
Core conventions:
|
||||
|
||||
- **PEP 8 compliance**: Ruff catches mutable defaults, unused imports automatically
|
||||
- **UK English**: "Optimise" not "optimize", matching upstream llama.cpp
|
||||
- **Line length**: 100 characters maximum except URLs or unbreakable paths
|
||||
- **Type annotations**: Complete hints for public functions – documentation that can't go stale
|
||||
- **Import ordering**: Standard library, third-party, local – ruff handles automatically
|
||||
|
||||
### Testing
|
||||
|
||||
While formal tests are not yet implemented, ensure:
|
||||
Formal tests pending. Quantisation "correctness" depends on complex interactions between model
|
||||
architecture, strategy, and downstream usage. Benchmark performance doesn't guarantee production
|
||||
success.
|
||||
|
||||
- Scripts run without errors on sample models
|
||||
- Logger output is correctly formatted
|
||||
- File I/O operations handle errors gracefully
|
||||
Current validation approach:
|
||||
|
||||
- **End-to-end testing**: Qwen 0.5B for quick iteration, Llama 3.2 1B for architecture compatibility
|
||||
- **Output validation**: GGUF must load in llama.cpp and degrade gracefully, not produce gibberish
|
||||
- **Error handling**: Test corrupted SafeTensors, missing configs, insufficient disk space
|
||||
- **Logger consistency**: Verify colour coding across terminals, progress bars with piped output
|
||||
|
||||
### Debugging
|
||||
|
||||
Enable debug logging for verbose output:
|
||||
Debug logging transforms black box to glass box, revealing failure points. Colour coding highlights
|
||||
stages: blue (info), yellow (warnings), red (errors), green (success). Visual hierarchy enables
|
||||
efficient log scanning.
|
||||
|
||||
```bash
|
||||
DEBUG=true uv run quantise.py <model_url>
|
||||
# Enable comprehensive debug output
|
||||
DEBUG=true uv run direct_safetensors_to_gguf.py ./model # Tensor mapping details
|
||||
DEBUG=true uv run quantise.py <model_url> # Memory usage tracking
|
||||
```
|
||||
|
||||
This will show additional information about:
|
||||
Debug output reveals:
|
||||
|
||||
- Model download progress
|
||||
- Conversion steps
|
||||
- File operations
|
||||
- Error details
|
||||
- **Download progress**: Bytes transferred, retries, connection issues
|
||||
- **Conversion pipeline**: SafeTensors→GGUF steps, tensor mappings, dimension changes
|
||||
- **Quantisation decisions**: Layer bit depths, importance matrix effects on weight selection
|
||||
- **Memory usage**: Peak consumption for predicting larger model requirements
|
||||
- **File operations**: Read/write/temp patterns for disk usage analysis
|
||||
- **Error context**: Stack traces with local variables at failure points
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue