164 lines
5.2 KiB
Markdown
164 lines
5.2 KiB
Markdown
# direct_safetensors_to_gguf.py - Direct SafeTensors Conversion
|
|
|
|
Direct SafeTensors to GGUF converter for unsupported architectures.
|
|
|
|
## Overview
|
|
|
|
This tool converts SafeTensors models directly to GGUF format without requiring specific
|
|
architecture support in llama.cpp. It's particularly useful for experimental models, custom
|
|
architectures, or when llama.cpp's standard conversion tools don't recognise your model
|
|
architecture.
|
|
|
|
## Features
|
|
|
|
- **Architecture-agnostic**: Works with unsupported model architectures
|
|
- **Automatic mapping**: Intelligently maps tensor names to GGUF conventions
|
|
- **BFloat16 support**: Handles BF16 tensors with PyTorch (optional)
|
|
- **Vision models**: Supports models with vision components
|
|
- **Tokeniser preservation**: Extracts and includes tokeniser metadata
|
|
- **Fallback mechanisms**: Provides sensible defaults for unknown architectures
|
|
|
|
## Usage
|
|
|
|
### Basic Usage
|
|
|
|
```bash
|
|
# Convert a local SafeTensors model
|
|
uv run direct_safetensors_to_gguf.py /path/to/model/directory
|
|
```
|
|
|
|
### Command Line Options
|
|
|
|
```bash
|
|
# Specify output file
|
|
uv run direct_safetensors_to_gguf.py /path/to/model -o output.gguf
|
|
|
|
# Force specific architecture mapping
|
|
uv run direct_safetensors_to_gguf.py /path/to/model --force-arch qwen2
|
|
|
|
# Convert with custom output path
|
|
uv run direct_safetensors_to_gguf.py ./my-model --output ./converted/my-model.gguf
|
|
```
|
|
|
|
## Supported Input Formats
|
|
|
|
The tool automatically detects and handles:
|
|
|
|
1. **Single file models**: `model.safetensors`
|
|
2. **Sharded models**: `model-00001-of-00005.safetensors`, etc.
|
|
3. **Custom names**: Any `*.safetensors` files in the directory
|
|
|
|
## Architecture Mapping
|
|
|
|
The tool includes built-in mappings for several architectures:
|
|
|
|
- `DotsOCRForCausalLM` → `qwen2`
|
|
- `GptOssForCausalLM` → `llama`
|
|
- Unknown architectures → `llama` (fallback)
|
|
|
|
You can override these with the `--force-arch` parameter.
|
|
|
|
## Tensor Name Mapping
|
|
|
|
The converter automatically maps common tensor patterns:
|
|
|
|
| Original Pattern | GGUF Name |
|
|
|-----------------|-----------|
|
|
| `model.embed_tokens.weight` | `token_embd.weight` |
|
|
| `model.norm.weight` | `output_norm.weight` |
|
|
| `lm_head.weight` | `output.weight` |
|
|
| `layers.N.self_attn.q_proj` | `blk.N.attn_q` |
|
|
| `layers.N.self_attn.k_proj` | `blk.N.attn_k` |
|
|
| `layers.N.self_attn.v_proj` | `blk.N.attn_v` |
|
|
| `layers.N.mlp.gate_proj` | `blk.N.ffn_gate` |
|
|
| `layers.N.mlp.up_proj` | `blk.N.ffn_up` |
|
|
| `layers.N.mlp.down_proj` | `blk.N.ffn_down` |
|
|
|
|
## Configuration Requirements
|
|
|
|
The model directory must contain:
|
|
|
|
- **config.json**: Model configuration file (required)
|
|
- **\*.safetensors**: One or more SafeTensors files (required)
|
|
- **tokenizer_config.json**: Tokeniser configuration (optional)
|
|
- **tokenizer.json**: Tokeniser data (optional)
|
|
|
|
## Output Format
|
|
|
|
The tool produces a single GGUF file containing:
|
|
|
|
- All model weights in F32 format
|
|
- Model architecture metadata
|
|
- Tokeniser configuration (if available)
|
|
- Special token IDs (BOS, EOS, UNK, PAD)
|
|
|
|
## Error Handling
|
|
|
|
| Error | Message | Solution |
|
|
|-------|---------|----------|
|
|
| Missing config.json | `FileNotFoundError: Config file not found` | Ensure the model directory contains a valid `config.json` file |
|
|
| No SafeTensors files | `FileNotFoundError: No safetensor files found` | Check that the directory contains `.safetensors` files |
|
|
| BFloat16 without PyTorch | `Warning: PyTorch not available, BFloat16 models may not convert properly` | Install PyTorch for BF16 support: `uv add torch` |
|
|
| Unknown architecture | `Warning: Unknown architecture X, using llama as fallback` | Use `--force-arch` to specify a known compatible architecture |
|
|
|
|
## Technical Details
|
|
|
|
### Parameter Inference
|
|
|
|
The tool infers GGUF parameters from the model configuration:
|
|
|
|
- `vocab_size` → vocabulary size (default: 32000)
|
|
- `max_position_embeddings` → context length (default: 2048)
|
|
- `hidden_size` → embedding dimension (default: 4096)
|
|
- `num_hidden_layers` → number of transformer blocks (default: 32)
|
|
- `num_attention_heads` → attention head count (default: 32)
|
|
- `num_key_value_heads` → KV head count (defaults to attention heads)
|
|
- `rope_theta` → RoPE frequency base (default: 10000.0)
|
|
- `rms_norm_eps` → layer normalisation epsilon (default: 1e-5)
|
|
|
|
### Vision Model Support
|
|
|
|
For models with vision components, the tool extracts:
|
|
|
|
- Vision embedding dimensions
|
|
- Vision transformer block count
|
|
- Vision attention heads
|
|
- Vision feed-forward dimensions
|
|
- Patch size and spatial merge parameters
|
|
|
|
## Limitations
|
|
|
|
- **F32 only**: Currently outputs only full precision (F32) models
|
|
- **Architecture guessing**: May require manual architecture specification
|
|
- **Tokeniser compatibility**: Uses llama tokeniser as default fallback
|
|
- **Memory usage**: Requires loading full tensors into memory
|
|
|
|
## Examples
|
|
|
|
### Converting a custom model
|
|
|
|
```bash
|
|
# Download a model first
|
|
git clone https://huggingface.co/my-org/my-model ./my-model
|
|
|
|
# Convert to GGUF
|
|
uv run direct_safetensors_to_gguf.py ./my-model
|
|
|
|
# Output will be at ./my-model/my-model-f32.gguf
|
|
```
|
|
|
|
### Converting with specific architecture
|
|
|
|
```bash
|
|
# For a Qwen2-based model
|
|
uv run direct_safetensors_to_gguf.py ./qwen-model --force-arch qwen2
|
|
```
|
|
|
|
### Batch conversion
|
|
|
|
```bash
|
|
# Convert multiple models
|
|
for model in ./models/*; do
|
|
uv run direct_safetensors_to_gguf.py "$model" -o "./gguf/$(basename $model).gguf"
|
|
done
|
|
```
|