# direct_safetensors_to_gguf.py - Direct SafeTensors Conversion Direct SafeTensors to GGUF converter for unsupported architectures. ## Overview This tool converts SafeTensors models directly to GGUF format without requiring specific architecture support in llama.cpp. It's particularly useful for experimental models, custom architectures, or when llama.cpp's standard conversion tools don't recognise your model architecture. ## Features - **Architecture-agnostic**: Works with unsupported model architectures - **Automatic mapping**: Intelligently maps tensor names to GGUF conventions - **BFloat16 support**: Handles BF16 tensors with PyTorch (optional) - **Vision models**: Supports models with vision components - **Tokeniser preservation**: Extracts and includes tokeniser metadata - **Fallback mechanisms**: Provides sensible defaults for unknown architectures ## Usage ### Basic Usage ```bash # Convert a local SafeTensors model uv run direct_safetensors_to_gguf.py /path/to/model/directory ``` ### Command Line Options ```bash # Specify output file uv run direct_safetensors_to_gguf.py /path/to/model -o output.gguf # Force specific architecture mapping uv run direct_safetensors_to_gguf.py /path/to/model --force-arch qwen2 # Convert with custom output path uv run direct_safetensors_to_gguf.py ./my-model --output ./converted/my-model.gguf ``` ## Supported Input Formats The tool automatically detects and handles: 1. **Single file models**: `model.safetensors` 2. **Sharded models**: `model-00001-of-00005.safetensors`, etc. 3. **Custom names**: Any `*.safetensors` files in the directory ## Architecture Mapping The tool includes built-in mappings for several architectures: - `DotsOCRForCausalLM` → `qwen2` - `GptOssForCausalLM` → `llama` - Unknown architectures → `llama` (fallback) You can override these with the `--force-arch` parameter. ## Tensor Name Mapping The converter automatically maps common tensor patterns: | Original Pattern | GGUF Name | |-----------------|-----------| | `model.embed_tokens.weight` | `token_embd.weight` | | `model.norm.weight` | `output_norm.weight` | | `lm_head.weight` | `output.weight` | | `layers.N.self_attn.q_proj` | `blk.N.attn_q` | | `layers.N.self_attn.k_proj` | `blk.N.attn_k` | | `layers.N.self_attn.v_proj` | `blk.N.attn_v` | | `layers.N.mlp.gate_proj` | `blk.N.ffn_gate` | | `layers.N.mlp.up_proj` | `blk.N.ffn_up` | | `layers.N.mlp.down_proj` | `blk.N.ffn_down` | ## Configuration Requirements The model directory must contain: - **config.json**: Model configuration file (required) - **\*.safetensors**: One or more SafeTensors files (required) - **tokenizer_config.json**: Tokeniser configuration (optional) - **tokenizer.json**: Tokeniser data (optional) ## Output Format The tool produces a single GGUF file containing: - All model weights in F32 format - Model architecture metadata - Tokeniser configuration (if available) - Special token IDs (BOS, EOS, UNK, PAD) ## Error Handling | Error | Message | Solution | |-------|---------|----------| | Missing config.json | `FileNotFoundError: Config file not found` | Ensure the model directory contains a valid `config.json` file | | No SafeTensors files | `FileNotFoundError: No safetensor files found` | Check that the directory contains `.safetensors` files | | BFloat16 without PyTorch | `Warning: PyTorch not available, BFloat16 models may not convert properly` | Install PyTorch for BF16 support: `uv add torch` | | Unknown architecture | `Warning: Unknown architecture X, using llama as fallback` | Use `--force-arch` to specify a known compatible architecture | ## Technical Details ### Parameter Inference The tool infers GGUF parameters from the model configuration: - `vocab_size` → vocabulary size (default: 32000) - `max_position_embeddings` → context length (default: 2048) - `hidden_size` → embedding dimension (default: 4096) - `num_hidden_layers` → number of transformer blocks (default: 32) - `num_attention_heads` → attention head count (default: 32) - `num_key_value_heads` → KV head count (defaults to attention heads) - `rope_theta` → RoPE frequency base (default: 10000.0) - `rms_norm_eps` → layer normalisation epsilon (default: 1e-5) ### Vision Model Support For models with vision components, the tool extracts: - Vision embedding dimensions - Vision transformer block count - Vision attention heads - Vision feed-forward dimensions - Patch size and spatial merge parameters ## Limitations - **F32 only**: Currently outputs only full precision (F32) models - **Architecture guessing**: May require manual architecture specification - **Tokeniser compatibility**: Uses llama tokeniser as default fallback - **Memory usage**: Requires loading full tensors into memory ## Examples ### Converting a custom model ```bash # Download a model first git clone https://huggingface.co/my-org/my-model ./my-model # Convert to GGUF uv run direct_safetensors_to_gguf.py ./my-model # Output will be at ./my-model/my-model-f32.gguf ``` ### Converting with specific architecture ```bash # For a Qwen2-based model uv run direct_safetensors_to_gguf.py ./qwen-model --force-arch qwen2 ``` ### Batch conversion ```bash # Convert multiple models for model in ./models/*; do uv run direct_safetensors_to_gguf.py "$model" -o "./gguf/$(basename $model).gguf" done ```