tom/llm-gguf-tools

SHA256

Fork 0

Tom Foster ef7df1a8c3 Initial commit

2025-08-07 18:29:12 +01:00

5.2 KiB

Raw Blame History

direct_safetensors_to_gguf.py - Direct SafeTensors Conversion

Direct SafeTensors to GGUF converter for unsupported architectures.

Overview

This tool converts SafeTensors models directly to GGUF format without requiring specific architecture support in llama.cpp. It's particularly useful for experimental models, custom architectures, or when llama.cpp's standard conversion tools don't recognise your model architecture.

Features

Architecture-agnostic: Works with unsupported model architectures
Automatic mapping: Intelligently maps tensor names to GGUF conventions
BFloat16 support: Handles BF16 tensors with PyTorch (optional)
Vision models: Supports models with vision components
Tokeniser preservation: Extracts and includes tokeniser metadata
Fallback mechanisms: Provides sensible defaults for unknown architectures

Usage

Basic Usage

# Convert a local SafeTensors model
uv run direct_safetensors_to_gguf.py /path/to/model/directory

Command Line Options

# Specify output file
uv run direct_safetensors_to_gguf.py /path/to/model -o output.gguf

# Force specific architecture mapping
uv run direct_safetensors_to_gguf.py /path/to/model --force-arch qwen2

# Convert with custom output path
uv run direct_safetensors_to_gguf.py ./my-model --output ./converted/my-model.gguf

Supported Input Formats

The tool automatically detects and handles:

Single file models: model.safetensors
Sharded models: model-00001-of-00005.safetensors, etc.
Custom names: Any *.safetensors files in the directory

Architecture Mapping

The tool includes built-in mappings for several architectures:

DotsOCRForCausalLM → qwen2
GptOssForCausalLM → llama
Unknown architectures → llama (fallback)

You can override these with the --force-arch parameter.

Tensor Name Mapping

The converter automatically maps common tensor patterns:

Original Pattern	GGUF Name
`model.embed_tokens.weight`	`token_embd.weight`
`model.norm.weight`	`output_norm.weight`
`lm_head.weight`	`output.weight`
`layers.N.self_attn.q_proj`	`blk.N.attn_q`
`layers.N.self_attn.k_proj`	`blk.N.attn_k`
`layers.N.self_attn.v_proj`	`blk.N.attn_v`
`layers.N.mlp.gate_proj`	`blk.N.ffn_gate`
`layers.N.mlp.up_proj`	`blk.N.ffn_up`
`layers.N.mlp.down_proj`	`blk.N.ffn_down`

Configuration Requirements

The model directory must contain:

config.json: Model configuration file (required)
*.safetensors: One or more SafeTensors files (required)
tokenizer_config.json: Tokeniser configuration (optional)
tokenizer.json: Tokeniser data (optional)

Output Format

The tool produces a single GGUF file containing:

All model weights in F32 format
Model architecture metadata
Tokeniser configuration (if available)
Special token IDs (BOS, EOS, UNK, PAD)

Error Handling

Error	Message	Solution
Missing config.json	`FileNotFoundError: Config file not found`	Ensure the model directory contains a valid `config.json` file
No SafeTensors files	`FileNotFoundError: No safetensor files found`	Check that the directory contains `.safetensors` files
BFloat16 without PyTorch	`Warning: PyTorch not available, BFloat16 models may not convert properly`	Install PyTorch for BF16 support: `uv add torch`
Unknown architecture	`Warning: Unknown architecture X, using llama as fallback`	Use `--force-arch` to specify a known compatible architecture

Technical Details

Parameter Inference

The tool infers GGUF parameters from the model configuration:

vocab_size → vocabulary size (default: 32000)
max_position_embeddings → context length (default: 2048)
hidden_size → embedding dimension (default: 4096)
num_hidden_layers → number of transformer blocks (default: 32)
num_attention_heads → attention head count (default: 32)
num_key_value_heads → KV head count (defaults to attention heads)
rope_theta → RoPE frequency base (default: 10000.0)
rms_norm_eps → layer normalisation epsilon (default: 1e-5)

Vision Model Support

For models with vision components, the tool extracts:

Vision embedding dimensions
Vision transformer block count
Vision attention heads
Vision feed-forward dimensions
Patch size and spatial merge parameters

Limitations

F32 only: Currently outputs only full precision (F32) models
Architecture guessing: May require manual architecture specification
Tokeniser compatibility: Uses llama tokeniser as default fallback
Memory usage: Requires loading full tensors into memory

Examples

Converting a custom model

# Download a model first
git clone https://huggingface.co/my-org/my-model ./my-model

# Convert to GGUF
uv run direct_safetensors_to_gguf.py ./my-model

# Output will be at ./my-model/my-model-f32.gguf

Converting with specific architecture

# For a Qwen2-based model
uv run direct_safetensors_to_gguf.py ./qwen-model --force-arch qwen2

Batch conversion

# Convert multiple models
for model in ./models/*; do
    uv run direct_safetensors_to_gguf.py "$model" -o "./gguf/$(basename $model).gguf"
done

5.2 KiB Raw Blame History