Pure Go implementation of SNAC (Multi-Scale Neural Audio Codec) decoder https://github.com/tcpipuk/snac-go
Find a file
Tom Foster e1c9d5b1e9
Some checks failed
CI/CD Pipeline / test (push) Failing after 3s
feat(decoder): complete architectural correction and ground truth verification
Discovered and fixed fundamental architecture mismatch in SNAC 24kHz decoder.
Initial implementation was based on incomplete understanding - corrected to
match actual pretrained model structure.

Architectural corrections:
- VQ levels: 4 → 3 (strides [8, 4, 2])
- DecoderDim: 1536 → 1024
- DecoderRates: [5,5,3,3] → [8,8,4,2]
- Removed LocalMHA (24kHz model has no attention)
- DecoderBlock: Rewrote to Snake → ConvTranspose → NoiseBlock → 3× ResidualUnit
- Initial layers: Single Conv → Depthwise (groups=768) + Pointwise

New components:
- NoiseBlock - scaled random noise injection with deterministic seeding
- ResidualUnit - dilated depthwise convolutions
- Extended Conv1d with groups and dilation support

Testing:
- 151 tests passing (91 unit + 60 new including ground truth)
- 6 ground truth test cases verified against Python SNAC
- MSE errors: 1e-7 to 1e-5 (float32 precision match)
- Production-ready decoder verified with actual pretrained weights

Updated README.md to reflect correct architecture and verification status.
2025-10-11 02:31:35 +01:00
.forgejo/workflows feat(project): add infrastructure for SNAC decoder 2025-10-10 09:08:41 +01:00
docs docs(readme): tidy README and move architecture details to dedicated guide 2025-10-10 17:36:16 +01:00
examples/simple feat(decoder): implement comprehensive test suite and weight loading infrastructure 2025-10-10 22:13:24 +01:00
scripts feat(decoder): complete architectural correction and ground truth verification 2025-10-11 02:31:35 +01:00
snac feat(decoder): complete architectural correction and ground truth verification 2025-10-11 02:31:35 +01:00
.gitignore feat(project): add infrastructure for SNAC decoder 2025-10-10 09:08:41 +01:00
.markdownlint.yaml feat(project): add infrastructure for SNAC decoder 2025-10-10 09:08:41 +01:00
.pre-commit-config.yaml feat(decoder): implement complete SNAC audio decoder 2025-10-10 13:13:31 +01:00
go.mod feat(decoder): implement comprehensive test suite and weight loading infrastructure 2025-10-10 22:13:24 +01:00
go.sum feat(decoder): implement comprehensive test suite and weight loading infrastructure 2025-10-10 22:13:24 +01:00
LICENSE feat(project): add infrastructure for SNAC decoder 2025-10-10 09:08:41 +01:00
README.md feat(decoder): complete architectural correction and ground truth verification 2025-10-11 02:31:35 +01:00

SNAC-Go

Go Reference

Pure Go implementation of the SNAC neural audio codec decoder for text-to-speech applications.

Overview

SNAC-Go decodes multi-scale neural audio codes into high-quality audio waveforms. It integrates with llama-go to enable complete Go-based text-to-speech using models like Orpheus TTS, with no Python runtime dependencies.

Features

  • Pure Go implementation - No CGO, Python, or PyTorch dependencies
  • SNAC 24kHz model - Speech-optimised decoder (19.8M parameters)
  • Hierarchical decoding - Multi-scale vector quantisation with 3 codebook levels
  • Production-ready - Ground-truth verified against upstream Python SNAC (MSE < 1e-5)

Installation

go get github.com/tcpipuk/snac-go

Quick start

package main

import (
    "github.com/tcpipuk/snac-go/snac"
    "log"
)

func main() {
    // Load SNAC decoder with 24kHz model
    decoder, err := snac.NewDecoder("hubertsiuzdak/snac_24khz")
    if err != nil {
        log.Fatal(err)
    }

    // Decode SNAC tokens to audio
    // tokens is [3][]int representing 3 hierarchical codebook levels
    audio, err := decoder.Decode(tokens)
    if err != nil {
        log.Fatal(err)
    }

    // Write to WAV file
    if err := writeWAV("output.wav", audio, 24000); err != nil {
        log.Fatal(err)
    }
}

Why Go?

The Python SNAC decoder is ~474 lines whilst this is nearly 4700. We want pure Go for TTS (llama-go runs Orpheus, snac-go decodes audio), but PyTorch does the heavy lifting in Python - convolutions and attention get implemented manually using Gorgonia. GoMLX would shrink this significantly, but it's pre-v1.0 with API instability. Current approach is pragmatic: reliable pure Go now, but ready to migrate when GoMLX stabilises.

Documentation

See Architecture Guide for technical details about how SNAC decoder works.

Status

  • Architecture implementation complete (~4700 lines pure Go)
  • Weight loading and conversion tooling ready
  • Ground truth verification complete (151 tests passing, MSE < 1e-5)
  • Production-ready decoder verified against Python SNAC
  • API subject to change before v1.0

Licence

Licensed under the Apache Licence 2.0