|
Some checks failed
CI/CD Pipeline / test (push) Failing after 3s
Discovered and fixed fundamental architecture mismatch in SNAC 24kHz decoder. Initial implementation was based on incomplete understanding - corrected to match actual pretrained model structure. Architectural corrections: - VQ levels: 4 → 3 (strides [8, 4, 2]) - DecoderDim: 1536 → 1024 - DecoderRates: [5,5,3,3] → [8,8,4,2] - Removed LocalMHA (24kHz model has no attention) - DecoderBlock: Rewrote to Snake → ConvTranspose → NoiseBlock → 3× ResidualUnit - Initial layers: Single Conv → Depthwise (groups=768) + Pointwise New components: - NoiseBlock - scaled random noise injection with deterministic seeding - ResidualUnit - dilated depthwise convolutions - Extended Conv1d with groups and dilation support Testing: - 151 tests passing (91 unit + 60 new including ground truth) - 6 ground truth test cases verified against Python SNAC - MSE errors: 1e-7 to 1e-5 (float32 precision match) - Production-ready decoder verified with actual pretrained weights Updated README.md to reflect correct architecture and verification status. |
||
|---|---|---|
| .forgejo/workflows | ||
| docs | ||
| examples/simple | ||
| scripts | ||
| snac | ||
| .gitignore | ||
| .markdownlint.yaml | ||
| .pre-commit-config.yaml | ||
| go.mod | ||
| go.sum | ||
| LICENSE | ||
| README.md | ||
SNAC-Go
Pure Go implementation of the SNAC neural audio codec decoder for text-to-speech applications.
Overview
SNAC-Go decodes multi-scale neural audio codes into high-quality audio waveforms. It integrates with llama-go to enable complete Go-based text-to-speech using models like Orpheus TTS, with no Python runtime dependencies.
Features
- Pure Go implementation - No CGO, Python, or PyTorch dependencies
- SNAC 24kHz model - Speech-optimised decoder (19.8M parameters)
- Hierarchical decoding - Multi-scale vector quantisation with 3 codebook levels
- Production-ready - Ground-truth verified against upstream Python SNAC (MSE < 1e-5)
Installation
go get github.com/tcpipuk/snac-go
Quick start
package main
import (
"github.com/tcpipuk/snac-go/snac"
"log"
)
func main() {
// Load SNAC decoder with 24kHz model
decoder, err := snac.NewDecoder("hubertsiuzdak/snac_24khz")
if err != nil {
log.Fatal(err)
}
// Decode SNAC tokens to audio
// tokens is [3][]int representing 3 hierarchical codebook levels
audio, err := decoder.Decode(tokens)
if err != nil {
log.Fatal(err)
}
// Write to WAV file
if err := writeWAV("output.wav", audio, 24000); err != nil {
log.Fatal(err)
}
}
Why Go?
The Python SNAC decoder is ~474 lines whilst this is nearly 4700. We want pure Go for TTS (llama-go runs Orpheus, snac-go decodes audio), but PyTorch does the heavy lifting in Python - convolutions and attention get implemented manually using Gorgonia. GoMLX would shrink this significantly, but it's pre-v1.0 with API instability. Current approach is pragmatic: reliable pure Go now, but ready to migrate when GoMLX stabilises.
Documentation
See Architecture Guide for technical details about how SNAC decoder works.
Status
- ✅ Architecture implementation complete (~4700 lines pure Go)
- ✅ Weight loading and conversion tooling ready
- ✅ Ground truth verification complete (151 tests passing, MSE < 1e-5)
- ✅ Production-ready decoder verified against Python SNAC
- ⏳ API subject to change before v1.0
Licence
Licensed under the Apache Licence 2.0