GitHub - justinlime/Fatterbox: Open API and Wyoming wrapper around Chatterbox

Overview

Fatterbox is built on rsxdalv's optimized Chatterbox implementation, exposing both Wyoming protocol and OpenAPI endpoints with streaming support for minimal time-to-first-word latency. The streaming architecture splits text into sentence chunks and generates audio progressively, allowing playback to begin before the entire text is synthesized.

Requirements

Docker with NVIDIA GPU support (install nvidia-container-toolkit)
NVIDIA GPU with CUDA capability
Voice reference files (.wav format)

Quick Start

Prepare voice files: Place .wav files in a voices directory. Each file becomes a voice (e.g., Jake.wav → voice name "Jake").
Pull the prebuilt image (or build your own with docker build -t fatterbox .):

docker pull docker.io/justinlime/fatterbox:v0.1.0

Run the container:

docker run --gpus all \
  -v ./voices:/chatter/voices \
  -p 10200:10200 \
  -p 8000:8000 \
  docker.io/justinlime/fatterbox:v0.1.0

Servers

Two servers run simultaneously:

Wyoming protocol: tcp://0.0.0.0:10200 (Home Assistant integration)
OpenAPI REST: http://0.0.0.0:8000 (OpenAI-compatible)

Configuration

Configure via environment variables (all prefixed with FATTERBOX_):

Server Configuration

FATTERBOX_WYOMING_HOST=0.0.0.0
FATTERBOX_WYOMING_PORT=10200
FATTERBOX_OPENAPI_HOST=0.0.0.0
FATTERBOX_OPENAPI_PORT=8000
FATTERBOX_VOICES_DIR=./voices

Model Configuration

FATTERBOX_DEVICE=cuda              # cuda or cpu
FATTERBOX_DTYPE=bf16               # float32, fp16, bf16 (bf16 recommended)
FATTERBOX_BACKEND=cudagraphs-manual # fastest option

Minimum VRAM Required:

FP32: ~4.5 GB
FP16/BF16: ~3.5 GB (recommended)

Estimates based on my tests with a RTX 3090. You might experience memory spikes if generating large sentences without punctuations. BF16 offers the best balance of speed and memory efficiency on modern GPUs.

Generation Parameters

FATTERBOX_EXAGGERATION=0.5      # Emotional expressiveness (0.0-2.0)
FATTERBOX_CFG_WEIGHT=0.5        # Voice adherence (0.0-1.0)
FATTERBOX_TEMPERATURE=0.8       # Randomness (0.05-5.0)
FATTERBOX_SEED=0                # Random seed (0=random)
FATTERBOX_TOP_P=1.0             # Nucleus sampling (0.0-1.0)
FATTERBOX_MIN_P=0.0             # Min probability (0.0-1.0)
FATTERBOX_MAX_NEW_TOKENS=4096   # Max audio tokens (~25 per second)
FATTERBOX_N_TIMESTEPS=10        # Diffusion steps
FATTERBOX_FLOW_CFG_SCALE=1.0    # Mel decoder CFG scale
FATTERBOX_DEBUG=false           # Enable debug logging

Example with custom settings:

docker run --gpus all \
  -v ./voices:/chatter/voices \
  -p 10200:10200 \
  -p 8000:8000 \
  -e FATTERBOX_DTYPE=bf16 \
  -e FATTERBOX_EXAGGERATION=0.7 \
  -e FATTERBOX_CFG_WEIGHT=0.4 \
  fatterbox

API Usage

OpenAPI Endpoint

curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello, this is a test.",
    "voice": "Jake"
  }' \
  --output speech.wav

List Available Voices

curl http://localhost:8000/v1/voices

Wyoming Protocol

Compatible with Home Assistant's Wyoming protocol. Configure in Home Assistant using:

Host: <docker-host-ip>
Port: 10200

Performance Tips

Use bf16 dtype (recommended) for best balance of speed and VRAM efficiency
RTX 30xx/40xx GPUs have native BF16 support for optimal performance
Use cudagraphs-manual backend (default) for fastest generation
Lower EXAGGERATION and CFG_WEIGHT for more expressive speech

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
fatterbox		fatterbox
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker_init.py		docker_init.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Requirements

Quick Start

Servers

Configuration

Server Configuration

Model Configuration

Generation Parameters

Example with custom settings:

API Usage

OpenAPI Endpoint

List Available Voices

Wyoming Protocol

Performance Tips

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Requirements

Quick Start

Servers

Configuration

Server Configuration

Model Configuration

Generation Parameters

Example with custom settings:

API Usage

OpenAPI Endpoint

List Available Voices

Wyoming Protocol

Performance Tips

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages