chatterbox ttsmacapple siliconlocal ttsvoice cloningopen source tts

How to Run Chatterbox TTS Locally on Mac

A practical guide to running Chatterbox Turbo, the original Chatterbox model, and Chatterbox Multilingual locally on a Mac with Python, Apple Silicon MPS acceleration, and CPU fallback.

Updated on Jun 01, 202612 min read

Chatterbox TTS is an open-source text-to-speech model family from Resemble AI. It can generate speech locally, clone a voice from a reference recording, and save the result as a WAV file without sending your text or reference audio to a hosted TTS API.

This guide shows how to run all three main Chatterbox variants on macOS:

Variant Best for Languages Size
Chatterbox Turbo Most Mac users, faster inference, voice agents, and narration English 350M parameters
Original Chatterbox English speech with exaggeration and cfg_weight controls English 500M parameters
Chatterbox Multilingual Speech and voice cloning across supported languages 23 languages 500M parameters

Start with Chatterbox Turbo unless you specifically need the original model’s CFG tuning or multilingual output. Turbo is the smallest variant and reduces the speech-token-to-mel decoder from 10 steps to one.

For a deeper architectural comparison, see Chatterbox vs Qwen3-TTS.

System Requirements

Use a Mac with:

  • macOS
  • Python 3.10 or newer; Python 3.11 recommended
  • 8 GB of memory or more recommended
  • Internet access during setup and the first model download
  • A clean 6-10 second WAV reference recording if you want voice cloning

Chatterbox can run locally on Apple Silicon Macs, including MacBook Air and MacBook Pro laptops as well as Mac mini, iMac, Mac Studio, and Mac Pro desktops. The setup applies to M1, M2, M3, M4, and M5-series systems. It selects the mps device provided by PyTorch when it is available, allowing supported operations to use the Mac GPU.

Intel Macs and any environment where torch.backends.mps.is_available() returns False use CPU fallback. CPU inference works, but it is slower. Memory usage and generation time depend on the selected model, prompt length, and Mac hardware.

Step 1: Create a Python Environment

Open Terminal and create a dedicated project folder:

mkdir chatterbox-mac
cd chatterbox-mac
python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip setuptools wheel

Using a virtual environment keeps Chatterbox dependencies separate from your system Python installation.

If python3.11 is not installed, install Python 3.11 first. One option is Homebrew:

brew install python@3.11

Step 2: Install Chatterbox

Install the published package:

pip install chatterbox-tts

The official repository also supports installing from source. This is useful if you need to inspect the examples, modify dependencies, or work with the current repository version:

git clone https://github.com/resemble-ai/chatterbox.git
cd chatterbox
pip install -e .

The Chatterbox repository states that its pinned dependencies were developed and tested with Python 3.11 on Debian 11. macOS can require extra troubleshooting, especially when native dependencies need to build locally.

Step 3: Add the Mac Device Helper

The current Chatterbox package handles non-CUDA checkpoint loading internally. Add a small helper that selects mps when available and falls back to cpu.

Create a file named mac_device.py:

import torch

device = "mps" if torch.backends.mps.is_available() else "cpu"

Each script below imports this helper before loading a model.

To confirm which device your Mac will use:

python -c "from mac_device import device; print(device)"

On a supported Apple Silicon setup, the output should be:

mps

If the output is cpu, the examples still run, but generation will usually take longer.

Option 1: Run Chatterbox Turbo

Turbo is the best starting point for most Mac users. It is the smallest model in the family and supports paralinguistic tags such as [laugh], [chuckle], and [cough].

Turbo requires a reference recording longer than 5 seconds. A clean 6-10 second WAV clip is a practical starting point for voice cloning. Place a file such as reference.wav in your project folder.

Create run_turbo.py:

import torchaudio as ta

from mac_device import device
from chatterbox.tts_turbo import ChatterboxTurboTTS

model = ChatterboxTurboTTS.from_pretrained(device=device)

text = (
    "This audio was generated locally on my Mac. "
    "The model is ready for a longer narration test [chuckle]."
)

wav = model.generate(
    text,
    audio_prompt_path="reference.wav",
)

ta.save("turbo-output.wav", wav, model.sr)
print(f"Saved turbo-output.wav using {device}")

Run it:

python run_turbo.py

The first run downloads the model files. Later runs can reuse the local cache.

Option 2: Run the Original Chatterbox Model

Use the original English model when you want to tune emotional intensity and conditioning behavior. The two important parameters are:

  • exaggeration: controls emotional intensity. The default is 0.5.
  • cfg_weight: controls how strongly the output follows the conditioning audio. The default is 0.5.

Create run_original.py:

import torchaudio as ta

from mac_device import device
from chatterbox.tts import ChatterboxTTS

model = ChatterboxTTS.from_pretrained(device=device)

text = "This is the original Chatterbox model running locally on a Mac."

wav = model.generate(
    text,
    audio_prompt_path="reference.wav",
    exaggeration=0.5,
    cfg_weight=0.5,
)

ta.save("original-output.wav", wav, model.sr)
print(f"Saved original-output.wav using {device}")

Run it:

python run_original.py

The original model can also generate speech without audio_prompt_path. Add a reference recording when you want to clone a different voice.

For a more dramatic result, try increasing exaggeration to 0.7 or higher and lowering cfg_weight toward 0.3. Higher exaggeration values can make speech faster or less stable, so change one parameter at a time.

Option 3: Run Chatterbox Multilingual

Use Chatterbox Multilingual when you need a supported language other than English or want to test cross-lingual voice cloning.

Create run_multilingual.py:

import torchaudio as ta

from mac_device import device
from chatterbox.mtl_tts import ChatterboxMultilingualTTS

model = ChatterboxMultilingualTTS.from_pretrained(device=device)

text = "Bonjour, ceci est un test de synthese vocale execute localement sur un Mac."

wav = model.generate(
    text,
    language_id="fr",
    audio_prompt_path="reference.wav",
)

ta.save("multilingual-output.wav", wav, model.sr)
print(f"Saved multilingual-output.wav using {device}")

Run it:

python run_multilingual.py

The multilingual model supports these 23 language IDs:

Language ID Language ID Language ID
Arabic ar Danish da German de
Greek el English en Spanish es
Finnish fi French fr Hebrew he
Hindi hi Italian it Japanese ja
Korean ko Malay ms Dutch nl
Norwegian no Polish pl Portuguese pt
Russian ru Swedish sv Swahili sw
Turkish tr Chinese zh

Match the reference clip to the target language when possible. A reference recording in a different language can introduce accent artifacts.

Alternative: Use an ONNX Export

The Python package is the main path for running the full Chatterbox model family locally on a Mac. There are also exported ONNX versions for portable inference with ONNX Runtime:

The original and multilingual ONNX exports are maintained by the ONNX community. The Turbo ONNX export is published under Resemble AI’s Hugging Face account. Follow the instructions on the selected model card because dependencies, quantized files, and runtime support differ between exports.

Use CPU Fallback Manually

MPS is useful on Apple Silicon, but some PyTorch operations or dependency combinations may fail on a particular macOS setup. To force CPU inference while troubleshooting, change the first line that defines device in mac_device.py:

device = "cpu"

CPU fallback is also the expected path on an Intel Mac.

Troubleshooting

python3.11: command not found

Install Python 3.11 and create a new virtual environment:

brew install python@3.11
python3.11 -m venv .venv
source .venv/bin/activate

A checkpoint tries to load on CUDA

Upgrade to the current chatterbox-tts package or install the current source checkout. Current versions handle non-CUDA checkpoint loading internally:

pip install --upgrade chatterbox-tts

An MPS operation fails

Force CPU fallback in mac_device.py:

device = "cpu"

This is slower, but it helps distinguish a Metal backend limitation from a general installation issue.

Installation fails while building a dependency

First confirm that you are using Python 3.11 inside a clean virtual environment:

python --version
which python
python -m pip install --upgrade pip setuptools wheel

Then retry:

pip install chatterbox-tts

If the published package still fails to resolve or build, install the current source checkout and inspect the dependency error:

git clone https://github.com/resemble-ai/chatterbox.git
cd chatterbox
pip install -e .

Do not install random dependency versions globally to work around a project-specific error. Keep changes inside the virtual environment.

The first generation takes longer than expected

The first run downloads model files and initializes the model. Run a short sentence first, then test longer text after the initial setup completes.

Memory usage grows during repeated generations

Use short prompts while testing and restart the Python process between batches if memory pressure becomes noticeable. If you are building a long-running service, monitor memory usage on your target Mac before treating the setup as production-ready.

Generated Audio Is Watermarked

Chatterbox applies PerTh watermarking to generated audio. The watermark is designed to be imperceptible while making it possible to detect that a clip was generated by the model.

Which Variant Should You Use?

Use Chatterbox Turbo first. It is the practical default for local Mac experiments and English voice cloning.

Use the original Chatterbox model when you specifically want exaggeration and cfg_weight tuning.

Use Chatterbox Multilingual when language coverage matters more than the smallest model size.

All three variants share the same basic setup. You do not need three separate Python environments unless you intentionally want to isolate different dependency experiments.

A Simpler Mac Workflow

Running Chatterbox from Python is useful when you want to study the model, change parameters directly, or integrate speech generation into your own code.

If you want a desktop workflow instead of managing Python environments and model scripts, Spokio is a native Mac text-to-speech app powered by Chatterbox Turbo. It runs locally on Apple Silicon and Intel Macs, supports English voice generation and local voice cloning, and does not upload your text, audio, or voice samples to a cloud TTS service.

More from the blog