fish audio s2 promacapple siliconmlxggufvoice cloninglocal tts

How to Run Fish Audio S2 Pro Locally on Mac

Learn the practical ways to run Fish Audio S2 Pro locally on a Mac, including MLX on Apple Silicon, native Swift integration, experimental GGUF and Metal ports, and the official CUDA self-hosting route.

Updated on Jun 01, 202614 min read

Fish Audio S2 Pro is a large open-weight text-to-speech model for expressive speech, voice cloning, and inline style control. It can follow instructions inside your text, including tags such as [whisper], [excited], and [pause].

The official deployment path is designed primarily for Linux systems with high-memory GPUs. Running S2 Pro on a Mac is possible because community projects have added Apple Silicon and GGUF ports. These ports are useful, but they should not be mistaken for first-party macOS support.

This guide covers the practical Mac options. For model architecture, licensing, and a feature overview, read the Fish Audio S2 Pro technical guide.

Choose a Local Setup

Method Best for Apple Silicon Intel Mac Support level
mlx-speech with the AppAutomaton bundle The most reproducible Mac setup Yes No Community
mlx-audio A general Apple Silicon audio toolkit Yes No Community
mlx-audio-swift Native macOS app integration Yes No Community
s2.cpp with GGUF weights Experimental C++ and Metal testing Yes CPU testing may be possible Community alpha
Official Fish Speech repository Linux server deployment, WebUI, and API serving Not the main target Not the main target Official

For most Mac users, start with mlx-speech. It has a documented S2 Pro script and a self-contained 8-bit bundle that includes MLX codec assets.

System Requirements

Fish Audio S2 Pro is much larger than lightweight local TTS models. A modern Apple Silicon Mac with generous unified memory is the practical starting point.

Recommended hardware:

  • A MacBook Pro, Mac mini, iMac, Mac Studio, or Mac Pro with an Apple Silicon chip.
  • An M1, M2, M3, M4, or M5-series chip.
  • At least 32 GB of unified memory for a smoother experience with quantized ports.
  • More unified memory if you want to experiment with larger weights or run other applications at the same time.
  • Several gigabytes of free storage for model downloads and generated audio.

A MacBook Air can work for experimentation if it has enough memory, but sustained generation may be slower because it does not have active cooling. Intel Macs do not support Apple’s MLX framework. If you have an Intel Mac, the experimental CPU route in s2.cpp is the relevant option, and you should expect slower output.

The official Fish Audio documentation recommends at least 24 GB of GPU memory for inference. That recommendation refers to the upstream server workflow, not to the community quantized MLX ports covered below.

Option 1: Use mlx-speech on Apple Silicon

mlx-speech provides an MLX-native S2 Pro generation script for Apple Silicon Macs. The matching AppAutomaton Fish Audio S2 Pro 8-bit MLX bundle contains the quantized model weights and bundled codec assets needed for local waveform generation.

This is the recommended Mac route because the model card documents the exact download and generation commands.

1. Install the Project

Open Terminal and run:

git clone https://github.com/appautomaton/mlx-speech.git
cd mlx-speech

python3 -m venv .venv
source .venv/bin/activate
pip install -e .

2. Download the 8-bit Model

Install the Hugging Face download tool and download the bundle:

pip install "huggingface_hub[hf_xet]"

huggingface-cli download \
  --local-dir fishaudio-s2-pro-8bit-mlx \
  appautomaton/fishaudio-s2-pro-8bit-mlx

The first download is large. Let it complete before generating audio.

3. Generate Speech

Run the included script:

python scripts/generate/fish_s2_pro.py \
  --text "Hello from Fish Audio S2 Pro on this Mac." \
  --model-dir ./fishaudio-s2-pro-8bit-mlx \
  --output outputs/fish_s2_pro.wav

The generated audio will be written to outputs/fish_s2_pro.wav.

4. Clone a Voice

Provide a clean reference recording and its exact transcript:

python scripts/generate/fish_s2_pro.py \
  --text "This sentence uses a locally cloned voice." \
  --reference-audio /path/to/reference.wav \
  --reference-text "Exact transcript of the reference recording." \
  --model-dir ./fishaudio-s2-pro-8bit-mlx \
  --output outputs/fish_s2_pro_clone.wav

Use a clear recording with minimal background noise. The transcript should match the spoken words closely.

5. Add Inline Style Controls

S2 Pro can interpret inline instructions directly inside the text:

python scripts/generate/fish_s2_pro.py \
  --text "Now Bobby, [clearing throat] I need to talk to you. [whisper] This stays between us. [chuckle] Just kidding." \
  --reference-audio /path/to/reference.wav \
  --reference-text "Exact transcript of the reference recording." \
  --model-dir ./fishaudio-s2-pro-8bit-mlx \
  --output outputs/fish_s2_pro_emotion.wav

Common examples include [whisper], [chuckle], [laugh], [excited], [sad], and [pause]. The official model also supports more descriptive natural-language instructions.

Option 2: Use mlx-audio

mlx-audio is a broader MLX audio toolkit for Apple Silicon. It supports multiple TTS, speech-to-text, and speech-to-speech architectures, including Fish Audio S2 Pro.

Install the released package:

python3 -m venv .venv
source .venv/bin/activate
pip install mlx-audio

Or install its current command-line tool from GitHub:

uv tool install --force \
  git+https://github.com/Blaizzy/mlx-audio.git \
  --prerelease=allow

The S2 Pro support in mlx-audio is newer than its general CLI, so check the current repository examples before choosing a model conversion and generation flags. Community quantizations include mlx-community/fish-audio-s2-pro-8bit.

Use this route when you already use mlx-audio for other local speech models or want its wider toolkit. Use the mlx-speech route above when you want a documented standalone S2 Pro command today.

Option 3: Integrate S2 Pro into a Native Mac App

mlx-audio-swift is a Swift package for MLX-powered speech models on Apple platforms. Fish Audio S2 Pro support has been added for developers building native macOS experiences.

This route is relevant when you want to call S2 Pro from a Swift application instead of running a Python script. Review the repository’s current package and model instructions before integrating it, because the Swift implementation is still evolving alongside the Python MLX port.

Option 4: Experiment with s2.cpp and GGUF

s2.cpp is a community-built C++ inference engine for Fish Audio S2 Pro based on GGML. It supports quantized GGUF weights, voice cloning, and style tags without a Python runtime.

This project is alpha software. Expect rough edges and breaking changes. Its README documents a Metal backend but also lists macOS as untested. It is appropriate for local experiments, not production deployments.

Build with Metal

Clone the repository with its submodules and enable Metal:

git clone --recurse-submodules https://github.com/rodrigomatta/s2.cpp.git
cd s2.cpp

cmake -B build -DCMAKE_BUILD_TYPE=Release -DS2_METAL=ON
cmake --build build --parallel $(sysctl -n hw.ncpu)

Download a compatible model from the S2 Pro GGUF model card. Available quantizations range from smaller Q2 and Q3 files to larger Q8 and F16 weights. Start with a quantized file if your Mac has limited unified memory.

Follow the current s2.cpp README for the latest executable name, model path, and voice cloning flags. This project is changing quickly, so copying old CLI arguments from a third-party post is likely to fail.

Check the License Before Publishing or Shipping

Fish Audio S2 Pro weights use the Fish Audio Research License. Research and non-commercial use are allowed under its terms. Commercial use requires a separate license from Fish Audio.

Review the current Fish Audio S2 Pro license before distributing a model bundle, publishing a commercial integration, or shipping a paid product.

Option 5: Use the Official Fish Speech Repository

The official Fish Speech route is useful if you want the upstream command-line pipeline, Gradio WebUI, modern WebUI, API server, or Docker deployment. It is primarily a Linux and WSL server workflow, not the preferred path for running S2 Pro on a MacBook.

The official installation guide lists Linux and WSL as the target systems and recommends 24 GB of GPU memory for inference. It also provides a CPU-only install option, but CPU inference for a model this large will be slow.

The official setup uses Python 3.12:

git clone https://github.com/fishaudio/fish-speech.git
cd fish-speech

conda create -n fish-speech python=3.12
conda activate fish-speech
pip install -e .[cpu]

Download the model weights:

hf download fishaudio/s2-pro --local-dir checkpoints/s2-pro

Start the maintained Gradio WebUI:

python tools/run_webui.py

Do not add --compile on macOS. The official installation guide warns that the compile option is not supported on macOS unless you install Triton manually.

For a server deployment, follow the official API server guide. For higher-throughput serving, review the SGLang-Omni Fish Audio S2 Pro example and the vLLM-Omni Fish Speech S2 Pro guide.

Troubleshooting

The First Generation Takes a Long Time

The first run may download model files and initialize the runtime. Wait for the model download to complete before diagnosing inference speed.

macOS Runs Out of Memory

Close memory-heavy applications and use a quantized model. Fish Audio S2 Pro is not a lightweight TTS model. A smaller local model may be a better match for Macs with limited unified memory.

Voice Cloning Sounds Inaccurate

Use clean reference audio and provide an exact transcript. Avoid music, background noise, overlapping speakers, and long pauses.

A Community Command Stops Working

Check the project’s current README and use a matching model bundle. The MLX, Swift, and GGUF ports are moving quickly, and model layouts or CLI flags may change between releases.

Intel Mac Performance Is Too Slow

MLX requires Apple Silicon. On an Intel Mac, use s2.cpp only as an experimental CPU route or switch to a smaller model.

Which Method Should You Use?

Use mlx-speech with the AppAutomaton 8-bit bundle if you want the clearest path to local Fish Audio S2 Pro generation on an Apple Silicon Mac.

Use mlx-audio if you want one toolkit for multiple MLX speech models. Use mlx-audio-swift for native app integration. Try s2.cpp if you specifically want to explore GGUF quantization, C++, or Metal.

Use the official Fish Speech repository for server-style deployments, WebUI testing, or API hosting when you have suitable hardware.

If S2 Pro is too large for your Mac, try the Kokoro Mac guide for a lightweight option or the Chatterbox Mac guide for a smaller voice cloning setup. You can also use Spokio when you want text-to-speech without managing local model files.

More from the blog