Fish Audio S2 Pro
is a large open-weight text-to-speech model for expressive speech, voice cloning,
and inline style control. It can follow instructions inside your text, including
tags such as [whisper], [excited], and [pause].
The official deployment path is designed primarily for Linux systems with high-memory GPUs. Running S2 Pro on a Mac is possible because community projects have added Apple Silicon and GGUF ports. These ports are useful, but they should not be mistaken for first-party macOS support.
This guide covers the practical Mac options. For model architecture, licensing, and a feature overview, read the Fish Audio S2 Pro technical guide.
Choose a Local Setup
| Method | Best for | Apple Silicon | Intel Mac | Support level |
|---|---|---|---|---|
| mlx-speech with the AppAutomaton bundle | The most reproducible Mac setup | Yes | No | Community |
| mlx-audio | A general Apple Silicon audio toolkit | Yes | No | Community |
| mlx-audio-swift | Native macOS app integration | Yes | No | Community |
| s2.cpp with GGUF weights | Experimental C++ and Metal testing | Yes | CPU testing may be possible | Community alpha |
| Official Fish Speech repository | Linux server deployment, WebUI, and API serving | Not the main target | Not the main target | Official |
For most Mac users, start with mlx-speech. It has a documented S2 Pro script
and a self-contained 8-bit bundle that includes MLX codec assets.
System Requirements
Fish Audio S2 Pro is much larger than lightweight local TTS models. A modern Apple Silicon Mac with generous unified memory is the practical starting point.
Recommended hardware:
- A MacBook Pro, Mac mini, iMac, Mac Studio, or Mac Pro with an Apple Silicon chip.
- An M1, M2, M3, M4, or M5-series chip.
- At least 32 GB of unified memory for a smoother experience with quantized ports.
- More unified memory if you want to experiment with larger weights or run other applications at the same time.
- Several gigabytes of free storage for model downloads and generated audio.
A MacBook Air can work for experimentation if it has enough memory, but sustained
generation may be slower because it does not have active cooling. Intel Macs do
not support Apple’s MLX framework. If you have an Intel Mac, the experimental
CPU route in s2.cpp is the relevant option, and you should expect slower output.
The official Fish Audio documentation recommends at least 24 GB of GPU memory for inference. That recommendation refers to the upstream server workflow, not to the community quantized MLX ports covered below.
Option 1: Use mlx-speech on Apple Silicon
mlx-speech provides an MLX-native S2 Pro generation script for Apple Silicon Macs. The matching AppAutomaton Fish Audio S2 Pro 8-bit MLX bundle contains the quantized model weights and bundled codec assets needed for local waveform generation.
This is the recommended Mac route because the model card documents the exact download and generation commands.
1. Install the Project
Open Terminal and run:
git clone https://github.com/appautomaton/mlx-speech.git
cd mlx-speech
python3 -m venv .venv
source .venv/bin/activate
pip install -e .2. Download the 8-bit Model
Install the Hugging Face download tool and download the bundle:
pip install "huggingface_hub[hf_xet]"
huggingface-cli download \
--local-dir fishaudio-s2-pro-8bit-mlx \
appautomaton/fishaudio-s2-pro-8bit-mlxThe first download is large. Let it complete before generating audio.
3. Generate Speech
Run the included script:
python scripts/generate/fish_s2_pro.py \
--text "Hello from Fish Audio S2 Pro on this Mac." \
--model-dir ./fishaudio-s2-pro-8bit-mlx \
--output outputs/fish_s2_pro.wavThe generated audio will be written to outputs/fish_s2_pro.wav.
4. Clone a Voice
Provide a clean reference recording and its exact transcript:
python scripts/generate/fish_s2_pro.py \
--text "This sentence uses a locally cloned voice." \
--reference-audio /path/to/reference.wav \
--reference-text "Exact transcript of the reference recording." \
--model-dir ./fishaudio-s2-pro-8bit-mlx \
--output outputs/fish_s2_pro_clone.wavUse a clear recording with minimal background noise. The transcript should match the spoken words closely.
5. Add Inline Style Controls
S2 Pro can interpret inline instructions directly inside the text:
python scripts/generate/fish_s2_pro.py \
--text "Now Bobby, [clearing throat] I need to talk to you. [whisper] This stays between us. [chuckle] Just kidding." \
--reference-audio /path/to/reference.wav \
--reference-text "Exact transcript of the reference recording." \
--model-dir ./fishaudio-s2-pro-8bit-mlx \
--output outputs/fish_s2_pro_emotion.wavCommon examples include [whisper], [chuckle], [laugh], [excited],
[sad], and [pause]. The official model also supports more descriptive
natural-language instructions.
Option 2: Use mlx-audio
mlx-audio is a broader MLX audio toolkit for Apple Silicon. It supports multiple TTS, speech-to-text, and speech-to-speech architectures, including Fish Audio S2 Pro.
Install the released package:
python3 -m venv .venv
source .venv/bin/activate
pip install mlx-audioOr install its current command-line tool from GitHub:
uv tool install --force \
git+https://github.com/Blaizzy/mlx-audio.git \
--prerelease=allowThe S2 Pro support in mlx-audio is newer than its general CLI, so check the
current repository examples before choosing a model conversion and generation
flags. Community quantizations include
mlx-community/fish-audio-s2-pro-8bit.
Use this route when you already use mlx-audio for other local speech models or
want its wider toolkit. Use the mlx-speech route above when you want a
documented standalone S2 Pro command today.
Option 3: Integrate S2 Pro into a Native Mac App
mlx-audio-swift is a Swift package for MLX-powered speech models on Apple platforms. Fish Audio S2 Pro support has been added for developers building native macOS experiences.
This route is relevant when you want to call S2 Pro from a Swift application instead of running a Python script. Review the repository’s current package and model instructions before integrating it, because the Swift implementation is still evolving alongside the Python MLX port.
Option 4: Experiment with s2.cpp and GGUF
s2.cpp is a community-built C++ inference engine for Fish Audio S2 Pro based on GGML. It supports quantized GGUF weights, voice cloning, and style tags without a Python runtime.
This project is alpha software. Expect rough edges and breaking changes. Its README documents a Metal backend but also lists macOS as untested. It is appropriate for local experiments, not production deployments.
Build with Metal
Clone the repository with its submodules and enable Metal:
git clone --recurse-submodules https://github.com/rodrigomatta/s2.cpp.git
cd s2.cpp
cmake -B build -DCMAKE_BUILD_TYPE=Release -DS2_METAL=ON
cmake --build build --parallel $(sysctl -n hw.ncpu)Download a compatible model from the S2 Pro GGUF model card. Available quantizations range from smaller Q2 and Q3 files to larger Q8 and F16 weights. Start with a quantized file if your Mac has limited unified memory.
Follow the current s2.cpp README for the latest executable name, model path,
and voice cloning flags. This project is changing quickly, so copying old CLI
arguments from a third-party post is likely to fail.
Check the License Before Publishing or Shipping
Fish Audio S2 Pro weights use the Fish Audio Research License. Research and non-commercial use are allowed under its terms. Commercial use requires a separate license from Fish Audio.
Review the current Fish Audio S2 Pro license before distributing a model bundle, publishing a commercial integration, or shipping a paid product.
Option 5: Use the Official Fish Speech Repository
The official Fish Speech route is useful if you want the upstream command-line pipeline, Gradio WebUI, modern WebUI, API server, or Docker deployment. It is primarily a Linux and WSL server workflow, not the preferred path for running S2 Pro on a MacBook.
The official installation guide lists Linux and WSL as the target systems and recommends 24 GB of GPU memory for inference. It also provides a CPU-only install option, but CPU inference for a model this large will be slow.
The official setup uses Python 3.12:
git clone https://github.com/fishaudio/fish-speech.git
cd fish-speech
conda create -n fish-speech python=3.12
conda activate fish-speech
pip install -e .[cpu]Download the model weights:
hf download fishaudio/s2-pro --local-dir checkpoints/s2-proStart the maintained Gradio WebUI:
python tools/run_webui.pyDo not add --compile on macOS. The official installation guide warns that the
compile option is not supported on macOS unless you install Triton manually.
For a server deployment, follow the official API server guide. For higher-throughput serving, review the SGLang-Omni Fish Audio S2 Pro example and the vLLM-Omni Fish Speech S2 Pro guide.
Troubleshooting
The First Generation Takes a Long Time
The first run may download model files and initialize the runtime. Wait for the model download to complete before diagnosing inference speed.
macOS Runs Out of Memory
Close memory-heavy applications and use a quantized model. Fish Audio S2 Pro is not a lightweight TTS model. A smaller local model may be a better match for Macs with limited unified memory.
Voice Cloning Sounds Inaccurate
Use clean reference audio and provide an exact transcript. Avoid music, background noise, overlapping speakers, and long pauses.
A Community Command Stops Working
Check the project’s current README and use a matching model bundle. The MLX, Swift, and GGUF ports are moving quickly, and model layouts or CLI flags may change between releases.
Intel Mac Performance Is Too Slow
MLX requires Apple Silicon. On an Intel Mac, use s2.cpp only as an experimental
CPU route or switch to a smaller model.
Which Method Should You Use?
Use mlx-speech with the AppAutomaton 8-bit bundle if you want the clearest
path to local Fish Audio S2 Pro generation on an Apple Silicon Mac.
Use mlx-audio if you want one toolkit for multiple MLX speech models. Use
mlx-audio-swift for native app integration. Try s2.cpp if you specifically
want to explore GGUF quantization, C++, or Metal.
Use the official Fish Speech repository for server-style deployments, WebUI testing, or API hosting when you have suitable hardware.
If S2 Pro is too large for your Mac, try the Kokoro Mac guide for a lightweight option or the Chatterbox Mac guide for a smaller voice cloning setup. You can also use Spokio when you want text-to-speech without managing local model files.
