Qwen3-TTS is an open-source text-to-speech model family from the Qwen team. It supports multilingual generation, preset voices, short-reference voice cloning, and voice creation from text descriptions.
There is more than one way to run Qwen3-TTS locally on a Mac. The right setup depends on your hardware and what you want to build. On an Apple Silicon Mac, the most practical starting point is an MLX runtime. The official Python package is useful when you want to follow the upstream project directly, and a native Swift path is available for macOS app development.
This guide focuses on setup and runtime choices. For model architecture, tokenizer details, and reported latency, read the Qwen3-TTS technical guide. For a feature comparison with another local model, see Chatterbox vs Qwen3-TTS.
Ways to Run Qwen3-TTS Locally on Mac
| Method | Best for | Intel Mac | Apple Silicon Mac |
|---|---|---|---|
| mlx-audio | Recommended Mac setup, CLI generation, Python integration, and a local API | No | Yes |
Official qwen-tts Python package with PyTorch |
Following upstream examples and evaluating the official implementation | Possible but slow | Check upstream device support |
| Official local Web UI | Trying CustomVoice, VoiceDesign, and voice cloning in a browser | Possible but slow | Check upstream device support |
| Community Apple Silicon manager | Guided terminal setup for several MLX model variants | No | Yes |
| mlx-audio-swift | Native macOS application development | No | Yes |
If you have a MacBook Air, MacBook Pro, or desktop Mac with an M-series chip,
start with mlx-audio. The official repository’s accelerated examples focus
on NVIDIA CUDA
environments. MLX is designed specifically for Apple Silicon.
Choose a Qwen3-TTS Model
Qwen3-TTS is a model family rather than a single file. Choose the variant based on the type of speech you need:
| Model variant | Best for |
|---|---|
Qwen3-TTS-12Hz-0.6B-CustomVoice |
Lower-memory generation with preset voices |
Qwen3-TTS-12Hz-0.6B-Base |
Lower-memory voice cloning |
Qwen3-TTS-12Hz-1.7B-CustomVoice |
Higher-quality preset voices with instruction control |
Qwen3-TTS-12Hz-1.7B-VoiceDesign |
Creating a voice from a text description |
Qwen3-TTS-12Hz-1.7B-Base |
Higher-quality short-reference voice cloning |
The 0.6B models are sensible starting points when memory use matters. The
1.7B models are larger but expose the full range of Qwen3-TTS workflows.
Quantized MLX conversions can reduce memory pressure further.
System Requirements
For the recommended MLX path, use an Apple Silicon Mac with an M1, M2, M3, M4, or M5-series chip. That includes MacBook Air and MacBook Pro laptops, plus Mac mini, iMac, Mac Studio, and Mac Pro desktops with Apple Silicon.
Install:
- macOS
- Python 3.12 for the most predictable official-package setup
- Several GB of free disk space for dependencies, caches, and model files
- At least 16 GB of unified memory for a comfortable starting point
Smaller and quantized models may run with less memory. Larger models, longer text, and concurrent requests need more. Intel Macs cannot use MLX and are a poor fit for Qwen3-TTS compared with smaller local models.
Option 1: MLX on Apple Silicon
MLX
is Apple’s machine learning framework for Apple Silicon. The community
mlx-audio project supports Qwen3-TTS model conversions, command-line
generation, streaming output, joined output files, Python integration, and an
OpenAI-compatible
local API server.
Create a virtual environment and install mlx-audio:
mkdir qwen3-tts-mac
cd qwen3-tts-mac
python3.12 -m venv .venv
source .venv/bin/activate
pip install mlx-audioGenerate audio with an 8-bit Base model:
mlx_audio.tts.generate \
--model mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit \
--text "Hello from Qwen3-TTS running locally on this Mac." \
--voice Chelsie \
--lang_code EnglishThe first run downloads the model conversion from mlx-community on Hugging Face. Later runs can reuse the cached files.
Use a Smaller Model
If memory pressure is high, start with a 0.6B conversion:
mlx_audio.tts.generate \
--model mlx-community/Qwen3-TTS-12Hz-0.6B-CustomVoice-8bit \
--text "This is a smaller Qwen3-TTS model running locally." \
--voice Chelsie \
--lang_code EnglishCheck the current mlx-audio documentation and available mlx-community
model cards before choosing a conversion. Quantized models trade some
precision for lower memory use.
Stream or Join Generated Audio
mlx-audio exposes command options for streaming and joined output:
mlx_audio.tts.generate \
--model mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit \
--text "Generate this longer passage as one joined audio file." \
--voice Chelsie \
--lang_code English \
--join_audio \
--output_path ./outputUse the current mlx-audio README for its streaming flags and Python API
examples because those interfaces can evolve.
Run a Local MLX API Server
You can expose an OpenAI-compatible speech endpoint on your Mac:
mlx_audio.server --host 127.0.0.1 --port 8000In another terminal, send a request:
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"model":"mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit","input":"Hello from the local Qwen3-TTS API.","voice":"Chelsie"}' \
--output qwen3-tts-api.wavThis is useful when a local application already expects an OpenAI-compatible speech API.
Option 2: Official Python Package
The official repository provides the qwen-tts Python package. Use this route
when you want the upstream implementation and its direct model APIs.
Create a Python 3.12 environment:
mkdir qwen3-tts-official
cd qwen3-tts-official
python3.12 -m venv .venv
source .venv/bin/activate
pip install -U qwen-ttsThe official documentation recommends FlashAttention 2 for lower memory use and faster generation in supported environments. Its accelerated examples use CUDA. FlashAttention installation instructions for NVIDIA GPUs do not apply to MLX on a Mac.
Generate With a Preset Voice
The CustomVoice model provides named speakers and optional style instructions:
import soundfile as sf
from qwen_tts import Qwen3TTSModel
model = Qwen3TTSModel.from_pretrained(
"Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
device_map="cpu",
)
wavs, sample_rate = model.generate_custom_voice(
text="Hello from Qwen3-TTS running locally.",
language="English",
speaker="Ryan",
instruct="Speak clearly and calmly.",
)
sf.write("qwen3-custom-voice.wav", wavs[0], sample_rate)device_map="cpu" is a conservative local example for macOS. It can be slow.
The official repository documents CUDA acceleration rather than this CPU
configuration. For Apple Silicon acceleration, prefer the MLX path.
Create a Voice From a Description
The VoiceDesign variant creates a voice from a text description:
import soundfile as sf
from qwen_tts import Qwen3TTSModel
model = Qwen3TTSModel.from_pretrained(
"Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign",
device_map="cpu",
)
wavs, sample_rate = model.generate_voice_design(
text="The library closes in fifteen minutes.",
language="English",
instruct="A warm, calm narrator with a measured pace.",
)
sf.write("qwen3-voice-design.wav", wavs[0], sample_rate)Clone a Voice From Reference Audio
The Base model supports short-reference voice cloning:
import soundfile as sf
from qwen_tts import Qwen3TTSModel
model = Qwen3TTSModel.from_pretrained(
"Qwen/Qwen3-TTS-12Hz-1.7B-Base",
device_map="cpu",
)
wavs, sample_rate = model.generate_voice_clone(
text="This line uses a locally cloned voice.",
language="English",
ref_audio="./reference.wav",
ref_text="This is the exact transcript of the reference recording.",
)
sf.write("qwen3-voice-clone.wav", wavs[0], sample_rate)Use a clean reference recording and provide an accurate transcript. Only clone a voice when you have permission to use it.
Option 3: Official Local Web UI
The official Python package installs a qwen-tts-demo command for launching a
local browser interface. Use it when you want to test models interactively
without writing a script.
Start a CustomVoice demo:
qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \
--ip 127.0.0.1 \
--port 8000Start a VoiceDesign demo:
qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign \
--ip 127.0.0.1 \
--port 8000Start a Base model demo for voice cloning:
qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-Base \
--ip 127.0.0.1 \
--port 8000Open http://127.0.0.1:8000 in your browser. If you expose the demo to another
device, follow the upstream HTTPS instructions. Browser microphone access can
require HTTPS when the page is not running on localhost.
The Web UI uses the official Python runtime, so it does not turn a CUDA-focused
workflow into an MLX workflow. Treat it as an option for compatible Python
environments rather than a guaranteed accelerated Mac setup. For Apple Silicon
acceleration, use mlx-audio.
Option 4: Community Apple Silicon Manager
For a guided terminal interface, review qwen3-tts-apple-silicon. This community project packages MLX-based CustomVoice, VoiceDesign, and voice cloning workflows behind a menu-driven script for M-series Macs.
It is convenient for experimenting with multiple variants, but it is not the official Qwen repository. Check its current setup instructions, model download links, and requirements before installing it.
Option 5: Native Swift Integration
For a native macOS application, review mlx-audio-swift. It brings MLX audio
models into Swift applications on Apple Silicon.
This path is for application developers rather than command-line users. Check the package’s current Qwen3-TTS model support and integration instructions before choosing a conversion for your app.
Which Qwen3-TTS Setup Should You Choose?
Use mlx-audio if you have an Apple Silicon Mac and want the most practical
local setup.
Use the official Python package if you need the upstream APIs for CustomVoice, VoiceDesign, or voice cloning and can accept slower CPU inference on macOS.
Use the official Web UI when you want an interactive browser interface for the same upstream Python models.
Use qwen3-tts-apple-silicon when you want a community-maintained terminal
menu for exploring several MLX variants.
Use mlx-audio-swift when Qwen3-TTS needs to ship inside a native Mac app.
If you have an Intel Mac, consider a smaller local model such as Kokoro TTS instead.
This guide does not recommend an ONNX path because there is not a maintained Qwen3-TTS ONNX workflow documented by the official repository. Verify the current implementation and model support before adopting any community ONNX conversion.
Troubleshooting
The First Run Takes a Long Time
Model files are downloaded during the first run. A Qwen3-TTS conversion is much larger than a lightweight model such as Kokoro, so allow time and disk space for the initial download.
macOS Runs Out of Memory
Start with a 0.6B model or an 8-bit MLX conversion. Close memory-intensive
applications before generation. Larger models and longer text need more unified
memory.
MLX Does Not Work on an Intel Mac
MLX requires Apple Silicon. Intel Mac users can evaluate the official Python package on CPU, but generation may be slow.
FlashAttention Installation Fails
FlashAttention is intended for supported GPU environments and is not required for the MLX setup. Do not apply CUDA installation instructions to an Apple Silicon MLX environment.
Voice Cloning Sounds Wrong
Use a clean reference clip without background music, overlapping speakers, or heavy room echo. Supply the exact reference transcript. Avoid cloning voices without permission.
The Web UI Cannot Use the Microphone
Use 127.0.0.1 for a local-only demo. If you access the interface from another
device, configure HTTPS as described in the official repository.
Run Qwen3-TTS Privately on Your Mac
Qwen3-TTS is a larger and more capable model family than many lightweight local TTS projects. On a modern Mac, start with an MLX conversion and choose the smallest model that supports your workflow. Move to the official package when you need its direct APIs or Web UI.
If you want a native Mac TTS workflow without maintaining Python environments, try Spokio.
