qwen3 ttsmacapple siliconmlxvoice cloningvoice designlocal tts

How to Run Qwen3-TTS Locally on Mac

Learn how to run Qwen3-TTS locally on a Mac with MLX, the official Python package, the local Web UI, and native Swift integration. Compare model sizes for voice cloning, preset voices, and description-based voice design.

Updated on Jun 01, 202614 min read

Qwen3-TTS is an open-source text-to-speech model family from the Qwen team. It supports multilingual generation, preset voices, short-reference voice cloning, and voice creation from text descriptions.

There is more than one way to run Qwen3-TTS locally on a Mac. The right setup depends on your hardware and what you want to build. On an Apple Silicon Mac, the most practical starting point is an MLX runtime. The official Python package is useful when you want to follow the upstream project directly, and a native Swift path is available for macOS app development.

This guide focuses on setup and runtime choices. For model architecture, tokenizer details, and reported latency, read the Qwen3-TTS technical guide. For a feature comparison with another local model, see Chatterbox vs Qwen3-TTS.

Ways to Run Qwen3-TTS Locally on Mac

Method Best for Intel Mac Apple Silicon Mac
mlx-audio Recommended Mac setup, CLI generation, Python integration, and a local API No Yes
Official qwen-tts Python package with PyTorch Following upstream examples and evaluating the official implementation Possible but slow Check upstream device support
Official local Web UI Trying CustomVoice, VoiceDesign, and voice cloning in a browser Possible but slow Check upstream device support
Community Apple Silicon manager Guided terminal setup for several MLX model variants No Yes
mlx-audio-swift Native macOS application development No Yes

If you have a MacBook Air, MacBook Pro, or desktop Mac with an M-series chip, start with mlx-audio. The official repository’s accelerated examples focus on NVIDIA CUDA environments. MLX is designed specifically for Apple Silicon.

Choose a Qwen3-TTS Model

Qwen3-TTS is a model family rather than a single file. Choose the variant based on the type of speech you need:

Model variant Best for
Qwen3-TTS-12Hz-0.6B-CustomVoice Lower-memory generation with preset voices
Qwen3-TTS-12Hz-0.6B-Base Lower-memory voice cloning
Qwen3-TTS-12Hz-1.7B-CustomVoice Higher-quality preset voices with instruction control
Qwen3-TTS-12Hz-1.7B-VoiceDesign Creating a voice from a text description
Qwen3-TTS-12Hz-1.7B-Base Higher-quality short-reference voice cloning

The 0.6B models are sensible starting points when memory use matters. The 1.7B models are larger but expose the full range of Qwen3-TTS workflows. Quantized MLX conversions can reduce memory pressure further.

System Requirements

For the recommended MLX path, use an Apple Silicon Mac with an M1, M2, M3, M4, or M5-series chip. That includes MacBook Air and MacBook Pro laptops, plus Mac mini, iMac, Mac Studio, and Mac Pro desktops with Apple Silicon.

Install:

  • macOS
  • Python 3.12 for the most predictable official-package setup
  • Several GB of free disk space for dependencies, caches, and model files
  • At least 16 GB of unified memory for a comfortable starting point

Smaller and quantized models may run with less memory. Larger models, longer text, and concurrent requests need more. Intel Macs cannot use MLX and are a poor fit for Qwen3-TTS compared with smaller local models.

Option 1: MLX on Apple Silicon

MLX is Apple’s machine learning framework for Apple Silicon. The community mlx-audio project supports Qwen3-TTS model conversions, command-line generation, streaming output, joined output files, Python integration, and an OpenAI-compatible local API server.

Create a virtual environment and install mlx-audio:

mkdir qwen3-tts-mac
cd qwen3-tts-mac
python3.12 -m venv .venv
source .venv/bin/activate
pip install mlx-audio

Generate audio with an 8-bit Base model:

mlx_audio.tts.generate \
  --model mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit \
  --text "Hello from Qwen3-TTS running locally on this Mac." \
  --voice Chelsie \
  --lang_code English

The first run downloads the model conversion from mlx-community on Hugging Face. Later runs can reuse the cached files.

Use a Smaller Model

If memory pressure is high, start with a 0.6B conversion:

mlx_audio.tts.generate \
  --model mlx-community/Qwen3-TTS-12Hz-0.6B-CustomVoice-8bit \
  --text "This is a smaller Qwen3-TTS model running locally." \
  --voice Chelsie \
  --lang_code English

Check the current mlx-audio documentation and available mlx-community model cards before choosing a conversion. Quantized models trade some precision for lower memory use.

Stream or Join Generated Audio

mlx-audio exposes command options for streaming and joined output:

mlx_audio.tts.generate \
  --model mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit \
  --text "Generate this longer passage as one joined audio file." \
  --voice Chelsie \
  --lang_code English \
  --join_audio \
  --output_path ./output

Use the current mlx-audio README for its streaming flags and Python API examples because those interfaces can evolve.

Run a Local MLX API Server

You can expose an OpenAI-compatible speech endpoint on your Mac:

mlx_audio.server --host 127.0.0.1 --port 8000

In another terminal, send a request:

curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"model":"mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit","input":"Hello from the local Qwen3-TTS API.","voice":"Chelsie"}' \
  --output qwen3-tts-api.wav

This is useful when a local application already expects an OpenAI-compatible speech API.

Option 2: Official Python Package

The official repository provides the qwen-tts Python package. Use this route when you want the upstream implementation and its direct model APIs.

Create a Python 3.12 environment:

mkdir qwen3-tts-official
cd qwen3-tts-official
python3.12 -m venv .venv
source .venv/bin/activate
pip install -U qwen-tts

The official documentation recommends FlashAttention 2 for lower memory use and faster generation in supported environments. Its accelerated examples use CUDA. FlashAttention installation instructions for NVIDIA GPUs do not apply to MLX on a Mac.

Generate With a Preset Voice

The CustomVoice model provides named speakers and optional style instructions:

import soundfile as sf
from qwen_tts import Qwen3TTSModel

model = Qwen3TTSModel.from_pretrained(
    "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
    device_map="cpu",
)

wavs, sample_rate = model.generate_custom_voice(
    text="Hello from Qwen3-TTS running locally.",
    language="English",
    speaker="Ryan",
    instruct="Speak clearly and calmly.",
)

sf.write("qwen3-custom-voice.wav", wavs[0], sample_rate)

device_map="cpu" is a conservative local example for macOS. It can be slow. The official repository documents CUDA acceleration rather than this CPU configuration. For Apple Silicon acceleration, prefer the MLX path.

Create a Voice From a Description

The VoiceDesign variant creates a voice from a text description:

import soundfile as sf
from qwen_tts import Qwen3TTSModel

model = Qwen3TTSModel.from_pretrained(
    "Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign",
    device_map="cpu",
)

wavs, sample_rate = model.generate_voice_design(
    text="The library closes in fifteen minutes.",
    language="English",
    instruct="A warm, calm narrator with a measured pace.",
)

sf.write("qwen3-voice-design.wav", wavs[0], sample_rate)

Clone a Voice From Reference Audio

The Base model supports short-reference voice cloning:

import soundfile as sf
from qwen_tts import Qwen3TTSModel

model = Qwen3TTSModel.from_pretrained(
    "Qwen/Qwen3-TTS-12Hz-1.7B-Base",
    device_map="cpu",
)

wavs, sample_rate = model.generate_voice_clone(
    text="This line uses a locally cloned voice.",
    language="English",
    ref_audio="./reference.wav",
    ref_text="This is the exact transcript of the reference recording.",
)

sf.write("qwen3-voice-clone.wav", wavs[0], sample_rate)

Use a clean reference recording and provide an accurate transcript. Only clone a voice when you have permission to use it.

Option 3: Official Local Web UI

The official Python package installs a qwen-tts-demo command for launching a local browser interface. Use it when you want to test models interactively without writing a script.

Start a CustomVoice demo:

qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \
  --ip 127.0.0.1 \
  --port 8000

Start a VoiceDesign demo:

qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign \
  --ip 127.0.0.1 \
  --port 8000

Start a Base model demo for voice cloning:

qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-Base \
  --ip 127.0.0.1 \
  --port 8000

Open http://127.0.0.1:8000 in your browser. If you expose the demo to another device, follow the upstream HTTPS instructions. Browser microphone access can require HTTPS when the page is not running on localhost.

The Web UI uses the official Python runtime, so it does not turn a CUDA-focused workflow into an MLX workflow. Treat it as an option for compatible Python environments rather than a guaranteed accelerated Mac setup. For Apple Silicon acceleration, use mlx-audio.

Option 4: Community Apple Silicon Manager

For a guided terminal interface, review qwen3-tts-apple-silicon. This community project packages MLX-based CustomVoice, VoiceDesign, and voice cloning workflows behind a menu-driven script for M-series Macs.

It is convenient for experimenting with multiple variants, but it is not the official Qwen repository. Check its current setup instructions, model download links, and requirements before installing it.

Option 5: Native Swift Integration

For a native macOS application, review mlx-audio-swift. It brings MLX audio models into Swift applications on Apple Silicon.

This path is for application developers rather than command-line users. Check the package’s current Qwen3-TTS model support and integration instructions before choosing a conversion for your app.

Which Qwen3-TTS Setup Should You Choose?

Use mlx-audio if you have an Apple Silicon Mac and want the most practical local setup.

Use the official Python package if you need the upstream APIs for CustomVoice, VoiceDesign, or voice cloning and can accept slower CPU inference on macOS.

Use the official Web UI when you want an interactive browser interface for the same upstream Python models.

Use qwen3-tts-apple-silicon when you want a community-maintained terminal menu for exploring several MLX variants.

Use mlx-audio-swift when Qwen3-TTS needs to ship inside a native Mac app.

If you have an Intel Mac, consider a smaller local model such as Kokoro TTS instead.

This guide does not recommend an ONNX path because there is not a maintained Qwen3-TTS ONNX workflow documented by the official repository. Verify the current implementation and model support before adopting any community ONNX conversion.

Troubleshooting

The First Run Takes a Long Time

Model files are downloaded during the first run. A Qwen3-TTS conversion is much larger than a lightweight model such as Kokoro, so allow time and disk space for the initial download.

macOS Runs Out of Memory

Start with a 0.6B model or an 8-bit MLX conversion. Close memory-intensive applications before generation. Larger models and longer text need more unified memory.

MLX Does Not Work on an Intel Mac

MLX requires Apple Silicon. Intel Mac users can evaluate the official Python package on CPU, but generation may be slow.

FlashAttention Installation Fails

FlashAttention is intended for supported GPU environments and is not required for the MLX setup. Do not apply CUDA installation instructions to an Apple Silicon MLX environment.

Voice Cloning Sounds Wrong

Use a clean reference clip without background music, overlapping speakers, or heavy room echo. Supply the exact reference transcript. Avoid cloning voices without permission.

The Web UI Cannot Use the Microphone

Use 127.0.0.1 for a local-only demo. If you access the interface from another device, configure HTTPS as described in the official repository.

Run Qwen3-TTS Privately on Your Mac

Qwen3-TTS is a larger and more capable model family than many lightweight local TTS projects. On a modern Mac, start with an MLX conversion and choose the smallest model that supports your workflow. Move to the official package when you need its direct APIs or Web UI.

If you want a native Mac TTS workflow without maintaining Python environments, try Spokio.

More from the blog