kokoro ttsmacapple siliconmlxonnxlocal ttsopen source tts

How to Run Kokoro TTS Locally on Mac

Learn the practical ways to run Kokoro TTS locally on Mac, including Python with PyTorch, ONNX Runtime, MLX on Apple Silicon, JavaScript, native Swift integrations, and ready-made desktop apps.

Updated on Jun 01, 202614 min read

Kokoro TTS is an open-weight text-to-speech model with 82 million parameters. It is small enough to run locally on many Macs, but there is more than one way to use it.

The best setup depends on your Mac and what you are building. The official package is the easiest place to start. Other runtimes are useful for portable inference, Apple Silicon optimization, local web apps, and native Mac apps.

This guide compares the practical options and shows the shortest path for each one. For model architecture, voices, and language details, read the Kokoro technical guide. For a complete voiceover workflow, see the Kokoro YouTube voiceover guide.

Ways to Run Kokoro TTS Locally on Mac

Method Best for Intel Mac Apple Silicon Mac
Official Python package with PyTorch Learning Kokoro and writing Python scripts Yes Yes
ONNX Runtime Portable CPU inference and quantized model choices Yes Yes
MLX with mlx-audio or kokoro-mlx Apple-focused inference and local APIs No Yes
JavaScript with kokoro-js Browser, Node.js, and Electron apps Yes Yes
Swift packages Native macOS app development Depends on runtime Yes
Ready-made desktop apps Using Kokoro without writing code Check the app Check the app

If you only want one recommendation, start with the official Python package. Use ONNX when portability matters. Use MLX when you have an M-series Mac and want an Apple Silicon-focused runtime.

System Requirements

Kokoro can run locally on Intel Macs and Apple Silicon Macs. That includes MacBook Air and MacBook Pro laptops, plus Mac mini, iMac, Mac Studio, and Mac Pro desktops. Apple Silicon models with M1, M2, M3, M4, or M5-series chips can also use MLX-based options. Intel Mac users should choose Python with PyTorch or ONNX Runtime instead.

For the Python examples, install:

  • macOS
  • Python 3.10, 3.11, or 3.12 for the official package
  • Homebrew
  • Several GB of free disk space for dependencies, caches, and model files

The model is lightweight, but generation speed depends on your chip, memory, runtime, text length, and output settings.

Option 1: Official Python Package With PyTorch

The official kokoro package is the clearest way to understand the model. It uses PyTorch and downloads the model files when needed.

Create a project and virtual environment:

The official pipeline uses eSpeak NG for English fallback cases and some non-English languages.

mkdir kokoro-mac
cd kokoro-mac
python3 -m venv .venv
source .venv/bin/activate
pip install "kokoro>=0.9.4" soundfile
brew install espeak-ng

Create run_kokoro.py:

from kokoro import KPipeline
import soundfile as sf

pipeline = KPipeline(lang_code="a")
generator = pipeline(
    "Kokoro is running locally on this Mac.",
    voice="af_heart",
)

for index, (_, _, audio) in enumerate(generator):
    sf.write(f"kokoro-{index}.wav", audio, 24000)

Run it:

python run_kokoro.py

The script writes one or more WAV files in the current folder. The a language code selects American English. The official repository also documents British English, Spanish, French, Hindi, Italian, Japanese, Brazilian Portuguese, and Mandarin Chinese pipelines.

Apple Silicon MPS Fallback

On Apple Silicon, the official repository recommends enabling the PyTorch MPS fallback if your script hits a CPU-versus-MPS tensor error:

PYTORCH_ENABLE_MPS_FALLBACK=1 python run_kokoro.py

This lets unsupported operations fall back to the CPU while the rest of the pipeline uses Apple’s Metal backend.

Option 2: ONNX Runtime

ONNX Runtime is useful when you want a portable inference format, CPU-friendly deployment, or quantized model choices. This is a community-supported route rather than the official Hexgrad package.

One practical wrapper is kokoro-onnx. Create a separate environment:

python3 -m venv .venv-onnx
source .venv-onnx/bin/activate
pip install kokoro-onnx soundfile

Download a compatible ONNX model file and voice file by following the wrapper’s current README. Then create run_kokoro_onnx.py:

from kokoro_onnx import Kokoro
import soundfile as sf

kokoro = Kokoro("kokoro-v1.0.onnx", "voices-v1.0.bin")
samples, sample_rate = kokoro.create(
    "Kokoro ONNX is running locally on this Mac.",
    voice="af_sarah",
    speed=1.0,
    lang="en-us",
)

sf.write("kokoro-onnx.wav", samples, sample_rate)

Run it:

python run_kokoro_onnx.py

The Kokoro-82M-v1.0-ONNX model card lists current ONNX variants, including quantized files. Smaller quantized models can reduce memory use, with a possible quality tradeoff.

Option 3: MLX on Apple Silicon

MLX is Apple’s machine learning framework for Apple Silicon. Choose this path for an M1, M2, M3, M4, or M5-series MacBook, Mac mini, iMac, Mac Studio, or Mac Pro. It is not an Intel Mac runtime.

The most complete MLX option is mlx-audio. Kokoro support in mlx-audio also uses Misaki for text processing. The toolkit supports quantized Kokoro models, a Python API, a CLI, and an OpenAI-compatible local API server.

Install it:

python3 -m venv .venv-mlx
source .venv-mlx/bin/activate
pip install mlx-audio misaki

Generate audio from the command line:

mlx_audio.tts.generate \
  --model mlx-community/Kokoro-82M-bf16 \
  --text "Kokoro MLX is running locally on Apple Silicon." \
  --voice af_heart

For lower memory use, replace the model with mlx-community/Kokoro-82M-8bit or mlx-community/Kokoro-82M-4bit. mlx-audio also provides a Python API when you need to integrate generation into an application.

Run an MLX Local API

mlx-audio can also expose an OpenAI-compatible local endpoint:

mlx_audio.server --host 127.0.0.1 --port 8000

In another terminal:

curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"model":"mlx-community/Kokoro-82M-bf16","input":"Hello from Kokoro on this Mac.","voice":"af_heart"}' \
  --output speech.wav

For a smaller dedicated wrapper, also review kokoro-mlx. Use mlx-audio when you want a broader toolkit or a local API server.

Option 4: JavaScript and Browser Apps

kokoro-js is the JavaScript route for local browser, Node.js, and Electron projects. It uses ONNX models and can run with browser-oriented runtimes such as WebGPU or WASM, depending on the environment.

Install the package:

npm install kokoro-js

Because browser runtime support and package APIs can evolve, use the current kokoro-js package examples when wiring it into a project. You can also test the model in the Kokoro Web demo before building your own local interface.

Choose JavaScript when the speech generator should live inside a web app, desktop web wrapper, or front-end prototype.

Option 5: Native Swift Integrations

Native macOS developers can integrate Kokoro through community Swift ports. These options are useful for packaged Mac apps, but they require more application code than the Python examples.

Package Runtime Notes
kokoro-ios MLX Swift Supports macOS 15 or newer and requires you to bundle model and voice files
SwiftKokoroONNX ONNX Useful when an ONNX-based Swift package fits your app
swift-coreml-kokoro Core ML Community Core ML option for Apple-platform deployment

These are community projects, so check each repository’s current requirements and releases before choosing one for production work.

Option 6: Ready-Made Desktop Apps

If you want to generate speech without writing code, community desktop apps can package Kokoro behind a graphical interface. For example, Freekoko is a Mac app with a local interface and HTTP API.

Desktop apps are convenient for testing voices and generating occasional audio. For repeatable automation or product integration, use Python, ONNX, MLX, JavaScript, or Swift directly.

Which Kokoro Setup Should You Choose?

Use the official Python package if you are learning Kokoro, writing a script, or following the YouTube voiceover workflow.

Use ONNX Runtime if you want portable inference, CPU-friendly deployment, or quantized model files on either Intel or Apple Silicon Macs.

Use mlx-audio if you have an Apple Silicon Mac and want an MLX-native Python API, lower-memory quantized options, or a local OpenAI-compatible endpoint.

Use kokoro-js for browser, Electron, and JavaScript application workflows.

Use a Swift port when Kokoro needs to ship inside a native macOS app.

Troubleshooting

espeak-ng Is Missing

Install it with Homebrew:

brew install espeak-ng

The official pipeline uses espeak-ng for English fallback cases and some non-English languages.

PyTorch Reports a CPU and MPS Tensor Error

Use the fallback environment variable documented by the official repository:

PYTORCH_ENABLE_MPS_FALLBACK=1 python run_kokoro.py

MLX Does Not Work on an Intel Mac

MLX is designed for Apple Silicon. Use the official Python package or ONNX Runtime on Intel Macs.

The First Run Takes Longer

Most setups download model files the first time you use them. Later runs can reuse the cached files.

Long Text Produces Multiple Files

Kokoro pipelines commonly split longer text into segments. Keep the numbered WAV files, join them after generation, or split your input into paragraphs before synthesis.

Run Kokoro Privately on Your Mac

Kokoro is small enough to make local TTS practical across several Mac workflows. Start with Python, then move to ONNX, MLX, JavaScript, or Swift when your deployment needs become clearer.

If you want a hosted workflow without maintaining local runtimes, try Spokio.

More from the blog