Most on-device TTS involves a tradeoff: smaller footprint means lower quality. Neuphonic’s NeuTTS models challenge that assumption.
NeuTTS Nano and NeuTTS Air are designed for on-device inference. They target useful speech quality while running on constrained hardware such as Raspberry Pi-class devices, phones, and lower-memory laptops.
Here is what makes them worth knowing about.
The two models
| NeuTTS Nano | NeuTTS Air | |
|---|---|---|
| Parameters | ~120M | ~552M |
| CPU speed | Check current benchmarks | Check current benchmarks |
| Real-time factor | Hardware-dependent | Hardware-dependent |
| Voice cloning | Zero-shot workflows | Zero-shot workflows |
| Languages | English, Spanish, German, French, Urdu, Japanese, Korean, Chinese, Portuguese | English |
| Format | GGUF / GGML | GGUF / GGML |
| GPU needed | No | No (recommended for production) |
NeuTTS Nano is the compact option. At roughly 120M parameters, it is designed for fast CPU-oriented generation, though real speed depends on hardware, quantization, runtime, and text length.
NeuTTS Air trades some speed for quality. It is the model to evaluate when Nano’s quality is not enough but GPU budget is limited.
Why model size matters for on-device TTS
Many larger TTS models — including Qwen3-TTS, Chatterbox-family models, and Orpheus — are commonly evaluated with GPU-oriented workflows. That can make them harder to deploy for:
- Mobile apps that cannot assume a GPU
- Edge devices like Raspberry Pi or IoT hardware
- Low-power environments where GPU draws too much energy
- Applications where the TTS model shares resources with other compute
Neuphonic’s approach is different. By using a CPU-oriented deployment path distributed in GGUF/GGML-style formats, the models can target devices that can load them into RAM.
Voice cloning without fine-tuning
Both models describe zero-shot voice cloning from short reference audio. Neuphonic calls this “infinite cloning”; for production use, review the current license, consent requirements, and product limits.
The key product idea is that cloning can happen on-device. For applications that need personalized voices at scale — think language learning apps, audiobook generators, or accessibility tools — this is a meaningful capability if quality and licensing fit the use case.
Running on constrained hardware
NeuTTS Nano has been discussed for constrained hardware such as:
- Raspberry Pi-class devices
- M-series MacBook Air-class laptops
- Recent iPhones
- Recent Android phones
This level of portability can open TTS use cases that are difficult with heavier GPU-first models: offline navigation voice, on-device accessibility tools, battery-powered assistants, and privacy-sensitive workflows.
The watermarking angle
Neuphonic describes PerTh watermarking for generated speech provenance. This is increasingly relevant as voice cloning becomes more accessible and policymakers scrutinize synthetic voice disclosure.
As with any watermarking claim, developers should review current documentation and test how the watermark behaves under compression, editing, and distribution.
Where Neuphonic fits in the TTS ecosystem
Neuphonic appears focused on a different part of the spectrum than larger GPU-oriented TTS models: devices where GPU access is limited or nonexistent.
For developers building mobile apps, edge AI products, or battery-constrained systems, NeuTTS Nano is worth evaluating for quality-per-watt tradeoffs.
Where Spokio fits
Spokio runs on Mac, including Apple Silicon and Intel Macs, so Raspberry Pi-class deployment is not its core use case. However, the trend Neuphonic represents matters: the TTS ecosystem is splitting into heavier high-quality workflows and smaller edge-oriented workflows.
For Mac users who want local TTS today, Spokio is powered by Chatterbox Turbo, supports local voice cloning and batch export, exports MP3, WAV, AIFF, and M4A, and does not upload text, audio, or voice samples to cloud services.
