Published Jun 02, 2026

G2P (Grapheme-to-Phoneme)

G2P is the component that converts written text into a sequence of phonemes — the smallest units of sound in a language. It is the first step in any TTS pipeline and determines how accurately words are pronounced.

Why It Matters

English has deeply irregular spelling. “Through,” “though,” “tough,” and “thought” all use the same “ough” pattern but produce four different pronunciations. A G2P system must handle these exceptions alongside regular rules.

Beyond words, G2P handles:

Numbers — “$5.50” → “five dollars and fifty cents”
Dates — “05/17/2026” → “May seventeenth, twenty twenty-six”
Abbreviations — “Dr.” → “doctor”, “St.” → “street” or “saint” depending on context
Acronyms — “NASA” → “NASA” (as a word) vs “FBI” → “F-B-I” (spelled out)

Common Approaches

Rule-based G2P uses hand-written pronunciation rules. Espeak-ng is the most widely used open-source engine, supporting 100+ languages. It is fast and predictable but can sound robotic.

Dictionary-based G2P looks up words in a pronunciation lexicon (like the CMU Pronouncing Dictionary covering 134,000 English words), falling back to rules for unknown words. More accurate for common vocabulary.

Neural G2P uses sequence-to-sequence models trained on pronunciation data. Most accurate for complex languages but requires training data and compute. Modern TTS models often bake G2P directly into the end-to-end network.

In Practice

G2P quality directly affects whether a TTS model sounds like it knows how to read. A strong G2P backend handles heteronyms (“read” vs “read”, “lead” vs “lead”), proper nouns, and domain-specific terminology without manual intervention.

Try Spokio for Mac.

Offline text-to-speech for Mac. Local voice cloning, batch export, and no cloud uploads for your text, audio, or voice samples.

macOS 15.6+ | Apple Silicon & Intel | English only

hi@spokio.pro

G2P (Grapheme-to-Phoneme)

Why It Matters

Common Approaches

In Practice

Try Spokio for Mac.

Product

Features

Use Cases

Compare