Published Jun 02, 2026

SSML (Speech Synthesis Markup Language)

SSML is an XML-based markup language that controls how a TTS engine renders text. It gives authors fine-grained control over pronunciation, pacing, emphasis, and audio formatting that plain text cannot express.

Common Tags

<break> — inserts a pause of a specified duration.

<break time="500ms"/>

<emphasis> — stresses a word or phrase.

That was <emphasis level="strong">not</emphasis> the plan.

<prosody> — adjusts rate, pitch, and volume.

<prosody rate="slow" pitch="+2st">She whispered softly.</prosody>

<say-as> — interprets how to read specific content.

<say-as interpret-as="cardinal">123</say-as>   <!-- one hundred twenty-three -->
<say-as interpret-as="ordinal">1st</say-as>     <!-- first -->
<say-as interpret-as="characters">AI</say-as>  <!-- A-I, not "eye" -->

<phoneme> — forces a specific pronunciation using IPA.

<phoneme alphabet="ipa" ph="ˈnjuːkliər">nuclear</phoneme>

Why SSML Matters

Plain text TTS guesses everything — pronunciation, pacing, emphasis. SSML lets you override those guesses. For audiobooks, SSML controls character voices and chapter pacing. For voiceovers, it ensures brand names and technical terms are pronounced correctly. For accessibility, it controls reading speed and phrasing.

Limitations

Not all TTS engines support the same SSML tags. Some ignore <phoneme> or have limited <prosody> ranges. Always test SSML output rather than assuming compliance.

Try Spokio for Mac.

Offline text-to-speech for Mac. Local voice cloning, batch export, and no cloud uploads for your text, audio, or voice samples.

macOS 15.6+ | Apple Silicon & Intel | English only

hi@spokio.pro

SSML (Speech Synthesis Markup Language)

Common Tags

Why SSML Matters

Limitations

Try Spokio for Mac.

Product

Features

Use Cases

Compare