---
name: tts-dialogue-writing
description: |
  Write natural, human-like dialogue scripts optimized for AI text-to-speech generation.
  Use when: (1) Creating podcast scripts, audiobook chapters, YouTube video scripts,
  documentary narration, or any spoken content, (2) Rewriting existing scripts to sound
  more natural when spoken aloud, (3) Adding emotion tags for TTS voice synthesis,
  (4) Converting word counts to speech duration (150 WPM standard), (5) Structuring
  multi-speaker dialogue that flows naturally.
version: "1.4.2"
---

# TTS Dialogue Writing

Create natural, engaging spoken content optimized for AI voice synthesis.

## Core Principles

### 1. Write for the Ear, Not the Eye

Written text and spoken dialogue are fundamentally different:

| Written Text | Spoken Dialogue |
|--------------|-----------------|
| Long sentences with subordinate clauses | Short, punchy sentences |
| Formal transitions ("Furthermore," "In conclusion") | Natural bridges ("So here's the thing," "And that's when") |
| Dense information | Breathing room between ideas |
| Passive voice acceptable | Active voice preferred |
| Abstract concepts | Concrete examples and stories |

### 2. Speech Duration Formula

**Standard TTS rate: 150 words per minute**

| Target Duration | Word Count |
|-----------------|------------|
| 5 minutes | 750 words |
| 10 minutes | 1,500 words |
| 15 minutes | 2,250 words |
| 20 minutes | 3,000 words |
| 30 minutes | 4,500 words |

### 3. Speaker Tag Rules

**Minimize redundant tags.** Only add a speaker tag when the speaker changes:

```
BAD (redundant tags):
[Host]: Welcome to the show!
[Host]: Today we're discussing productivity.
[Host]: I'm really excited about this topic.

GOOD (speaker continues without re-tagging):
[Host]: Welcome to the show! Today we're discussing productivity. I'm really excited about this topic.
```

**Tag changes only when needed:**
```
[Host]: Let me ask you about your morning routine.

[Guest]: Oh, that's a great question. I actually don't have one! I know that sounds crazy, but hear me out.

I tried the whole 5 AM club thing for about six months. Meditation, journaling, cold showers.

[Host]: And it didn't work for you?

[Guest]: It made me miserable. Here's what I realized...
```

### 4. Emotion Tags: Strategic, Not Systematic

Emotion tags should enhance moments, not annotate every line.

**BAD (over-tagged, mechanical):**
```
[Host]: [excited] Welcome back everyone!
[Host]: [curious] Today we have a special guest.
[Host]: [happy] I'm so glad you could join us.
[Guest]: [grateful] Thanks for having me.
[Guest]: [nervous] I'm a little nervous actually.
```

**GOOD (strategic emotional moments):**
```
[Host]: Welcome back, everyone! Today we have a special guest, someone whose work has literally changed how I think about productivity. I'm so glad you could join us.

[Guest]: Thanks for having me. [nervous] I'm a little nervous, actually. I don't usually do podcasts.

[Host]: [warm] Don't worry, we'll take it easy. Just a conversation between friends.
```

**When to use emotion tags:**
- Emotional shifts within a character's speech
- Moments that would be misread without context
- Dramatic emphasis in storytelling
- Subtle tonal cues (whisper, sarcasm, etc.)

**Available emotions:** natural, happy, sad, angry, fearful, surprised, disgusted, excited, calm, serious, tender, contemplative, mysterious, playful, sarcastic, nervous, confident, exhausted, hopeful, melancholic, dramatic, urgent, relieved, nostalgic, curious, whisper, shout, laugh, sigh, gasp, chuckle, hmm, clearing_throat, yawn

### 5. Dialogue Flow Techniques

**Interruptions and overlaps (use sparingly):**
```
[Guest]: The thing about morning routines is...

[Host]: They're completely personal, right?

[Guest]: Exactly! Everyone's different.
```

**Trailing thoughts:**
```
[Guest]: I've been thinking about this a lot lately... like, why do we even need routines at all?
```

**Reactions integrated into speech:**
```
[Host]: [laugh] Okay, I love that. So you're telling me the secret to productivity is... having no routine?
```

**Natural fillers (use sparingly for authenticity):**
```
[Guest]: So, here's the thing about habits. They're, I mean, they're kind of overrated, honestly.
```

### 6. Pacing and Rhythm

**Vary sentence length:**
```
Short sentences create urgency. They punch. They grab attention.

But then you need longer sentences that allow the listener to breathe, to process what they've just heard, to let the ideas settle before the next point arrives.

Then short again.
```

**Use paragraph breaks for pauses:**
In TTS, paragraph breaks create natural pauses. Use them to:
- Signal topic shifts
- Create dramatic effect
- Allow processing time after important points

**Punctuation-based pacing (no SSML needed):**
The TTS engines interpret punctuation as timing cues. Use them deliberately:

| Punctuation | Pause Duration | Best For |
|-------------|---------------|----------|
| Ellipsis `...` | ~0.4-0.6s | Reflective pauses, trailing thoughts, suspense |
| Period `.` | ~0.3-0.5s | Clean sentence stops, declarative weight |
| Comma `,` | ~0.1-0.2s | Breathing room, list pacing, clause separation |
| Semicolon `;` | ~0.2-0.4s | Mid-weight pause, connecting related ideas |

```
SLOW, REFLECTIVE:
[Narrator]: She looked at the letter... then at the door... then back at the letter.

URGENT, STACCATO:
[Reporter]: The building collapsed. Three exits blocked. One left.
```

**Paralinguistic tags** (expressive engine only):
Nine tags that add human texture to speech. Use sparingly for key moments:
`[clear throat]`, `[sigh]`, `[shush]`, `[cough]`, `[groan]`, `[sniff]`, `[gasp]`, `[chuckle]`, `[laugh]`

```
[Host]: [laugh] Okay, that's a hot take. But seriously...
[Guest]: [sigh] I know, I know. But hear me out.
```

**Chunk boundaries:**
Each new `[Speaker]:` line creates a separate audio chunk with a crossfade gap between them. Use speaker changes strategically to control pacing in multi-speaker content. Rapid back-and-forth creates energy, while longer monologues build depth.

### 7. Content Structure by Format

**Podcast/Interview:**
- Open with energy and hook
- Build rapport through natural conversation
- Include disagreement and genuine curiosity
- End with actionable takeaway

**Audiobook (Fiction):**
- Show don't tell through dialogue and action
- Vary narrator tone with scene mood
- Use emotion tags for dramatic moments
- Create distinct character voices through word choice, not just tags

**Audiobook (Non-fiction):**
- Open chapters with compelling hook
- Use stories and examples liberally
- Direct address to listener ("Here's what this means for you...")
- Summarize key points naturally, not mechanically

**YouTube/Tutorial:**
- Hook in first 15 seconds
- Promise value immediately
- Use conversational, energetic tone
- Include "pattern interrupts" every 2-3 minutes
- Clear calls-to-action

**Documentary:**
- Establish atmosphere through description
- Use expert voices for credibility
- Personal stories for emotional connection
- Let silence (paragraph breaks) create weight

## Anti-Patterns to Avoid

1. **Mechanical tagging:** Not every line needs a speaker tag or emotion
2. **Over-explanation:** Trust the listener to follow along
3. **Written-style transitions:** Avoid "In conclusion," "Furthermore," "As mentioned previously"
4. **Uniform sentence length:** Creates monotonous rhythm
5. **Abstract without concrete:** Always ground concepts in examples
6. **Announcing emotions:** "I'm so excited!" instead of showing excitement through word choice
7. **Perfect grammar in casual speech:** Real people use fragments, contractions, trailing thoughts

## Quality Checklist

Before finalizing any script:

- [ ] Read aloud. Does it flow naturally?
- [ ] Word count matches target duration (150 WPM)
- [ ] Speaker tags only at speaker changes
- [ ] Emotion tags are strategic, not systematic
- [ ] Variety in sentence length and rhythm
- [ ] Concrete examples ground abstract concepts
- [ ] Opening hooks attention immediately
- [ ] Ending provides closure or call-to-action
