feat(helpers): add TextBuilder class for TTS pronunciation and pause controls by lukeocodes · Pull Request #660 · deepgram/deepgram-python-sdk

lukeocodes · 2026-02-15T22:13:59Z

Summary

Add TextBuilder — a fluent builder class for constructing TTS text with inline pronunciation (IPA) and pause controls. Includes SSML-to-Deepgram conversion for migrating from other TTS providers.

Rebased from #646 onto current main to resolve conflicts.

Usage

from deepgram import DeepgramClient
from deepgram.helpers import TextBuilder

text = (
    TextBuilder()
    .text("Take ")
    .pronunciation("azathioprine", "ˌæzəˈθaɪəpriːn")
    .pause(500)
    .text(" twice daily.")
    .build()
)

client = DeepgramClient()
response = client.speak.v1.audio.generate(text=text, model="aura-2-asteria-en")

SSML Migration

from deepgram.helpers import ssml_to_deepgram

ssml = '<speak>Take <phoneme alphabet="ipa" ph="ˌæzəˈθaɪəpriːn">azathioprine</phoneme> daily.</speak>'
deepgram_text = ssml_to_deepgram(ssml)

What's Included

src/deepgram/helpers/text_builder.py — TextBuilder class, add_pronunciation(), ssml_to_deepgram(), validate_ipa(), validate_pause()
tests/custom/test_text_builder.py — 44 tests covering all features and edge cases
examples/22-text-builder-demo.py — interactive demo (no API key needed)
examples/23-text-builder-helper.py — REST API integration examples
examples/24-text-builder-streaming.py — WebSocket streaming examples

Design Decision

Imports come from deepgram.helpers (not deepgram) so the auto-generated __init__.py doesn't need modification. Fern regeneration won't break anything.

Test plan

44 tests pass (pytest tests/custom/test_text_builder.py)
mypy src/ clean (708 files, 0 errors)
No changes to auto-generated files

…controls - Add TextBuilder fluent builder class with text(), pronunciation(), pause(), from_ssml(), and build() methods - Add standalone utility functions: add_pronunciation(), ssml_to_deepgram(), validate_ipa(), validate_pause() - Implement comprehensive input validation and API limit enforcement (500 pronunciations, 50 pauses, 2000 chars) - Support SSML parsing and conversion (phoneme and break tags) - Include proper JSON escaping and error handling

- Export TextBuilder, add_pronunciation, ssml_to_deepgram, validate_ipa, and validate_pause from deepgram package - Add to __all__ and _dynamic_imports for lazy loading - Enable usage: from deepgram import TextBuilder

- Add 50+ test cases covering all TextBuilder functionality - Test basic text, pronunciation, pause, and SSML conversion - Test validation functions and error handling - Test API limit enforcement (pronunciations, pauses, characters) - Test standalone functions (add_pronunciation, ssml_to_deepgram) - Include integration tests with real-world examples

- Add 25-text-builder-demo.py: interactive demonstration of all features (no API key required) - Add 25-text-builder-helper.py: live TTS generation examples with API integration - Include examples for basic usage, SSML migration, standalone functions, and real-world scenarios - Cover medical prescriptions, pharmacy instructions, and scientific terminology use cases

- Add TextBuilder demo and helper examples to Text-to-Speech section - Include both interactive demo (no API key) and live TTS generation examples

Renumber all examples to group by feature area, with each section starting at multiples of 10: - 01-09: Authentication - 10-19: Transcription (Listen) - 20-29: Text-to-Speech (Speak) - including new TextBuilder streaming example - 30-39: Voice Agent - 40-49: Text Intelligence (Read) - 50-59: Management API - 60-69: On-Premises - 70-79: Configuration & Advanced This makes the examples easier to navigate and leaves room for future additions in each section.

…error

- Fix ssml_to_deepgram to handle SSML fragments (not just complete documents) - Fix validate_pause to check integer type before increment validation - Fix test case to use correct case-sensitive word matching

…gram Revert changes to the auto-generated __init__.py so Fern regeneration won't overwrite the TextBuilder exports. Users import from deepgram.helpers instead.

Copilot

Pull request overview

This pull request adds a TextBuilder helper class and related utilities for constructing TTS (Text-to-Speech) text with inline pronunciation (IPA) and pause controls for the Deepgram API. It includes SSML-to-Deepgram conversion functionality to help users migrate from other TTS providers. This is a rebased version of PR #646 onto the current main branch.

Changes:

Adds TextBuilder fluent builder class with pronunciation and pause controls
Provides SSML-to-Deepgram conversion for migration from other TTS providers
Includes comprehensive test suite (44 tests) covering all features and edge cases
Adds three example scripts demonstrating interactive demo, REST API integration, and WebSocket streaming
Updates example file numbering scheme to organize examples by feature area (10s, 20s, 30s, etc.)

Reviewed changes

Copilot reviewed 9 out of 29 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
`src/deepgram/helpers/text_builder.py`	Core implementation: TextBuilder class and helper functions (pronunciation, pause, SSML conversion, validation)
`src/deepgram/helpers/__init__.py`	Public exports for helper module
`src/deepgram/helpers/README.md`	Documentation for TextBuilder usage and API
`tests/custom/test_text_builder.py`	Comprehensive test suite with 44 tests covering all functionality
`examples/22-text-builder-demo.py`	Interactive demonstration (no API key required)
`examples/23-text-builder-helper.py`	REST API integration examples
`examples/24-text-builder-streaming.py`	WebSocket streaming examples
`examples/README.md`	Updated documentation with new numbering scheme
`.fernignore`	Added helpers directory to prevent auto-generation
`examples/10-71*.py`	Renumbered existing examples to fit new organization scheme

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-15T22:19:49Z

examples/23-text-builder-helper.py

+        TextBuilder()
+        .text("Prescription for ")
+        .pronunciation("lisinopril", "laɪˈsɪnəprɪl")
+        .pause(300)


Invalid pause duration used. The pause duration must be between 500-5000ms in 100ms increments, but 300ms is below the minimum of 500ms. This code will raise a ValueError at runtime.

Suggested change

.pause(300)

.pause(500)

Copilot · 2026-02-15T22:19:49Z

src/deepgram/helpers/text_builder.py

+        pronunciation_pattern = r'\{"word":\s*"[^"]*",\s*"pronounce":\s*"[^"]*"\}'
+        pronunciations = re.findall(pronunciation_pattern, text)
+        self._pronunciation_count += len(pronunciations)
+
+        # Count pauses
+        pause_pattern = r"\{pause:\d+\}"
+        pauses = re.findall(pause_pattern, text)
+        self._pause_count += len(pauses)
+
+        # Character count (approximate - remove control syntax)
+        clean_text = re.sub(pronunciation_pattern, "", text)
+        clean_text = re.sub(pause_pattern, "", clean_text)
+        self._char_count += len(clean_text)


Inconsistent character counting between pronunciation() and from_ssml(). The pronunciation() method counts the word length (line 80), but _update_counts_from_text() removes pronunciation JSON entirely without counting the word (line 152). This means from_ssml() will undercount characters compared to using pronunciation() directly. Both should extract and count only the word from the pronunciation JSON for consistency.

Suggested change

pronunciation_pattern = r'\{"word":\s*"[^"]*",\s*"pronounce":\s*"[^"]*"\}'

pronunciations = re.findall(pronunciation_pattern, text)

self._pronunciation_count += len(pronunciations)

# Count pauses

pause_pattern = r"\{pause:\d+\}"

pauses = re.findall(pause_pattern, text)

self._pause_count += len(pauses)

# Character count (approximate - remove control syntax)

clean_text = re.sub(pronunciation_pattern, "", text)

clean_text = re.sub(pause_pattern, "", clean_text)

self._char_count += len(clean_text)

# Capture the word so we can count its characters consistently with pronunciation()

pronunciation_pattern = r'\{"word":\s*"([^"]*)",\s*"pronounce":\s*"[^"]*"\}'

pronunciations = re.findall(pronunciation_pattern, text)

self._pronunciation_count += len(pronunciations)

word_char_count = sum(len(word) for word in pronunciations)

# Count pauses

pause_pattern = r"\{pause:\d+\}"

pauses = re.findall(pause_pattern, text)

self._pause_count += len(pauses)

# Character count (approximate - remove control syntax, but keep word characters)

clean_text = re.sub(pronunciation_pattern, "", text)

clean_text = re.sub(pause_pattern, "", clean_text)

self._char_count += len(clean_text) + word_char_count

Copilot · 2026-02-15T22:19:50Z

src/deepgram/helpers/text_builder.py

+
+    def replace_phoneme(match):
+        ipa = match.group(1)
+        word = match.group(2)


The replace_phoneme function in ssml_to_deepgram does not validate IPA strings before creating pronunciation JSON. This is inconsistent with the pronunciation() method and add_pronunciation() function which both validate IPA. Invalid IPA from SSML could bypass validation and create malformed JSON. Consider adding validate_ipa() call here for consistency and safety.

Suggested change

word = match.group(2)

word = match.group(2)

# Validate IPA for consistency with pronunciation() and add_pronunciation()

is_valid, error_msg = validate_ipa(ipa)

if not is_valid:

raise ValueError(error_msg)

Copilot · 2026-02-15T22:19:50Z

src/deepgram/helpers/text_builder.py

+    phoneme_pattern = r'<phoneme\s+alphabet=["\']ipa["\']\s+ph=["\'](.*?)["\']\s*>(.*?)</phoneme>'
+
+    def replace_phoneme(match):
+        ipa = match.group(1)
+        word = match.group(2)


The phoneme regex pattern requires attributes in a specific order (alphabet before ph). XML/SSML attributes can appear in any order, so this pattern may fail to match valid SSML like <phoneme ph="..." alphabet="ipa">word</phoneme>. Consider making the regex more flexible to handle attributes in any order.

Suggested change

phoneme_pattern = r'<phoneme\s+alphabet=["\']ipa["\']\s+ph=["\'](.*?)["\']\s*>(.*?)</phoneme>'

def replace_phoneme(match):

ipa = match.group(1)

word = match.group(2)

# Match <phoneme> with alphabet="ipa" and ph="..." in any attribute order

phoneme_pattern = (

r'<phoneme'

r'(?=[^>]*\balphabet=["\']ipa["\'])'

r'(?=[^>]*\bph=["\'](?P<ipa>[^"\']+)["\'])'

r'[^>]*>'

r'(?P<word>.*?)'

r'</phoneme>'

)

def replace_phoneme(match):

ipa = match.group("ipa")

word = match.group("word")

Copilot · 2026-02-15T22:19:50Z

examples/23-text-builder-helper.py

+    if api_key:
+        client = DeepgramClient(api_key=api_key)
+
+        response = client.speak.v1.generate(


Incorrect API method. The correct method is client.speak.v1.audio.generate(), not client.speak.v1.generate(). Based on example 20-text-to-speech-single.py line 21, the audio namespace is required.

Copilot · 2026-02-15T22:19:51Z

src/deepgram/helpers/README.md

+
+# Use with Deepgram TTS
+client = DeepgramClient(api_key="YOUR_API_KEY")
+response = client.speak.v1.generate(text, model="aura-asteria-en")


Incorrect API method. The correct method is client.speak.v1.audio.generate(), not client.speak.v1.generate(). Based on example 20-text-to-speech-single.py line 21, the audio namespace is required.

Suggested change

response = client.speak.v1.generate(text, model="aura-asteria-en")

response = client.speak.v1.audio.generate(text, model="aura-asteria-en")

Copilot · 2026-02-15T22:19:51Z

examples/23-text-builder-helper.py

+        client = DeepgramClient(api_key=api_key)
+
+        # Generate speech with custom pronunciations
+        response = client.speak.v1.generate(


Incorrect API method. The correct method is client.speak.v1.audio.generate(), not client.speak.v1.generate(). Based on example 20-text-to-speech-single.py line 21, the audio namespace is required.

Copilot · 2026-02-15T22:19:51Z

examples/23-text-builder-helper.py

+    if api_key:
+        client = DeepgramClient(api_key=api_key)
+
+        response = client.speak.v1.generate(


Incorrect API method. The correct method is client.speak.v1.audio.generate(), not client.speak.v1.generate(). Based on example 20-text-to-speech-single.py line 21, the audio namespace is required.

Copilot · 2026-02-15T22:19:51Z

src/deepgram/helpers/text_builder.py

+Provides helper methods for constructing TTS text with pronunciation, pause,
+and speed controls for Deepgram's Text-to-Speech API.


The docstring mentions "speed controls" but no speed control functionality is implemented in this module. Only pronunciation and pause controls are available. Remove "and speed controls" from the docstring to accurately reflect the current functionality.

Suggested change

Provides helper methods for constructing TTS text with pronunciation, pause,

and speed controls for Deepgram's Text-to-Speech API.

Provides helper methods for constructing TTS text with pronunciation and pause

controls for Deepgram's Text-to-Speech API.

Copilot · 2026-02-15T22:19:52Z

src/deepgram/helpers/README.md

+
+### Documentation
+
+See [TextBuilder-Guide.md](../../../docs/TextBuilder-Guide.md) for comprehensive documentation.


The referenced documentation file docs/TextBuilder-Guide.md does not exist in the repository. Either add the comprehensive documentation file or remove this reference.

Suggested change

See [TextBuilder-Guide.md](../../../docs/TextBuilder-Guide.md) for comprehensive documentation.

Comprehensive TextBuilder documentation will be added to the main Deepgram SDK documentation.

🤖 Release candidate for v6.0.0 --- ## [6.0.0-rc.1](v5.3.2...v6.0.0-rc.1) (2026-02-16) ### ⚠ BREAKING CHANGES * v6 — fully generated SDK with latest APIs and WebSocket support (#640) ### Features * **helpers:** add TextBuilder class for TTS pronunciation and pause controls (#660) * **sagemaker:** add SageMaker transport for running Deepgram on AWS SageMaker endpoints (#659) * v6 — fully generated SDK with latest APIs and WebSocket support (#640) * **websockets:** add custom WebSocket transport support (#658) --- ### Files changed | File | Change | |------|--------| | `pyproject.toml` | `6.0.0-beta.4` → `6.0.0-rc.1` | | `src/deepgram/core/client_wrapper.py` | User-Agent and SDK version → `6.0.0-rc.1` | | `.github/.release-please-manifest.json` | `5.3.2` → `6.0.0-rc.1` | | `CHANGELOG.md` | Add 6.0.0-rc.1 entry |

lukeocodes added 9 commits February 15, 2026 22:10

feat(sdk): export TextBuilder helpers from main deepgram package

744808c

- Export TextBuilder, add_pronunciation, ssml_to_deepgram, validate_ipa, and validate_pause from deepgram package - Add to __all__ and _dynamic_imports for lazy loading - Enable usage: from deepgram import TextBuilder

docs(examples): update README with TextBuilder examples

970e248

- Add TextBuilder demo and helper examples to Text-to-Speech section - Include both interactive demo (no API key) and live TTS generation examples

fix(helpers): remove unused variable to resolve mypy type annotation …

11b2d49

…error

fix(helpers): improve SSML parsing and validation

d1d9aa3

- Fix ssml_to_deepgram to handle SSML fragments (not just complete documents) - Fix validate_pause to check integer type before increment validation - Fix test case to use correct case-sensitive word matching

refactor(helpers): import TextBuilder from deepgram.helpers, not deep…

476a6ea

…gram Revert changes to the auto-generated __init__.py so Fern regeneration won't overwrite the TextBuilder exports. Users import from deepgram.helpers instead.

Copilot AI review requested due to automatic review settings February 15, 2026 22:14

Copilot started reviewing on behalf of lukeocodes February 15, 2026 22:14 View session

lukeocodes mentioned this pull request Feb 15, 2026

feat(helpers): add helper class for tts pronunciation and pause controls #646

Closed

lukeocodes merged commit 4324120 into main Feb 15, 2026
18 checks passed

lukeocodes deleted the lo/tts-helpers-v2 branch February 15, 2026 22:18

Copilot AI reviewed Feb 15, 2026

View reviewed changes

github-actions bot mentioned this pull request Feb 15, 2026

chore(main): release 5.3.2 #657

Closed

lukeocodes mentioned this pull request Feb 16, 2026

chore(main): release 6.0.0-rc.1 #661

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(helpers): add TextBuilder class for TTS pronunciation and pause controls#660

feat(helpers): add TextBuilder class for TTS pronunciation and pause controls#660
lukeocodes merged 9 commits intomainfrom
lo/tts-helpers-v2

lukeocodes commented Feb 15, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-        word = match.group(2)
+        word = match.group(2)
+        # Validate IPA for consistency with pronunciation() and add_pronunciation()
+        is_valid, error_msg = validate_ipa(ipa)
+        if not is_valid:
+            raise ValueError(error_msg)

-    phoneme_pattern = r'<phoneme\s+alphabet=["\']ipa["\']\s+ph=["\'](.*?)["\']\s*>(.*?)</phoneme>'
-    def replace_phoneme(match):
-        ipa = match.group(1)
-        word = match.group(2)
+    # Match <phoneme> with alphabet="ipa" and ph="..." in any attribute order
+    phoneme_pattern = (
+        r'<phoneme'
+        r'(?=[^>]*\balphabet=["\']ipa["\'])'
+        r'(?=[^>]*\bph=["\'](?P<ipa>[^"\']+)["\'])'
+        r'[^>]*>'
+        r'(?P<word>.*?)'
+        r'</phoneme>'
+    )
+    def replace_phoneme(match):
+        ipa = match.group("ipa")
+        word = match.group("word")

	response = client.speak.v1.generate(text, model="aura-asteria-en")
	response = client.speak.v1.audio.generate(text, model="aura-asteria-en")

		Provides helper methods for constructing TTS text with pronunciation, pause,
		and speed controls for Deepgram's Text-to-Speech API.


		### Documentation

		See [TextBuilder-Guide.md](../../../docs/TextBuilder-Guide.md) for comprehensive documentation.

	See [TextBuilder-Guide.md](../../../docs/TextBuilder-Guide.md) for comprehensive documentation.
	Comprehensive TextBuilder documentation will be added to the main Deepgram SDK documentation.

Conversation

lukeocodes commented Feb 15, 2026

Summary

Usage

SSML Migration

What's Included

Design Decision

Test plan

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant