feat(helpers): add TextBuilder class for TTS pronunciation and pause controls#660
feat(helpers): add TextBuilder class for TTS pronunciation and pause controls#660lukeocodes merged 9 commits intomainfrom
Conversation
…controls - Add TextBuilder fluent builder class with text(), pronunciation(), pause(), from_ssml(), and build() methods - Add standalone utility functions: add_pronunciation(), ssml_to_deepgram(), validate_ipa(), validate_pause() - Implement comprehensive input validation and API limit enforcement (500 pronunciations, 50 pauses, 2000 chars) - Support SSML parsing and conversion (phoneme and break tags) - Include proper JSON escaping and error handling
- Export TextBuilder, add_pronunciation, ssml_to_deepgram, validate_ipa, and validate_pause from deepgram package - Add to __all__ and _dynamic_imports for lazy loading - Enable usage: from deepgram import TextBuilder
- Add 50+ test cases covering all TextBuilder functionality - Test basic text, pronunciation, pause, and SSML conversion - Test validation functions and error handling - Test API limit enforcement (pronunciations, pauses, characters) - Test standalone functions (add_pronunciation, ssml_to_deepgram) - Include integration tests with real-world examples
- Add 25-text-builder-demo.py: interactive demonstration of all features (no API key required) - Add 25-text-builder-helper.py: live TTS generation examples with API integration - Include examples for basic usage, SSML migration, standalone functions, and real-world scenarios - Cover medical prescriptions, pharmacy instructions, and scientific terminology use cases
- Add TextBuilder demo and helper examples to Text-to-Speech section - Include both interactive demo (no API key) and live TTS generation examples
Renumber all examples to group by feature area, with each section starting at multiples of 10: - 01-09: Authentication - 10-19: Transcription (Listen) - 20-29: Text-to-Speech (Speak) - including new TextBuilder streaming example - 30-39: Voice Agent - 40-49: Text Intelligence (Read) - 50-59: Management API - 60-69: On-Premises - 70-79: Configuration & Advanced This makes the examples easier to navigate and leaves room for future additions in each section.
- Fix ssml_to_deepgram to handle SSML fragments (not just complete documents) - Fix validate_pause to check integer type before increment validation - Fix test case to use correct case-sensitive word matching
…gram Revert changes to the auto-generated __init__.py so Fern regeneration won't overwrite the TextBuilder exports. Users import from deepgram.helpers instead.
There was a problem hiding this comment.
Pull request overview
This pull request adds a TextBuilder helper class and related utilities for constructing TTS (Text-to-Speech) text with inline pronunciation (IPA) and pause controls for the Deepgram API. It includes SSML-to-Deepgram conversion functionality to help users migrate from other TTS providers. This is a rebased version of PR #646 onto the current main branch.
Changes:
- Adds
TextBuilderfluent builder class with pronunciation and pause controls - Provides SSML-to-Deepgram conversion for migration from other TTS providers
- Includes comprehensive test suite (44 tests) covering all features and edge cases
- Adds three example scripts demonstrating interactive demo, REST API integration, and WebSocket streaming
- Updates example file numbering scheme to organize examples by feature area (10s, 20s, 30s, etc.)
Reviewed changes
Copilot reviewed 9 out of 29 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
src/deepgram/helpers/text_builder.py |
Core implementation: TextBuilder class and helper functions (pronunciation, pause, SSML conversion, validation) |
src/deepgram/helpers/__init__.py |
Public exports for helper module |
src/deepgram/helpers/README.md |
Documentation for TextBuilder usage and API |
tests/custom/test_text_builder.py |
Comprehensive test suite with 44 tests covering all functionality |
examples/22-text-builder-demo.py |
Interactive demonstration (no API key required) |
examples/23-text-builder-helper.py |
REST API integration examples |
examples/24-text-builder-streaming.py |
WebSocket streaming examples |
examples/README.md |
Updated documentation with new numbering scheme |
.fernignore |
Added helpers directory to prevent auto-generation |
examples/10-71*.py |
Renumbered existing examples to fit new organization scheme |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| TextBuilder() | ||
| .text("Prescription for ") | ||
| .pronunciation("lisinopril", "laɪˈsɪnəprɪl") | ||
| .pause(300) |
There was a problem hiding this comment.
Invalid pause duration used. The pause duration must be between 500-5000ms in 100ms increments, but 300ms is below the minimum of 500ms. This code will raise a ValueError at runtime.
| .pause(300) | |
| .pause(500) |
| pronunciation_pattern = r'\{"word":\s*"[^"]*",\s*"pronounce":\s*"[^"]*"\}' | ||
| pronunciations = re.findall(pronunciation_pattern, text) | ||
| self._pronunciation_count += len(pronunciations) | ||
|
|
||
| # Count pauses | ||
| pause_pattern = r"\{pause:\d+\}" | ||
| pauses = re.findall(pause_pattern, text) | ||
| self._pause_count += len(pauses) | ||
|
|
||
| # Character count (approximate - remove control syntax) | ||
| clean_text = re.sub(pronunciation_pattern, "", text) | ||
| clean_text = re.sub(pause_pattern, "", clean_text) | ||
| self._char_count += len(clean_text) |
There was a problem hiding this comment.
Inconsistent character counting between pronunciation() and from_ssml(). The pronunciation() method counts the word length (line 80), but _update_counts_from_text() removes pronunciation JSON entirely without counting the word (line 152). This means from_ssml() will undercount characters compared to using pronunciation() directly. Both should extract and count only the word from the pronunciation JSON for consistency.
| pronunciation_pattern = r'\{"word":\s*"[^"]*",\s*"pronounce":\s*"[^"]*"\}' | |
| pronunciations = re.findall(pronunciation_pattern, text) | |
| self._pronunciation_count += len(pronunciations) | |
| # Count pauses | |
| pause_pattern = r"\{pause:\d+\}" | |
| pauses = re.findall(pause_pattern, text) | |
| self._pause_count += len(pauses) | |
| # Character count (approximate - remove control syntax) | |
| clean_text = re.sub(pronunciation_pattern, "", text) | |
| clean_text = re.sub(pause_pattern, "", clean_text) | |
| self._char_count += len(clean_text) | |
| # Capture the word so we can count its characters consistently with pronunciation() | |
| pronunciation_pattern = r'\{"word":\s*"([^"]*)",\s*"pronounce":\s*"[^"]*"\}' | |
| pronunciations = re.findall(pronunciation_pattern, text) | |
| self._pronunciation_count += len(pronunciations) | |
| word_char_count = sum(len(word) for word in pronunciations) | |
| # Count pauses | |
| pause_pattern = r"\{pause:\d+\}" | |
| pauses = re.findall(pause_pattern, text) | |
| self._pause_count += len(pauses) | |
| # Character count (approximate - remove control syntax, but keep word characters) | |
| clean_text = re.sub(pronunciation_pattern, "", text) | |
| clean_text = re.sub(pause_pattern, "", clean_text) | |
| self._char_count += len(clean_text) + word_char_count |
|
|
||
| def replace_phoneme(match): | ||
| ipa = match.group(1) | ||
| word = match.group(2) |
There was a problem hiding this comment.
The replace_phoneme function in ssml_to_deepgram does not validate IPA strings before creating pronunciation JSON. This is inconsistent with the pronunciation() method and add_pronunciation() function which both validate IPA. Invalid IPA from SSML could bypass validation and create malformed JSON. Consider adding validate_ipa() call here for consistency and safety.
| word = match.group(2) | |
| word = match.group(2) | |
| # Validate IPA for consistency with pronunciation() and add_pronunciation() | |
| is_valid, error_msg = validate_ipa(ipa) | |
| if not is_valid: | |
| raise ValueError(error_msg) |
| phoneme_pattern = r'<phoneme\s+alphabet=["\']ipa["\']\s+ph=["\'](.*?)["\']\s*>(.*?)</phoneme>' | ||
|
|
||
| def replace_phoneme(match): | ||
| ipa = match.group(1) | ||
| word = match.group(2) |
There was a problem hiding this comment.
The phoneme regex pattern requires attributes in a specific order (alphabet before ph). XML/SSML attributes can appear in any order, so this pattern may fail to match valid SSML like <phoneme ph="..." alphabet="ipa">word</phoneme>. Consider making the regex more flexible to handle attributes in any order.
| phoneme_pattern = r'<phoneme\s+alphabet=["\']ipa["\']\s+ph=["\'](.*?)["\']\s*>(.*?)</phoneme>' | |
| def replace_phoneme(match): | |
| ipa = match.group(1) | |
| word = match.group(2) | |
| # Match <phoneme> with alphabet="ipa" and ph="..." in any attribute order | |
| phoneme_pattern = ( | |
| r'<phoneme' | |
| r'(?=[^>]*\balphabet=["\']ipa["\'])' | |
| r'(?=[^>]*\bph=["\'](?P<ipa>[^"\']+)["\'])' | |
| r'[^>]*>' | |
| r'(?P<word>.*?)' | |
| r'</phoneme>' | |
| ) | |
| def replace_phoneme(match): | |
| ipa = match.group("ipa") | |
| word = match.group("word") |
| if api_key: | ||
| client = DeepgramClient(api_key=api_key) | ||
|
|
||
| response = client.speak.v1.generate( |
There was a problem hiding this comment.
Incorrect API method. The correct method is client.speak.v1.audio.generate(), not client.speak.v1.generate(). Based on example 20-text-to-speech-single.py line 21, the audio namespace is required.
|
|
||
| # Use with Deepgram TTS | ||
| client = DeepgramClient(api_key="YOUR_API_KEY") | ||
| response = client.speak.v1.generate(text, model="aura-asteria-en") |
There was a problem hiding this comment.
Incorrect API method. The correct method is client.speak.v1.audio.generate(), not client.speak.v1.generate(). Based on example 20-text-to-speech-single.py line 21, the audio namespace is required.
| response = client.speak.v1.generate(text, model="aura-asteria-en") | |
| response = client.speak.v1.audio.generate(text, model="aura-asteria-en") |
| client = DeepgramClient(api_key=api_key) | ||
|
|
||
| # Generate speech with custom pronunciations | ||
| response = client.speak.v1.generate( |
There was a problem hiding this comment.
Incorrect API method. The correct method is client.speak.v1.audio.generate(), not client.speak.v1.generate(). Based on example 20-text-to-speech-single.py line 21, the audio namespace is required.
| if api_key: | ||
| client = DeepgramClient(api_key=api_key) | ||
|
|
||
| response = client.speak.v1.generate( |
There was a problem hiding this comment.
Incorrect API method. The correct method is client.speak.v1.audio.generate(), not client.speak.v1.generate(). Based on example 20-text-to-speech-single.py line 21, the audio namespace is required.
| Provides helper methods for constructing TTS text with pronunciation, pause, | ||
| and speed controls for Deepgram's Text-to-Speech API. |
There was a problem hiding this comment.
The docstring mentions "speed controls" but no speed control functionality is implemented in this module. Only pronunciation and pause controls are available. Remove "and speed controls" from the docstring to accurately reflect the current functionality.
| Provides helper methods for constructing TTS text with pronunciation, pause, | |
| and speed controls for Deepgram's Text-to-Speech API. | |
| Provides helper methods for constructing TTS text with pronunciation and pause | |
| controls for Deepgram's Text-to-Speech API. |
|
|
||
| ### Documentation | ||
|
|
||
| See [TextBuilder-Guide.md](../../../docs/TextBuilder-Guide.md) for comprehensive documentation. |
There was a problem hiding this comment.
The referenced documentation file docs/TextBuilder-Guide.md does not exist in the repository. Either add the comprehensive documentation file or remove this reference.
| See [TextBuilder-Guide.md](../../../docs/TextBuilder-Guide.md) for comprehensive documentation. | |
| Comprehensive TextBuilder documentation will be added to the main Deepgram SDK documentation. |
🤖 Release candidate for v6.0.0 --- ## [6.0.0-rc.1](v5.3.2...v6.0.0-rc.1) (2026-02-16) ### ⚠ BREAKING CHANGES * v6 — fully generated SDK with latest APIs and WebSocket support (#640) ### Features * **helpers:** add TextBuilder class for TTS pronunciation and pause controls (#660) * **sagemaker:** add SageMaker transport for running Deepgram on AWS SageMaker endpoints (#659) * v6 — fully generated SDK with latest APIs and WebSocket support (#640) * **websockets:** add custom WebSocket transport support (#658) --- ### Files changed | File | Change | |------|--------| | `pyproject.toml` | `6.0.0-beta.4` → `6.0.0-rc.1` | | `src/deepgram/core/client_wrapper.py` | User-Agent and SDK version → `6.0.0-rc.1` | | `.github/.release-please-manifest.json` | `5.3.2` → `6.0.0-rc.1` | | `CHANGELOG.md` | Add 6.0.0-rc.1 entry |
Summary
Add
TextBuilder— a fluent builder class for constructing TTS text with inline pronunciation (IPA) and pause controls. Includes SSML-to-Deepgram conversion for migrating from other TTS providers.Rebased from #646 onto current main to resolve conflicts.
Usage
SSML Migration
What's Included
src/deepgram/helpers/text_builder.py—TextBuilderclass,add_pronunciation(),ssml_to_deepgram(),validate_ipa(),validate_pause()tests/custom/test_text_builder.py— 44 tests covering all features and edge casesexamples/22-text-builder-demo.py— interactive demo (no API key needed)examples/23-text-builder-helper.py— REST API integration examplesexamples/24-text-builder-streaming.py— WebSocket streaming examplesDesign Decision
Imports come from
deepgram.helpers(notdeepgram) so the auto-generated__init__.pydoesn't need modification. Fern regeneration won't break anything.Test plan
pytest tests/custom/test_text_builder.py)mypy src/clean (708 files, 0 errors)