1- # Podcast Audio Generator
1+ # Podcast Generation Scripts
22
3- Converts course MDX/Markdown content into podcast-style conversational audio using Google's Gemini API.
3+ Converts course MDX/Markdown content into podcast-style conversational audio using a two-stage pipeline:
4+ 1 . ** Script Generation** : Claude Haiku 4.5 generates engaging dialog (via Claude Code CLI)
5+ 2 . ** Audio Synthesis** : Gemini 2.5 Flash TTS converts scripts to audio
46
57## Features
68
79- ** Two-speaker dialogue** : Converts technical documentation into natural conversations between Alex (instructor) and Sam (senior engineer)
8- - ** Gemini 2.5 Flash** : Uses latest models for dialogue generation and TTS
9- - ** Multi-speaker TTS** : Natural voice synthesis with distinct speaker voices
10- - ** Automatic processing** : Scans all course content and generates audio files
11- - ** Manifest generation** : Creates JSON manifest mapping docs to audio URLs
10+ - ** Optimized for senior engineers** : Professional, engaging, argument-driven content based on educational podcast best practices
11+ - ** Separated concerns** : Generate scripts first, then audio - allows manual editing and version control
12+ - ** Multi-speaker TTS** : Natural voice synthesis with distinct speaker voices (Kore/Charon)
13+ - ** Automatic processing** : Scans all course content and processes systematically
14+ - ** Dual manifests** : Scripts manifest + audio manifest for tracking
1215
1316## Prerequisites
1417
1518- Node.js 20+
16- - Google Gemini API key
19+ - ** Claude Code CLI** installed and authenticated (` npm install -g @anthropic-ai/claude-code ` )
20+ - ** Google Gemini API key** for TTS
1721- Course content in ` website/docs/ ` directory
1822
1923## Setup
2024
21- 1 . ** Get API Key** : Obtain a Gemini API key from [ Google AI Studio] ( https://aistudio.google.com/ )
25+ ### 1. Install Claude Code CLI
26+ ``` bash
27+ npm install -g @anthropic-ai/claude-code
28+ claude # Follow authentication prompts
29+ ```
2230
23- 2 . ** Set Environment Variable ** :
24- ``` bash
25- export GOOGLE_API_KEY=" your-api-key-here"
26- # OR
27- export GEMINI_API_KEY=" your-api-key-here"
28- # OR
29- export GCP_API_KEY=" your-api-key-here"
30- ```
31+ ### 2. Set Gemini API Key
32+ ``` bash
33+ export GOOGLE_API_KEY=" your-api-key-here"
34+ # OR
35+ export GEMINI_API_KEY=" your-api-key-here"
36+ # OR
37+ export GCP_API_KEY=" your-api-key-here"
38+ ```
3139
32- 3 . ** Install Dependencies** (already done if you ran setup):
33- ``` bash
34- cd scripts
35- npm install
36- ```
40+ ### 3. Install Dependencies
41+ ``` bash
42+ cd scripts
43+ npm install
44+ ```
3745
3846## Usage
3947
40- ### From website directory:
48+ ### Complete Pipeline (Scripts + Audio)
4149``` bash
42- cd website
50+ cd scripts
4351npm run generate-podcast
4452```
4553
46- ### From scripts directory:
54+ This runs both stages sequentially.
55+
56+ ### Stage 1: Generate Scripts Only
4757``` bash
4858cd scripts
49- npm run generate-podcast
59+ npm run generate-podcast-scripts
5060```
5161
52- ### Direct execution:
62+ ** Output:** Markdown scripts in ` scripts/output/podcasts/ `
63+ - Version-controllable
64+ - Manually editable
65+ - Contains frontmatter with metadata
66+
67+ ### Stage 2: Generate Audio from Scripts
5368``` bash
5469cd scripts
55- node generate-podcast.js
70+ npm run generate-podcast-audio
5671```
5772
58- ## Output
73+ ** Output:** WAV files in ` website/static/audio/ `
74+ - Reads saved scripts
75+ - Multi-speaker synthesis
76+ - Updates audio manifest
5977
60- - ** Audio files** : ` website/static/audio/[lesson-path]/[filename].wav `
61- - ** Manifest** : ` website/static/audio/manifest.json `
78+ ### Legacy Monolithic Script
79+ ``` bash
80+ cd scripts
81+ npm run generate-podcast-legacy
82+ ```
83+
84+ Runs the original single-stage script (generates dialog inline without saving)
85+
86+ ## Output Structure
6287
63- ### Manifest Structure
88+ ### Script Files
89+ ** Location:** ` scripts/output/podcasts/ `
90+
91+ ** Structure:**
92+ ```
93+ output/podcasts/
94+ ├── manifest.json
95+ ├── intro.md
96+ ├── understanding-the-tools/
97+ │ ├── lesson-1-intro.md
98+ │ └── lesson-2-understanding-agents.md
99+ └── methodology/
100+ └── lesson-3-high-level-methodology.md
101+ ```
102+
103+ ** Script Format:**
104+ ``` markdown
105+ ---
106+ source: understanding-the-tools/lesson-1-intro.md
107+ speakers:
108+ - name: Alex
109+ role: Instructor
110+ voice: Kore
111+ - name: Sam
112+ role: Senior Engineer
113+ voice: Charon
114+ generatedAt: 2025-11-01T12:34:56.789Z
115+ model: claude-haiku-4.5
116+ tokenCount: 5234
117+ ---
118+
119+ Alex: Let's dive into AI coding agents...
120+
121+ Sam: I've been using them for a few months now...
122+ ```
64123
124+ ### Audio Files
125+ ** Location:** ` website/static/audio/ `
126+
127+ ** Structure:** Mirrors script directory structure
128+
129+ ** Manifest:** ` website/static/audio/manifest.json `
65130``` json
66131{
67132 "understanding-the-tools/lesson-1-intro.md" : {
68133 "audioUrl" : " /audio/understanding-the-tools/lesson-1-intro.wav" ,
69134 "size" : 1234567 ,
70135 "format" : " audio/wav" ,
71- "generatedAt" : " 2025-10-29T12:34:56.789Z"
136+ "tokenCount" : 5234 ,
137+ "generatedAt" : " 2025-11-01T12:34:56.789Z" ,
138+ "scriptSource" : " understanding-the-tools/lesson-1-intro.md"
72139 }
73140}
74141```
75142
76143## Processing Pipeline
77144
145+ ### Script Generation (generate-podcast-script.js)
781461 . ** Content Discovery** : Scans ` website/docs/ ` for .md/.mdx files
791472 . ** Content Parsing** : Strips frontmatter, JSX, code blocks
80- 3 . ** Dialogue Generation** : Uses Gemini 2.5 Flash to create conversational script
81- 4 . ** Audio Synthesis** : Uses Gemini 2.5 Flash TTS with multi-speaker config
82- 5 . ** File Output** : Saves WAV files and updates manifest
148+ 3 . ** Prompt Engineering** : Builds optimized prompt for Haiku 4.5
149+ 4 . ** Dialog Generation** : Calls Claude Code CLI in headless mode
150+ 5 . ** Script Output** : Saves markdown with frontmatter to ` output/podcasts/ `
151+ 6 . ** Manifest Update** : Updates script manifest
152+
153+ ### Audio Synthesis (generate-podcast-audio.js)
154+ 1 . ** Script Discovery** : Scans ` output/podcasts/ ` for markdown files
155+ 2 . ** Script Parsing** : Extracts frontmatter and dialog
156+ 3 . ** Token Validation** : Ensures dialog fits TTS limits
157+ 4 . ** Audio Synthesis** : Calls Gemini 2.5 Flash TTS with multi-speaker config
158+ 5 . ** WAV Creation** : Adds proper headers to PCM data
159+ 6 . ** Audio Output** : Saves to ` website/static/audio/ `
160+ 7 . ** Manifest Update** : Updates audio manifest
83161
84162## Configuration
85163
86164### Models
87- - ** Dialogue ** : ` gemini-2.5-flash ` (text generation )
165+ - ** Dialog Generation ** : Claude Haiku 4.5 (via Claude Code CLI )
88166- ** TTS** : ` gemini-2.5-flash-preview-tts ` (audio synthesis)
89167
90168### Speakers
91- - ** Alex** : "Kore" voice (firm, professional)
92- - ** Sam** : "Charon" voice (neutral, professional)
169+ - ** Alex** : "Kore" voice (firm, professional instructor )
170+ - ** Sam** : "Charon" voice (neutral, professional engineer )
93171
94- ### Rate Limiting
95- - 2-second delay between files to avoid API rate limits
96- - Sequential processing (not parallel)
172+ ### Processing
173+ - ** Script concurrency** : 3 files at a time (Claude CLI calls)
174+ - ** Audio concurrency** : 3 files at a time (API rate limits)
175+ - ** Token limits** : 6,000-7,500 tokens per dialog (TTS API constraint)
97176
98177## Cost Estimation
99178
100- Using Gemini 2.5 Flash pricing:
101- - ** Text generation** : $0.50 per 1M tokens
179+ ### Script Generation (Claude Haiku 4.5)
180+ - ** Input** : ~ $0.25 per 1M tokens
181+ - ** Output** : ~ $1.25 per 1M tokens
182+ - ** Estimated per lesson** : ~ $0.01-0.05 (depends on content length)
183+
184+ ### Audio Synthesis (Gemini 2.5 Flash TTS)
102185- ** Audio output** : $10.00 per 1M tokens
186+ - ** Estimated per lesson** : ~ $0.05-0.10 (6k-7k tokens avg)
103187
104- Estimated cost for full course (12 lessons):
105- - ** ~ $0.50-1.50** total (assuming ~ 200KB content)
188+ ### Full Course (12 lessons)
189+ - ** Script generation** : ~ $0.50-1.00 total
190+ - ** Audio synthesis** : ~ $0.60-1.20 total
191+ - ** Combined** : ~ $1.10-2.20 total
192+
193+ ** Benefits of split pipeline:**
194+ - Regenerate audio without re-prompting LLM (saves script gen costs)
195+ - Edit scripts manually before audio synthesis (reduces audio regeneration)
106196
107197## Utility Scripts
108198
@@ -118,14 +208,26 @@ This creates `.bak` backups and adds proper RIFF/WAV headers to headerless PCM f
118208
119209## Troubleshooting
120210
121- ### "No API key found"
211+ ### "No API key found" (Gemini)
122212Set one of the environment variables: ` GOOGLE_API_KEY ` , ` GEMINI_API_KEY ` , or ` GCP_API_KEY `
123213
214+ ### "Failed to spawn Claude CLI"
215+ 1 . Ensure Claude Code CLI is installed: ` npm install -g @anthropic-ai/claude-code `
216+ 2 . Verify it's in PATH: ` which claude ` (should return a path)
217+ 3 . Authenticate: Run ` claude ` and follow prompts
218+ 4 . Check permissions: Script uses ` --dangerously-skip-permissions ` flag
219+
124220### "Module not found"
125221Run ` npm install ` in the scripts directory
126222
223+ ### "No script files found"
224+ Run ` npm run generate-podcast-scripts ` first before generating audio
225+
226+ ### Dialog exceeds token limit
227+ The script was generated with too much content. Regenerate with stricter constraints or manually edit the script file to reduce length.
228+
127229### Corrupted/unplayable WAV files
128- Gemini API returns raw PCM data without WAV headers. The script now automatically adds headers, but if you have old files, run :
230+ The audio generation script automatically adds proper WAV headers. If you have old files from the legacy script :
129231``` bash
130232node fix-wav-files.js
131233```
@@ -134,20 +236,58 @@ node fix-wav-files.js
134236The Gemini 2.5 Flash TTS model is in preview and may have some background noise in long generations (known issue as of October 2025)
135237
136238### Rate limit errors
137- Increase the delay between files in the script (currently 2000ms)
239+ - Script generation: Reduce concurrency in ` generate-podcast-script.js ` (currently 3)
240+ - Audio generation: Reduce concurrency in ` generate-podcast-audio.js ` (currently 3)
138241
139242## Development
140243
141244### Test with single file
142- Modify the script to process only one file for testing:
143245
246+ ** Script generation:**
144247``` javascript
145- // In main(), after finding files:
248+ // In generate-podcast-script.js main(), after finding files:
146249const files = findMarkdownFiles (DOCS_DIR ).slice (0 , 1 ); // Test first file only
147250```
148251
149- ### Skip CLAUDE.md
150- The script automatically skips ` CLAUDE.md ` files (project instructions)
252+ ** Audio generation:**
253+ ``` javascript
254+ // In generate-podcast-audio.js main(), after finding files:
255+ const files = findScriptFiles (SCRIPT_INPUT_DIR ).slice (0 , 1 ); // Test first file only
256+ ```
257+
258+ ### Manual testing workflow
259+ ``` bash
260+ # 1. Generate single script
261+ cd scripts
262+ # Edit generate-podcast-script.js to slice(0, 1)
263+ npm run generate-podcast-scripts
264+
265+ # 2. Review output
266+ cat output/podcasts/intro.md
267+
268+ # 3. Generate audio from that script
269+ # Edit generate-podcast-audio.js to slice(0, 1)
270+ npm run generate-podcast-audio
271+
272+ # 4. Test audio playback
273+ open ../website/static/audio/intro.wav
274+ ```
275+
276+ ### Automatic exclusions
277+ - Scripts automatically skip ` CLAUDE.md ` files (project instructions)
278+ - Requires ` .md ` or ` .mdx ` extension
279+
280+ ### Version control considerations
281+ ** Scripts (` output/podcasts/ ` )** : Consider version-controlling these for:
282+ - Manual editing capability
283+ - Tracking prompt quality improvements
284+ - Rollback if regeneration produces worse results
285+
286+ ** Audio files (` website/static/audio/ ` )** : Typically excluded from git due to size:
287+ ``` gitignore
288+ # Add to .gitignore if needed
289+ website/static/audio/*.wav
290+ ```
151291
152292## Related Documentation
153293
0 commit comments