Skip to content

Commit 690d324

Browse files
committed
Updated podcasts generation
1 parent 4af7912 commit 690d324

20 files changed

+3467
-53
lines changed

scripts/README.md

Lines changed: 192 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -1,108 +1,198 @@
1-
# Podcast Audio Generator
1+
# Podcast Generation Scripts
22

3-
Converts course MDX/Markdown content into podcast-style conversational audio using Google's Gemini API.
3+
Converts course MDX/Markdown content into podcast-style conversational audio using a two-stage pipeline:
4+
1. **Script Generation**: Claude Haiku 4.5 generates engaging dialog (via Claude Code CLI)
5+
2. **Audio Synthesis**: Gemini 2.5 Flash TTS converts scripts to audio
46

57
## Features
68

79
- **Two-speaker dialogue**: Converts technical documentation into natural conversations between Alex (instructor) and Sam (senior engineer)
8-
- **Gemini 2.5 Flash**: Uses latest models for dialogue generation and TTS
9-
- **Multi-speaker TTS**: Natural voice synthesis with distinct speaker voices
10-
- **Automatic processing**: Scans all course content and generates audio files
11-
- **Manifest generation**: Creates JSON manifest mapping docs to audio URLs
10+
- **Optimized for senior engineers**: Professional, engaging, argument-driven content based on educational podcast best practices
11+
- **Separated concerns**: Generate scripts first, then audio - allows manual editing and version control
12+
- **Multi-speaker TTS**: Natural voice synthesis with distinct speaker voices (Kore/Charon)
13+
- **Automatic processing**: Scans all course content and processes systematically
14+
- **Dual manifests**: Scripts manifest + audio manifest for tracking
1215

1316
## Prerequisites
1417

1518
- Node.js 20+
16-
- Google Gemini API key
19+
- **Claude Code CLI** installed and authenticated (`npm install -g @anthropic-ai/claude-code`)
20+
- **Google Gemini API key** for TTS
1721
- Course content in `website/docs/` directory
1822

1923
## Setup
2024

21-
1. **Get API Key**: Obtain a Gemini API key from [Google AI Studio](https://aistudio.google.com/)
25+
### 1. Install Claude Code CLI
26+
```bash
27+
npm install -g @anthropic-ai/claude-code
28+
claude # Follow authentication prompts
29+
```
2230

23-
2. **Set Environment Variable**:
24-
```bash
25-
export GOOGLE_API_KEY="your-api-key-here"
26-
# OR
27-
export GEMINI_API_KEY="your-api-key-here"
28-
# OR
29-
export GCP_API_KEY="your-api-key-here"
30-
```
31+
### 2. Set Gemini API Key
32+
```bash
33+
export GOOGLE_API_KEY="your-api-key-here"
34+
# OR
35+
export GEMINI_API_KEY="your-api-key-here"
36+
# OR
37+
export GCP_API_KEY="your-api-key-here"
38+
```
3139

32-
3. **Install Dependencies** (already done if you ran setup):
33-
```bash
34-
cd scripts
35-
npm install
36-
```
40+
### 3. Install Dependencies
41+
```bash
42+
cd scripts
43+
npm install
44+
```
3745

3846
## Usage
3947

40-
### From website directory:
48+
### Complete Pipeline (Scripts + Audio)
4149
```bash
42-
cd website
50+
cd scripts
4351
npm run generate-podcast
4452
```
4553

46-
### From scripts directory:
54+
This runs both stages sequentially.
55+
56+
### Stage 1: Generate Scripts Only
4757
```bash
4858
cd scripts
49-
npm run generate-podcast
59+
npm run generate-podcast-scripts
5060
```
5161

52-
### Direct execution:
62+
**Output:** Markdown scripts in `scripts/output/podcasts/`
63+
- Version-controllable
64+
- Manually editable
65+
- Contains frontmatter with metadata
66+
67+
### Stage 2: Generate Audio from Scripts
5368
```bash
5469
cd scripts
55-
node generate-podcast.js
70+
npm run generate-podcast-audio
5671
```
5772

58-
## Output
73+
**Output:** WAV files in `website/static/audio/`
74+
- Reads saved scripts
75+
- Multi-speaker synthesis
76+
- Updates audio manifest
5977

60-
- **Audio files**: `website/static/audio/[lesson-path]/[filename].wav`
61-
- **Manifest**: `website/static/audio/manifest.json`
78+
### Legacy Monolithic Script
79+
```bash
80+
cd scripts
81+
npm run generate-podcast-legacy
82+
```
83+
84+
Runs the original single-stage script (generates dialog inline without saving)
85+
86+
## Output Structure
6287

63-
### Manifest Structure
88+
### Script Files
89+
**Location:** `scripts/output/podcasts/`
90+
91+
**Structure:**
92+
```
93+
output/podcasts/
94+
├── manifest.json
95+
├── intro.md
96+
├── understanding-the-tools/
97+
│ ├── lesson-1-intro.md
98+
│ └── lesson-2-understanding-agents.md
99+
└── methodology/
100+
└── lesson-3-high-level-methodology.md
101+
```
102+
103+
**Script Format:**
104+
```markdown
105+
---
106+
source: understanding-the-tools/lesson-1-intro.md
107+
speakers:
108+
- name: Alex
109+
role: Instructor
110+
voice: Kore
111+
- name: Sam
112+
role: Senior Engineer
113+
voice: Charon
114+
generatedAt: 2025-11-01T12:34:56.789Z
115+
model: claude-haiku-4.5
116+
tokenCount: 5234
117+
---
118+
119+
Alex: Let's dive into AI coding agents...
120+
121+
Sam: I've been using them for a few months now...
122+
```
64123

124+
### Audio Files
125+
**Location:** `website/static/audio/`
126+
127+
**Structure:** Mirrors script directory structure
128+
129+
**Manifest:** `website/static/audio/manifest.json`
65130
```json
66131
{
67132
"understanding-the-tools/lesson-1-intro.md": {
68133
"audioUrl": "/audio/understanding-the-tools/lesson-1-intro.wav",
69134
"size": 1234567,
70135
"format": "audio/wav",
71-
"generatedAt": "2025-10-29T12:34:56.789Z"
136+
"tokenCount": 5234,
137+
"generatedAt": "2025-11-01T12:34:56.789Z",
138+
"scriptSource": "understanding-the-tools/lesson-1-intro.md"
72139
}
73140
}
74141
```
75142

76143
## Processing Pipeline
77144

145+
### Script Generation (generate-podcast-script.js)
78146
1. **Content Discovery**: Scans `website/docs/` for .md/.mdx files
79147
2. **Content Parsing**: Strips frontmatter, JSX, code blocks
80-
3. **Dialogue Generation**: Uses Gemini 2.5 Flash to create conversational script
81-
4. **Audio Synthesis**: Uses Gemini 2.5 Flash TTS with multi-speaker config
82-
5. **File Output**: Saves WAV files and updates manifest
148+
3. **Prompt Engineering**: Builds optimized prompt for Haiku 4.5
149+
4. **Dialog Generation**: Calls Claude Code CLI in headless mode
150+
5. **Script Output**: Saves markdown with frontmatter to `output/podcasts/`
151+
6. **Manifest Update**: Updates script manifest
152+
153+
### Audio Synthesis (generate-podcast-audio.js)
154+
1. **Script Discovery**: Scans `output/podcasts/` for markdown files
155+
2. **Script Parsing**: Extracts frontmatter and dialog
156+
3. **Token Validation**: Ensures dialog fits TTS limits
157+
4. **Audio Synthesis**: Calls Gemini 2.5 Flash TTS with multi-speaker config
158+
5. **WAV Creation**: Adds proper headers to PCM data
159+
6. **Audio Output**: Saves to `website/static/audio/`
160+
7. **Manifest Update**: Updates audio manifest
83161

84162
## Configuration
85163

86164
### Models
87-
- **Dialogue**: `gemini-2.5-flash` (text generation)
165+
- **Dialog Generation**: Claude Haiku 4.5 (via Claude Code CLI)
88166
- **TTS**: `gemini-2.5-flash-preview-tts` (audio synthesis)
89167

90168
### Speakers
91-
- **Alex**: "Kore" voice (firm, professional)
92-
- **Sam**: "Charon" voice (neutral, professional)
169+
- **Alex**: "Kore" voice (firm, professional instructor)
170+
- **Sam**: "Charon" voice (neutral, professional engineer)
93171

94-
### Rate Limiting
95-
- 2-second delay between files to avoid API rate limits
96-
- Sequential processing (not parallel)
172+
### Processing
173+
- **Script concurrency**: 3 files at a time (Claude CLI calls)
174+
- **Audio concurrency**: 3 files at a time (API rate limits)
175+
- **Token limits**: 6,000-7,500 tokens per dialog (TTS API constraint)
97176

98177
## Cost Estimation
99178

100-
Using Gemini 2.5 Flash pricing:
101-
- **Text generation**: $0.50 per 1M tokens
179+
### Script Generation (Claude Haiku 4.5)
180+
- **Input**: ~$0.25 per 1M tokens
181+
- **Output**: ~$1.25 per 1M tokens
182+
- **Estimated per lesson**: ~$0.01-0.05 (depends on content length)
183+
184+
### Audio Synthesis (Gemini 2.5 Flash TTS)
102185
- **Audio output**: $10.00 per 1M tokens
186+
- **Estimated per lesson**: ~$0.05-0.10 (6k-7k tokens avg)
103187

104-
Estimated cost for full course (12 lessons):
105-
- **~$0.50-1.50** total (assuming ~200KB content)
188+
### Full Course (12 lessons)
189+
- **Script generation**: ~$0.50-1.00 total
190+
- **Audio synthesis**: ~$0.60-1.20 total
191+
- **Combined**: ~$1.10-2.20 total
192+
193+
**Benefits of split pipeline:**
194+
- Regenerate audio without re-prompting LLM (saves script gen costs)
195+
- Edit scripts manually before audio synthesis (reduces audio regeneration)
106196

107197
## Utility Scripts
108198

@@ -118,14 +208,26 @@ This creates `.bak` backups and adds proper RIFF/WAV headers to headerless PCM f
118208

119209
## Troubleshooting
120210

121-
### "No API key found"
211+
### "No API key found" (Gemini)
122212
Set one of the environment variables: `GOOGLE_API_KEY`, `GEMINI_API_KEY`, or `GCP_API_KEY`
123213

214+
### "Failed to spawn Claude CLI"
215+
1. Ensure Claude Code CLI is installed: `npm install -g @anthropic-ai/claude-code`
216+
2. Verify it's in PATH: `which claude` (should return a path)
217+
3. Authenticate: Run `claude` and follow prompts
218+
4. Check permissions: Script uses `--dangerously-skip-permissions` flag
219+
124220
### "Module not found"
125221
Run `npm install` in the scripts directory
126222

223+
### "No script files found"
224+
Run `npm run generate-podcast-scripts` first before generating audio
225+
226+
### Dialog exceeds token limit
227+
The script was generated with too much content. Regenerate with stricter constraints or manually edit the script file to reduce length.
228+
127229
### Corrupted/unplayable WAV files
128-
Gemini API returns raw PCM data without WAV headers. The script now automatically adds headers, but if you have old files, run:
230+
The audio generation script automatically adds proper WAV headers. If you have old files from the legacy script:
129231
```bash
130232
node fix-wav-files.js
131233
```
@@ -134,20 +236,58 @@ node fix-wav-files.js
134236
The Gemini 2.5 Flash TTS model is in preview and may have some background noise in long generations (known issue as of October 2025)
135237

136238
### Rate limit errors
137-
Increase the delay between files in the script (currently 2000ms)
239+
- Script generation: Reduce concurrency in `generate-podcast-script.js` (currently 3)
240+
- Audio generation: Reduce concurrency in `generate-podcast-audio.js` (currently 3)
138241

139242
## Development
140243

141244
### Test with single file
142-
Modify the script to process only one file for testing:
143245

246+
**Script generation:**
144247
```javascript
145-
// In main(), after finding files:
248+
// In generate-podcast-script.js main(), after finding files:
146249
const files = findMarkdownFiles(DOCS_DIR).slice(0, 1); // Test first file only
147250
```
148251

149-
### Skip CLAUDE.md
150-
The script automatically skips `CLAUDE.md` files (project instructions)
252+
**Audio generation:**
253+
```javascript
254+
// In generate-podcast-audio.js main(), after finding files:
255+
const files = findScriptFiles(SCRIPT_INPUT_DIR).slice(0, 1); // Test first file only
256+
```
257+
258+
### Manual testing workflow
259+
```bash
260+
# 1. Generate single script
261+
cd scripts
262+
# Edit generate-podcast-script.js to slice(0, 1)
263+
npm run generate-podcast-scripts
264+
265+
# 2. Review output
266+
cat output/podcasts/intro.md
267+
268+
# 3. Generate audio from that script
269+
# Edit generate-podcast-audio.js to slice(0, 1)
270+
npm run generate-podcast-audio
271+
272+
# 4. Test audio playback
273+
open ../website/static/audio/intro.wav
274+
```
275+
276+
### Automatic exclusions
277+
- Scripts automatically skip `CLAUDE.md` files (project instructions)
278+
- Requires `.md` or `.mdx` extension
279+
280+
### Version control considerations
281+
**Scripts (`output/podcasts/`)**: Consider version-controlling these for:
282+
- Manual editing capability
283+
- Tracking prompt quality improvements
284+
- Rollback if regeneration produces worse results
285+
286+
**Audio files (`website/static/audio/`)**: Typically excluded from git due to size:
287+
```gitignore
288+
# Add to .gitignore if needed
289+
website/static/audio/*.wav
290+
```
151291

152292
## Related Documentation
153293

0 commit comments

Comments
 (0)