Skip to content

Commit b47c048

Browse files
committed
Added podcast version
1 parent a18d148 commit b47c048

35 files changed

+1395
-27
lines changed

scripts/README.md

Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
# Podcast Audio Generator
2+
3+
Converts course MDX/Markdown content into podcast-style conversational audio using Google's Gemini API.
4+
5+
## Features
6+
7+
- **Two-speaker dialogue**: Converts technical documentation into natural conversations between Alex (instructor) and Sam (senior engineer)
8+
- **Gemini 2.5 Flash**: Uses latest models for dialogue generation and TTS
9+
- **Multi-speaker TTS**: Natural voice synthesis with distinct speaker voices
10+
- **Automatic processing**: Scans all course content and generates audio files
11+
- **Manifest generation**: Creates JSON manifest mapping docs to audio URLs
12+
13+
## Prerequisites
14+
15+
- Node.js 20+
16+
- Google Gemini API key
17+
- Course content in `website/docs/` directory
18+
19+
## Setup
20+
21+
1. **Get API Key**: Obtain a Gemini API key from [Google AI Studio](https://aistudio.google.com/)
22+
23+
2. **Set Environment Variable**:
24+
```bash
25+
export GOOGLE_API_KEY="your-api-key-here"
26+
# OR
27+
export GEMINI_API_KEY="your-api-key-here"
28+
# OR
29+
export GCP_API_KEY="your-api-key-here"
30+
```
31+
32+
3. **Install Dependencies** (already done if you ran setup):
33+
```bash
34+
cd scripts
35+
npm install
36+
```
37+
38+
## Usage
39+
40+
### From website directory:
41+
```bash
42+
cd website
43+
npm run generate-podcast
44+
```
45+
46+
### From scripts directory:
47+
```bash
48+
cd scripts
49+
npm run generate-podcast
50+
```
51+
52+
### Direct execution:
53+
```bash
54+
cd scripts
55+
node generate-podcast.js
56+
```
57+
58+
## Output
59+
60+
- **Audio files**: `website/static/audio/[lesson-path]/[filename].wav`
61+
- **Manifest**: `website/static/audio/manifest.json`
62+
63+
### Manifest Structure
64+
65+
```json
66+
{
67+
"understanding-the-tools/lesson-1-intro.md": {
68+
"audioUrl": "/audio/understanding-the-tools/lesson-1-intro.wav",
69+
"size": 1234567,
70+
"format": "audio/wav",
71+
"generatedAt": "2025-10-29T12:34:56.789Z"
72+
}
73+
}
74+
```
75+
76+
## Processing Pipeline
77+
78+
1. **Content Discovery**: Scans `website/docs/` for .md/.mdx files
79+
2. **Content Parsing**: Strips frontmatter, JSX, code blocks
80+
3. **Dialogue Generation**: Uses Gemini 2.5 Flash to create conversational script
81+
4. **Audio Synthesis**: Uses Gemini 2.5 Flash TTS with multi-speaker config
82+
5. **File Output**: Saves WAV files and updates manifest
83+
84+
## Configuration
85+
86+
### Models
87+
- **Dialogue**: `gemini-2.5-flash` (text generation)
88+
- **TTS**: `gemini-2.5-flash-preview-tts` (audio synthesis)
89+
90+
### Speakers
91+
- **Alex**: "Kore" voice (firm, professional)
92+
- **Sam**: "Puck" voice (upbeat, curious)
93+
94+
### Rate Limiting
95+
- 2-second delay between files to avoid API rate limits
96+
- Sequential processing (not parallel)
97+
98+
## Cost Estimation
99+
100+
Using Gemini 2.5 Flash pricing:
101+
- **Text generation**: $0.50 per 1M tokens
102+
- **Audio output**: $10.00 per 1M tokens
103+
104+
Estimated cost for full course (12 lessons):
105+
- **~$0.50-1.50** total (assuming ~200KB content)
106+
107+
## Utility Scripts
108+
109+
### fix-wav-files.js
110+
Repairs corrupted WAV files (adds missing headers to raw PCM data):
111+
112+
```bash
113+
cd scripts
114+
node fix-wav-files.js
115+
```
116+
117+
This creates `.bak` backups and adds proper RIFF/WAV headers to headerless PCM files.
118+
119+
## Troubleshooting
120+
121+
### "No API key found"
122+
Set one of the environment variables: `GOOGLE_API_KEY`, `GEMINI_API_KEY`, or `GCP_API_KEY`
123+
124+
### "Module not found"
125+
Run `npm install` in the scripts directory
126+
127+
### Corrupted/unplayable WAV files
128+
Gemini API returns raw PCM data without WAV headers. The script now automatically adds headers, but if you have old files, run:
129+
```bash
130+
node fix-wav-files.js
131+
```
132+
133+
### Audio quality issues
134+
The Gemini 2.5 Flash TTS model is in preview and may have some background noise in long generations (known issue as of October 2025)
135+
136+
### Rate limit errors
137+
Increase the delay between files in the script (currently 2000ms)
138+
139+
## Development
140+
141+
### Test with single file
142+
Modify the script to process only one file for testing:
143+
144+
```javascript
145+
// In main(), after finding files:
146+
const files = findMarkdownFiles(DOCS_DIR).slice(0, 1); // Test first file only
147+
```
148+
149+
### Skip CLAUDE.md
150+
The script automatically skips `CLAUDE.md` files (project instructions)
151+
152+
## Related Documentation
153+
154+
- [Gemini API - Speech Generation](https://ai.google.dev/gemini-api/docs/speech-generation)
155+
- [Gemini API - Models](https://ai.google.dev/gemini-api/docs/models)
156+
- [Gemini API - Pricing](https://ai.google.dev/gemini-api/docs/pricing)

scripts/fix-wav-files.js

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
#!/usr/bin/env node
2+
3+
/**
4+
* Fix existing WAV files by adding proper headers
5+
* Converts raw PCM data to valid WAV format
6+
*/
7+
8+
import { readdirSync, readFileSync, writeFileSync, statSync, renameSync } from 'fs';
9+
import { join, dirname } from 'path';
10+
import { fileURLToPath } from 'url';
11+
12+
const __filename = fileURLToPath(import.meta.url);
13+
const __dirname = dirname(__filename);
14+
15+
const AUDIO_OUTPUT_DIR = join(__dirname, '../website/static/audio');
16+
17+
function createWavHeader(pcmDataLength) {
18+
const header = Buffer.alloc(44);
19+
20+
// RIFF chunk descriptor
21+
header.write('RIFF', 0);
22+
header.writeUInt32LE(36 + pcmDataLength, 4);
23+
header.write('WAVE', 8);
24+
25+
// fmt subchunk
26+
header.write('fmt ', 12);
27+
header.writeUInt32LE(16, 16);
28+
header.writeUInt16LE(1, 20);
29+
header.writeUInt16LE(1, 22);
30+
header.writeUInt32LE(24000, 24);
31+
header.writeUInt32LE(24000 * 1 * 2, 28);
32+
header.writeUInt16LE(1 * 2, 32);
33+
header.writeUInt16LE(16, 34);
34+
35+
// data subchunk
36+
header.write('data', 36);
37+
header.writeUInt32LE(pcmDataLength, 40);
38+
39+
return header;
40+
}
41+
42+
function findWavFiles(dir) {
43+
const files = [];
44+
45+
function traverse(currentDir) {
46+
const items = readdirSync(currentDir);
47+
48+
for (const item of items) {
49+
const fullPath = join(currentDir, item);
50+
const stat = statSync(fullPath);
51+
52+
if (stat.isDirectory()) {
53+
traverse(fullPath);
54+
} else if (item.endsWith('.wav')) {
55+
files.push(fullPath);
56+
}
57+
}
58+
}
59+
60+
traverse(dir);
61+
return files;
62+
}
63+
64+
function isValidWav(buffer) {
65+
if (buffer.length < 12) return false;
66+
const header = buffer.slice(0, 12).toString('ascii', 0, 12);
67+
return header.startsWith('RIFF') && header.includes('WAVE');
68+
}
69+
70+
async function fixWavFile(filePath) {
71+
console.log(`\n📄 ${filePath}`);
72+
73+
const buffer = readFileSync(filePath);
74+
75+
// Check if already valid
76+
if (isValidWav(buffer)) {
77+
console.log(' ✅ Already valid WAV file - skipping');
78+
return { status: 'skipped', path: filePath };
79+
}
80+
81+
console.log(' 🔧 Adding WAV header...');
82+
83+
// Buffer is raw PCM - add WAV header
84+
const wavHeader = createWavHeader(buffer.length);
85+
const wavBuffer = Buffer.concat([wavHeader, buffer]);
86+
87+
// Backup original
88+
const backupPath = filePath + '.bak';
89+
renameSync(filePath, backupPath);
90+
91+
// Write fixed file
92+
writeFileSync(filePath, wavBuffer);
93+
94+
// Verify
95+
const verifyBuffer = readFileSync(filePath);
96+
if (isValidWav(verifyBuffer)) {
97+
console.log(' ✅ Fixed successfully');
98+
console.log(` Original: ${(buffer.length / 1024 / 1024).toFixed(2)} MB`);
99+
console.log(` Fixed: ${(wavBuffer.length / 1024 / 1024).toFixed(2)} MB`);
100+
console.log(` Backup: ${backupPath}`);
101+
return { status: 'fixed', path: filePath, backup: backupPath };
102+
} else {
103+
console.log(' ❌ Fix failed - restoring backup');
104+
renameSync(backupPath, filePath);
105+
return { status: 'failed', path: filePath };
106+
}
107+
}
108+
109+
async function main() {
110+
console.log('🔧 WAV File Repair Utility\n');
111+
console.log(`📂 Audio directory: ${AUDIO_OUTPUT_DIR}\n`);
112+
113+
const files = findWavFiles(AUDIO_OUTPUT_DIR);
114+
console.log(`Found ${files.length} WAV files\n`);
115+
116+
if (files.length === 0) {
117+
console.log('No WAV files to process.');
118+
return;
119+
}
120+
121+
console.log('='.repeat(60));
122+
123+
const results = { fixed: 0, skipped: 0, failed: 0 };
124+
125+
for (const file of files) {
126+
try {
127+
const result = await fixWavFile(file);
128+
results[result.status]++;
129+
} catch (error) {
130+
console.error(` ❌ Error: ${error.message}`);
131+
results.failed++;
132+
}
133+
}
134+
135+
console.log('\n' + '='.repeat(60));
136+
console.log('\n📊 Summary:');
137+
console.log(` ✅ Fixed: ${results.fixed}`);
138+
console.log(` ⏭️ Skipped (already valid): ${results.skipped}`);
139+
console.log(` ❌ Failed: ${results.failed}`);
140+
141+
if (results.fixed > 0) {
142+
console.log('\n💡 Tip: .bak files can be deleted after verifying audio playback');
143+
}
144+
145+
console.log('\n✨ Done!\n');
146+
}
147+
148+
main().catch(error => {
149+
console.error('\n💥 Fatal error:', error);
150+
process.exit(1);
151+
});

0 commit comments

Comments
 (0)