Aletheia Video GeneratorAletheia Video Generator
Aletheia Video Generator
Turns ElevenLabs audiobook output (audio + SRT) into a faceless video with AI-generated contextual visuals.
Setup
pip install replicate anthropic moviepy srt
Set your API keys:
export REPLICATE_API_TOKEN="r8_your_key_here"
export ANTHROPIC_API_KEY="sk-ant-your_key_here"
Workflow
Step 1: Generate prompts only (review before spending on video generation)
python generate_video.py \
--srt reeks01.srt \
--audio reeks01.mp3 \
--prompts-only \
--save-prompts prompts.json
This parses your SRT, groups cues into paragraphs, and uses Claude to generate a cinematic video prompt per paragraph. The output prompts.json looks like:
[
{
"type": "paragraph",
"start": 0.0,
"end": 17.0,
"duration": 17.0,
"text": "Sessie 1: Wat bedoel ons met God...",
"prompt": "Slow dolly through ancient cathedral nave, golden light streaming through stained glass"
},
{
"type": "separator",
"start": 38.0,
"end": 40.0
}
]
Edit the prompts in this file if any don’t feel right, then proceed to step 2.
Step 2: Generate video using your reviewed prompts
python generate_video.py \
--srt reeks01.srt \
--audio reeks01.mp3 \
--prompts-file prompts.json \
--output reeks01_video.mp4
This calls Replicate Wan 2.1 for each paragraph, then stitches the clips with the audio into a final MP4.
One-shot (skip review)
python generate_video.py \
--srt reeks01.srt \
--audio reeks01.mp3 \
--output reeks01_video.mp4 \
--save-prompts prompts.json
Generates prompts, saves them for reference, generates clips, and stitches — all in one go.
Options
| Flag | Default | Description |
|---|---|---|
--srt |
required | Path to SRT file from ElevenLabs |
--audio |
required | Path to audio file (MP3/WAV) |
--output |
output.mp4 |
Output video path |
--cues-per-clip |
4 |
Number of SRT cues grouped per video clip |
--prompts-file |
— | Load prompts from JSON (skips Claude) |
--save-prompts |
— | Save prompts to JSON for review |
--prompts-only |
— | Only generate prompts, don’t make video |
--video-model |
wavespeedai/wan-2.1-t2v-480p |
Replicate model to use |
Tuning
--cues-per-clip controls visual pacing. With the default of 4, you get a new visual roughly every 10–15 seconds. Set it lower (2–3) for faster cuts, higher (6–8) for longer held shots.
Video model alternatives on Replicate:
wavespeedai/wan-2.1-t2v-480p— fastest, cheapest (~$0.02/clip)wan-video/wan-2.1-t2v-720p— higher resolution, slower (~$0.08/clip)
Cost estimate
For 8 sessions (~2,400 words, ~18 min audio):
- ~40–50 paragraphs at 4 cues each
- Claude prompt generation: ~$0.10
- Replicate Wan 480p: ~$0.02 × 45 clips = ~$0.90
- Total: ~$1.00
After generation
Drop the final MP4 into ElevenLabs Studio to:
- Add captions (Studio auto-transcribes)
- Add background music
- Fine-tune any cuts
© Attie Retief, 2025