14 KiB
AGENTS.md — VideoMakerBot Development Guide
Project Overview
VideoMakerBot — Automated short-form video creator from social media content.
Status: Production-ready, actively maintained (v3.4.0) Language: Python 3.10+ Platforms: Reddit (original), Threads (NEW), X/Twitter (planned)
Core Mission
Transforms social media threads (post + comments/replies) into complete short-form videos with:
- AI-generated speech (7+ TTS providers)
- UI screenshots (Playwright)
- Background video/audio overlays
- FFmpeg composition & output
Architecture at a Glance
main.py (CLI)
↓ [platform factory]
├─→ reddit/subreddit.py [PRAW API]
└─→ platforms/threads/fetcher.py [Graph API]
↓ [standard data dict]
├─→ TTS/engine_wrapper.py [7+ providers]
├─→ screenshot_downloader.py (Reddit)
│ or platforms/threads/screenshot.py (Threads)
├─→ video_creation/background.py
└─→ video_creation/final_video.py [FFmpeg]
↓
results/{category}/{video.mp4}
Key Design: Platform Abstraction via Factory Pattern
Why: Single codebase supports multiple platforms without tight coupling.
How: platforms/__init__.py exports:
get_content_object(POST_ID=None)— routes to right fetcherget_screenshot_fn()— routes to right screenshotter
Result: Adding X/Twitter requires only: new module + config section + two elif branches.
Data Contract: The "content_object" Dict
All fetchers return this shape (defined in platforms/__init__.py):
{
# Unique identifiers
"thread_id": str, # Used for temp folder: assets/temp/{id}/
"thread_category": str, # "reddit", "threads", etc. → output folder
# Content
"thread_title": str, # TTS as title + output filename
"thread_url": str, # Playwright navigates here for screenshot
"is_nsfw": bool, # Content filter flag
# Replies/Comments (mutually exclusive with thread_post)
"comments": [
{
"comment_body": str, # TTS per reply
"comment_url": str, # Playwright navigates here
"comment_id": str, # CSS selector ID or unique identifier
}
],
# OR Story mode:
"thread_post": str | list, # Long-form text (no comments)
}
Why: Loose coupling—TTS, backgrounds, and video composition don't need platform-specific logic.
File Organization
VideoMakerBot/
├── platforms/ # Multi-platform abstraction
│ ├── __init__.py # Factory: get_content_object(), get_screenshot_fn()
│ └── threads/ # Threads (Meta) implementation
│ ├── fetcher.py # Graph API → content_object
│ └── screenshot.py # Playwright Threads screenshotter
│
├── reddit/ # Reddit implementation (kept as-is)
│ └── subreddit.py # PRAW API → content_object + thread_category
│
├── video_creation/
│ ├── final_video.py # FFmpeg composition (platform-aware folder naming)
│ ├── screenshot_downloader.py # Playwright Reddit UI capturer
│ ├── voices.py # TTS orchestrator (platform-agnostic)
│ ├── background.py # Video/audio downloader (platform-agnostic)
│ └── data/
│ ├── videos.json # Dedup tracker
│ ├── cookie-dark-mode.json # Reddit theme cookie
│ └── cookie-threads.json # Threads session cookie (auto-created)
│
├── TTS/ # Text-to-Speech
│ ├── engine_wrapper.py # Provider abstraction + post_lang fallback
│ ├── elevenlabs.py, aws_polly.py, etc. # 7+ provider implementations
│
├── utils/
│ ├── settings.py # Config loading + validation
│ ├── videos.py # check_done() + check_done_by_id()
│ ├── console.py # Rich terminal output
│ ├── .config.template.toml # Config schema (platform sections)
│ └── ... (id, voice, cleanup, etc.)
│
├── main.py # CLI entry (platform-routed via factory)
├── GUI.py # Flask web UI (localhost:4000 in host mode, 0.0.0.0 in Docker)
├── requirements.txt # Dependencies
└── AGENTS.md / AGENT.md # This file + agent guidelines
Configuration
File: utils/.config.template.toml (schema) → config.toml (user config)
Platform Selection
[settings]
platform = "reddit" # or "threads"
post_lang = "es-cr" # Optional: translation language (all platforms)
Reddit Config
[reddit.creds]
client_id = "..." # OAuth app
client_secret = "..."
username = "..."
password = "..."
2fa = true/false
[reddit.thread]
subreddit = "AskReddit"
post_id = "" # Leave blank for auto-pick
max_comment_length = 500
min_comment_length = 1
min_comments = 20
blocked_words = "..."
Threads Config (NEW)
[threads.creds]
access_token = "EAABsbCS..." # Meta Graph API token (60-day expiry)
user_id = "12345678901234567"
username = "your_insta" # For Playwright login
password = "your_password"
[threads.thread]
post_id = "" # Leave blank for auto-pick
max_reply_length = 500
min_reply_length = 1
min_replies = 5
blocked_words = "..."
Generic Settings
[settings]
theme = "dark"
resolution_w = 1080
resolution_h = 1920
storymode = false
times_to_run = 1
[settings.tts]
voice_choice = "tiktok" # or "elevenlabs", "awspolly", "googletranslate", etc.
random_voice = true
silence_duration = 0.3
[settings.background]
background_video = "minecraft"
background_audio = "lofi"
background_audio_volume = 0.15
Development Guidelines
✅ DO:
-
Use platform factory in main.py
from platforms import get_content_object, get_screenshot_fn reddit_object = get_content_object(POST_ID) screenshot_fn = get_screenshot_fn() screenshot_fn(reddit_object, number_of_comments) -
Return standard content dict from all fetchers
return { "thread_id": ..., "thread_category": ..., # NEW: replaces hardcoded subreddit "comments": [...] } -
Use config fallback chains for cross-platform keys
lang = (settings.config["settings"].get("post_lang") or settings.config.get("reddit", {}).get("thread", {}).get("post_lang", "")) -
Read thread_category from dict instead of config
# WRONG: subreddit = settings.config["reddit"]["thread"]["subreddit"] # RIGHT: platform = settings.config["settings"].get("platform", "reddit") if platform == "reddit": subreddit = settings.config["reddit"]["thread"]["subreddit"] else: subreddit = reddit_obj.get("thread_category", platform) -
Test both platforms after core pipeline changes
# Test Reddit (must not regress) sed -i 's/platform = "threads"/platform = "reddit"/' config.toml python3 main.py # Test Threads sed -i 's/platform = "reddit"/platform = "threads"/' config.toml python3 main.py --post-id <threads-id>
❌ DON'T:
-
Don't import platform modules directly in main.py/utils
# WRONG: from reddit.subreddit import get_subreddit_threads # RIGHT: from platforms import get_content_object -
Don't hardcode platform names in generic modules
# WRONG in final_video.py: subreddit = settings.config["reddit"]["thread"]["subreddit"] # RIGHT: subreddit = reddit_obj.get("thread_category", "unknown") -
Don't add platform-specific UI selectors outside
platforms/{platform}/screenshot.py- Reddit selectors stay in
video_creation/screenshot_downloader.py - Threads selectors stay in
platforms/threads/screenshot.py
- Reddit selectors stay in
-
Don't assume config keys exist without fallback
# WRONG: lang = settings.config["reddit"]["thread"]["post_lang"] # RIGHT: lang = settings.config.get("settings", {}).get("post_lang", "")
Platform-Specific Knowledge
- API: PRAW (Python Reddit API Wrapper)
- Auth: OAuth app (client_id, secret) + username/password
- Screenshot: Playwright on reddit.com/new.reddit.com
- Login form:
input[name="username"],input[name="password"] - Post selector:
[data-test-id="post-content"] - Comment selector:
#t1_{comment_id}
- Login form:
- NSFW:
submission.over_18 - Output folder:
results/{subreddit}/
Threads
- API: Meta Graph API (v18.0+)
- Auth: User access token (60-day lifetime) via https://developers.facebook.com/
- Screenshot: Playwright on threads.net
- Login form:
input[autocomplete="username"],input[autocomplete="current-password"] - Post selector:
article(universal, more stable than Reddit) - Cookies saved to:
video_creation/data/cookie-threads.json
- Login form:
- NSFW: API doesn't provide; always False
- Output folder:
results/threads/
Future: X/Twitter
Create: platforms/twitter/fetcher.py + platforms/twitter/screenshot.py + config section
Update: platforms/__init__.py with elif platform == "twitter" branches
Extending the Project
Adding a New TTS Provider
- Create
TTS/my_provider.pywith a class implementing the TTS interface - Add config keys to
[settings.tts]in.config.template.toml - Update
TTS/engine_wrapper.pyto call your provider - Test with
settings.config["settings"]["tts"]["voice_choice"] = "my_provider"
Adding a New Platform (e.g., X/Twitter)
- Create fetcher:
platforms/twitter/fetcher.py- Implement
get_twitter_content(POST_ID=None)returning standard dict
- Implement
- Create screenshotter:
platforms/twitter/screenshot.py- Implement
get_screenshots_of_twitter_posts(content_object, screenshot_num)
- Implement
- Update config: Add
[twitter.creds]and[twitter.thread]sections - Update factory: Add
elif platform == "twitter"inplatforms/__init__.py - Update CLI helper: Add case to
_get_platform_post_id()inmain.py - Test: Verify Reddit mode still works, test Twitter mode end-to-end
Zero changes needed to: TTS, backgrounds, video composition, or utils.
Debugging Tips
"No matching distribution found for yt-dlp==2026.3.17"
→ yt-dlp uses date versioning (YYYY.M.DD, no leading zeros). Use 2025.10.14 (latest stable).
"Threads API: Invalid or expired access_token"
→ Meta tokens expire every 60 days. Refresh at https://developers.facebook.com/tools/explorer/
Playwright timeout on Threads screenshot
→ Login cookies corrupted or expired. Delete video_creation/data/cookie-threads.json to force fresh login next run.
"No eligible Threads posts found"
→ Configure [threads.thread].min_replies = 5 (or lower). Ensure your Threads account has public posts with replies.
Video dedup not working
→ Check video_creation/data/videos.json is writable. Ensure check_done_by_id() is called before fetching content.
Testing Checklist
- Reddit mode:
platform = "reddit"produces video toresults/{subreddit}/ - Threads mode:
platform = "threads"produces video toresults/threads/ - Video dedup: Running same post_id twice skips second run
- Translation:
post_lang = "es"translates filenames - TTS providers: Test with different voice_choice values
- Background selection: Custom background video/audio works
- Story mode: storymode=true only uses thread_post, not comments
- Error handling: Invalid credentials show clear messages
Key Files to Know
| File | Purpose |
|---|---|
main.py |
CLI entry; orchestrates pipeline via factory |
platforms/__init__.py |
Factory dispatch for multi-platform support |
platforms/threads/fetcher.py |
Threads Graph API client |
platforms/threads/screenshot.py |
Threads.net Playwright screenshotter |
video_creation/final_video.py |
FFmpeg composition; platform-aware output naming |
TTS/engine_wrapper.py |
TTS provider abstraction; post_lang fallback |
utils/settings.py |
Config loading & validation |
utils/videos.py |
Video dedup tracking |
utils/.config.template.toml |
Config schema |
requirements.txt |
Dependencies |
Useful Commands
# Install dependencies
pip install -r requirements.txt
# Run CLI
python3 main.py
# Run with specific post
python3 main.py <post_id>
# Run Flask GUI
python3 GUI.py
# Check syntax
python3 -m py_compile main.py platforms/threads/fetcher.py
# Format code
black main.py platforms/ utils/
# Lint
pylint main.py
Docker Workflow
- Use
docker compose buildto build the shared image for both CLI and GUI. - Use
docker compose up guito run the Flask app on port4000. - Use
docker compose run --rm clito run the video generator in a container. - The repo root is bind-mounted in Compose, so
config.toml,results/,assets/temp/,video_creation/data/videos.json, andutils/backgrounds.jsonshould persist across runs. - The GUI must bind to
0.0.0.0in Docker; do not switch it back tolocalhostfor container use.
When You Get Stuck
- "What does this module do?" → Check imports in
main.pyor docstrings - "How do I add support for platform X?" → See "Adding a New Platform" section above
- "Why is my config not being read?" → Check
utils/settings.py:check_toml()and.config.template.tomlschema - "Why isn't my TTS provider being called?" → Check
TTS/engine_wrapper.py:make_voice()and configvoice_choice - "How do I debug the Playwright screenshot?" → Uncomment
page.pause()in screenshot downloader, run headful browser
Good luck! 🚀