14 KiB

Raw Blame History

AGENTS.md — VideoMakerBot Development Guide

Project Overview

VideoMakerBot — Automated short-form video creator from social media content.

Status: Production-ready, actively maintained (v3.4.0) Language: Python 3.10+ Platforms: Reddit (original), Threads (NEW), X/Twitter (planned)

Core Mission

Transforms social media threads (post + comments/replies) into complete short-form videos with:

AI-generated speech (7+ TTS providers)
UI screenshots (Playwright)
Background video/audio overlays
FFmpeg composition & output

Architecture at a Glance

main.py (CLI)
    ↓ [platform factory]
    ├─→ reddit/subreddit.py [PRAW API]
    └─→ platforms/threads/fetcher.py [Graph API]
        ↓ [standard data dict]
        ├─→ TTS/engine_wrapper.py [7+ providers]
        ├─→ screenshot_downloader.py (Reddit)
        │   or platforms/threads/screenshot.py (Threads)
        ├─→ video_creation/background.py
        └─→ video_creation/final_video.py [FFmpeg]
            ↓
            results/{category}/{video.mp4}

Key Design: Platform Abstraction via Factory Pattern

Why: Single codebase supports multiple platforms without tight coupling.

How: platforms/__init__.py exports:

get_content_object(POST_ID=None) — routes to right fetcher
get_screenshot_fn() — routes to right screenshotter

Result: Adding X/Twitter requires only: new module + config section + two elif branches.

Data Contract: The "content_object" Dict

All fetchers return this shape (defined in platforms/__init__.py):

{
    # Unique identifiers
    "thread_id":       str,           # Used for temp folder: assets/temp/{id}/
    "thread_category": str,           # "reddit", "threads", etc. → output folder

    # Content
    "thread_title":    str,           # TTS as title + output filename
    "thread_url":      str,           # Playwright navigates here for screenshot
    "is_nsfw":         bool,          # Content filter flag

    # Replies/Comments (mutually exclusive with thread_post)
    "comments": [
        {
            "comment_body": str,      # TTS per reply
            "comment_url":  str,      # Playwright navigates here
            "comment_id":   str,      # CSS selector ID or unique identifier
        }
    ],

    # OR Story mode:
    "thread_post":     str | list,    # Long-form text (no comments)
}

Why: Loose coupling—TTS, backgrounds, and video composition don't need platform-specific logic.

File Organization

VideoMakerBot/
├── platforms/                      # Multi-platform abstraction
│   ├── __init__.py                # Factory: get_content_object(), get_screenshot_fn()
│   └── threads/                   # Threads (Meta) implementation
│       ├── fetcher.py             # Graph API → content_object
│       └── screenshot.py          # Playwright Threads screenshotter
│
├── reddit/                        # Reddit implementation (kept as-is)
│   └── subreddit.py              # PRAW API → content_object + thread_category
│
├── video_creation/
│   ├── final_video.py            # FFmpeg composition (platform-aware folder naming)
│   ├── screenshot_downloader.py  # Playwright Reddit UI capturer
│   ├── voices.py                 # TTS orchestrator (platform-agnostic)
│   ├── background.py             # Video/audio downloader (platform-agnostic)
│   └── data/
│       ├── videos.json           # Dedup tracker
│       ├── cookie-dark-mode.json # Reddit theme cookie
│       └── cookie-threads.json   # Threads session cookie (auto-created)
│
├── TTS/                          # Text-to-Speech
│   ├── engine_wrapper.py         # Provider abstraction + post_lang fallback
│   ├── elevenlabs.py, aws_polly.py, etc. # 7+ provider implementations
│
├── utils/
│   ├── settings.py               # Config loading + validation
│   ├── videos.py                 # check_done() + check_done_by_id()
│   ├── console.py                # Rich terminal output
│   ├── .config.template.toml     # Config schema (platform sections)
│   └── ... (id, voice, cleanup, etc.)
│
├── main.py                       # CLI entry (platform-routed via factory)
├── GUI.py                        # Flask web UI (localhost:4000 in host mode, 0.0.0.0 in Docker)
├── requirements.txt              # Dependencies
└── AGENTS.md / AGENT.md          # This file + agent guidelines

Configuration

File: utils/.config.template.toml (schema) → config.toml (user config)

Platform Selection

[settings]
platform = "reddit"     # or "threads"
post_lang = "es-cr"     # Optional: translation language (all platforms)

Reddit Config

[reddit.creds]
client_id = "..."       # OAuth app
client_secret = "..."
username = "..."
password = "..."
2fa = true/false

[reddit.thread]
subreddit = "AskReddit"
post_id = ""            # Leave blank for auto-pick
max_comment_length = 500
min_comment_length = 1
min_comments = 20
blocked_words = "..."

Threads Config (NEW)

[threads.creds]
access_token = "EAABsbCS..."  # Meta Graph API token (60-day expiry)
user_id = "12345678901234567"
username = "your_insta"       # For Playwright login
password = "your_password"

[threads.thread]
post_id = ""            # Leave blank for auto-pick
max_reply_length = 500
min_reply_length = 1
min_replies = 5
blocked_words = "..."

Generic Settings

[settings]
theme = "dark"
resolution_w = 1080
resolution_h = 1920
storymode = false
times_to_run = 1

[settings.tts]
voice_choice = "tiktok"     # or "elevenlabs", "awspolly", "googletranslate", etc.
random_voice = true
silence_duration = 0.3

[settings.background]
background_video = "minecraft"
background_audio = "lofi"
background_audio_volume = 0.15

Development Guidelines

✅ DO:

Use platform factory in main.py

from platforms import get_content_object, get_screenshot_fn
reddit_object = get_content_object(POST_ID)
screenshot_fn = get_screenshot_fn()
screenshot_fn(reddit_object, number_of_comments)

Return standard content dict from all fetchers

return {
    "thread_id": ...,
    "thread_category": ...,  # NEW: replaces hardcoded subreddit
    "comments": [...]
}

Use config fallback chains for cross-platform keys

lang = (settings.config["settings"].get("post_lang") or
        settings.config.get("reddit", {}).get("thread", {}).get("post_lang", ""))

Read thread_category from dict instead of config

# WRONG:
subreddit = settings.config["reddit"]["thread"]["subreddit"]

# RIGHT:
platform = settings.config["settings"].get("platform", "reddit")
if platform == "reddit":
    subreddit = settings.config["reddit"]["thread"]["subreddit"]
else:
    subreddit = reddit_obj.get("thread_category", platform)

Test both platforms after core pipeline changes

# Test Reddit (must not regress)
sed -i 's/platform = "threads"/platform = "reddit"/' config.toml
python3 main.py

# Test Threads
sed -i 's/platform = "reddit"/platform = "threads"/' config.toml
python3 main.py --post-id <threads-id>

❌ DON'T:

Don't import platform modules directly in main.py/utils

# WRONG: from reddit.subreddit import get_subreddit_threads
# RIGHT: from platforms import get_content_object

Don't hardcode platform names in generic modules

# WRONG in final_video.py:
subreddit = settings.config["reddit"]["thread"]["subreddit"]

# RIGHT:
subreddit = reddit_obj.get("thread_category", "unknown")

Don't add platform-specific UI selectors outside platforms/{platform}/screenshot.py
- Reddit selectors stay in video_creation/screenshot_downloader.py
- Threads selectors stay in platforms/threads/screenshot.py

Don't assume config keys exist without fallback

# WRONG: lang = settings.config["reddit"]["thread"]["post_lang"]
# RIGHT: lang = settings.config.get("settings", {}).get("post_lang", "")

Platform-Specific Knowledge

API: PRAW (Python Reddit API Wrapper)
Auth: OAuth app (client_id, secret) + username/password
Screenshot: Playwright on reddit.com/new.reddit.com
- Login form: input[name="username"], input[name="password"]
- Post selector: [data-test-id="post-content"]
- Comment selector: #t1_{comment_id}
NSFW: submission.over_18
Output folder: results/{subreddit}/

Threads

API: Meta Graph API (v18.0+)
Auth: User access token (60-day lifetime) via https://developers.facebook.com/
Screenshot: Playwright on threads.net
- Login form: input[autocomplete="username"], input[autocomplete="current-password"]
- Post selector: article (universal, more stable than Reddit)
- Cookies saved to: video_creation/data/cookie-threads.json
NSFW: API doesn't provide; always False
Output folder: results/threads/

Future: X/Twitter

Create: platforms/twitter/fetcher.py + platforms/twitter/screenshot.py + config section Update: platforms/__init__.py with elif platform == "twitter" branches

Extending the Project

Adding a New TTS Provider

Create TTS/my_provider.py with a class implementing the TTS interface
Add config keys to [settings.tts] in .config.template.toml
Update TTS/engine_wrapper.py to call your provider
Test with settings.config["settings"]["tts"]["voice_choice"] = "my_provider"

Adding a New Platform (e.g., X/Twitter)

Create fetcher: platforms/twitter/fetcher.py
- Implement get_twitter_content(POST_ID=None) returning standard dict
Create screenshotter: platforms/twitter/screenshot.py
- Implement get_screenshots_of_twitter_posts(content_object, screenshot_num)
Update config: Add [twitter.creds] and [twitter.thread] sections
Update factory: Add elif platform == "twitter" in platforms/__init__.py
Update CLI helper: Add case to _get_platform_post_id() in main.py
Test: Verify Reddit mode still works, test Twitter mode end-to-end

Zero changes needed to: TTS, backgrounds, video composition, or utils.

Debugging Tips

"No matching distribution found for yt-dlp==2026.3.17"

→ yt-dlp uses date versioning (YYYY.M.DD, no leading zeros). Use 2025.10.14 (latest stable).

"Threads API: Invalid or expired access_token"

→ Meta tokens expire every 60 days. Refresh at https://developers.facebook.com/tools/explorer/

Playwright timeout on Threads screenshot

→ Login cookies corrupted or expired. Delete video_creation/data/cookie-threads.json to force fresh login next run.

"No eligible Threads posts found"

→ Configure [threads.thread].min_replies = 5 (or lower). Ensure your Threads account has public posts with replies.

Video dedup not working

→ Check video_creation/data/videos.json is writable. Ensure check_done_by_id() is called before fetching content.

Testing Checklist

Reddit mode: platform = "reddit" produces video to results/{subreddit}/
Threads mode: platform = "threads" produces video to results/threads/
Video dedup: Running same post_id twice skips second run
Translation: post_lang = "es" translates filenames
TTS providers: Test with different voice_choice values
Background selection: Custom background video/audio works
Story mode: storymode=true only uses thread_post, not comments
Error handling: Invalid credentials show clear messages

Key Files to Know

File	Purpose
`main.py`	CLI entry; orchestrates pipeline via factory
`platforms/__init__.py`	Factory dispatch for multi-platform support
`platforms/threads/fetcher.py`	Threads Graph API client
`platforms/threads/screenshot.py`	Threads.net Playwright screenshotter
`video_creation/final_video.py`	FFmpeg composition; platform-aware output naming
`TTS/engine_wrapper.py`	TTS provider abstraction; post_lang fallback
`utils/settings.py`	Config loading & validation
`utils/videos.py`	Video dedup tracking
`utils/.config.template.toml`	Config schema
`requirements.txt`	Dependencies

Useful Commands

# Install dependencies
pip install -r requirements.txt

# Run CLI
python3 main.py

# Run with specific post
python3 main.py <post_id>

# Run Flask GUI
python3 GUI.py

# Check syntax
python3 -m py_compile main.py platforms/threads/fetcher.py

# Format code
black main.py platforms/ utils/

# Lint
pylint main.py

Docker Workflow

Use docker compose build to build the shared image for both CLI and GUI.
Use docker compose up gui to run the Flask app on port 4000.
Use docker compose run --rm cli to run the video generator in a container.
The repo root is bind-mounted in Compose, so config.toml, results/, assets/temp/, video_creation/data/videos.json, and utils/backgrounds.json should persist across runs.
The GUI must bind to 0.0.0.0 in Docker; do not switch it back to localhost for container use.

When You Get Stuck

"What does this module do?" → Check imports in main.py or docstrings
"How do I add support for platform X?" → See "Adding a New Platform" section above
"Why is my config not being read?" → Check utils/settings.py:check_toml() and .config.template.toml schema
"Why isn't my TTS provider being called?" → Check TTS/engine_wrapper.py:make_voice() and config voice_choice
"How do I debug the Playwright screenshot?" → Uncomment page.pause() in screenshot downloader, run headful browser

Good luck! 🚀

14 KiB Raw Blame History