You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
RedditVideoMakerBot/AGENTS.md

14 KiB

AGENTS.md — VideoMakerBot Development Guide

Project Overview

VideoMakerBot — Automated short-form video creator from social media content.

Status: Production-ready, actively maintained (v3.4.0) Language: Python 3.10+ Platforms: Reddit (original), Threads (NEW), X/Twitter (planned)

Core Mission

Transforms social media threads (post + comments/replies) into complete short-form videos with:

  • AI-generated speech (7+ TTS providers)
  • UI screenshots (Playwright)
  • Background video/audio overlays
  • FFmpeg composition & output

Architecture at a Glance

main.py (CLI)
    ↓ [platform factory]
    ├─→ reddit/subreddit.py [PRAW API]
    └─→ platforms/threads/fetcher.py [Graph API]
        ↓ [standard data dict]
        ├─→ TTS/engine_wrapper.py [7+ providers]
        ├─→ screenshot_downloader.py (Reddit)
        │   or platforms/threads/screenshot.py (Threads)
        ├─→ video_creation/background.py
        └─→ video_creation/final_video.py [FFmpeg]
            ↓
            results/{category}/{video.mp4}

Key Design: Platform Abstraction via Factory Pattern

Why: Single codebase supports multiple platforms without tight coupling.

How: platforms/__init__.py exports:

  • get_content_object(POST_ID=None) — routes to right fetcher
  • get_screenshot_fn() — routes to right screenshotter

Result: Adding X/Twitter requires only: new module + config section + two elif branches.


Data Contract: The "content_object" Dict

All fetchers return this shape (defined in platforms/__init__.py):

{
    # Unique identifiers
    "thread_id":       str,           # Used for temp folder: assets/temp/{id}/
    "thread_category": str,           # "reddit", "threads", etc. → output folder

    # Content
    "thread_title":    str,           # TTS as title + output filename
    "thread_url":      str,           # Playwright navigates here for screenshot
    "is_nsfw":         bool,          # Content filter flag

    # Replies/Comments (mutually exclusive with thread_post)
    "comments": [
        {
            "comment_body": str,      # TTS per reply
            "comment_url":  str,      # Playwright navigates here
            "comment_id":   str,      # CSS selector ID or unique identifier
        }
    ],

    # OR Story mode:
    "thread_post":     str | list,    # Long-form text (no comments)
}

Why: Loose coupling—TTS, backgrounds, and video composition don't need platform-specific logic.


File Organization

VideoMakerBot/
├── platforms/                      # Multi-platform abstraction
│   ├── __init__.py                # Factory: get_content_object(), get_screenshot_fn()
│   └── threads/                   # Threads (Meta) implementation
│       ├── fetcher.py             # Graph API → content_object
│       └── screenshot.py          # Playwright Threads screenshotter
│
├── reddit/                        # Reddit implementation (kept as-is)
│   └── subreddit.py              # PRAW API → content_object + thread_category
│
├── video_creation/
│   ├── final_video.py            # FFmpeg composition (platform-aware folder naming)
│   ├── screenshot_downloader.py  # Playwright Reddit UI capturer
│   ├── voices.py                 # TTS orchestrator (platform-agnostic)
│   ├── background.py             # Video/audio downloader (platform-agnostic)
│   └── data/
│       ├── videos.json           # Dedup tracker
│       ├── cookie-dark-mode.json # Reddit theme cookie
│       └── cookie-threads.json   # Threads session cookie (auto-created)
│
├── TTS/                          # Text-to-Speech
│   ├── engine_wrapper.py         # Provider abstraction + post_lang fallback
│   ├── elevenlabs.py, aws_polly.py, etc. # 7+ provider implementations
│
├── utils/
│   ├── settings.py               # Config loading + validation
│   ├── videos.py                 # check_done() + check_done_by_id()
│   ├── console.py                # Rich terminal output
│   ├── .config.template.toml     # Config schema (platform sections)
│   └── ... (id, voice, cleanup, etc.)
│
├── main.py                       # CLI entry (platform-routed via factory)
├── GUI.py                        # Flask web UI (localhost:4000 in host mode, 0.0.0.0 in Docker)
├── requirements.txt              # Dependencies
└── AGENTS.md / AGENT.md          # This file + agent guidelines

Configuration

File: utils/.config.template.toml (schema) → config.toml (user config)

Platform Selection

[settings]
platform = "reddit"     # or "threads"
post_lang = "es-cr"     # Optional: translation language (all platforms)

Reddit Config

[reddit.creds]
client_id = "..."       # OAuth app
client_secret = "..."
username = "..."
password = "..."
2fa = true/false

[reddit.thread]
subreddit = "AskReddit"
post_id = ""            # Leave blank for auto-pick
max_comment_length = 500
min_comment_length = 1
min_comments = 20
blocked_words = "..."

Threads Config (NEW)

[threads.creds]
access_token = "EAABsbCS..."  # Meta Graph API token (60-day expiry)
user_id = "12345678901234567"
username = "your_insta"       # For Playwright login
password = "your_password"

[threads.thread]
post_id = ""            # Leave blank for auto-pick
max_reply_length = 500
min_reply_length = 1
min_replies = 5
blocked_words = "..."

Generic Settings

[settings]
theme = "dark"
resolution_w = 1080
resolution_h = 1920
storymode = false
times_to_run = 1

[settings.tts]
voice_choice = "tiktok"     # or "elevenlabs", "awspolly", "googletranslate", etc.
random_voice = true
silence_duration = 0.3

[settings.background]
background_video = "minecraft"
background_audio = "lofi"
background_audio_volume = 0.15

Development Guidelines

DO:

  1. Use platform factory in main.py

    from platforms import get_content_object, get_screenshot_fn
    reddit_object = get_content_object(POST_ID)
    screenshot_fn = get_screenshot_fn()
    screenshot_fn(reddit_object, number_of_comments)
    
  2. Return standard content dict from all fetchers

    return {
        "thread_id": ...,
        "thread_category": ...,  # NEW: replaces hardcoded subreddit
        "comments": [...]
    }
    
  3. Use config fallback chains for cross-platform keys

    lang = (settings.config["settings"].get("post_lang") or
            settings.config.get("reddit", {}).get("thread", {}).get("post_lang", ""))
    
  4. Read thread_category from dict instead of config

    # WRONG:
    subreddit = settings.config["reddit"]["thread"]["subreddit"]
    
    # RIGHT:
    platform = settings.config["settings"].get("platform", "reddit")
    if platform == "reddit":
        subreddit = settings.config["reddit"]["thread"]["subreddit"]
    else:
        subreddit = reddit_obj.get("thread_category", platform)
    
  5. Test both platforms after core pipeline changes

    # Test Reddit (must not regress)
    sed -i 's/platform = "threads"/platform = "reddit"/' config.toml
    python3 main.py
    
    # Test Threads
    sed -i 's/platform = "reddit"/platform = "threads"/' config.toml
    python3 main.py --post-id <threads-id>
    

DON'T:

  1. Don't import platform modules directly in main.py/utils

    # WRONG: from reddit.subreddit import get_subreddit_threads
    # RIGHT: from platforms import get_content_object
    
  2. Don't hardcode platform names in generic modules

    # WRONG in final_video.py:
    subreddit = settings.config["reddit"]["thread"]["subreddit"]
    
    # RIGHT:
    subreddit = reddit_obj.get("thread_category", "unknown")
    
  3. Don't add platform-specific UI selectors outside platforms/{platform}/screenshot.py

    • Reddit selectors stay in video_creation/screenshot_downloader.py
    • Threads selectors stay in platforms/threads/screenshot.py
  4. Don't assume config keys exist without fallback

    # WRONG: lang = settings.config["reddit"]["thread"]["post_lang"]
    # RIGHT: lang = settings.config.get("settings", {}).get("post_lang", "")
    

Platform-Specific Knowledge

Reddit

  • API: PRAW (Python Reddit API Wrapper)
  • Auth: OAuth app (client_id, secret) + username/password
  • Screenshot: Playwright on reddit.com/new.reddit.com
    • Login form: input[name="username"], input[name="password"]
    • Post selector: [data-test-id="post-content"]
    • Comment selector: #t1_{comment_id}
  • NSFW: submission.over_18
  • Output folder: results/{subreddit}/

Threads

  • API: Meta Graph API (v18.0+)
  • Auth: User access token (60-day lifetime) via https://developers.facebook.com/
  • Screenshot: Playwright on threads.net
    • Login form: input[autocomplete="username"], input[autocomplete="current-password"]
    • Post selector: article (universal, more stable than Reddit)
    • Cookies saved to: video_creation/data/cookie-threads.json
  • NSFW: API doesn't provide; always False
  • Output folder: results/threads/

Future: X/Twitter

Create: platforms/twitter/fetcher.py + platforms/twitter/screenshot.py + config section Update: platforms/__init__.py with elif platform == "twitter" branches


Extending the Project

Adding a New TTS Provider

  1. Create TTS/my_provider.py with a class implementing the TTS interface
  2. Add config keys to [settings.tts] in .config.template.toml
  3. Update TTS/engine_wrapper.py to call your provider
  4. Test with settings.config["settings"]["tts"]["voice_choice"] = "my_provider"

Adding a New Platform (e.g., X/Twitter)

  1. Create fetcher: platforms/twitter/fetcher.py
    • Implement get_twitter_content(POST_ID=None) returning standard dict
  2. Create screenshotter: platforms/twitter/screenshot.py
    • Implement get_screenshots_of_twitter_posts(content_object, screenshot_num)
  3. Update config: Add [twitter.creds] and [twitter.thread] sections
  4. Update factory: Add elif platform == "twitter" in platforms/__init__.py
  5. Update CLI helper: Add case to _get_platform_post_id() in main.py
  6. Test: Verify Reddit mode still works, test Twitter mode end-to-end

Zero changes needed to: TTS, backgrounds, video composition, or utils.


Debugging Tips

"No matching distribution found for yt-dlp==2026.3.17"

→ yt-dlp uses date versioning (YYYY.M.DD, no leading zeros). Use 2025.10.14 (latest stable).

"Threads API: Invalid or expired access_token"

→ Meta tokens expire every 60 days. Refresh at https://developers.facebook.com/tools/explorer/

Playwright timeout on Threads screenshot

→ Login cookies corrupted or expired. Delete video_creation/data/cookie-threads.json to force fresh login next run.

"No eligible Threads posts found"

→ Configure [threads.thread].min_replies = 5 (or lower). Ensure your Threads account has public posts with replies.

Video dedup not working

→ Check video_creation/data/videos.json is writable. Ensure check_done_by_id() is called before fetching content.


Testing Checklist

  • Reddit mode: platform = "reddit" produces video to results/{subreddit}/
  • Threads mode: platform = "threads" produces video to results/threads/
  • Video dedup: Running same post_id twice skips second run
  • Translation: post_lang = "es" translates filenames
  • TTS providers: Test with different voice_choice values
  • Background selection: Custom background video/audio works
  • Story mode: storymode=true only uses thread_post, not comments
  • Error handling: Invalid credentials show clear messages

Key Files to Know

File Purpose
main.py CLI entry; orchestrates pipeline via factory
platforms/__init__.py Factory dispatch for multi-platform support
platforms/threads/fetcher.py Threads Graph API client
platforms/threads/screenshot.py Threads.net Playwright screenshotter
video_creation/final_video.py FFmpeg composition; platform-aware output naming
TTS/engine_wrapper.py TTS provider abstraction; post_lang fallback
utils/settings.py Config loading & validation
utils/videos.py Video dedup tracking
utils/.config.template.toml Config schema
requirements.txt Dependencies

Useful Commands

# Install dependencies
pip install -r requirements.txt

# Run CLI
python3 main.py

# Run with specific post
python3 main.py <post_id>

# Run Flask GUI
python3 GUI.py

# Check syntax
python3 -m py_compile main.py platforms/threads/fetcher.py

# Format code
black main.py platforms/ utils/

# Lint
pylint main.py

Docker Workflow

  • Use docker compose build to build the shared image for both CLI and GUI.
  • Use docker compose up gui to run the Flask app on port 4000.
  • Use docker compose run --rm cli to run the video generator in a container.
  • The repo root is bind-mounted in Compose, so config.toml, results/, assets/temp/, video_creation/data/videos.json, and utils/backgrounds.json should persist across runs.
  • The GUI must bind to 0.0.0.0 in Docker; do not switch it back to localhost for container use.

When You Get Stuck

  1. "What does this module do?" → Check imports in main.py or docstrings
  2. "How do I add support for platform X?" → See "Adding a New Platform" section above
  3. "Why is my config not being read?" → Check utils/settings.py:check_toml() and .config.template.toml schema
  4. "Why isn't my TTS provider being called?" → Check TTS/engine_wrapper.py:make_voice() and config voice_choice
  5. "How do I debug the Playwright screenshot?" → Uncomment page.pause() in screenshot downloader, run headful browser

Good luck! 🚀