13 KiB

Raw Blame History

AGENT.md — Guidance for Agents & AI Working on VideoMakerBot

This document guides agents, bots, and AI assistants on how to work effectively with the VideoMakerBot codebase.

Quick Start for Agents

Core Principle

VideoMakerBot uses a platform-agnostic factory pattern. Always respect the abstraction:

Don't import platform-specific modules (reddit/, threads/) directly
Always use platforms/__init__.py factory functions
Keep platform-specific logic in platforms/{platform}/

The "Do This" Checklist

✅ Read existing CLAUDE.md for architecture context
✅ Use factory: from platforms import get_content_object, get_screenshot_fn
✅ Return standard content_object dict from all fetchers
✅ Test both Reddit and Threads modes before declaring completion
✅ Use config fallback chains for cross-platform keys
✅ Document platform-specific logic in docstrings

The "Don't Do This" List

❌ Import reddit.subreddit directly in main.py or generic modules
❌ Hardcode subreddit/platform names in core video pipeline
❌ Add platform-specific selectors outside platforms/{platform}/
❌ Assume config keys exist without .get() and fallbacks
❌ Modify screenshot_downloader.py for non-Reddit platforms

Understanding the Codebase Structure

Entry Point

main.py — Single CLI entry point using platform factory

Calls get_content_object(POST_ID) from factory
Calls get_screenshot_fn() from factory
Everything else is platform-agnostic

Platform Layer (`platforms/`)

__init__.py — Factory dispatch functions (add new platforms here)
threads/fetcher.py — Threads Graph API client (returns standard dict)
threads/screenshot.py — Threads.net Playwright screenshotter

Legacy Platform (`reddit/`)

subreddit.py — PRAW API client (returns standard dict)
No changes needed; called via factory

Video Pipeline (`video_creation/`)

final_video.py — FFmpeg composition (platform-aware output folder only)
screenshot_downloader.py — Reddit Playwright screenshotter (not called for Threads)
voices.py — TTS orchestration (platform-agnostic)
background.py — Video/audio download (platform-agnostic)

TTS Layer (`TTS/`)

engine_wrapper.py — Provider abstraction (handles post_lang fallback)
*.py — Individual provider implementations (elevenlabs, aws_polly, etc.)

Config & Utils (`utils/`)

settings.py — TOML config loading & validation
videos.py — Dedup tracking (check_done() + check_done_by_id())
.config.template.toml — Config schema with [settings], [reddit.*], [threads.*], [ai]

How to Approach Common Tasks

Steps:

Create platforms/twitter/fetcher.py:

def get_twitter_content(POST_ID=None) -> dict:
    """Fetch post + replies, return standard content_object."""
    # Implement API fetching logic here
    return {
        "thread_id": ...,
        "thread_category": "twitter",  # NEW: generic field for output folder
        "thread_title": ...,
        "thread_url": ...,
        "comments": [...]
    }

Create platforms/twitter/screenshot.py:

def get_screenshots_of_twitter_posts(content_object: dict, screenshot_num: int):
    """Use Playwright to screenshot X/Twitter posts."""
    # Implement Playwright logic here

Update platforms/__init__.py:

elif platform == "twitter":
    from platforms.twitter.fetcher import get_twitter_content
    return get_twitter_content(POST_ID)

Add config section to utils/.config.template.toml:

[twitter.creds]
api_key = { ... }
api_secret = { ... }

[twitter.thread]
post_id = { ... }

Update main.py helper:

elif platform == "twitter":
    return config.get("twitter", {}).get("thread", {}).get("post_id", "")

Zero changes needed to: TTS, backgrounds, video composition, utils.

Verification:

# Test Reddit (regression check)
sed -i 's/platform = "twitter"/platform = "reddit"/' config.toml
python3 main.py
# Verify results/{subreddit}/ output

# Test Twitter
sed -i 's/platform = "reddit"/platform = "twitter"/' config.toml
python3 main.py --post-id <twitter-id>
# Verify results/twitter/ output

Modifying the Video Pipeline

Scenario: You need to change FFmpeg composition or add a new processing step.

Approach:

Check which data the modified code consumes (content_object dict)
Verify it works with both Reddit and Threads content structures
If platform-specific: move logic to platforms/{platform}/
If generic: keep in video_creation/
Test both modes before merging

Example: Adding video filters

# In final_video.py (generic, works for all platforms)
def apply_filter(video_clip, filter_type):
    # No platform-specific logic here
    return video_clip.filter(...)

# Test:
# - Reddit mode produces filtered video
# - Threads mode produces filtered video

Fixing a Bug in Config Handling

Scenario: post_lang is not being applied correctly.

Debug Path:

Check utils/settings.py — how is config loaded?

Check TTS/engine_wrapper.py:182 — uses fallback chain:

lang = (settings.config["settings"].get("post_lang") or
        settings.config.get("reddit", {}).get("thread", {}).get("post_lang", ""))

Check video_creation/final_video.py:78 — same fallback logic
If still broken: verify utils/.config.template.toml has the key defined
Test both platforms with post_lang = "es" in config

Adding Support for a New TTS Provider

Scenario: User wants Whisper TTS support.

Steps:

Create TTS/whisper_tts.py:

class WhisperTTS:
    def make_voice(self, text):
        # Call Whisper API
        return audio_bytes

Update TTS/engine_wrapper.py:make_voice():

elif voice_choice == "whisper":
    from TTS.whisper_tts import WhisperTTS
    return WhisperTTS().make_voice(text)

Add config to utils/.config.template.toml:

[settings.tts]
whisper_api_key = { optional = true, ... }

Test:

# In config.toml:
voice_choice = "whisper"
# Run: python3 main.py

Common Pitfalls & How to Avoid Them

Pitfall 1: Platform-Specific Code in Generic Modules

Problem:

# BAD: In video_creation/final_video.py
subreddit = settings.config["reddit"]["thread"]["subreddit"]

Will break when platform = "threads" (no reddit.thread.subreddit).

Solution:

# GOOD:
platform = settings.config["settings"].get("platform", "reddit")
if platform == "reddit":
    category = settings.config["reddit"]["thread"]["subreddit"]
else:
    category = reddit_obj.get("thread_category", platform)

Pitfall 2: Hardcoding Selectors in Platform-Agnostic Code

Problem:

# BAD: In video_creation/voices.py
element = page.locator("#t1_{comment_id}")  # Reddit-only selector!

Will fail when running Threads mode (different DOM).

Solution:

Keep all Playwright logic in platforms/{platform}/screenshot.py
Never hardcode selectors in generic modules

Pitfall 3: Forgetting to Test Both Modes

Problem: You change final_video.py, test with Reddit, declare done. Threads mode breaks because you didn't test it.

Solution:

# Test both before committing:
sed -i 's/platform = "threads"/platform = "reddit"/' config.toml
python3 main.py
# Check results/{subreddit}/

sed -i 's/platform = "reddit"/platform = "threads"/' config.toml
python3 main.py --post-id <id>
# Check results/threads/

Pitfall 4: Assuming Config Keys Exist

Problem:

# BAD:
lang = settings.config["reddit"]["thread"]["post_lang"]

Will crash if key doesn't exist.

Solution:

# GOOD:
lang = (settings.config["settings"].get("post_lang") or
        settings.config.get("reddit", {}).get("thread", {}).get("post_lang", ""))

Code Review Checklist for Agents

Before marking work complete, verify:

No platform imports in main.py — Uses factory only
Standard content_object dict — All fetchers return same shape
Platform-specific logic isolated — Only in platforms/{platform}/
Config fallback chains — No hardcoded section names in generic code
Both modes tested — Reddit AND Threads produce correct output
Docstrings updated — New functions document platform assumptions
Error messages clear — Include platform name + actionable guidance
Video dedup works — No duplicate videos created

Understanding Data Flow

Happy Path: Fetch → TTS → Screenshot → Compose → Output

1. main.py:main()
   └─→ platforms/__init__.py:get_content_object()
       └─→ platforms/threads/fetcher.py:get_threads_content()
           └─→ Returns: {thread_id, thread_title, comments, ...}

2. video_creation/voices.py:save_text_to_mp3()
   └─→ TTS/engine_wrapper.py:process_text()
       └─→ TTS/engine_wrapper.py:make_voice()
           └─→ TTS/{provider}.py: {elevenlabs,tiktok,etc}
               └─→ Returns: audio_length, comment_count

3. platforms/__init__.py:get_screenshot_fn()
   └─→ platforms/threads/screenshot.py:get_screenshots_of_threads_posts()
       └─→ Uses Playwright on threads.net
           └─→ Saves: assets/temp/{thread_id}/png/{title,comment_0,etc}.png

4. video_creation/background.py
   └─→ download_background_video() & download_background_audio()
       └─→ Uses yt-dlp to fetch YouTube videos/audio
           └─→ Saves to: assets/temp/{thread_id}/{video,audio}

5. video_creation/final_video.py:make_final_video()
   └─→ Uses FFmpeg to compose everything
       └─→ Reads: audio files, screenshot PNGs, background video
           └─→ Writes: results/{thread_category}/{filename}.mp4

6. utils/videos.py:save_data()
   └─→ Records video in videos.json for dedup

Config Flow

config.toml (user settings)
    ↓
utils/settings.py:check_toml()
    └─→ Validates against .config.template.toml schema
        └─→ Returns: settings.config (dict)

            Used by:
            ├─ main.py (platform selection)
            ├─ platforms/reddit/ (subreddit, etc.)
            ├─ platforms/threads/ (Graph API token, etc.)
            ├─ TTS/engine_wrapper.py (post_lang fallback)
            ├─ video_creation/ (theme, resolution, etc.)
            └─ utils/videos.py (dedup behavior)

Deployment Notes

Python Version

Minimum: 3.10
Tested: 3.10, 3.11, 3.12
Reason: F-strings, type hints, modern async patterns

Critical Dependencies

reddit platform: praw 7.8.1 (requires Reddit OAuth app)
threads platform: requests (for Graph API calls)
screenshots: playwright 1.49.1 (requires browser installation: playwright install)
video: moviepy 2.2.1, ffmpeg-python 0.2.0 (requires FFmpeg system binary)
tts: varies per provider (elevenlabs, aws_polly, openai, etc.)

Versions That Caused Issues

yt-dlp==2026.3.17 — Doesn't exist (use 2025.10.14 or latest stable)
playwright without browser install — Will crash on first screenshot

When to Escalate

Escalate to User if:

User needs new platform support (only they know requirements)
Config changes affect backward compatibility
Performance optimization needed (only user knows acceptable limits)
Security concern (token handling, credential storage, etc.)

Safe to Implement as Agent:

Bug fixes within existing architecture
Adding new TTS providers
Extending config options for existing platforms
Performance optimizations (caching, parallelization)
New filter/processing features that work platform-agnostically
Documentation & refactoring

Final Guidance

Golden Rule: The factory pattern is your friend. When in doubt, check if your change breaks the abstraction. If it does, rethink it.

Test Obsessively: Always run both Reddit and Threads modes. The codebase is designed for multi-platform support, and it's easy to break one platform while fixing another.

Document Platform Assumptions: If your code works differently for Reddit vs Threads, say so explicitly in docstrings and comments.

Ask Yourself: "Would this work for X/Twitter?" If no, it probably belongs in platforms/threads/, not in generic code.

Good luck, and happy contributing! 🎥

13 KiB Raw Blame History

AGENT.md — Guidance for Agents & AI Working on VideoMakerBot

Quick Start for Agents

Core Principle

The "Do This" Checklist

The "Don't Do This" List

Understanding the Codebase Structure

Entry Point

Platform Layer (platforms/)

Legacy Platform (reddit/)

Video Pipeline (video_creation/)

TTS Layer (TTS/)

Config & Utils (utils/)

How to Approach Common Tasks

Adding a New Social Platform (e.g., X/Twitter)

Modifying the Video Pipeline

Fixing a Bug in Config Handling

Adding Support for a New TTS Provider

Common Pitfalls & How to Avoid Them

Pitfall 1: Platform-Specific Code in Generic Modules

Pitfall 2: Hardcoding Selectors in Platform-Agnostic Code

Pitfall 3: Forgetting to Test Both Modes

Pitfall 4: Assuming Config Keys Exist

Code Review Checklist for Agents

Understanding Data Flow

Happy Path: Fetch → TTS → Screenshot → Compose → Output

Config Flow

Deployment Notes

Python Version

Critical Dependencies

Versions That Caused Issues

When to Escalate

Escalate to User if:

Safe to Implement as Agent:

Final Guidance

13 KiB

Raw Blame History

Platform Layer (`platforms/`)

Legacy Platform (`reddit/`)

Video Pipeline (`video_creation/`)

TTS Layer (`TTS/`)

Config & Utils (`utils/`)