21 KiB
CLAUDE.md — VideoMakerBot Development Guide
Project Overview
VideoMakerBot — Automated short-form video creator from social media content.
Status: Production (v3.4.0)
Language: Python 3.14+ (host + Docker)
Runtime: Docker only — CLI, GUI, test go through docker compose. Never python on host.
Platforms: Reddit (PRAW API), Threads (Graph API + Web Scraping)
Core Mission
Transforms social media threads (post + comments/replies) into short-form videos:
- AI-generated speech (7+ TTS providers)
- UI screenshots (Playwright, headless Chromium in image)
- Background video/audio overlays
- FFmpeg composition & output (Linux ffmpeg, full filter set +
drawtext) - Optional YouTube upload
- Web UI (Tailwind CSS + DaisyUI + Lucide + vanilla ES6) on
localhost:4000
Architecture at a Glance
main.py (CLI)
↓ [platform factory]
├─→ reddit/subreddit.py [PRAW API]
└─→ platforms/threads/
├─→ fetcher.py [Graph API — your own posts]
├─→ scraper.py [Web scraping — trending For You feed]
└─→ auth.py [Shared Playwright login + cookies]
↓ [standard data dict]
├─→ TTS/engine_wrapper.py [7+ providers, auto-fallback]
├─→ screenshot_downloader.py (Reddit)
│ or platforms/threads/screenshot.py (Threads)
├─→ video_creation/background.py [local or yt-dlp]
├─→ video_creation/youtube_uploader.py [optional auto-upload]
└─→ video_creation/final_video.py [FFmpeg with libx264; exports get_output_path()]
↓
results/{category}/{video.mp4}
Data Contract: The "content_object" Dict
All fetchers return this shape:
{
"thread_id": str, # Used for temp folder: assets/temp/{id}/
"thread_category": str, # "reddit", "threads" → output folder
"thread_title": str, # TTS + output filename (clean, no metadata)
"thread_url": str, # Playwright navigates here for screenshot
"is_nsfw": bool,
"comments": [
{
"comment_body": str, # TTS per reply (clean body text)
"comment_url": str, # Playwright navigates here
"comment_id": str, # Unique identifier (URL-based for scraper)
}
],
"thread_post": str | list, # Story mode (no comments)
}
File Organization
VideoMakerBot/
├── platforms/
│ ├── __init__.py # Factory: get_content_object(), get_screenshot_fn()
│ └── threads/
│ ├── auth.py # Shared Playwright login + cookie management
│ ├── fetcher.py # Graph API → content_object (your own posts)
│ ├── scraper.py # Web scraping → content_object (trending feed)
│ └── screenshot.py # Playwright Threads screenshotter (div-based)
│
├── reddit/
│ └── subreddit.py # PRAW API → content_object
│
├── video_creation/
│ ├── final_video.py # FFmpeg composition (libx264, no drawtext on macOS)
│ ├── background.py # Video/audio downloader (local files or yt-dlp)
│ ├── screenshot_downloader.py # Playwright Reddit UI capturer
│ ├── voices.py # TTS orchestrator
│ └── youtube_uploader.py # YouTube OAuth2 upload (post-render hook)
│
├── TTS/
│ ├── engine_wrapper.py # Provider abstraction + TikTok→pyttsx3 fallback
│ ├── TikTok.py # TikTok TTS (hardened error handling)
│ └── ... # 7+ provider implementations
│
├── utils/
│ ├── settings.py # Config loading + interactive validation
│ ├── videos.py # check_done() + check_done_by_id()
│ ├── console.py # Rich terminal output
│ ├── .config.template.toml # Config schema
│ ├── background_videos.json # Background video manifest
│ ├── background_audios.json # Background audio manifest
│ └── ...
│
├── GUI/ # Flask templates (Tailwind + DaisyUI + Lucide)
│ ├── layout.html # Base layout (no jQuery, no Bootstrap)
│ ├── index.html # Video Library (3 buttons: source / download / copy link)
│ ├── backgrounds.html # Background Manager (videos catalog)
│ ├── settings.html # Config editor (validated against template)
│ └── create.html # Render progress page
│
├── tests/
│ └── test_gui_utils.py # pytest regression for add/delete background
│
├── main.py # CLI entry (platform-routed via factory)
├── GUI.py # Flask web UI; `/video/<id>` serves files with sanitized headers
├── Dockerfile # python:3.10-slim-bookworm + ffmpeg + playwright + pytest
├── docker-compose.yml # Services: gui, cli, test
├── docker-entrypoint.sh # Runs `utils.docker_bootstrap` then exec's the command
├── requirements.txt
└── CLAUDE.md
Configuration
Threads (full config)
[settings]
platform = "threads"
[threads]
discovery_method = "scrape" # "api" (Graph API, own posts) or "scrape" (trending feed)
[threads.creds]
username = "your_insta" # For Playwright login (always needed)
password = "your_password"
access_token = "" # Only for discovery_method="api"
user_id = "" # Only for discovery_method="api"
[threads.thread]
post_id = "" # Specific post ID; blank = auto-pick from feed
max_reply_length = 500
min_reply_length = 1
min_replies = 5 # Minimum replies for post eligibility
min_engagement = 0 # Minimum likes+reposts for viral filter (0=disabled, 10000=viral)
blocked_words = ""
[settings.tts]
voice_choice = "googletranslate" # Best for macOS: no API key, fast, free
# voice_choice = "tiktok" # Needs tiktok_sessionid; auto-falls back to pyttsx3
# voice_choice = "OpenAI" # Needs openai_api_key
[settings.background]
background_video = "minecraft"
background_audio = "lofi"
background_audio_volume = 0.15
Reddit (reference)
[settings]
platform = "reddit"
[reddit.creds]
client_id = "..."
client_secret = "..."
username = "..."
password = "..."
2fa = false
2fa_secret = "" # TOTP base32 secret for auto-2FA
[reddit.thread]
subreddit = "AskReddit"
min_comments = 20
YouTube upload
[youtube]
enabled = false # Set true to auto-upload after render
privacy = "public" # or "private", "unlisted"
client_secret_path = "" # Path to youtube_client_secret.json
Platform-Specific Knowledge
Threads — Web Scraping (discovery_method = "scrape")
DOM Structure:
- Threads.net uses div-based card layout — NO
<article>elements - Feed posts:
a[href*="/post/"]links inside<div>cards (class containsx1a2a7pz) - Post pages: same structure; main post link appears first, replies follow
- Screenshots: Use
a[href*="/post/"]→ ancestor div card, NOTpage.locator("article")
Card Text Format (used by _parse_card_text()):
Line 0: username
Line 1: timestamp (e.g., "14h", "1d")
Line 2..N: post body text
Last 1-4: engagement metrics (likes, replies, reposts, quotes)
Engagement Parsing:
- Numbers can be plain ("266") or abbreviated ("1K", "2.5M")
likes= first trailing number,replies= second,reposts= thirdmin_engagementfilters bylikes + repoststotal- Posts sorted by engagement descending before selection
Login Flow:
- Threads uses Instagram auth (
threads.net/login) - Selectors:
input[autocomplete="username"],input[autocomplete="current-password"] - Button:
get_by_role("button", name="Log in", exact=True).first - After click:
page.wait_for_url("https://www.threads.net/", timeout=15000)— event-wait, not fixed delay - Cookies cached at
video_creation/data/cookie-threads.json - Login logic shared via
platforms/threads/auth.py
API Limitation:
- Graph API v1.0 only accesses YOUR OWN posts — no trending/discovery
- Scraping bypasses this — no API token needed
Threads — Graph API (discovery_method = "api")
- Auth: Bearer token, 60-day expiry
- Only accesses authenticated user's own threads + replies
- Use when you have your own content with replies
- API: PRAW (Python Reddit API Wrapper)
- Post discovery:
subreddit.hot(limit=25)→get_subreddit_undone()→ fallback totop(day/hour/month/week/year/all) - Screenshot: Playwright on new.reddit.com
- 2FA: Auto-TOTP via
pyotpwhen2fa_secretis configured in config.toml
Development Guidelines
✅ DO:
- Run everything through Docker —
docker compose up gui,docker compose run --rm cli,docker compose run --rm test - Use platform factory — never import platform modules directly
- Return standard content_object from all fetchers
- Use clean body text for TTS — parse out username/timestamp metadata
- Default to
googletranslateTTS for headless containers — no API key, fast, free - Use
libx264encoder —h264_nvencis NVIDIA-only and not available in the slim image - Test both Threads discovery methods:
apiandscrape - Bind-mount preserves state — edits to
config.toml,results/,assets/temp/,video_creation/data/, andutils/background_*.jsoncatalogs persist across container runs - GUI must bind to
0.0.0.0in Docker (enforced viaGUI_HOST=0.0.0.0env) - Use
/video/<id>to serve renders — the route looks up the file by id invideos.json, sanitizesContent-Dispositionfilename, avoids 404s from literal newlines in titles
❌ DON'T:
- Don't run
python GUI.pyorpython main.pyon the host — Docker is the only supported path - Don't use
<article>selectors on Threads.net — DOM is div-based - Don't hardcode
h264_nvenc— uselibx264for cross-platform compatibility - Don't import platform modules directly in main.py/utils
- Don't assume config keys exist without
.get()fallback - Don't reintroduce jQuery, Bootstrap, or ClipboardJS — UI is vanilla ES6 + Tailwind + DaisyUI + Lucide
- Don't write to
utils/backgrounds.json— legacy empty file. Useutils/background_videos.jsonandutils/background_audios.json
🔒 Security (hardened May 2026)
- No
eval()— use{"int": int, "float": float, "bool": bool, "str": str}dict dispatch.utils/settings.pyhas module-level_TYPE_COERCION. - No
os.system()— usesubprocess.run([...])with argument lists. No shell interpretation. - No
shell=True— removed from allsubprocess.run()andPopen()calls. - No bare
except:— catch specific exception types. Bare excepts swallowKeyboardInterruptandSystemExit. - Redact secrets before printing —
main.pyerror handler deep-copies config and masks all credential fields.GUI.pyredacts API keys/passwords from settings page data. Sensitive fields show as********. - CSRF protection —
GUI.py@app.before_requestchecksOriginheader on all mutating requests. - Security headers —
X-Content-Type-Options: nosniff,X-Frame-Options: DENYon every response. - Flask secret key —
FLASK_SECRET_KEYenv var, fallbackos.urandom(32)per startup. - Docker non-root — container runs as
appuser, not root. - Path traversal —
/video/<id>usesPath.resolve().relative_to()guard;add_background()sanitizes citation withre.sub(r"[./\\\\]", "_", citation). - No hardcoded credentials in source — all secrets from
config.toml(gitignored). Rotate passwords regularly.
Web UI (Flask, served by gui service)
- Stack: Tailwind CSS, DaisyUI, Lucide Icons, vanilla ES6 (no jQuery, Bootstrap, ClipboardJS)
- Routes:
/— Video Library; cards show source-post link, download, copy-link buttons/video/<id>— serves rendered mp4 by id (lookup viavideos.json); path-traversal guard, sanitizedContent-Disposition/backgrounds— Background Manager UI/backgrounds.json— servesutils/background_videos.json(videos catalog)/background/add,/background/delete— POST; mutate bothutils/background_videos.jsonandsettings.background.background_video.optionsinutils/.config.template.toml/settings— config editor; loads fromconfig.toml, validates againstutils/.config.template.toml, persists viautils/gui_utils.modify_settings(preserves comments/formatting viatomlkit)
- HTML escaping:
h()helper inindex.htmlescapes& " < >for user-controlled strings in attributes
Key Files to Know
| File | Purpose |
|---|---|
main.py |
CLI entry; pipeline orchestration via factory |
platforms/__init__.py |
Factory dispatch (platform + discovery_method) |
platforms/threads/scraper.py |
Web scraping fetcher with engagement parsing |
platforms/threads/auth.py |
Shared Playwright login + cookie management |
platforms/threads/fetcher.py |
Graph API client (own posts only) |
platforms/threads/screenshot.py |
Div-based Threads screenshotter |
video_creation/final_video.py |
FFmpeg composition (libx264, platform-aware output); exports get_output_path() |
video_creation/background.py |
Background downloader (local files + yt-dlp); prefers already-downloaded videos |
video_creation/youtube_uploader.py |
OAuth2 YouTube upload |
TTS/engine_wrapper.py |
TTS provider abstraction + TikTok→pyttsx3 fallback; single-pass ffmpeg concat |
TTS/TikTok.py |
Hardened TikTok TTS with graceful error handling |
reddit/subreddit.py |
PRAW Reddit fetcher with auto-2FA; retry-depth limit (50) |
utils/settings.py |
Config loading + interactive validation; uses _TYPE_COERCION dict (no eval) |
utils/videos.py |
Video dedup tracking (check_done, check_done_by_id, save_data with truncate) |
utils/.config.template.toml |
Config schema (drives Settings page validation) |
utils/background_videos.json |
Background video manifest (served at /backgrounds.json) |
utils/background_audios.json |
Background audio manifest |
utils/gui_utils.py |
add_background, delete_background, modify_settings, get_checks (no eval) |
GUI.py |
Flask app: /, /video/<id>, /backgrounds, /settings, /create; CSRF + security headers |
Dockerfile |
python:3.14-slim-bookworm + ffmpeg + Playwright Chromium + pytest; runs as appuser |
docker-compose.yml |
Three services: gui (port 4000), cli, test |
tests/test_gui_utils.py |
Pytest regression for Background Manager round-trip |
Debugging Tips
FFmpeg "Unknown encoder 'h264_nvenc'"
→ Use libx264. Find-and-replace h264_nvenc → libx264 in video_creation/final_video.py. Slim image doesn't ship NVIDIA encoders.
yt-dlp "Requested format is not available"
→ Bump pinned version in requirements.txt and rebuild (docker compose build). Prefer best[height<=1080] over bestvideo in video_creation/background.py — many videos lack video-only streams.
Threads screenshots fail ("Main post article not found")
→ Threads.net uses div cards, not <article>. Use a[href*="/post/"] → ancestor div approach.
Config validator EOFError in non-interactive mode
→ check_toml() prompts for ALL platform sections regardless of platform setting. Fill all required fields, edit through /settings, or pre-populate config.toml before docker compose run cli.
Playwright timeout on Threads login
→ Cookies corrupted. Delete video_creation/data/cookie-threads.json for fresh login (file is bind-mounted, host delete clears container too). Confirm selectors: button uses exact=True for multiple "Log in" buttons.
No viral posts found
→ Lower min_engagement in config. Most Threads feed posts have <100 likes — 10000 filters almost everything.
Background Manager grid is empty
→ /backgrounds.json must serve utils/background_videos.json (split catalog), not legacy utils/backgrounds.json (empty {}). Verify in GUI.py:backgrounds_json.
/video/<id> returns 404
→ Route looks up entry in video_creation/data/videos.json by id, resolves file under results/<thread_category>/<filename>.mp4. Confirm both JSON entry and file exist; file may have been pruned.
JS "Unexpected end of input" on Library page
→ User-controlled strings in HTML attributes must go through h() helper in index.html. Avoid inline onclick= with ${JSON.stringify(...)}.
Stale image after editing requirements.txt or Dockerfile
→ docker compose build to rebuild. Code-only changes don't need rebuild — repo root is bind-mounted to /app.
Python bytecode caching in long-running GUI container
→ GUI caches imported modules in sys.modules. After editing pipeline code, restart GUI (docker compose restart gui) or trigger pipeline run which calls importlib.reload() on pipeline modules.
Reddit image template appearing in Threads videos
→ Verify platform in config.toml is "threads" (not "reddit"). The if platform == "reddit" guard in final_video.py blocks Reddit template. Restart GUI container to flush Python bytecode cache.
Background video download fails (yt-dlp HTTP 403)
→ get_background_config() prefers already-downloaded videos. Set background_video in config.toml to a downloaded video name (check assets/backgrounds/video/). If empty, randomly picks from downloaded videos first.
TTS output has wrong number of audio clips
→ engine_wrapper.run() returns idx + 1 (count, not last index). If getting one fewer clip than expected, check return value consumers — treat as count.
videos.json corruption (trailing garbage after save)
→ Fixed: save_data() calls raw_vids.truncate() after json.dump(). Delete video_creation/data/videos.json if existing file is corrupted.
Infinite recursion in Reddit post discovery
→ Fixed: get_subreddit_threads() has retry-depth limit of 50. If hit, subreddit may have no undone posts — try different subreddit or clear videos.json.
Useful Commands (Docker-only)
# Build (or rebuild after Dockerfile / requirements.txt changes)
docker compose build
# Run the GUI (foreground)
docker compose up gui
# → http://localhost:4000
# Run the GUI in background
docker compose up -d gui
docker compose logs -f gui
docker compose down
# Run CLI pipeline (one-off, removed on exit)
docker compose run --rm cli
docker compose run --rm cli python main.py <post_id>
# Run test suite
docker compose run --rm test
# Shell in fresh container for ad-hoc commands
docker compose run --rm --entrypoint /bin/bash gui
# inside: python -m py_compile main.py platforms/threads/scraper.py
# Tail running GUI container
docker compose exec gui ls /app/results/threads/
Anything needing
pip install,playwright install, orapt-getbelongs inDockerfile+docker compose build— never on host.
Recent Changes (May 2026 Security Hardening)
eval() removal: eval(checks["type"])(value) replaced with {"int": int, "float": float, "bool": bool, "str": str} dict dispatch in utils/settings.py, utils/console.py, utils/gui_utils.py.
os.system() removal: TTS/engine_wrapper.py:split_post uses subprocess.run([...]) with argument lists. utils/posttextparser.py spacy download uses subprocess.run([sys.executable, "-m", "spacy", ...]).
shell=True removal: All subprocess.run(..., shell=True) and Popen(..., shell=True) replaced with argument lists in main.py and utils/ffmpeg_install.py.
Credential leak prevention: main.py error handler deep-copies config and redacts all secrets. GUI.py masks sensitive keys as ******** in settings page data.
CSRF + security headers: GUI.py checks Origin header on POST/PUT/DELETE. X-Content-Type-Options, X-Frame-Options headers added.
Docker hardening: Container runs as appuser (non-root). Digest pinning + pip version comments added.
Bug fixes (18 total):
- Config overwrite crash (config=None after empty file write)
- Playwright TimeoutError (wrong exception class caught)
- Lambda closure (loop variable captured by reference)
- Redundant ffmpeg runs (concat now single-pass)
- Audio IndexError on empty TTS output
- Hardcoded NSFW post selector (now generic role-based)
- JSON truncation bug in save_data (missing truncate())
- Infinite recursion in Reddit post discovery (retry limit 50)
- Silent exception swallowing in scraper search
- exit() → sys.exit() in subreddit.py
- Dead macOS branch (os.name == "mac" → sys.platform == "darwin")
- Wrong upstream repo in version check (now configurable + resilient)
- Duplicate path logic (get_output_path() shared between main.py and final_video.py)
- Catastrophic backtracking URL regex (now atomic https?://\S+)
- Fixed 6s login delay (now wait_for_url event-wait)
- 6 bare except: clauses → specific exception types
- Temp file leak in ProgressFfmpeg (cleanup in exit)
- Flask secret key hardcoded → env var + urandom fallback