From e8d95dfa3c09ba79f1ed01b1fcc0483050f4edce Mon Sep 17 00:00:00 2001 From: Hong Phuc Date: Fri, 24 Apr 2026 01:20:42 +0700 Subject: [PATCH] Keep container workflow instructions aligned with the code Document the Docker Compose workflow, persistent runtime paths, and container-specific GUI binding so future work on this branch follows the implemented setup rather than the old direct-Python assumptions. Constraint: The repo now supports both host and container execution paths, and the agent guidance needs to reflect the new operational defaults. Rejected: Leave AGENTS.md untouched | it would continue pointing contributors at stale runtime behavior. Confidence: high Scope-risk: narrow Directive: Treat the Docker Compose commands as the default local workflow for GUI/CLI work on this branch. Tested: Reviewed AGENTS.md against the implemented Docker files and runtime bootstrap. Not-tested: No code-path changes; documentation-only update. --- AGENTS.md | 413 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 413 insertions(+) create mode 100644 AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..6e1634c --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,413 @@ +# AGENTS.md — VideoMakerBot Development Guide + +## Project Overview + +**VideoMakerBot** — Automated short-form video creator from social media content. + +**Status:** Production-ready, actively maintained (v3.4.0) +**Language:** Python 3.10+ +**Platforms:** Reddit (original), Threads (NEW), X/Twitter (planned) + +### Core Mission +Transforms social media threads (post + comments/replies) into complete short-form videos with: +- AI-generated speech (7+ TTS providers) +- UI screenshots (Playwright) +- Background video/audio overlays +- FFmpeg composition & output + +--- + +## Architecture at a Glance + +``` +main.py (CLI) + ↓ [platform factory] + ├─→ reddit/subreddit.py [PRAW API] + └─→ platforms/threads/fetcher.py [Graph API] + ↓ [standard data dict] + ├─→ TTS/engine_wrapper.py [7+ providers] + ├─→ screenshot_downloader.py (Reddit) + │ or platforms/threads/screenshot.py (Threads) + ├─→ video_creation/background.py + └─→ video_creation/final_video.py [FFmpeg] + ↓ + results/{category}/{video.mp4} +``` + +### Key Design: Platform Abstraction via Factory Pattern + +**Why:** Single codebase supports multiple platforms without tight coupling. + +**How:** `platforms/__init__.py` exports: +- `get_content_object(POST_ID=None)` — routes to right fetcher +- `get_screenshot_fn()` — routes to right screenshotter + +**Result:** Adding X/Twitter requires only: new module + config section + two `elif` branches. + +--- + +## Data Contract: The "content_object" Dict + +All fetchers return this shape (defined in `platforms/__init__.py`): + +```python +{ + # Unique identifiers + "thread_id": str, # Used for temp folder: assets/temp/{id}/ + "thread_category": str, # "reddit", "threads", etc. → output folder + + # Content + "thread_title": str, # TTS as title + output filename + "thread_url": str, # Playwright navigates here for screenshot + "is_nsfw": bool, # Content filter flag + + # Replies/Comments (mutually exclusive with thread_post) + "comments": [ + { + "comment_body": str, # TTS per reply + "comment_url": str, # Playwright navigates here + "comment_id": str, # CSS selector ID or unique identifier + } + ], + + # OR Story mode: + "thread_post": str | list, # Long-form text (no comments) +} +``` + +**Why:** Loose coupling—TTS, backgrounds, and video composition don't need platform-specific logic. + +--- + +## File Organization + +``` +VideoMakerBot/ +├── platforms/ # Multi-platform abstraction +│ ├── __init__.py # Factory: get_content_object(), get_screenshot_fn() +│ └── threads/ # Threads (Meta) implementation +│ ├── fetcher.py # Graph API → content_object +│ └── screenshot.py # Playwright Threads screenshotter +│ +├── reddit/ # Reddit implementation (kept as-is) +│ └── subreddit.py # PRAW API → content_object + thread_category +│ +├── video_creation/ +│ ├── final_video.py # FFmpeg composition (platform-aware folder naming) +│ ├── screenshot_downloader.py # Playwright Reddit UI capturer +│ ├── voices.py # TTS orchestrator (platform-agnostic) +│ ├── background.py # Video/audio downloader (platform-agnostic) +│ └── data/ +│ ├── videos.json # Dedup tracker +│ ├── cookie-dark-mode.json # Reddit theme cookie +│ └── cookie-threads.json # Threads session cookie (auto-created) +│ +├── TTS/ # Text-to-Speech +│ ├── engine_wrapper.py # Provider abstraction + post_lang fallback +│ ├── elevenlabs.py, aws_polly.py, etc. # 7+ provider implementations +│ +├── utils/ +│ ├── settings.py # Config loading + validation +│ ├── videos.py # check_done() + check_done_by_id() +│ ├── console.py # Rich terminal output +│ ├── .config.template.toml # Config schema (platform sections) +│ └── ... (id, voice, cleanup, etc.) +│ +├── main.py # CLI entry (platform-routed via factory) +├── GUI.py # Flask web UI (localhost:4000 in host mode, 0.0.0.0 in Docker) +├── requirements.txt # Dependencies +└── AGENTS.md / AGENT.md # This file + agent guidelines +``` + +--- + +## Configuration + +**File:** `utils/.config.template.toml` (schema) → `config.toml` (user config) + +### Platform Selection +```toml +[settings] +platform = "reddit" # or "threads" +post_lang = "es-cr" # Optional: translation language (all platforms) +``` + +### Reddit Config +```toml +[reddit.creds] +client_id = "..." # OAuth app +client_secret = "..." +username = "..." +password = "..." +2fa = true/false + +[reddit.thread] +subreddit = "AskReddit" +post_id = "" # Leave blank for auto-pick +max_comment_length = 500 +min_comment_length = 1 +min_comments = 20 +blocked_words = "..." +``` + +### Threads Config (NEW) +```toml +[threads.creds] +access_token = "EAABsbCS..." # Meta Graph API token (60-day expiry) +user_id = "12345678901234567" +username = "your_insta" # For Playwright login +password = "your_password" + +[threads.thread] +post_id = "" # Leave blank for auto-pick +max_reply_length = 500 +min_reply_length = 1 +min_replies = 5 +blocked_words = "..." +``` + +### Generic Settings +```toml +[settings] +theme = "dark" +resolution_w = 1080 +resolution_h = 1920 +storymode = false +times_to_run = 1 + +[settings.tts] +voice_choice = "tiktok" # or "elevenlabs", "awspolly", "googletranslate", etc. +random_voice = true +silence_duration = 0.3 + +[settings.background] +background_video = "minecraft" +background_audio = "lofi" +background_audio_volume = 0.15 +``` + +--- + +## Development Guidelines + +### ✅ DO: + +1. **Use platform factory in main.py** + ```python + from platforms import get_content_object, get_screenshot_fn + reddit_object = get_content_object(POST_ID) + screenshot_fn = get_screenshot_fn() + screenshot_fn(reddit_object, number_of_comments) + ``` + +2. **Return standard content dict** from all fetchers + ```python + return { + "thread_id": ..., + "thread_category": ..., # NEW: replaces hardcoded subreddit + "comments": [...] + } + ``` + +3. **Use config fallback chains** for cross-platform keys + ```python + lang = (settings.config["settings"].get("post_lang") or + settings.config.get("reddit", {}).get("thread", {}).get("post_lang", "")) + ``` + +4. **Read thread_category from dict** instead of config + ```python + # WRONG: + subreddit = settings.config["reddit"]["thread"]["subreddit"] + + # RIGHT: + platform = settings.config["settings"].get("platform", "reddit") + if platform == "reddit": + subreddit = settings.config["reddit"]["thread"]["subreddit"] + else: + subreddit = reddit_obj.get("thread_category", platform) + ``` + +5. **Test both platforms** after core pipeline changes + ```bash + # Test Reddit (must not regress) + sed -i 's/platform = "threads"/platform = "reddit"/' config.toml + python3 main.py + + # Test Threads + sed -i 's/platform = "reddit"/platform = "threads"/' config.toml + python3 main.py --post-id + ``` + +### ❌ DON'T: + +1. **Don't import platform modules directly** in main.py/utils + ```python + # WRONG: from reddit.subreddit import get_subreddit_threads + # RIGHT: from platforms import get_content_object + ``` + +2. **Don't hardcode platform names** in generic modules + ```python + # WRONG in final_video.py: + subreddit = settings.config["reddit"]["thread"]["subreddit"] + + # RIGHT: + subreddit = reddit_obj.get("thread_category", "unknown") + ``` + +3. **Don't add platform-specific UI selectors** outside `platforms/{platform}/screenshot.py` + - Reddit selectors stay in `video_creation/screenshot_downloader.py` + - Threads selectors stay in `platforms/threads/screenshot.py` + +4. **Don't assume config keys exist** without fallback + ```python + # WRONG: lang = settings.config["reddit"]["thread"]["post_lang"] + # RIGHT: lang = settings.config.get("settings", {}).get("post_lang", "") + ``` + +--- + +## Platform-Specific Knowledge + +### Reddit +- **API:** PRAW (Python Reddit API Wrapper) +- **Auth:** OAuth app (client_id, secret) + username/password +- **Screenshot:** Playwright on reddit.com/new.reddit.com + - Login form: `input[name="username"]`, `input[name="password"]` + - Post selector: `[data-test-id="post-content"]` + - Comment selector: `#t1_{comment_id}` +- **NSFW:** `submission.over_18` +- **Output folder:** `results/{subreddit}/` + +### Threads +- **API:** Meta Graph API (v18.0+) +- **Auth:** User access token (60-day lifetime) via https://developers.facebook.com/ +- **Screenshot:** Playwright on threads.net + - Login form: `input[autocomplete="username"]`, `input[autocomplete="current-password"]` + - Post selector: `article` (universal, more stable than Reddit) + - Cookies saved to: `video_creation/data/cookie-threads.json` +- **NSFW:** API doesn't provide; always False +- **Output folder:** `results/threads/` + +### Future: X/Twitter +Create: `platforms/twitter/fetcher.py` + `platforms/twitter/screenshot.py` + config section +Update: `platforms/__init__.py` with `elif platform == "twitter"` branches + +--- + +## Extending the Project + +### Adding a New TTS Provider +1. Create `TTS/my_provider.py` with a class implementing the TTS interface +2. Add config keys to `[settings.tts]` in `.config.template.toml` +3. Update `TTS/engine_wrapper.py` to call your provider +4. Test with `settings.config["settings"]["tts"]["voice_choice"] = "my_provider"` + +### Adding a New Platform (e.g., X/Twitter) +1. **Create fetcher:** `platforms/twitter/fetcher.py` + - Implement `get_twitter_content(POST_ID=None)` returning standard dict +2. **Create screenshotter:** `platforms/twitter/screenshot.py` + - Implement `get_screenshots_of_twitter_posts(content_object, screenshot_num)` +3. **Update config:** Add `[twitter.creds]` and `[twitter.thread]` sections +4. **Update factory:** Add `elif platform == "twitter"` in `platforms/__init__.py` +5. **Update CLI helper:** Add case to `_get_platform_post_id()` in `main.py` +6. **Test:** Verify Reddit mode still works, test Twitter mode end-to-end + +**Zero changes needed to:** TTS, backgrounds, video composition, or utils. + +--- + +## Debugging Tips + +### "No matching distribution found for yt-dlp==2026.3.17" +→ yt-dlp uses date versioning (YYYY.M.DD, no leading zeros). Use `2025.10.14` (latest stable). + +### "Threads API: Invalid or expired access_token" +→ Meta tokens expire every 60 days. Refresh at https://developers.facebook.com/tools/explorer/ + +### Playwright timeout on Threads screenshot +→ Login cookies corrupted or expired. Delete `video_creation/data/cookie-threads.json` to force fresh login next run. + +### "No eligible Threads posts found" +→ Configure `[threads.thread].min_replies = 5` (or lower). Ensure your Threads account has public posts with replies. + +### Video dedup not working +→ Check `video_creation/data/videos.json` is writable. Ensure `check_done_by_id()` is called before fetching content. + +--- + +## Testing Checklist + +- [ ] Reddit mode: `platform = "reddit"` produces video to `results/{subreddit}/` +- [ ] Threads mode: `platform = "threads"` produces video to `results/threads/` +- [ ] Video dedup: Running same post_id twice skips second run +- [ ] Translation: `post_lang = "es"` translates filenames +- [ ] TTS providers: Test with different voice_choice values +- [ ] Background selection: Custom background video/audio works +- [ ] Story mode: storymode=true only uses thread_post, not comments +- [ ] Error handling: Invalid credentials show clear messages + +--- + +## Key Files to Know + +| File | Purpose | +|------|---------| +| `main.py` | CLI entry; orchestrates pipeline via factory | +| `platforms/__init__.py` | Factory dispatch for multi-platform support | +| `platforms/threads/fetcher.py` | Threads Graph API client | +| `platforms/threads/screenshot.py` | Threads.net Playwright screenshotter | +| `video_creation/final_video.py` | FFmpeg composition; platform-aware output naming | +| `TTS/engine_wrapper.py` | TTS provider abstraction; post_lang fallback | +| `utils/settings.py` | Config loading & validation | +| `utils/videos.py` | Video dedup tracking | +| `utils/.config.template.toml` | Config schema | +| `requirements.txt` | Dependencies | + +--- + +## Useful Commands + +```bash +# Install dependencies +pip install -r requirements.txt + +# Run CLI +python3 main.py + +# Run with specific post +python3 main.py + +# Run Flask GUI +python3 GUI.py + +# Check syntax +python3 -m py_compile main.py platforms/threads/fetcher.py + +# Format code +black main.py platforms/ utils/ + +# Lint +pylint main.py +``` + +## Docker Workflow + +- Use `docker compose build` to build the shared image for both CLI and GUI. +- Use `docker compose up gui` to run the Flask app on port `4000`. +- Use `docker compose run --rm cli` to run the video generator in a container. +- The repo root is bind-mounted in Compose, so `config.toml`, `results/`, `assets/temp/`, `video_creation/data/videos.json`, and `utils/backgrounds.json` should persist across runs. +- The GUI must bind to `0.0.0.0` in Docker; do not switch it back to `localhost` for container use. + +--- + +## When You Get Stuck + +1. **"What does this module do?"** → Check imports in `main.py` or docstrings +2. **"How do I add support for platform X?"** → See "Adding a New Platform" section above +3. **"Why is my config not being read?"** → Check `utils/settings.py:check_toml()` and `.config.template.toml` schema +4. **"Why isn't my TTS provider being called?"** → Check `TTS/engine_wrapper.py:make_voice()` and config `voice_choice` +5. **"How do I debug the Playwright screenshot?"** → Uncomment `page.pause()` in screenshot downloader, run headful browser + +Good luck! 🚀