feat(manual): Implement TTS processing and video building for manual pipeline

- Added  to handle text-to-speech for screenshots, generating MP3 files and updating post objects with audio paths and durations.
- Introduced  to assemble videos from screenshots and TTS audio, including background video and audio management.
- Created  as the entry point for the manual pipeline, supporting commands to initialize posts, render videos, and list post statuses.
- Updated background audio and video configurations in JSON files, removing outdated entries and adding new options.
- Adjusted file permissions for several utility scripts to ensure proper execution.
pull/2558/head
MinhVu2711 2 months ago
parent 569f25098a
commit 2301f9c3b4

5
.gitignore vendored

@ -246,3 +246,8 @@ video_creation/data/envvars.txt
config.toml
*.exe
.agents
# Manual pipeline
manual_posts/
manual_results/

@ -0,0 +1,235 @@
# 🧠 Brainstorming: Manual Screenshot → Video Pipeline
> **Bối cảnh**: Không thể sử dụng Reddit API. Cần workflow mới cho phép user tự chụp screenshot từ **Reddit, Threads (Meta), X (Twitter)** rồi hệ thống tự động tạo video.
>
> **Trạng thái**: ✅ **ĐÃ IMPLEMENT** — Phase 1 hoàn tất.
---
## 1. Phân Tích Vấn Đề Cốt Lõi
### Flow hiện tại đang phụ thuộc Reddit API ở đâu?
| Bước | Phụ thuộc API? | Chi tiết |
|------|:---:|----------|
| Lấy thread + comments | ✅ **YES** | `reddit/subreddit.py` — PRAW login, fetch post, filter comments |
| **Text cho TTS** | ✅ **YES** | `TTS/engine_wrapper.py` — lấy text từ `reddit_object["comments"]` |
| **Screenshot** | ✅ **YES** | `screenshot_downloader.py` — Playwright login Reddit, navigate, capture |
| Background video/audio | ❌ NO | `background.py` — chỉ dùng YouTube, không liên quan Reddit |
| Final video assembly | ❌ NO | `final_video.py` — chỉ dùng FFmpeg, nhưng cần `reddit_obj` dict |
| Video tracking | ❌ NO | `videos.json` — chỉ lưu metadata |
**Kết luận**: Cần thay thế hoàn toàn **3 bước đầu** (fetch → TTS text → screenshot) bằng flow thủ công.
---
## 2. Phương Án Đã Chọn: **.mp3 ưu tiên, .txt fallback**
User cung cấp **file audio (.mp3) trực tiếp** + screenshots. TTS chỉ là fallback nếu chỉ có file `.txt`.
```
User chụp screenshot + cung cấp audio .mp3 → Video
(hoặc .txt fallback → TTS → Video)
```
### Ưu tiên audio:
```
Có .mp3? ──YES──▶ Dùng .mp3 trực tiếp (bỏ qua TTS)
NO
Có .txt? ──YES──▶ TTS sinh .mp3 từ text (fallback)
NO
⚠ SKIP (screenshot không có audio)
```
---
## 3. Cấu Trúc Đã Implement
### 3.1 Thư mục
```
RedditVideoMakerBot/
├── main.py # Flow cũ (giữ nguyên, không sửa)
├── manual_main.py # 🆕 Entry point cho flow mới
├── manual/ # 🆕 Module flow mới (tách biệt hoàn toàn)
│ ├── __init__.py # Module docstring
│ ├── scanner.py # Quét folder, validate (.png + .mp3 + .txt)
│ ├── tts_processor.py # Audio processor (.mp3 ưu tiên, TTS fallback)
│ └── video_builder.py # FFmpeg pipeline (libx264 CPU)
├── manual_posts/ # 🆕 Thư mục input
│ └── post_001/
│ ├── meta.json # (optional) metadata
│ ├── 0_title.png # Screenshot bài đăng
│ ├── 0_title.mp3 # Audio (pre-recorded)
│ ├── 1_comment.png # Screenshot comment
│ └── 1_comment.mp3 # Audio comment
├── manual_results/ # 🆕 Thư mục output
│ └── post_001.mp4
├── reddit/ # Flow cũ (giữ nguyên)
├── TTS/ # Shared — dùng chung TTS engines (fallback)
├── video_creation/ # Flow cũ (giữ nguyên)
└── utils/ # Shared — dùng chung utilities
```
### 3.2 Quy Tắc Đặt Tên File
```
<s_th_t>_<loi>.<ext>
```
| Pattern | Ý nghĩa | Bắt buộc? |
|---------|----------|-----------|
| `0_title.png` | Screenshot bài đăng chính | ✅ Bắt buộc |
| `0_title.mp3` | Audio pre-recorded | ✅ (hoặc .txt) |
| `0_title.txt` | Text TTS fallback | Fallback |
| `1_comment.png` | Screenshot comment 1 | Optional |
| `1_comment.mp3` | Audio comment 1 | ✅ (hoặc .txt) |
| `meta.json` | Metadata | Optional |
### 3.3 `post_object` — Data Structure
```python
post_object = {
"post_id": "post_001",
"platform": "reddit", # reddit | threads | x | other
"title": "What's the most...",
"author": "u/example_user",
"url": "https://...",
"post_dir": "manual_posts/post_001",
"screenshots": [
{
"index": 0,
"type": "title",
"image_path": "manual_posts/post_001/0_title.png",
"text": "", # Từ .txt (nếu có)
"audio_path": "manual_posts/post_001/0_title.mp3", # Từ .mp3
"audio_duration": 3.5, # Đo sau khi process
},
{
"index": 1,
"type": "comment",
"image_path": "manual_posts/post_001/1_comment.png",
"text": "",
"audio_path": "manual_posts/post_001/1_comment.mp3",
"audio_duration": 5.2,
},
],
"total_duration": 8.7,
"output_path": "manual_results/post_001.mp4",
}
```
### 3.4 Flow Xử Lý
```mermaid
flowchart TD
A["manual_main.py"] --> B{"Command?"}
B -->|"render"| G["Quét manual_posts/"]
G --> H["Validate: có ảnh + audio/text?"]
H --> I["Build post_object từ files"]
I --> J{"Có .mp3?"}
J -->|"YES"| K["Dùng .mp3 trực tiếp"]
J -->|"NO, có .txt"| L["TTS: text → .mp3"]
K --> M["Random pick background video + audio"]
L --> M
M --> N["FFmpeg: ghép ảnh + audio + background"]
N --> O["Output → manual_results/"]
B -->|"render --all"| P["Loop qua tất cả folders"]
P --> G
B -->|"init"| Q["Tạo folder + meta.json"]
B -->|"list"| R["Liệt kê posts + trạng thái"]
```
---
## 4. So Sánh Flow Cũ vs Flow Mới
| Aspect | Flow Cũ (`main.py`) | Flow Mới (`manual_main.py`) |
|--------|---------------------|---------------------------|
| **Data source** | Reddit API (PRAW) | Manual screenshots + audio files |
| **Screenshot** | Playwright auto-capture | User tự chụp |
| **Audio source** | TTS từ comment text | **User cung cấp .mp3** (hoặc .txt → TTS) |
| **Platform** | Chỉ Reddit | Reddit + Threads + X + any |
| **TTS engines** | Required | Optional (chỉ là fallback cho .txt) |
| **Background** | Hardcoded YouTube list | **Random từ local folder** (YouTube fallback) |
| **Encoder** | `h264_nvenc` (GPU) | `libx264` (CPU) |
| **Config** | `config.toml` (template-based) | `config.toml` `[manual]` section + built-in defaults |
| **Output** | `results/<subreddit>/` | `manual_results/` |
| **Tracking** | `videos.json` | `videos.json` (shared) |
---
## 5. Config
```toml
[manual]
input_dir = "manual_posts"
output_dir = "manual_results"
encoder = "libx264"
resolution_w = 1080
resolution_h = 1920
opacity = 0.9
background_video = "random" # "random" hoặc tên cụ thể (e.g. "minecraft")
background_audio = "random" # "random" hoặc tên cụ thể (e.g. "lofi")
background_video_dir = "assets/backgrounds/video" # Thư mục chứa video nền local
background_audio_dir = "assets/backgrounds/audio" # Thư mục chứa nhạc nền local
background_audio_volume = 0.15
max_video_length = 120
```
**Lưu ý**: Config `[manual]` là optional. Nếu không có, dùng built-in defaults.
### Background: Random từ local folder
Bỏ file video/audio nền vào thư mục → hệ thống random chọn mỗi lần render:
```
assets/backgrounds/video/ ← Bỏ file .mp4/.mkv/.webm/.avi/.mov vào đây
assets/backgrounds/audio/ ← Bỏ file .mp3/.wav/.ogg/.m4a/.flac vào đây
```
- **Có file local** → random chọn 1
- **Không có file local** → fallback tải từ YouTube (danh sách cũ)
TTS fallback dùng settings từ `[settings.tts]` (mặc định: GoogleTranslate, không cần API key).
---
## 6. Decisions Log
| Câu hỏi | Quyết định |
|----------|------------|
| Audio source | **.mp3 ưu tiên**, .txt fallback sang TTS |
| Background | **Random từ local folder**, YouTube fallback |
| Encoder | `libx264` (CPU) — không có GPU NVIDIA |
| Config | Section `[manual]` trong `config.toml` |
| Thumbnail | Bỏ qua |
| Video tracking | Chung file `videos.json` |
| OCR (Phase 2) | EN + VI, dùng EasyOCR |
---
## 7. Phases
| Phase | Trạng thái | Mô tả |
|-------|:---:|--------|
| **Phase 1: Core** | ✅ Done | .png + .mp3 → Video (+ .txt TTS fallback) |
| Phase 2: OCR | ⏳ Planned | Auto-read text từ screenshots (EN + VI) |
| Phase 3: GUI | ⏳ Planned | Flask web interface cho manual flow |
---
> 📝 **Tóm tắt**: Module `manual/` tách biệt hoàn toàn. Input chính: screenshots (.png) + audio (.mp3). TTS chỉ là fallback khi dùng .txt. Reuse background functions từ code cũ. Output video vào `manual_results/`. Platform-agnostic.

@ -0,0 +1,139 @@
# 📖 Hướng Dẫn Sử Dụng Manual Pipeline
> **Tóm tắt**: Tạo video từ screenshots chụp tay (Reddit, Threads, X) mà không cần API.
---
## 🚀 Quick Start (3 bước)
### Bước 1: Tạo folder cho post mới
```bash
cd /home/minhvu/projects/RedditVideoMakerBot
python manual_main.py init my_first_post --platform reddit
```
Kết quả:
```
manual_posts/my_first_post/
├── meta.json ← (optional) metadata
├── 0_title.txt ← Chỉnh text cho TTS ở đây
└── 1_comment.txt ← Chỉnh text comment ở đây
```
### Bước 2: Thêm screenshots + text
1. **Chụp screenshot** bài đăng → lưu thành `0_title.png`
2. **Chụp screenshot** comments → lưu thành `1_comment.png`, `2_comment.png`, ...
3. **Sửa file `.txt`** tương ứng — nhập nội dung text mà bot sẽ đọc thành giọng nói
```
manual_posts/my_first_post/
├── meta.json
├── 0_title.png ← Screenshot bài đăng
├── 0_title.txt ← "What's the most underrated life hack?"
├── 1_comment.png ← Screenshot comment 1
├── 1_comment.txt ← "I always put my phone on airplane mode..."
├── 2_comment.png ← Screenshot comment 2
└── 2_comment.txt ← "Using a binder clip as a phone stand..."
```
> [!IMPORTANT]
> Mỗi file `.png` **bắt buộc** phải có file `.txt` cùng số thứ tự. Số `0` luôn là title/bài đăng chính.
### Bước 3: Render video
```bash
python manual_main.py render my_first_post
```
Video sẽ được lưu tại: `manual_results/my_first_post.mp4`
---
## 📋 Tất Cả Commands
| Command | Mô tả |
|---------|--------|
| `python manual_main.py init <post_id>` | Tạo folder mới với template files |
| `python manual_main.py init <post_id> --platform threads` | Tạo folder cho Threads post |
| `python manual_main.py render <post_id>` | Render 1 post thành video |
| `python manual_main.py render --all` | Render tất cả posts chưa render |
| `python manual_main.py render <post_id> --force` | Re-render (dù đã render trước đó) |
| `python manual_main.py list` | Liệt kê tất cả posts + trạng thái |
---
## 📁 Quy Tắc Đặt Tên File
```
<s_th_t>_<loi>.<ext>
```
| File | Ý nghĩa |
|------|----------|
| `0_title.png` | Screenshot bài đăng chính (bắt buộc) |
| `0_title.txt` | Text TTS cho bài đăng (bắt buộc) |
| `1_comment.png` | Screenshot comment 1 |
| `1_comment.txt` | Text TTS cho comment 1 |
| `N_comment.png/txt` | Comment thứ N |
| `meta.json` | Metadata (optional) |
> [!TIP]
> File `.txt` hỗ trợ dòng comment bắt đầu bằng `#` — những dòng này sẽ bị bỏ qua khi TTS.
---
## ⚙️ Cấu Hình
Thêm section `[manual]` vào `config.toml` (hoặc để trống — bot sẽ dùng defaults):
```toml
[manual]
input_dir = "manual_posts" # Thư mục input
output_dir = "manual_results" # Thư mục output
encoder = "libx264" # CPU encoder (hoặc h264_nvenc nếu có GPU)
resolution_w = 1080 # Width video
resolution_h = 1920 # Height video (1080x1920 = portrait)
opacity = 0.9 # Độ trong suốt screenshot overlay
background_video = "minecraft" # Video nền
background_audio = "lofi" # Audio nền
background_audio_volume = 0.15 # Âm lượng audio nền (0 = tắt)
max_video_length = 120 # Max thời lượng video (giây)
```
TTS engine được lấy từ section `[settings.tts]` trong `config.toml`. Mặc định dùng **GoogleTranslate** (không cần API key).
---
## 🏗️ Kiến Trúc Module
```
manual/
├── __init__.py # Module docstring
├── scanner.py # Quét folders, validate, build post_object
├── tts_processor.py # TTS: text → MP3 (reuse TTS/ engines)
└── video_builder.py # FFmpeg: screenshots + audio → video
manual_main.py # CLI entry point (init, render, list)
```
**Hoàn toàn tách biệt** với flow cũ (`main.py`). Không sửa bất kỳ file nào của flow cũ.
### Files đã tạo/sửa
| File | Action | Mô tả |
|------|--------|--------|
| [manual/__init__.py](file:///home/minhvu/projects/RedditVideoMakerBot/manual/__init__.py) | 🆕 Created | Module init |
| [manual/scanner.py](file:///home/minhvu/projects/RedditVideoMakerBot/manual/scanner.py) | 🆕 Created | Folder scanner & validator |
| [manual/tts_processor.py](file:///home/minhvu/projects/RedditVideoMakerBot/manual/tts_processor.py) | 🆕 Created | TTS processor |
| [manual/video_builder.py](file:///home/minhvu/projects/RedditVideoMakerBot/manual/video_builder.py) | 🆕 Created | Video assembler |
| [manual_main.py](file:///home/minhvu/projects/RedditVideoMakerBot/manual_main.py) | 🆕 Created | CLI entry point |
| [.gitignore](file:///home/minhvu/projects/RedditVideoMakerBot/.gitignore) | ✏️ Updated | Thêm `manual_posts/`, `manual_results/` |
---
## ⚠️ Lưu Ý
1. **FFmpeg** phải được cài sẵn trên hệ thống
2. **Background video** sẽ tự động tải từ YouTube lần đầu (cần internet)
3. Config.toml có thể trống — bot dùng built-in defaults (GoogleTranslate TTS)
4. Encoder mặc định là `libx264` (CPU) — phù hợp máy không có GPU NVIDIA

@ -0,0 +1,483 @@
# 📦 RedditVideoMakerBot — Project Init Documentation
> **Version**: 3.4.0
> **Author gốc**: Lewis Menelaws & [TMRRW](https://tmrrwinc.ca)
> **License**: GPL + Roboto Fonts (Apache 2.0)
> **Python**: 3.10 / 3.11 / 3.12
---
## 1. Tổng Quan
RedditVideoMakerBot là một công cụ tự động hóa việc tạo video ngắn (TikTok/YouTube Shorts/Instagram Reels) từ các bài đăng trên Reddit. Bot sẽ:
1. **Lấy bài đăng** từ subreddit (qua Reddit API / PRAW)
2. **Chuyển text thành giọng nói** (TTS — 7 engine khác nhau)
3. **Chụp screenshot** bài đăng/comments bằng Playwright
4. **Tải & cắt video/audio nền** từ YouTube
5. **Ghép tất cả** thành video hoàn chỉnh bằng FFmpeg
Kết quả cuối cùng: file `.mp4` trong thư mục `results/<subreddit>/`.
---
## 2. Cấu Trúc Thư Mục
```
RedditVideoMakerBot/
├── main.py # 🚀 Entry point chính
├── GUI.py # 🖥️ Web GUI (Flask, port 4000)
├── config.toml # ⚙️ File cấu hình (user-generated)
├── ptt.py # 🔊 Helper script để liệt kê system voices
├── requirements.txt # 📦 Python dependencies
├── Dockerfile # 🐳 Docker support (python:3.10-slim)
├── build.sh / run.sh / run.bat # 📜 Scripts chạy nhanh
├── install.sh # 📜 Auto-installer (Linux/macOS)
├── reddit/ # 📡 Module lấy dữ liệu từ Reddit
│ └── subreddit.py # Đăng nhập Reddit, lấy threads & comments
├── TTS/ # 🗣️ Module Text-to-Speech (7 engines)
│ ├── engine_wrapper.py # TTSEngine — wrapper chung cho tất cả TTS
│ ├── TikTok.py # TikTok TTS API
│ ├── aws_polly.py # AWS Polly (boto3)
│ ├── elevenlabs.py # ElevenLabs API
│ ├── openai_tts.py # OpenAI TTS API
│ ├── GTTS.py # Google Translate TTS (gTTS)
│ ├── pyttsx.py # pyttsx3 (offline, system voices)
│ └── streamlabs_polly.py # Streamlabs Polly
├── video_creation/ # 🎬 Module tạo video
│ ├── voices.py # Orchestrator — chọn TTS provider & chạy
│ ├── screenshot_downloader.py # Chụp screenshot Reddit bằng Playwright
│ ├── background.py # Tải & cắt background video/audio (yt-dlp)
│ ├── final_video.py # Ghép tất cả thành video (FFmpeg pipeline)
│ └── data/ # Cookie files + videos.json (tracking)
│ ├── cookie-dark-mode.json
│ ├── cookie-light-mode.json
│ └── videos.json
├── utils/ # 🛠️ Utilities
│ ├── settings.py # Đọc/validate config.toml theo template
│ ├── .config.template.toml # Template cấu hình (định nghĩa tất cả fields)
│ ├── console.py # Rich console helpers (print_step, handle_input...)
│ ├── ai_methods.py # AI similarity sorting (sentence-transformers)
│ ├── subreddit.py # Logic chọn post chưa làm + bộ lọc
│ ├── voice.py # sanitize_text(), rate limit, sleep_until()
│ ├── videos.py # check_done(), save_data() — tracking
│ ├── cleanup.py # Xóa temp files
│ ├── ffmpeg_install.py # Tự động cài FFmpeg nếu chưa có
│ ├── imagenarator.py # Render ảnh cho storymode method 1
│ ├── thumbnail.py # Tạo thumbnail cho video
│ ├── fonts.py # Font size helpers
│ ├── id.py # extract_id() — sanitize reddit thread ID
│ ├── posttextparser.py # Phân tách post text thành các đoạn
│ ├── playwright.py # Helper clear cookies
│ ├── version.py # Check version mới trên GitHub
│ ├── gui_utils.py # Utils cho Flask GUI
│ ├── background_videos.json # Danh sách background videos (YouTube URLs)
│ └── background_audios.json # Danh sách background audios (YouTube URLs)
├── GUI/ # 🌐 Flask Templates (HTML)
│ ├── layout.html # Base template
│ ├── index.html # Trang chủ — danh sách videos đã tạo
│ ├── settings.html # Trang cấu hình
│ ├── backgrounds.html # Quản lý backgrounds
│ └── voices/ # Voice sample files
├── fonts/ # 🔤 Roboto font files
│ ├── Roboto-Regular.ttf
│ ├── Roboto-Bold.ttf
│ ├── Roboto-Medium.ttf
│ ├── Roboto-Black.ttf
│ └── LICENSE.txt
├── assets/ # 🎨 Static assets
│ ├── title_template.png # Template ảnh cho fancy thumbnail
│ └── backgrounds/ # Downloaded background files (video/audio)
├── results/ # 📁 Output videos (auto-created)
│ └── <subreddit>/
│ ├── <video>.mp4
│ ├── OnlyTTS/ # Video không có background audio
│ └── thumbnails/ # Generated thumbnails
└── threads/ # 📂 (Unused/placeholder)
```
---
## 3. Pipeline Xử Lý (Luồng Chính)
```mermaid
flowchart TD
A["main.py — Entry Point"] --> B["1. get_subreddit_threads()"]
B --> C["2. save_text_to_mp3()"]
C --> D["3. get_screenshots_of_reddit_posts()"]
D --> E["4. download/chop backgrounds"]
E --> F["5. make_final_video()"]
F --> G["results/subreddit/video.mp4"]
B -.-> B1["reddit/subreddit.py"]
B1 -.-> B2["PRAW — Reddit API"]
B1 -.-> B3["utils/subreddit.py — filter logic"]
B1 -.-> B4["utils/ai_methods.py — similarity sort"]
C -.-> C1["video_creation/voices.py"]
C1 -.-> C2["TTS/engine_wrapper.py"]
C2 -.-> C3["7 TTS Engines"]
D -.-> D1["video_creation/screenshot_downloader.py"]
D1 -.-> D2["Playwright — Headless Chrome"]
E -.-> E1["video_creation/background.py"]
E1 -.-> E2["yt-dlp — YouTube download"]
F -.-> F1["video_creation/final_video.py"]
F1 -.-> F2["FFmpeg — Video assembly"]
```
### Bước 1: Lấy Reddit Thread (`reddit/subreddit.py`)
- Đăng nhập Reddit qua **PRAW** (client_id, client_secret, username, password)
- Hỗ trợ **2FA** (nhập code thủ công)
- Chọn post theo các cách:
- **Post ID cụ thể** (từ config, hỗ trợ nhiều ID phân cách bằng `+`)
- **AI Similarity** — dùng `sentence-transformers/all-MiniLM-L6-v2` so sánh tương đồng với keywords
- **Random** từ `subreddit.hot(limit=25)`
- **Bộ lọc** (trong `utils/subreddit.py`):
- Skip posts đã làm (kiểm tra `videos.json`)
- Skip NSFW (nếu `allow_nsfw = false`)
- Skip pinned posts
- Skip posts chứa **blocked words**
- Skip posts ít hơn `min_comments`
- Storymode: kiểm tra `selftext` length
- Thu thập comments (filter theo `min/max_comment_length`, skip deleted/removed/stickied)
- **Output**: Dict chứa `thread_url`, `thread_title`, `thread_id`, `is_nsfw`, `comments[]` hoặc `thread_post`
### Bước 2: Text-to-Speech (`video_creation/voices.py` + `TTS/`)
**7 TTS Providers** với `max_chars` khác nhau:
| Provider | Class | Max Chars | API Key Required | Notes |
|----------|-------|-----------|------------------|-------|
| **TikTok** | `TikTok` | 200 | Session ID | Dùng TikTok unofficial API |
| **Google Translate** | `GTTS` | 5,000 | Không | Dùng gTTS library |
| **AWS Polly** | `AWSPolly` | 3,000 | AWS Profile | Neural engine, 15 voices |
| **Streamlabs Polly** | `StreamlabsPolly` | 550 | Không | Free Polly wrapper |
| **ElevenLabs** | `elevenlabs` | 2,500 | API Key | Multilingual v1 model |
| **OpenAI** | `OpenAITTS` | 4,096 | API Key | tts-1, tts-1-hd, gpt-4o-mini-tts |
| **pyttsx3** | `pyttsx` | 5,000 | Không | Offline, system voices |
**TTSEngine wrapper** (`TTS/engine_wrapper.py`):
- Nhận reddit object → tạo MP3 cho title + mỗi comment
- Tự động **split** text dài hơn `max_chars` thành nhiều phần, dùng FFmpeg concat
- Thêm **silence** giữa các phần (`silence_duration`, mặc định 0.3s)
- Sanitize text: xóa URLs, ký tự đặc biệt, thay `+` → "plus", `&` → "and"
- Hỗ trợ **dịch** sang ngôn ngữ khác (qua `translators` library)
- Tính tổng `length` audio → dùng cho video length
- **Max video length**: mặc định 50 giây (hardcoded `DEFAULT_MAX_LENGTH`)
- **Output**: MP3 files trong `assets/temp/<thread_id>/mp3/`
### Bước 3: Screenshot Reddit Posts (`video_creation/screenshot_downloader.py`)
- Dùng **Playwright** (Chromium headless)
- **Login** vào Reddit (username/password)
- Truy cập thread URL trên `new.reddit.com`
- Hỗ trợ **Dark/Light/Transparent** theme (load cookies tương ứng)
- Chụp screenshot:
- **Title**`assets/temp/<id>/png/title.png`
- **Comments**`assets/temp/<id>/png/comment_<i>.png`
- **Story content**`assets/temp/<id>/png/story_content.png`
- Hỗ trợ **zoom** (scale browser)
- Hỗ trợ **dịch** text trước khi chụp
- Xử lý NSFW warning popup
- **Storymode method 1**: thay vì screenshot, dùng `imagemaker()` render ảnh từ text bằng PIL
### Bước 4: Background Video/Audio (`video_creation/background.py`)
**Background Videos** (10 options):
| Name | Source | Credit |
|------|--------|--------|
| minecraft | YouTube parkour | bbswitzer |
| minecraft-2 | YouTube | Itslpsn |
| gta | GTA stunt race | Achy Gaming |
| motor-gta | Bike parkour GTA | Achy Gaming |
| rocket-league | Rocket League | Orbital Gameplay |
| csgo-surf | CSGO Surf | Aki |
| cluster-truck | Cluster Truck | No Copyright Gameplay |
| multiversus | MultiVersus | MKIceAndFire |
| fall-guys | Fall Guys | Throneful |
| steep | Steep | joel |
**Background Audios** (3 options): `lofi`, `lofi-2`, `chill-summer`
- Tải bằng **yt-dlp** (chỉ lần đầu, cache ở `assets/backgrounds/`)
- **Cắt ngẫu nhiên** đoạn video/audio dài bằng video length
- Output: `assets/temp/<id>/background.mp4``background.mp3`
### Bước 5: Final Video (`video_creation/final_video.py`)
- **Concat** tất cả audio clips → `assets/temp/<id>/audio.mp3`
- **Merge** background audio (volume configurable, mặc định 0.15)
- **Prepare background**: crop video nền theo tỉ lệ `W/H` (mặc định 1080x1920 — portrait)
- **Tạo fancy thumbnail**: lấy `title_template.png`, stretch middle section, vẽ title text lên
- **Overlay** screenshots lên background video theo thời gian audio clips
- Mỗi screenshot hiện trong khoảng thời gian tương ứng với audio clip của nó
- Hỗ trợ `opacity` (mặc định 0.9)
- **Draw credit text** ở góc dưới phải
- **Render** bằng FFmpeg:
- Codec: `h264_nvenc` (NVIDIA GPU acceleration)
- Video bitrate: 20Mbps
- Audio bitrate: 192kbps
- Threads: `multiprocessing.cpu_count()`
- **Optional**: Render thêm bản "OnlyTTS" (không có background audio)
- **Save metadata** vào `videos.json`
- **Cleanup** temp files
- **Output**: `results/<subreddit>/<normalized_title>.mp4`
---
## 4. Cấu Hình (`config.toml`)
Cấu hình được validate tự động dựa trên template `utils/.config.template.toml`. Khi chạy lần đầu hoặc thiếu field, bot sẽ hỏi user nhập.
### `[reddit.creds]` — Thông tin đăng nhập Reddit
| Key | Type | Required | Mô tả |
|-----|------|----------|-------|
| `client_id` | string | ✅ | Reddit App ID (12-30 chars) |
| `client_secret` | string | ✅ | Reddit App Secret (20-40 chars) |
| `username` | string | ✅ | Tên đăng nhập Reddit (3-20 chars) |
| `password` | string | ✅ | Mật khẩu Reddit |
| `2fa` | bool | ❌ | Bật 2FA? Default: `false` |
### `[reddit.thread]` — Cấu hình bài đăng
| Key | Type | Default | Mô tả |
|-----|------|---------|-------|
| `subreddit` | string | — | Subreddit name (hỗ trợ `+` cho nhiều sub) |
| `post_id` | string | `""` | Post ID cụ thể (hỗ trợ `+` cho nhiều ID) |
| `random` | bool | `false` | Random thread? |
| `max_comment_length` | int | `500` | Max ký tự/comment |
| `min_comment_length` | int | `1` | Min ký tự/comment |
| `post_lang` | string | `""` | Ngôn ngữ dịch (VD: `vi`, `es`, `ja`) |
| `min_comments` | int | `20` | Min số comments của post |
| `blocked_words` | string | `""` | Comma-separated blocked words |
### `[ai]` — AI Similarity
| Key | Type | Default | Mô tả |
|-----|------|---------|-------|
| `ai_similarity_enabled` | bool | `false` | Bật sorting theo similarity |
| `ai_similarity_keywords` | string | — | Keywords phân cách bằng dấu phẩy |
### `[settings]` — Cài đặt chung
| Key | Type | Default | Mô tả |
|-----|------|---------|-------|
| `allow_nsfw` | bool | `false` | Cho phép NSFW? |
| `theme` | string | `"dark"` | `dark` / `light` / `transparent` |
| `times_to_run` | int | `1` | Số lần chạy liên tiếp |
| `opacity` | float | `0.9` | Opacity overlayed comments (0-1) |
| `storymode` | bool | `false` | Chỉ đọc title + post content |
| `storymodemethod` | int | `1` | `0`: 1 ảnh cố định, `1`: ảnh fancy |
| `storymode_max_length` | int | `1000` | Max ký tự cho storymode |
| `resolution_w` | int | `1080` | Width video (pixels) |
| `resolution_h` | int | `1920` | Height video (pixels) |
| `zoom` | float | `1` | Browser zoom level (0.1-2.0) |
| `channel_name` | string | `"Reddit Tales"` | Tên kênh hiển thị trên thumbnail |
### `[settings.background]` — Background
| Key | Type | Default | Mô tả |
|-----|------|---------|-------|
| `background_video` | string | `"minecraft"` | Video nền |
| `background_audio` | string | `"lofi"` | Audio nền |
| `background_audio_volume` | float | `0.15` | Âm lượng audio nền (0=tắt) |
| `enable_extra_audio` | bool | `false` | Render thêm bản không có bg audio |
| `background_thumbnail` | bool | `false` | Tạo thumbnail? |
| `background_thumbnail_font_*` | — | — | Font family/size/color cho thumbnail |
### `[settings.tts]` — Text-to-Speech
| Key | Type | Default | Mô tả |
|-----|------|---------|-------|
| `voice_choice` | string | `"tiktok"` | TTS provider |
| `random_voice` | bool | `true` | Random voice mỗi comment |
| `silence_duration` | float | `0.3` | Khoảng lặng giữa các TTS (giây) |
| `no_emojis` | bool | `false` | Xóa emojis? |
| `tiktok_voice` | string | `"en_us_001"` | Voice cho TikTok TTS |
| `tiktok_sessionid` | string | — | TikTok session ID |
| `elevenlabs_voice_name` | string | `"Bella"` | Voice cho ElevenLabs |
| `elevenlabs_api_key` | string | — | ElevenLabs API Key |
| `aws_polly_voice` | string | `"Matthew"` | Voice cho AWS Polly |
| `streamlabs_polly_voice` | string | `"Matthew"` | Voice cho Streamlabs |
| `openai_api_url` | string | `"https://api.openai.com/v1/"` | OpenAI API endpoint |
| `openai_api_key` | string | — | OpenAI API Key |
| `openai_voice_name` | string | `"alloy"` | Voice cho OpenAI TTS |
| `openai_model` | string | `"tts-1"` | Model OpenAI TTS |
| `python_voice` | string | `"1"` | Index system voice |
| `py_voice_num` | string | `"2"` | Số system voices |
---
## 5. Dependencies (`requirements.txt`)
| Package | Version | Vai trò |
|---------|---------|---------|
| `praw` | 7.8.1 | Reddit API wrapper |
| `playwright` | 1.49.1 | Browser automation (screenshot) |
| `moviepy` | 2.2.1 | Video/audio clip processing |
| `ffmpeg-python` | 0.2.0 | FFmpeg pipeline builder |
| `yt-dlp` | 2025.10.22 | YouTube video/audio downloader |
| `gTTS` | 2.5.4 | Google Translate TTS |
| `pyttsx3` | 2.98 | Offline system TTS |
| `elevenlabs` | 1.57.0 | ElevenLabs TTS SDK |
| `boto3` / `botocore` | 1.36.8 | AWS Polly TTS |
| `requests` | 2.32.3 | HTTP requests (TikTok/Streamlabs API) |
| `rich` | 13.9.4 | Terminal formatting (progress bars, panels) |
| `toml` / `tomlkit` | 0.10.2 / 0.13.2 | Config file parsing |
| `translators` | 5.9.9 | Multi-language translation |
| `Pillow` (PIL) | — | Image processing (thumbnails, storymode) |
| `clean-text` | 0.6.0 | Text cleaning (emoji removal) |
| `unidecode` | 1.4.0 | Unicode → ASCII |
| `spacy` | 3.8.7 | NLP (text processing) |
| `torch` | 2.7.0 | PyTorch (AI similarity) |
| `transformers` | 4.52.4 | HuggingFace transformers (sentence-transformers) |
| `Flask` | 3.1.1 | Web GUI |
---
## 6. Hai Chế Độ Hoạt Động
### Mode 1: Comment Mode (mặc định)
- Lấy **top comments** từ Reddit thread
- Chuyển mỗi comment thành MP3 riêng
- Chụp screenshot mỗi comment
- Video hiển thị comments lần lượt
### Mode 2: Story Mode (`storymode = true`)
- Chỉ đọc **title + selftext** của post
- Hai method:
- **Method 0**: Screenshot toàn bộ post content → 1 ảnh cố định
- **Method 1**: Parse text thành từng đoạn → render từng ảnh riêng bằng PIL → hiệu ứng fancy
---
## 7. GUI Web (`GUI.py`)
- Framework: **Flask** (port 4000)
- Routes:
- `/` — Danh sách videos đã tạo (từ `videos.json`)
- `/settings` — Form chỉnh sửa `config.toml`
- `/backgrounds` — Quản lý background videos
- `/background/add` — Thêm background mới
- `/background/delete` — Xóa background
- `/results/<path>` — Serve video files
- `/voices/<path>` — Serve voice samples
- Tự động mở browser khi chạy
---
## 8. Lưu Ý Kỹ Thuật Quan Trọng
### ⚠️ FFmpeg Encoder
- Code sử dụng **`h264_nvenc`** (NVIDIA GPU encoder) — yêu cầu có GPU NVIDIA
- Nếu không có GPU, cần sửa thành `libx264`
### ⚠️ Cleanup Bug
- `utils/cleanup.py` sử dụng path `../assets/temp/{reddit_id}/` (relative path có `..`) — có thể gây lỗi tùy working directory
### ⚠️ Security Concerns
- `utils/settings.py` sử dụng `eval()` 2 lần (dòng 33, 81) — đánh dấu `fixme` nhưng chưa sửa
- `utils/console.py` cũng dùng `eval()` (dòng 105)
### ⚠️ Hardcoded Values
- `DEFAULT_MAX_LENGTH = 50` (seconds) trong `TTS/engine_wrapper.py`
- NSFW button selector hardcoded với post ID cụ thể (`#t3_12hmbug`) trong screenshot_downloader
- `title_template.png` username position hardcoded tại `(205, 825)`
### ⚠️ Video Tracking
- Videos đã tạo được lưu trong `video_creation/data/videos.json`
- Mỗi entry: `{subreddit, id, time, background_credit, reddit_title, filename}`
- Bot sẽ skip posts đã có trong list (trừ khi force bằng `post_id` config)
### ⚠️ AI Similarity Feature
- Dùng `sentence-transformers/all-MiniLM-L6-v2` model
- Tải model lần đầu chạy (~80MB)
- Cosine similarity giữa thread titles+content với user keywords
- Bật bằng `ai_similarity_enabled = true`
---
## 9. Cách Chạy
```bash
# 1. Clone & setup
git clone https://github.com/elebumm/RedditVideoMakerBot.git
cd RedditVideoMakerBot
python -m venv ./venv
source ./venv/bin/activate # Linux/macOS
# .\venv\Scripts\activate # Windows
# 2. Install dependencies
pip install -r requirements.txt
python -m playwright install
python -m playwright install-deps
# 3. Chạy bot (CLI)
python main.py
# 4. Hoặc chạy GUI
python GUI.py
```
### Docker:
```bash
docker build -t reddit-video-bot .
docker run reddit-video-bot
```
---
## 10. Sơ Đồ Module Dependencies
```mermaid
graph LR
main["main.py"] --> reddit["reddit/subreddit.py"]
main --> voices["video_creation/voices.py"]
main --> screenshots["video_creation/screenshot_downloader.py"]
main --> background["video_creation/background.py"]
main --> final["video_creation/final_video.py"]
reddit --> praw["praw"]
reddit --> ai["utils/ai_methods.py"]
reddit --> sub_utils["utils/subreddit.py"]
voices --> engine["TTS/engine_wrapper.py"]
engine --> tiktok["TTS/TikTok.py"]
engine --> gtts["TTS/GTTS.py"]
engine --> aws["TTS/aws_polly.py"]
engine --> eleven["TTS/elevenlabs.py"]
engine --> openai["TTS/openai_tts.py"]
engine --> pyttsx["TTS/pyttsx.py"]
engine --> streamlabs["TTS/streamlabs_polly.py"]
screenshots --> playwright["playwright"]
screenshots --> imagenarator["utils/imagenarator.py"]
background --> ytdlp["yt-dlp"]
background --> moviepy["moviepy"]
final --> ffmpeg["ffmpeg-python"]
final --> pil["PIL/Pillow"]
ai --> torch["torch + transformers"]
subgraph "Shared Utils"
settings["utils/settings.py"]
console["utils/console.py"]
voice_util["utils/voice.py"]
video_util["utils/videos.py"]
cleanup["utils/cleanup.py"]
end
```
---
> 📝 **Document generated**: 2026-04-20 | Dựa trên phân tích toàn bộ source code của project.

@ -13,7 +13,7 @@ class GTTS:
def run(self, text, filepath, random_voice: bool = False):
tts = gTTS(
text=text,
lang=settings.config["reddit"]["thread"]["post_lang"] or "en",
lang=settings.config["reddit"]["thread"]["post_lang"] or "vi",
slow=False,
)
tts.save(filepath)

@ -0,0 +1,15 @@
"""
Manual Screenshot Video Pipeline
This module provides an alternative workflow that creates videos
from manually captured screenshots and text files, without requiring
any social media API access.
Supported platforms: Reddit, Threads (Meta), X (Twitter), or any other.
Usage:
python manual_main.py init <post_id> # Create folder structure
python manual_main.py render <post_id> # Render one post
python manual_main.py render --all # Render all unrendered posts
python manual_main.py list # List all posts with status
"""

@ -0,0 +1,318 @@
"""
Scanner module for the manual pipeline.
Scans manual_posts/ directories for screenshots (.png), audio files (.mp3),
and optional text files (.txt). Builds a unified post_object for processing.
Folder convention:
manual_posts/
my_post_001/
meta.json (optional - metadata)
0_title.png (required - screenshot of post title)
0_title.mp3 (preferred - pre-recorded audio)
0_title.txt (fallback - text for TTS if no .mp3)
1_comment.png (optional - comment screenshots)
1_comment.mp3 (preferred - pre-recorded audio)
1_comment.txt (fallback - text for TTS if no .mp3)
...
Priority: .mp3 > .txt (if both exist, .mp3 is used and TTS is skipped).
"""
import json
import re
from pathlib import Path
from typing import Dict, List, Optional, Tuple
from utils.console import print_step, print_substep
class PostScanner:
"""Scans manual_posts/ directory, validates structure, builds post_object."""
# Regex pattern: <number>_<type>.<ext> where ext is png/jpg/jpeg/mp3/txt
FILE_PATTERN = re.compile(r"^(\d+)_(title|comment)\.(png|jpg|jpeg|mp3|txt)$", re.IGNORECASE)
def __init__(self, input_dir: str = "manual_posts"):
self.input_dir = Path(input_dir)
def scan_all(self) -> List[dict]:
"""Scan all post folders in the input directory.
Returns:
List of post_object dicts, sorted by folder name
"""
if not self.input_dir.exists():
print_substep(f"Input directory '{self.input_dir}' does not exist.", style="red")
return []
posts = []
for post_dir in sorted(self.input_dir.iterdir()):
if post_dir.is_dir() and not post_dir.name.startswith("."):
post_obj = self.scan_one(post_dir.name)
if post_obj is not None:
posts.append(post_obj)
return posts
def scan_one(self, post_id: str) -> Optional[dict]:
"""Scan a single post folder and build post_object.
Args:
post_id: Name of the folder inside manual_posts/
Returns:
post_object dict or None if invalid
"""
post_dir = self.input_dir / post_id
if not post_dir.exists():
print_substep(f"Post directory '{post_dir}' does not exist.", style="red")
return None
is_valid, errors = self.validate(post_dir)
if not is_valid:
print_substep(f"Validation failed for '{post_id}':", style="red")
for err in errors:
print_substep(f"{err}", style="red")
return None
return self._build_post_object(post_dir)
def validate(self, post_dir: Path) -> Tuple[bool, List[str]]:
"""Validate a post folder structure.
Checks:
- At least 1 image file exists
- Title image (0_title.png) exists
- Each image has a corresponding .mp3 or .txt file
- Files follow naming convention
Returns:
(is_valid, list_of_errors)
"""
errors = []
# Gather all matching files
images, audios, texts = self._categorize_files(post_dir)
# Check: at least 1 image
if not images:
errors.append("No image files found. Need at least 0_title.png")
return False, errors
# Check: title image exists (index 0)
if 0 not in images:
errors.append("Missing title image: 0_title.png (must start with '0_')")
# Check: each image has a corresponding .mp3 or .txt file
for idx in sorted(images.keys()):
if idx not in audios and idx not in texts:
errors.append(
f"Missing audio/text for image #{idx}: "
f"provide '{idx}_title.mp3' (or .txt as fallback)"
)
# Check: text files (used as TTS fallback) are not empty
for idx, txt_path in texts.items():
if idx not in audios: # Only check .txt if no .mp3 exists
content = txt_path.read_text(encoding="utf-8").strip()
if not content:
errors.append(f"Text file is empty (and no .mp3 provided): {txt_path.name}")
return len(errors) == 0, errors
def list_status(self) -> List[dict]:
"""List all posts with their status.
Returns:
List of dicts with keys: post_id, num_images, num_audios, num_texts, status
"""
if not self.input_dir.exists():
return []
results = []
for post_dir in sorted(self.input_dir.iterdir()):
if not post_dir.is_dir() or post_dir.name.startswith("."):
continue
images, audios, texts = self._categorize_files(post_dir)
is_valid, errors = self.validate(post_dir)
# Determine status
if not images:
status = "empty"
elif not is_valid:
status = "incomplete"
else:
status = "ready"
results.append(
{
"post_id": post_dir.name,
"num_images": len(images),
"num_audios": len(audios),
"num_texts": len(texts),
"status": status,
"errors": errors,
}
)
return results
def _categorize_files(self, post_dir: Path) -> Tuple[Dict[int, Path], Dict[int, Path], Dict[int, Path]]:
"""Categorize files in a post directory into images, audios, and texts.
Returns:
(images_dict, audios_dict, texts_dict) where key is the index number
"""
images = {} # {0: Path("0_title.png"), ...}
audios = {} # {0: Path("0_title.mp3"), ...}
texts = {} # {0: Path("0_title.txt"), ...}
for f in post_dir.iterdir():
match = self.FILE_PATTERN.match(f.name)
if match:
idx = int(match.group(1))
ext = match.group(3).lower()
if ext in ("png", "jpg", "jpeg"):
images[idx] = f
elif ext == "mp3":
audios[idx] = f
elif ext == "txt":
texts[idx] = f
return images, audios, texts
def _build_post_object(self, post_dir: Path) -> dict:
"""Build the unified post_object from a validated post directory.
Returns:
dict with structure:
{
"post_id": str,
"platform": str,
"title": str,
"author": str,
"url": str,
"post_dir": str,
"screenshots": [
{
"index": int,
"type": "title" | "comment",
"image_path": str,
"text": str,
"audio_path": None,
"audio_duration": None,
},
...
],
"total_duration": 0,
"output_path": None,
}
"""
post_id = post_dir.name
# Read optional meta.json
meta = self._read_meta(post_dir)
# Categorize files
images, audios, texts = self._categorize_files(post_dir)
# Build screenshots list (sorted by index)
screenshots = []
for idx in sorted(images.keys()):
img_path = images[idx]
# Determine type from filename
match = self.FILE_PATTERN.match(img_path.name)
entry_type = match.group(2).lower() if match else "comment"
# Audio: prefer .mp3, fallback to .txt for TTS
audio_path = str(audios[idx]) if idx in audios else None
text_content = ""
if idx in texts:
text_content = texts[idx].read_text(encoding="utf-8").strip()
screenshots.append(
{
"index": idx,
"type": entry_type,
"image_path": str(img_path),
"text": text_content,
"audio_path": audio_path, # Pre-filled if .mp3 exists
"audio_duration": None,
}
)
# Use title text, meta title, or folder name
title = ""
if screenshots and screenshots[0]["text"]:
title = screenshots[0]["text"][:100]
elif meta.get("title"):
title = meta["title"]
else:
title = post_id
return {
"post_id": post_id,
"platform": meta.get("platform", "other"),
"title": title,
"author": meta.get("author", ""),
"url": meta.get("url", ""),
"post_dir": str(post_dir),
"screenshots": screenshots,
"total_duration": 0,
"output_path": None,
}
def _read_meta(self, post_dir: Path) -> dict:
"""Read meta.json if it exists, return empty dict otherwise."""
meta_path = post_dir / "meta.json"
if meta_path.exists():
try:
with open(meta_path, "r", encoding="utf-8") as f:
return json.load(f)
except (json.JSONDecodeError, IOError) as e:
print_substep(f"Warning: Could not read meta.json: {e}", style="yellow")
return {}
def create_post_folder(input_dir: str, post_id: str, platform: str = "reddit") -> Path:
"""Create a new post folder with template files.
Args:
input_dir: Base directory for manual posts
post_id: Name for the new post folder
platform: Source platform (reddit, threads, x, other)
Returns:
Path to the created folder
"""
post_dir = Path(input_dir) / post_id
post_dir.mkdir(parents=True, exist_ok=True)
# Create meta.json template
meta = {
"platform": platform,
"post_id": post_id,
"title": "",
"author": "",
"url": "",
"created_at": "",
"tags": [],
"notes": "",
}
meta_path = post_dir / "meta.json"
if not meta_path.exists():
with open(meta_path, "w", encoding="utf-8") as f:
json.dump(meta, f, indent=4, ensure_ascii=False)
print_step(f"Created post folder: {post_dir}")
print_substep("Next steps:", style="bold cyan")
print_substep(" 1. Add screenshots: 0_title.png, 1_comment.png, ...")
print_substep(" 2. Add audio files: 0_title.mp3, 1_comment.mp3, ...")
print_substep(" (Or use .txt files instead — TTS will generate audio)")
print_substep(" 3. (Optional) Edit meta.json with post details")
print_substep(f" 4. Run: python manual_main.py render {post_id}")
return post_dir

@ -0,0 +1,277 @@
"""
TTS Processor for the manual pipeline.
Takes a post_object (built by scanner.py), generates MP3 audio files
for each screenshot's text using the existing TTS engines, and updates
the post_object with audio paths and durations.
Reuses TTS engines from TTS/ module no code duplication.
"""
import re
from pathlib import Path
from typing import Tuple
from moviepy import AudioFileClip
from utils import settings
from utils.console import print_step, print_substep
from utils.voice import sanitize_text
class ManualTTSProcessor:
"""Processes text-to-speech for manual pipeline posts."""
def __init__(self, post_object: dict, max_length: int = 120):
"""
Args:
post_object: Post data from scanner.py
max_length: Maximum total audio length in seconds (default: 120s = 2 min)
"""
self.post = post_object
self.post_id = post_object["post_id"]
self.max_length = max_length
self.mp3_dir = Path(f"assets/temp/{self.post_id}/mp3")
self.tts_module = None
def process(self) -> dict:
"""Process audio for all screenshots.
For each screenshot:
- If .mp3 already provided (audio_path set by scanner) skip TTS, just measure duration
- If only .txt provided run TTS to generate .mp3
- If neither skip
Returns:
Updated post_object with audio_path and audio_duration filled in
"""
self.mp3_dir.mkdir(parents=True, exist_ok=True)
print_step("🔊 Processing audio files...")
total_duration = 0
processed_count = 0
tts_needed = False
for screenshot in self.post["screenshots"]:
idx = screenshot["index"]
# Case 1: .mp3 already provided — just measure duration
if screenshot.get("audio_path"):
try:
clip = AudioFileClip(screenshot["audio_path"])
duration = clip.duration
clip.close()
except Exception as e:
print_substep(f" ✗ Failed to read audio #{idx}: {e}", style="red")
duration = 0
screenshot["audio_duration"] = duration
total_duration += duration
processed_count += 1
print_substep(
f" ✓ #{idx}{duration:.1f}s (pre-recorded .mp3)",
style="green",
)
continue
# Case 2: Only .txt provided — need TTS
text = screenshot.get("text", "").strip()
if not text:
print_substep(
f" ⚠ Screenshot #{idx} has no audio or text, skipping.",
style="yellow",
)
continue
# Initialize TTS engine only when needed (lazy)
if not tts_needed:
print_substep(" 📝 Some entries need TTS generation...")
self.tts_module = self._get_tts_engine()
tts_needed = True
mp3_path = str(self.mp3_dir / f"{idx}.mp3")
# Sanitize and process text
clean_text = self._process_text(text)
if not clean_text or clean_text.isspace():
print_substep(
f" ⚠ Screenshot #{idx} text is empty after sanitization, skipping.",
style="yellow",
)
continue
# Handle long text by splitting
if len(clean_text) > self.tts_module.max_chars:
self._generate_split_audio(clean_text, idx, mp3_path)
else:
self._generate_audio(clean_text, mp3_path)
# Measure duration
try:
clip = AudioFileClip(mp3_path)
duration = clip.duration
clip.close()
except Exception as e:
print_substep(f" ✗ Failed to read audio #{idx}: {e}", style="red")
duration = 0
# Update screenshot entry
screenshot["audio_path"] = mp3_path
screenshot["audio_duration"] = duration
total_duration += duration
processed_count += 1
print_substep(
f" ✓ #{idx}{duration:.1f}s (TTS generated, {len(clean_text)} chars)",
style="green",
)
# Check max length
if total_duration > self.max_length and processed_count > 1:
print_substep(
f" ⚠ Total duration ({total_duration:.1f}s) exceeds max ({self.max_length}s). "
f"Stopping at {processed_count} clips.",
style="yellow",
)
break
self.post["total_duration"] = total_duration
print_substep(
f"{processed_count} audio clips ready, total: {total_duration:.1f}s",
style="bold green",
)
return self.post
def _get_tts_engine(self):
"""Initialize the TTS engine based on config.
Reuses the TTS engines from video_creation/voices.py
"""
from TTS.GTTS import GTTS
from TTS.TikTok import TikTok
from TTS.aws_polly import AWSPolly
from TTS.elevenlabs import elevenlabs
from TTS.openai_tts import OpenAITTS
from TTS.pyttsx import pyttsx
from TTS.streamlabs_polly import StreamlabsPolly
providers = {
"googletranslate": GTTS,
"awspolly": AWSPolly,
"streamlabspolly": StreamlabsPolly,
"tiktok": TikTok,
"pyttsx": pyttsx,
"elevenlabs": elevenlabs,
"openai": OpenAITTS,
}
voice_choice = settings.config["settings"]["tts"]["voice_choice"]
engine_class = providers.get(str(voice_choice).lower())
if engine_class is None:
print_substep(
f"Unknown TTS provider: {voice_choice}. Falling back to GoogleTranslate.",
style="yellow",
)
engine_class = GTTS
print_substep(f"Using TTS engine: {engine_class.__name__}")
return engine_class()
def _generate_audio(self, text: str, filepath: str):
"""Generate a single audio file from text."""
try:
random_voice = settings.config["settings"]["tts"].get("random_voice", False)
if str(settings.config["settings"]["tts"]["voice_choice"]).lower() == "googletranslate":
# GTTS doesn't support random_voice parameter
self.tts_module.run(text, filepath=filepath)
else:
self.tts_module.run(text, filepath=filepath, random_voice=random_voice)
except Exception as e:
print_substep(f" ✗ TTS generation failed: {e}", style="red")
raise
def _generate_split_audio(self, text: str, idx: int, final_path: str):
"""Split long text and concat into one audio file.
For texts longer than the TTS engine's max_chars limit.
"""
import os
# Split text into chunks at sentence boundaries
max_chars = self.tts_module.max_chars
chunks = [
x.group().strip()
for x in re.finditer(
r" *(((.|\\n){0," + str(max_chars) + r"})(\.|.$))", text
)
]
if not chunks:
chunks = [text[:max_chars]]
part_files = []
for part_idx, chunk in enumerate(chunks):
if not chunk or chunk.isspace():
continue
part_path = str(self.mp3_dir / f"{idx}-{part_idx}.part.mp3")
self._generate_audio(chunk, part_path)
part_files.append(part_path)
if not part_files:
return
# Concat using ffmpeg
list_path = str(self.mp3_dir / f"{idx}_list.txt")
with open(list_path, "w") as f:
for part in part_files:
f.write(f"file '{Path(part).name}'\n")
os.system(
f"ffmpeg -f concat -y -hide_banner -loglevel panic -safe 0 "
f"-i {list_path} -c copy {final_path}"
)
# Cleanup part files
for part in part_files:
try:
os.unlink(part)
except OSError:
pass
try:
os.unlink(list_path)
except OSError:
pass
def _process_text(self, text: str) -> str:
"""Clean and sanitize text for TTS.
- Removes lines starting with # (comments in txt files)
- Sanitizes using existing sanitize_text()
"""
# Remove comment lines (lines starting with #)
lines = text.split("\n")
lines = [line for line in lines if not line.strip().startswith("#")]
text = " ".join(lines).strip()
# Remove URLs
regex_urls = r"((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:@\-_=#]+\.([a-zA-Z]){2,6}([a-zA-Z0-9\.\&\/\?\:@\-_=#])*"
text = re.sub(regex_urls, " ", text)
# Replace newlines with periods for natural speech
text = text.replace("\n", ". ")
# Add period at end if missing
if text and text[-1] not in ".!?":
text += "."
# Clean repeated dots
text = re.sub(r"\.{2,}", ".", text)
text = re.sub(r"\.\s*\.", ".", text)
# Use existing sanitize_text for final cleanup
text = sanitize_text(text)
return text

@ -0,0 +1,479 @@
"""
Video Builder for the manual pipeline.
Takes a post_object (with TTS audio already generated), downloads/chops
background video and audio, overlays screenshots onto the background
with correct timing, and renders the final video.
Reuses background download functions from video_creation/background.py.
Uses libx264 encoder (CPU-based) by default.
"""
import math
import multiprocessing
import os
import re
import tempfile
import threading
import time
from pathlib import Path
from typing import Dict, Tuple
import ffmpeg
from moviepy import AudioFileClip, VideoFileClip
from rich.console import Console
from utils import settings
from utils.console import print_step, print_substep
console = Console()
class ProgressFfmpeg(threading.Thread):
"""Thread to monitor FFmpeg progress during rendering."""
def __init__(self, vid_duration_seconds, progress_update_callback):
threading.Thread.__init__(self, name="ProgressFfmpeg")
self.stop_event = threading.Event()
self.output_file = tempfile.NamedTemporaryFile(mode="w+", delete=False)
self.vid_duration_seconds = vid_duration_seconds
self.progress_update_callback = progress_update_callback
def run(self):
while not self.stop_event.is_set():
latest_progress = self.get_latest_ms_progress()
if latest_progress is not None:
completed_percent = latest_progress / self.vid_duration_seconds
self.progress_update_callback(completed_percent)
time.sleep(1)
def get_latest_ms_progress(self):
lines = self.output_file.readlines()
if lines:
for line in lines:
if "out_time_ms" in line:
out_time_ms_str = line.split("=")[1].strip()
if out_time_ms_str.isnumeric():
return float(out_time_ms_str) / 1000000.0
return None
def stop(self):
self.stop_event.set()
def __enter__(self):
self.start()
return self
def __exit__(self, *args, **kwargs):
self.stop()
class ManualVideoBuilder:
"""Builds the final video from screenshots + TTS audio + background."""
def __init__(self, post_object: dict, manual_config: dict):
"""
Args:
post_object: Post data with audio already generated (from tts_processor)
manual_config: Manual-specific config dict
"""
self.post = post_object
self.post_id = post_object["post_id"]
self.config = manual_config
self.temp_dir = Path(f"assets/temp/{self.post_id}")
# Video settings
self.W = int(self.config.get("resolution_w", settings.config["settings"].get("resolution_w", 1080)))
self.H = int(self.config.get("resolution_h", settings.config["settings"].get("resolution_h", 1920)))
self.opacity = float(self.config.get("opacity", settings.config["settings"].get("opacity", 0.9)))
self.encoder = self.config.get("encoder", "libx264")
# Background settings
self.bg_video_name = self.config.get(
"background_video",
settings.config["settings"]["background"].get("background_video", "random"),
)
self.bg_audio_name = self.config.get(
"background_audio",
settings.config["settings"]["background"].get("background_audio", "random"),
)
self.bg_audio_volume = float(
self.config.get(
"background_audio_volume",
settings.config["settings"]["background"].get("background_audio_volume", 0.15),
)
)
# Local background directories (user drops files here)
self.bg_video_dir = Path(self.config.get("background_video_dir", "assets/backgrounds/video"))
self.bg_audio_dir = Path(self.config.get("background_audio_dir", "assets/backgrounds/audio"))
# Output settings
self.output_dir = Path(self.config.get("output_dir", "manual_results"))
def build(self) -> str:
"""Build the final video.
Pipeline:
1. Filter screenshots that have audio
2. Download background video & audio (cached)
3. Chop background to match video length
4. Prepare background (crop to aspect ratio)
5. Concat all audio clips final audio track
6. Mix with background audio
7. Overlay screenshots onto background with timing
8. Render final video
Returns:
Path to the output video file
"""
# Filter screenshots with audio
clips = [s for s in self.post["screenshots"] if s.get("audio_path") and s.get("audio_duration")]
if not clips:
print_substep("No audio clips found. Cannot create video.", style="red")
return ""
total_duration = sum(s["audio_duration"] for s in clips)
video_length = math.ceil(total_duration)
console.log(f"[bold green] Video will be: {video_length} seconds long ({len(clips)} clips)")
# Ensure temp directory exists
self.temp_dir.mkdir(parents=True, exist_ok=True)
# Step 1: Download backgrounds
print_step("📥 Downloading backgrounds (if needed)...")
bg_config = self._get_background_config()
self._download_backgrounds(bg_config)
# Step 2: Chop backgrounds to video length
print_step("✂️ Chopping backgrounds to video length...")
self._chop_backgrounds(bg_config, video_length)
# Step 3: Prepare background (crop to aspect ratio)
print_step("🎬 Preparing background...")
bg_path = self._prepare_background()
background_clip = ffmpeg.input(bg_path)
# Step 4: Concat audio clips
print_step("🔊 Building audio track...")
audio_inputs = [ffmpeg.input(s["audio_path"]) for s in clips]
audio_concat = ffmpeg.concat(*audio_inputs, a=1, v=0)
audio_path = str(self.temp_dir / "audio.mp3")
ffmpeg.output(
audio_concat, audio_path, **{"b:a": "192k"}
).overwrite_output().run(quiet=True)
# Step 5: Merge with background audio
audio = ffmpeg.input(audio_path)
final_audio = self._merge_background_audio(audio)
# Step 6: Overlay screenshots
print_step("🖼️ Overlaying screenshots...")
screenshot_width = int((self.W * 45) // 100)
current_time = 0
for s in clips:
img_input = ffmpeg.input(s["image_path"])["v"].filter("scale", screenshot_width, -1)
img_overlay = img_input.filter("colorchannelmixer", aa=self.opacity)
background_clip = background_clip.overlay(
img_overlay,
enable=f"between(t,{current_time},{current_time + s['audio_duration']})",
x="(main_w-overlay_w)/2",
y="(main_h-overlay_h)/2",
)
current_time += s["audio_duration"]
# Scale to final resolution
background_clip = background_clip.filter("scale", self.W, self.H)
# Step 7: Render
print_step("🎥 Rendering the video...")
self.output_dir.mkdir(parents=True, exist_ok=True)
# Normalize filename
filename = self._normalize_filename(self.post.get("title", self.post_id))
output_path = str(self.output_dir / f"{filename}.mp4")
# Prevent path too long
if len(output_path) > 251:
output_path = output_path[:247] + ".mp4"
from tqdm import tqdm
pbar = tqdm(total=100, desc="Progress: ", bar_format="{l_bar}{bar}", unit=" %")
def on_update(progress):
status = round(progress * 100, 2)
old_percentage = pbar.n
pbar.update(status - old_percentage)
with ProgressFfmpeg(video_length, on_update) as progress:
try:
ffmpeg.output(
background_clip,
final_audio,
output_path,
f="mp4",
**{
"c:v": self.encoder,
"b:v": "20M",
"b:a": "192k",
"threads": multiprocessing.cpu_count(),
},
).overwrite_output().global_args(
"-progress", progress.output_file.name
).run(
quiet=True,
overwrite_output=True,
capture_stdout=False,
capture_stderr=False,
)
except ffmpeg.Error as e:
print_substep(f"FFmpeg error: {e.stderr.decode('utf8') if e.stderr else str(e)}", style="red")
pbar.close()
return ""
old_percentage = pbar.n
pbar.update(100 - old_percentage)
pbar.close()
# Save to tracking (shared videos.json)
self._save_tracking(bg_config, output_path)
# Cleanup temp files
print_step("🗑️ Removing temporary files...")
self._cleanup()
self.post["output_path"] = output_path
print_step(f"✅ Done! Video saved to: {output_path}")
return output_path
def _scan_local_files(self, directory: Path, extensions: tuple) -> list:
"""Scan a directory for files matching given extensions.
Returns:
List of Path objects, sorted by name
"""
if not directory.exists():
return []
files = []
for f in directory.iterdir():
if f.is_file() and f.suffix.lower() in extensions:
files.append(f)
return sorted(files)
def _get_background_config(self) -> dict:
"""Get background video & audio — local random or YouTube fallback.
Priority:
1. Scan local directories for video/audio files
2. If config is 'random' or local files exist pick random from local
3. If config is a specific name AND no local files use YouTube download
Returns:
dict with 'video_path', 'audio_path', 'video_credit', 'audio_credit'
"""
import random
result = {
"video_path": None,
"audio_path": None,
"video_credit": "unknown",
"audio_credit": "unknown",
"_youtube_video": None, # YouTube config tuple (for download if needed)
"_youtube_audio": None,
}
# --- Video background ---
video_exts = (".mp4", ".mkv", ".webm", ".avi", ".mov")
local_videos = self._scan_local_files(self.bg_video_dir, video_exts)
if local_videos:
# Pick random from local files
chosen = random.choice(local_videos)
result["video_path"] = str(chosen)
result["video_credit"] = chosen.stem
print_substep(f"🎬 Background video: {chosen.name} (random from {len(local_videos)} files)")
else:
# Fallback: YouTube download via background_options
try:
from video_creation.background import background_options
video_name = self.bg_video_name
if video_name == "random" or video_name not in background_options["video"]:
video_name = random.choice(list(background_options["video"].keys()))
result["_youtube_video"] = background_options["video"][video_name]
print_substep(f"🎬 Background video: {video_name} (YouTube)")
except Exception as e:
print_substep(f"⚠ Could not load YouTube backgrounds: {e}", style="yellow")
# --- Audio background ---
if self.bg_audio_volume > 0:
audio_exts = (".mp3", ".wav", ".ogg", ".m4a", ".flac", ".aac")
local_audios = self._scan_local_files(self.bg_audio_dir, audio_exts)
if local_audios:
chosen = random.choice(local_audios)
result["audio_path"] = str(chosen)
result["audio_credit"] = chosen.stem
print_substep(f"🎵 Background audio: {chosen.name} (random from {len(local_audios)} files)")
else:
try:
from video_creation.background import background_options
audio_name = self.bg_audio_name
if audio_name == "random" or audio_name not in background_options["audio"]:
audio_name = random.choice(list(background_options["audio"].keys()))
result["_youtube_audio"] = background_options["audio"][audio_name]
print_substep(f"🎵 Background audio: {audio_name} (YouTube)")
except Exception as e:
print_substep(f"⚠ Could not load YouTube audio backgrounds: {e}", style="yellow")
return result
def _download_backgrounds(self, bg_config: dict):
"""Download YouTube backgrounds only if no local files were found."""
if bg_config.get("_youtube_video"):
from video_creation.background import download_background_video
download_background_video(bg_config["_youtube_video"])
# Set video_path to the downloaded file
yt_cfg = bg_config["_youtube_video"]
bg_config["video_path"] = f"assets/backgrounds/video/{yt_cfg[2]}-{yt_cfg[1]}"
bg_config["video_credit"] = yt_cfg[2]
if bg_config.get("_youtube_audio"):
from video_creation.background import download_background_audio
download_background_audio(bg_config["_youtube_audio"])
yt_cfg = bg_config["_youtube_audio"]
bg_config["audio_path"] = f"assets/backgrounds/audio/{yt_cfg[2]}-{yt_cfg[1]}"
bg_config["audio_credit"] = yt_cfg[2]
def _chop_backgrounds(self, bg_config: dict, video_length: int):
"""Chop background video and audio to match the video length."""
from video_creation.background import get_start_and_end_times
# Chop background audio
if self.bg_audio_volume > 0 and bg_config.get("audio_path"):
audio_file = bg_config["audio_path"]
if Path(audio_file).exists():
background_audio = AudioFileClip(audio_file)
start_a, end_a = get_start_and_end_times(video_length, background_audio.duration)
chopped = background_audio.subclipped(start_a, end_a)
chopped.write_audiofile(str(self.temp_dir / "background.mp3"))
background_audio.close()
chopped.close()
# Chop background video
video_file = bg_config.get("video_path")
if video_file and Path(video_file).exists():
with VideoFileClip(video_file) as video:
start_v, end_v = get_start_and_end_times(video_length, video.duration)
chopped = video.subclipped(start_v, end_v)
chopped.write_videofile(str(self.temp_dir / "background.mp4"))
else:
print_substep("⚠ No background video file found!", style="red")
raise FileNotFoundError(f"Background video not found: {video_file}")
def _prepare_background(self) -> str:
"""Crop background video to correct aspect ratio (W:H).
Returns:
Path to the cropped background video
"""
output_path = str(self.temp_dir / "background_noaudio.mp4")
try:
(
ffmpeg.input(str(self.temp_dir / "background.mp4"))
.filter("crop", f"ih*({self.W}/{self.H})", "ih")
.output(
output_path,
an=None,
**{
"c:v": self.encoder,
"b:v": "20M",
"threads": multiprocessing.cpu_count(),
},
)
.overwrite_output()
.run(quiet=True)
)
except ffmpeg.Error as e:
print_substep(f"Background prepare error: {e}", style="red")
raise
return output_path
def _merge_background_audio(self, tts_audio):
"""Merge TTS audio with background audio.
Args:
tts_audio: FFmpeg audio input of the TTS track
Returns:
Merged audio stream or original if background audio disabled
"""
if self.bg_audio_volume == 0:
return tts_audio
bg_audio_path = self.temp_dir / "background.mp3"
if not bg_audio_path.exists():
return tts_audio
bg_audio = ffmpeg.input(str(bg_audio_path)).filter("volume", self.bg_audio_volume)
merged = ffmpeg.filter([tts_audio, bg_audio], "amix", duration="longest")
return merged
def _normalize_filename(self, name: str) -> str:
"""Normalize a string to be safe for filenames."""
# Remove problematic characters
name = re.sub(r'[?\\"%*:|<>]', "", name)
name = re.sub(r"[/]", " ", name)
name = name.strip()
if not name:
name = self.post_id
# Limit length
return name[:100]
def _save_tracking(self, bg_config: dict, output_path: str):
"""Save rendered video info to shared videos.json.
Handles missing file gracefully (creates it if needed).
Does NOT import from utils.videos to avoid praw dependency.
"""
import json
import time as t
videos_path = Path("./video_creation/data/videos.json")
videos_path.parent.mkdir(parents=True, exist_ok=True)
# Load existing data or start fresh
done_vids = []
if videos_path.exists():
try:
with open(videos_path, "r", encoding="utf-8") as f:
done_vids = json.load(f)
except (json.JSONDecodeError, IOError):
done_vids = []
# Skip if already recorded
if self.post_id in [v.get("id") for v in done_vids]:
return
payload = {
"subreddit": self.post.get("platform", "manual"),
"id": self.post_id,
"time": str(int(t.time())),
"background_credit": bg_config.get("video_credit", "unknown"),
"reddit_title": self.post.get("title", ""),
"filename": Path(output_path).name,
}
done_vids.append(payload)
with open(videos_path, "w", encoding="utf-8") as f:
json.dump(done_vids, f, ensure_ascii=False, indent=4)
def _cleanup(self):
"""Remove temporary files for this post."""
temp_path = f"assets/temp/{self.post_id}/"
if Path(temp_path).exists():
import shutil
shutil.rmtree(temp_path)

@ -0,0 +1,438 @@
#!/usr/bin/env python
"""
Manual Screenshot Video Pipeline Entry Point
Create videos from manually captured screenshots and text files,
without requiring any social media API access.
Supports screenshots from: Reddit, Threads (Meta), X (Twitter), or any platform.
Usage:
python manual_main.py init <post_id> [--platform reddit|threads|x|other]
python manual_main.py render <post_id>
python manual_main.py render --all
python manual_main.py list
"""
import argparse
import json
import sys
from os.path import exists
from pathlib import Path
import toml
from utils import settings
from utils.console import print_markdown, print_step, print_substep
from utils.ffmpeg_install import ffmpeg_install
from manual.scanner import PostScanner
from manual.tts_processor import ManualTTSProcessor
from manual.video_builder import ManualVideoBuilder
__VERSION__ = "1.0.0"
# ────────────────────────────────────────────────────────────────
# Configuration
# ────────────────────────────────────────────────────────────────
# Default config for manual pipeline (used when [manual] section not in config.toml)
MANUAL_DEFAULTS = {
"input_dir": "manual_posts",
"output_dir": "manual_results",
"encoder": "libx264",
"resolution_w": 1080,
"resolution_h": 1920,
"opacity": 0.9,
"background_video": "random",
"background_audio": "random",
"background_video_dir": "assets/backgrounds/video",
"background_audio_dir": "assets/backgrounds/audio",
"background_audio_volume": 0.1,
"max_video_length": 120,
}
# Full default settings.config that TTS engines and shared modules expect.
# This ensures the manual flow works even if config.toml is empty or missing sections.
_BASE_SETTINGS_DEFAULTS = {
"reddit": {
"creds": {
"client_id": "",
"client_secret": "",
"username": "",
"password": "",
"2fa": False,
},
"thread": {
"subreddit": "",
"post_id": "",
"max_comment_length": 500,
"min_comment_length": 1,
"post_lang": "vi",
"min_comments": 20,
"blocked_words": "",
},
},
"ai": {
"ai_similarity_enabled": False,
"ai_similarity_keywords": "",
},
"settings": {
"allow_nsfw": False,
"theme": "dark",
"times_to_run": 1,
"opacity": 0.9,
"storymode": False,
"storymodemethod": 1,
"storymode_max_length": 1000,
"resolution_w": 1080,
"resolution_h": 1920,
"zoom": 1,
"channel_name": "Reddit Tales",
"background": {
"background_video": "minecraft",
"background_audio": "lofi",
"background_audio_volume": 0.1,
"enable_extra_audio": False,
"background_thumbnail": False,
"background_thumbnail_font_family": "arial",
"background_thumbnail_font_size": 96,
"background_thumbnail_font_color": "255,255,255",
},
"tts": {
"voice_choice": "googletranslate",
"random_voice": False,
"elevenlabs_voice_name": "Bella",
"elevenlabs_api_key": "",
"aws_polly_voice": "Matthew",
"streamlabs_polly_voice": "Matthew",
"tiktok_voice": "en_us_001",
"tiktok_sessionid": "",
"python_voice": "1",
"py_voice_num": "2",
"silence_duration": 0.3,
"no_emojis": False,
"openai_api_url": "https://api.openai.com/v1/",
"openai_api_key": "",
"openai_voice_name": "alloy",
"openai_model": "tts-1",
},
},
}
def _deep_merge(base: dict, override: dict) -> dict:
"""Deep merge two dicts. Values in 'override' take priority."""
result = base.copy()
for key, value in override.items():
if key in result and isinstance(result[key], dict) and isinstance(value, dict):
result[key] = _deep_merge(result[key], value)
else:
result[key] = value
return result
def load_config() -> dict:
"""Load config and set up settings.config for TTS engines and backgrounds.
Strategy:
1. Start with full default config (so TTS engines always have what they need)
2. If config.toml exists and has content, deep-merge on top of defaults
3. Extract [manual] section for manual-specific settings
4. Set settings.config globally so shared modules (TTS, background, etc.) work
Returns:
dict: Manual-specific config merged with defaults
"""
# Start with complete defaults
config = _deep_merge({}, _BASE_SETTINGS_DEFAULTS)
# Try to load config.toml and merge on top
config_path = Path("config.toml")
if config_path.exists():
try:
file_config = toml.load(str(config_path))
if file_config: # Not empty
config = _deep_merge(config, file_config)
print_substep("Loaded config from config.toml", style="dim")
except Exception as e:
print_substep(f"Warning: Could not parse config.toml: {e}", style="yellow")
else:
print_substep(
"config.toml not found — using built-in defaults. "
"TTS will use GoogleTranslate (no API key needed).",
style="yellow",
)
# Set global settings.config so TTS engines and shared modules work
settings.config = config
# Build manual-specific config: defaults + [manual] section from config.toml
manual_config = {**MANUAL_DEFAULTS}
if "manual" in config:
manual_config.update(config["manual"])
return manual_config
# ────────────────────────────────────────────────────────────────
# Commands
# ────────────────────────────────────────────────────────────────
def cmd_init(args, manual_config):
"""Create a new post folder with template files."""
from manual.scanner import create_post_folder
post_id = args.post_id
platform = getattr(args, "platform", "reddit")
input_dir = manual_config["input_dir"]
post_dir = create_post_folder(input_dir, post_id, platform)
print_markdown(f"### Post folder created: `{post_dir}`")
def cmd_render(args, manual_config):
"""Render one or all posts into videos."""
scanner = PostScanner(input_dir=manual_config["input_dir"])
if args.all:
# Render all ready posts
posts = scanner.scan_all()
if not posts:
print_substep("No valid posts found in the input directory.", style="red")
return
# Filter out already rendered
posts_to_render = []
for post in posts:
if _is_already_done(post["post_id"]):
print_substep(f"{post['post_id']} — already rendered, skipping", style="blue")
else:
posts_to_render.append(post)
if not posts_to_render:
print_substep("All posts have already been rendered!", style="green")
return
print_step(f"📋 Rendering {len(posts_to_render)} posts...")
for i, post in enumerate(posts_to_render):
print_markdown(
f"### [{i+1}/{len(posts_to_render)}] Rendering: {post['post_id']}"
)
_render_single(post, manual_config)
else:
# Render single post
if not args.post_id:
print_substep("Please specify a post_id or use --all", style="red")
return
post = scanner.scan_one(args.post_id)
if post is None:
return # Error already printed by scanner
if _is_already_done(post["post_id"]) and not args.force:
print_substep(
f"Post '{post['post_id']}' already rendered. Use --force to re-render.",
style="yellow",
)
return
_render_single(post, manual_config)
def _render_single(post_object: dict, manual_config: dict):
"""Render a single post into a video.
Pipeline:
1. TTS: Convert text MP3 audio files
2. Video: Assemble screenshots + audio + background MP4
"""
post_id = post_object["post_id"]
print_step(f"🚀 Starting render for: {post_id}")
# Step 1: TTS
max_length = manual_config.get("max_video_length", 120)
tts = ManualTTSProcessor(post_object, max_length=max_length)
post_object = tts.process()
# Check if we have audio
clips_with_audio = [s for s in post_object["screenshots"] if s.get("audio_path")]
if not clips_with_audio:
print_substep("No audio generated. Check text files.", style="red")
return
# Step 2: Video build
builder = ManualVideoBuilder(post_object, manual_config)
output_path = builder.build()
if output_path:
print_markdown(f"### ✅ Video saved: `{output_path}`")
else:
print_substep("Video rendering failed.", style="red")
def cmd_list(args, manual_config):
"""List all posts and their status."""
from manual.scanner import PostScanner
scanner = PostScanner(input_dir=manual_config["input_dir"])
statuses = scanner.list_status()
if not statuses:
print_substep(
f"No posts found in '{manual_config['input_dir']}/'. "
f"Run 'python manual_main.py init <post_id>' to create one.",
style="yellow",
)
return
# Status emoji map
status_icons = {
"ready": "",
"incomplete": "⚠️",
"empty": "",
}
print_step("📋 Manual Posts Status")
print()
for s in statuses:
icon = status_icons.get(s["status"], "")
rendered = "🎬" if _is_already_done(s["post_id"]) else " "
print_substep(
f" {icon} {rendered} {s['post_id']:30s} "
f"| {s['num_images']} 🖼️ {s.get('num_audios', 0)} 🎵 {s['num_texts']} 📝 "
f"| {s['status']}",
style="bold" if s["status"] == "ready" else "",
)
if s["errors"]:
for err in s["errors"]:
print_substep(f"{err}", style="red")
print()
ready_count = sum(1 for s in statuses if s["status"] == "ready")
rendered_count = sum(1 for s in statuses if _is_already_done(s["post_id"]))
print_substep(
f" Total: {len(statuses)} posts | "
f"{ready_count} ready | "
f"{rendered_count} rendered",
style="bold cyan",
)
def _is_already_done(post_id: str) -> bool:
"""Check if a post has already been rendered (shared videos.json)."""
videos_path = "./video_creation/data/videos.json"
if not exists(videos_path):
return False
try:
with open(videos_path, "r", encoding="utf-8") as f:
done_videos = json.load(f)
return any(v.get("id") == post_id for v in done_videos)
except (json.JSONDecodeError, IOError):
return False
# ────────────────────────────────────────────────────────────────
# CLI
# ────────────────────────────────────────────────────────────────
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
prog="manual_main.py",
description="Manual Screenshot → Video Pipeline. "
"Create videos from screenshots captured from Reddit, Threads, X, or any platform.",
)
parser.add_argument(
"--version", action="version", version=f"%(prog)s {__VERSION__}"
)
subparsers = parser.add_subparsers(dest="command", help="Available commands")
# init command
init_parser = subparsers.add_parser("init", help="Create a new post folder with template files")
init_parser.add_argument("post_id", type=str, help="Name/ID for the post folder")
init_parser.add_argument(
"--platform",
type=str,
default="reddit",
choices=["reddit", "threads", "x", "other"],
help="Source platform (default: reddit)",
)
# render command
render_parser = subparsers.add_parser("render", help="Render post(s) into video(s)")
render_parser.add_argument(
"post_id", type=str, nargs="?", default=None, help="Post ID to render"
)
render_parser.add_argument(
"--all", action="store_true", help="Render all unrendered posts"
)
render_parser.add_argument(
"--force", action="store_true", help="Re-render even if already done"
)
# list command
subparsers.add_parser("list", help="List all posts and their status")
return parser
def main():
print(
"""
Manual Screenshot Video Pipeline v1.0.0
Supports: Reddit Threads X Any Platform
"""
)
parser = build_parser()
args = parser.parse_args()
if not args.command:
parser.print_help()
sys.exit(1)
# Check Python version
if sys.version_info.major != 3 or sys.version_info.minor not in [10, 11, 12]:
print("This program requires Python 3.10, 3.11, or 3.12.")
sys.exit(1)
# Check FFmpeg
ffmpeg_install()
# Load config
manual_config = load_config()
# Create input directory if it doesn't exist
input_dir = Path(manual_config["input_dir"])
input_dir.mkdir(parents=True, exist_ok=True)
# Dispatch command
commands = {
"init": cmd_init,
"render": cmd_render,
"list": cmd_list,
}
cmd_func = commands.get(args.command)
if cmd_func:
try:
cmd_func(args, manual_config)
except KeyboardInterrupt:
print("\nInterrupted by user.")
sys.exit(0)
except Exception as e:
print_substep(f"Error: {e}", style="red")
raise
else:
parser.print_help()
if __name__ == "__main__":
main()

@ -1,18 +1,18 @@
{
"__comment": "Supported Backgrounds Audio. Can add/remove background audio here...",
"lofi": [
"https://www.youtube.com/watch?v=LTphVIore3A",
"https://www.youtube.com/watch?v=Q7HjxOAU5Kc",
"lofi.mp3",
"Super Lofi World"
"Breaking Copyright"
],
"lofi-2":[
"https://www.youtube.com/watch?v=BEXL80LS0-I",
"https://www.youtube.com/watch?v=cTMOQiY0axo",
"lofi-2.mp3",
"stompsPlaylist"
"Breaking Copyright"
],
"chill-summer":[
"https://www.youtube.com/watch?v=EZE8JagnBI8",
"chill-summer.mp3",
"Mellow Vibes Radio"
"lofi-3":[
"https://www.youtube.com/watch?v=4sFVeqvJu-0",
"lofi-3.mp3",
"Chill - Copyright Free Music"
]
}

@ -1,17 +1,5 @@
{
"__comment": "Supported Backgrounds. Can add/remove background video here...",
"motor-gta": [
"https://www.youtube.com/watch?v=vw5L4xCPy9Q",
"bike-parkour-gta.mp4",
"Achy Gaming",
"center"
],
"rocket-league": [
"https://www.youtube.com/watch?v=2X9QGY__0II",
"rocket_league.mp4",
"Orbital Gameplay",
"center"
],
"minecraft": [
"https://www.youtube.com/watch?v=n_Dv4JMiwK8",
"parkour.mp4",
@ -24,40 +12,16 @@
"Achy Gaming",
"center"
],
"csgo-surf": [
"https://www.youtube.com/watch?v=E-8JlyO59Io",
"csgo-surf.mp4",
"Aki",
"center"
],
"cluster-truck": [
"https://www.youtube.com/watch?v=uVKxtdMgJVU",
"cluster_truck.mp4",
"No Copyright Gameplay",
"center"
],
"minecraft-2": [
"https://www.youtube.com/watch?v=Pt5_GSKIWQM",
"minecraft-2.mp4",
"Itslpsn",
"center"
],
"multiversus": [
"https://www.youtube.com/watch?v=66oK1Mktz6g",
"multiversus.mp4",
"MKIceAndFire",
"center"
],
"fall-guys": [
"https://www.youtube.com/watch?v=oGSsgACIc6Q",
"fall-guys.mp4",
"Throneful",
"center"
],
"steep": [
"https://www.youtube.com/watch?v=EnGiQrWBrko",
"steep.mp4",
"joel",
"roblox": [
"https://www.youtube.com/watch?v=TnYDtDiuXzw",
"roblox.mp4",
"Dope Gameplays",
"center"
]
}

Loading…
Cancel
Save