feat(manual): Implement TTS processing and video building for manual pipeline

- Added to handle text-to-speech for screenshots, generating MP3 files and updating post objects with audio paths and durations. - Introduced to assemble videos from screenshots and TTS audio, including background video and audio management. - Created as the entry point for the manual pipeline, supporting commands to initialize posts, render videos, and list post statuses. - Updated background audio and video configurations in JSON files, removing outdated entries and adding new options. - Adjusted file permissions for several utility scripts to ensure proper execution.
3 months ago · 2301f9c3b4
parent 569f25098a
commit 2301f9c3b4
17 changed files with 2402 additions and 49 deletions
--- a/.gitignore
+++ b/.gitignore
@ -246,3 +246,8 @@ video_creation/data/envvars.txt

 config.toml
 *.exe
+.agents
+
+# Manual pipeline
+manual_posts/
+manual_results/
--- a/BRAINSTORM_manual_screenshot_flow.md
+++ b/BRAINSTORM_manual_screenshot_flow.md
@ -0,0 +1,235 @@
+# 🧠 Brainstorming: Manual Screenshot → Video Pipeline
+
+> **Bối cảnh**: Không thể sử dụng Reddit API. Cần workflow mới cho phép user tự chụp screenshot từ **Reddit, Threads (Meta), X (Twitter)** rồi hệ thống tự động tạo video.
+> 
+> **Trạng thái**: ✅ **ĐÃ IMPLEMENT** — Phase 1 hoàn tất.
+
+---
+
+## 1. Phân Tích Vấn Đề Cốt Lõi
+
+### Flow hiện tại đang phụ thuộc Reddit API ở đâu?
+
+| Bước | Phụ thuộc API? | Chi tiết |
+|------|:---:|----------|
+| Lấy thread + comments | ✅ **YES** | `reddit/subreddit.py` — PRAW login, fetch post, filter comments |
+| **Text cho TTS** | ✅ **YES** | `TTS/engine_wrapper.py` — lấy text từ `reddit_object["comments"]` |
+| **Screenshot** | ✅ **YES** | `screenshot_downloader.py` — Playwright login Reddit, navigate, capture |
+| Background video/audio | ❌ NO | `background.py` — chỉ dùng YouTube, không liên quan Reddit |
+| Final video assembly | ❌ NO | `final_video.py` — chỉ dùng FFmpeg, nhưng cần `reddit_obj` dict |
+| Video tracking | ❌ NO | `videos.json` — chỉ lưu metadata |
+
+**Kết luận**: Cần thay thế hoàn toàn **3 bước đầu** (fetch → TTS text → screenshot) bằng flow thủ công.
+
+---
+
+## 2. Phương Án Đã Chọn: **.mp3 ưu tiên, .txt fallback**
+
+User cung cấp **file audio (.mp3) trực tiếp** + screenshots. TTS chỉ là fallback nếu chỉ có file `.txt`.
+
+```
+User chụp screenshot + cung cấp audio .mp3 → Video
+                       (hoặc .txt fallback → TTS → Video)
+```
+
+### Ưu tiên audio:
+
+```
+Có .mp3?  ──YES──▶  Dùng .mp3 trực tiếp (bỏ qua TTS)
+    │
+    NO
+    │
+Có .txt?  ──YES──▶  TTS sinh .mp3 từ text (fallback)
+    │
+    NO
+    │
+    ▼
+  ⚠ SKIP (screenshot không có audio)
+```
+
+---
+
+## 3. Cấu Trúc Đã Implement
+
+### 3.1 Thư mục
+
+```
+RedditVideoMakerBot/
+├── main.py                          # Flow cũ (giữ nguyên, không sửa)
+├── manual_main.py                   # 🆕 Entry point cho flow mới
+│
+├── manual/                          # 🆕 Module flow mới (tách biệt hoàn toàn)
+│   ├── __init__.py                  # Module docstring
+│   ├── scanner.py                   # Quét folder, validate (.png + .mp3 + .txt)
+│   ├── tts_processor.py             # Audio processor (.mp3 ưu tiên, TTS fallback)
+│   └── video_builder.py             # FFmpeg pipeline (libx264 CPU)
+│
+├── manual_posts/                    # 🆕 Thư mục input
+│   └── post_001/
+│       ├── meta.json                # (optional) metadata
+│       ├── 0_title.png              # Screenshot bài đăng
+│       ├── 0_title.mp3              # Audio (pre-recorded)
+│       ├── 1_comment.png            # Screenshot comment
+│       └── 1_comment.mp3            # Audio comment
+│
+├── manual_results/                  # 🆕 Thư mục output
+│   └── post_001.mp4
+│
+├── reddit/                          # Flow cũ (giữ nguyên)
+├── TTS/                             # Shared — dùng chung TTS engines (fallback)
+├── video_creation/                  # Flow cũ (giữ nguyên)
+└── utils/                           # Shared — dùng chung utilities
+```
+
+### 3.2 Quy Tắc Đặt Tên File
+
+```
+<số_thứ_tự>_<loại>.<ext>
+```
+
+| Pattern | Ý nghĩa | Bắt buộc? |
+|---------|----------|-----------|
+| `0_title.png` | Screenshot bài đăng chính | ✅ Bắt buộc |
+| `0_title.mp3` | Audio pre-recorded | ✅ (hoặc .txt) |
+| `0_title.txt` | Text TTS fallback | Fallback |
+| `1_comment.png` | Screenshot comment 1 | Optional |
+| `1_comment.mp3` | Audio comment 1 | ✅ (hoặc .txt) |
+| `meta.json` | Metadata | Optional |
+
+### 3.3 `post_object` — Data Structure
+
+```python
+post_object = {
+    "post_id": "post_001",
+    "platform": "reddit",              # reddit | threads | x | other
+    "title": "What's the most...",
+    "author": "u/example_user",
+    "url": "https://...",
+    "post_dir": "manual_posts/post_001",
+    
+    "screenshots": [
+        {
+            "index": 0,
+            "type": "title",
+            "image_path": "manual_posts/post_001/0_title.png",
+            "text": "",                         # Từ .txt (nếu có)
+            "audio_path": "manual_posts/post_001/0_title.mp3",  # Từ .mp3
+            "audio_duration": 3.5,              # Đo sau khi process
+        },
+        {
+            "index": 1,
+            "type": "comment",
+            "image_path": "manual_posts/post_001/1_comment.png",
+            "text": "",
+            "audio_path": "manual_posts/post_001/1_comment.mp3",
+            "audio_duration": 5.2,
+        },
+    ],
+    
+    "total_duration": 8.7,
+    "output_path": "manual_results/post_001.mp4",
+}
+```
+
+### 3.4 Flow Xử Lý
+
+```mermaid
+flowchart TD
+    A["manual_main.py"] --> B{"Command?"}
+    
+    B -->|"render"| G["Quét manual_posts/"]
+    G --> H["Validate: có ảnh + audio/text?"]
+    H --> I["Build post_object từ files"]
+    I --> J{"Có .mp3?"}
+    J -->|"YES"| K["Dùng .mp3 trực tiếp"]
+    J -->|"NO, có .txt"| L["TTS: text → .mp3"]
+    K --> M["Random pick background video + audio"]
+    L --> M
+    M --> N["FFmpeg: ghép ảnh + audio + background"]
+    N --> O["Output → manual_results/"]
+    
+    B -->|"render --all"| P["Loop qua tất cả folders"]
+    P --> G
+    
+    B -->|"init"| Q["Tạo folder + meta.json"]
+    B -->|"list"| R["Liệt kê posts + trạng thái"]
+```
+
+---
+
+## 4. So Sánh Flow Cũ vs Flow Mới
+
+| Aspect | Flow Cũ (`main.py`) | Flow Mới (`manual_main.py`) |
+|--------|---------------------|---------------------------|
+| **Data source** | Reddit API (PRAW) | Manual screenshots + audio files |
+| **Screenshot** | Playwright auto-capture | User tự chụp |
+| **Audio source** | TTS từ comment text | **User cung cấp .mp3** (hoặc .txt → TTS) |
+| **Platform** | Chỉ Reddit | Reddit + Threads + X + any |
+| **TTS engines** | Required | Optional (chỉ là fallback cho .txt) |
+| **Background** | Hardcoded YouTube list | **Random từ local folder** (YouTube fallback) |
+| **Encoder** | `h264_nvenc` (GPU) | `libx264` (CPU) |
+| **Config** | `config.toml` (template-based) | `config.toml` `[manual]` section + built-in defaults |
+| **Output** | `results/<subreddit>/` | `manual_results/` |
+| **Tracking** | `videos.json` | `videos.json` (shared) |
+
+---
+
+## 5. Config
+
+```toml
+[manual]
+input_dir = "manual_posts"
+output_dir = "manual_results"
+encoder = "libx264"
+resolution_w = 1080
+resolution_h = 1920
+opacity = 0.9
+background_video = "random"                        # "random" hoặc tên cụ thể (e.g. "minecraft")
+background_audio = "random"                        # "random" hoặc tên cụ thể (e.g. "lofi")
+background_video_dir = "assets/backgrounds/video"  # Thư mục chứa video nền local
+background_audio_dir = "assets/backgrounds/audio"  # Thư mục chứa nhạc nền local
+background_audio_volume = 0.15
+max_video_length = 120
+```
+
+**Lưu ý**: Config `[manual]` là optional. Nếu không có, dùng built-in defaults.
+
+### Background: Random từ local folder
+
+Bỏ file video/audio nền vào thư mục → hệ thống random chọn mỗi lần render:
+```
+assets/backgrounds/video/   ← Bỏ file .mp4/.mkv/.webm/.avi/.mov vào đây
+assets/backgrounds/audio/   ← Bỏ file .mp3/.wav/.ogg/.m4a/.flac vào đây
+```
+- **Có file local** → random chọn 1
+- **Không có file local** → fallback tải từ YouTube (danh sách cũ)
+
+TTS fallback dùng settings từ `[settings.tts]` (mặc định: GoogleTranslate, không cần API key).
+
+---
+
+## 6. Decisions Log
+
+| Câu hỏi | Quyết định |
+|----------|------------|
+| Audio source | **.mp3 ưu tiên**, .txt fallback sang TTS |
+| Background | **Random từ local folder**, YouTube fallback |
+| Encoder | `libx264` (CPU) — không có GPU NVIDIA |
+| Config | Section `[manual]` trong `config.toml` |
+| Thumbnail | Bỏ qua |
+| Video tracking | Chung file `videos.json` |
+| OCR (Phase 2) | EN + VI, dùng EasyOCR |
+
+---
+
+## 7. Phases
+
+| Phase | Trạng thái | Mô tả |
+|-------|:---:|--------|
+| **Phase 1: Core** | ✅ Done | .png + .mp3 → Video (+ .txt TTS fallback) |
+| Phase 2: OCR | ⏳ Planned | Auto-read text từ screenshots (EN + VI) |
+| Phase 3: GUI | ⏳ Planned | Flask web interface cho manual flow |
+
+---
+
+> 📝 **Tóm tắt**: Module `manual/` tách biệt hoàn toàn. Input chính: screenshots (.png) + audio (.mp3). TTS chỉ là fallback khi dùng .txt. Reuse background functions từ code cũ. Output video vào `manual_results/`. Platform-agnostic.
--- a/MANUAL_PIPELINE_GUIDE.md
+++ b/MANUAL_PIPELINE_GUIDE.md
@ -0,0 +1,139 @@
+# 📖 Hướng Dẫn Sử Dụng Manual Pipeline
+
+> **Tóm tắt**: Tạo video từ screenshots chụp tay (Reddit, Threads, X) mà không cần API.
+
+---
+
+## 🚀 Quick Start (3 bước)
+
+### Bước 1: Tạo folder cho post mới
+```bash
+cd /home/minhvu/projects/RedditVideoMakerBot
+python manual_main.py init my_first_post --platform reddit
+```
+
+Kết quả:
+```
+manual_posts/my_first_post/
+├── meta.json           ← (optional) metadata
+├── 0_title.txt         ← Chỉnh text cho TTS ở đây
+└── 1_comment.txt       ← Chỉnh text comment ở đây
+```
+
+### Bước 2: Thêm screenshots + text
+
+1. **Chụp screenshot** bài đăng → lưu thành `0_title.png`
+2. **Chụp screenshot** comments → lưu thành `1_comment.png`, `2_comment.png`, ...
+3. **Sửa file `.txt`** tương ứng — nhập nội dung text mà bot sẽ đọc thành giọng nói
+
+```
+manual_posts/my_first_post/
+├── meta.json
+├── 0_title.png         ← Screenshot bài đăng
+├── 0_title.txt         ← "What's the most underrated life hack?"
+├── 1_comment.png       ← Screenshot comment 1
+├── 1_comment.txt       ← "I always put my phone on airplane mode..."
+├── 2_comment.png       ← Screenshot comment 2
+└── 2_comment.txt       ← "Using a binder clip as a phone stand..."
+```
+
+> [!IMPORTANT]
+> Mỗi file `.png` **bắt buộc** phải có file `.txt` cùng số thứ tự. Số `0` luôn là title/bài đăng chính.
+
+### Bước 3: Render video
+```bash
+python manual_main.py render my_first_post
+```
+
+Video sẽ được lưu tại: `manual_results/my_first_post.mp4`
+
+---
+
+## 📋 Tất Cả Commands
+
+| Command | Mô tả |
+|---------|--------|
+| `python manual_main.py init <post_id>` | Tạo folder mới với template files |
+| `python manual_main.py init <post_id> --platform threads` | Tạo folder cho Threads post |
+| `python manual_main.py render <post_id>` | Render 1 post thành video |
+| `python manual_main.py render --all` | Render tất cả posts chưa render |
+| `python manual_main.py render <post_id> --force` | Re-render (dù đã render trước đó) |
+| `python manual_main.py list` | Liệt kê tất cả posts + trạng thái |
+
+---
+
+## 📁 Quy Tắc Đặt Tên File
+
+```
+<số_thứ_tự>_<loại>.<ext>
+```
+
+| File | Ý nghĩa |
+|------|----------|
+| `0_title.png` | Screenshot bài đăng chính (bắt buộc) |
+| `0_title.txt` | Text TTS cho bài đăng (bắt buộc) |
+| `1_comment.png` | Screenshot comment 1 |
+| `1_comment.txt` | Text TTS cho comment 1 |
+| `N_comment.png/txt` | Comment thứ N |
+| `meta.json` | Metadata (optional) |
+
+> [!TIP]  
+> File `.txt` hỗ trợ dòng comment bắt đầu bằng `#` — những dòng này sẽ bị bỏ qua khi TTS.
+
+---
+
+## ⚙️ Cấu Hình
+
+Thêm section `[manual]` vào `config.toml` (hoặc để trống — bot sẽ dùng defaults):
+
+```toml
+[manual]
+input_dir = "manual_posts"        # Thư mục input
+output_dir = "manual_results"     # Thư mục output
+encoder = "libx264"               # CPU encoder (hoặc h264_nvenc nếu có GPU)
+resolution_w = 1080               # Width video
+resolution_h = 1920               # Height video (1080x1920 = portrait)
+opacity = 0.9                     # Độ trong suốt screenshot overlay
+background_video = "minecraft"    # Video nền
+background_audio = "lofi"         # Audio nền
+background_audio_volume = 0.15    # Âm lượng audio nền (0 = tắt)
+max_video_length = 120            # Max thời lượng video (giây)
+```
+
+TTS engine được lấy từ section `[settings.tts]` trong `config.toml`. Mặc định dùng **GoogleTranslate** (không cần API key).
+
+---
+
+## 🏗️ Kiến Trúc Module
+
+```
+manual/
+├── __init__.py          # Module docstring
+├── scanner.py           # Quét folders, validate, build post_object
+├── tts_processor.py     # TTS: text → MP3 (reuse TTS/ engines)
+└── video_builder.py     # FFmpeg: screenshots + audio → video
+
+manual_main.py           # CLI entry point (init, render, list)
+```
+
+**Hoàn toàn tách biệt** với flow cũ (`main.py`). Không sửa bất kỳ file nào của flow cũ.
+
+### Files đã tạo/sửa
+
+| File | Action | Mô tả |
+|------|--------|--------|
+| [manual/__init__.py](file:///home/minhvu/projects/RedditVideoMakerBot/manual/__init__.py) | 🆕 Created | Module init |
+| [manual/scanner.py](file:///home/minhvu/projects/RedditVideoMakerBot/manual/scanner.py) | 🆕 Created | Folder scanner & validator |
+| [manual/tts_processor.py](file:///home/minhvu/projects/RedditVideoMakerBot/manual/tts_processor.py) | 🆕 Created | TTS processor |
+| [manual/video_builder.py](file:///home/minhvu/projects/RedditVideoMakerBot/manual/video_builder.py) | 🆕 Created | Video assembler |
+| [manual_main.py](file:///home/minhvu/projects/RedditVideoMakerBot/manual_main.py) | 🆕 Created | CLI entry point |
+| [.gitignore](file:///home/minhvu/projects/RedditVideoMakerBot/.gitignore) | ✏️ Updated | Thêm `manual_posts/`, `manual_results/` |
+
+---
+
+## ⚠️ Lưu Ý
+
+1. **FFmpeg** phải được cài sẵn trên hệ thống
+2. **Background video** sẽ tự động tải từ YouTube lần đầu (cần internet)
+3. Config.toml có thể trống — bot dùng built-in defaults (GoogleTranslate TTS)
+4. Encoder mặc định là `libx264` (CPU) — phù hợp máy không có GPU NVIDIA
--- a/PROJECT_INIT.md
+++ b/PROJECT_INIT.md
@ -0,0 +1,483 @@
+# 📦 RedditVideoMakerBot — Project Init Documentation
+
+> **Version**: 3.4.0  
+> **Author gốc**: Lewis Menelaws & [TMRRW](https://tmrrwinc.ca)  
+> **License**: GPL + Roboto Fonts (Apache 2.0)  
+> **Python**: 3.10 / 3.11 / 3.12  
+
+---
+
+## 1. Tổng Quan
+
+RedditVideoMakerBot là một công cụ tự động hóa việc tạo video ngắn (TikTok/YouTube Shorts/Instagram Reels) từ các bài đăng trên Reddit. Bot sẽ:
+
+1. **Lấy bài đăng** từ subreddit (qua Reddit API / PRAW)
+2. **Chuyển text thành giọng nói** (TTS — 7 engine khác nhau)
+3. **Chụp screenshot** bài đăng/comments bằng Playwright
+4. **Tải & cắt video/audio nền** từ YouTube
+5. **Ghép tất cả** thành video hoàn chỉnh bằng FFmpeg
+
+Kết quả cuối cùng: file `.mp4` trong thư mục `results/<subreddit>/`.
+
+---
+
+## 2. Cấu Trúc Thư Mục
+
+```
+RedditVideoMakerBot/
+├── main.py                         # 🚀 Entry point chính
+├── GUI.py                          # 🖥️ Web GUI (Flask, port 4000)
+├── config.toml                     # ⚙️ File cấu hình (user-generated)
+├── ptt.py                          # 🔊 Helper script để liệt kê system voices
+├── requirements.txt                # 📦 Python dependencies
+├── Dockerfile                      # 🐳 Docker support (python:3.10-slim)
+├── build.sh / run.sh / run.bat     # 📜 Scripts chạy nhanh
+├── install.sh                      # 📜 Auto-installer (Linux/macOS)
+│
+├── reddit/                         # 📡 Module lấy dữ liệu từ Reddit
+│   └── subreddit.py                #    Đăng nhập Reddit, lấy threads & comments
+│
+├── TTS/                            # 🗣️ Module Text-to-Speech (7 engines)
+│   ├── engine_wrapper.py           #    TTSEngine — wrapper chung cho tất cả TTS
+│   ├── TikTok.py                   #    TikTok TTS API
+│   ├── aws_polly.py                #    AWS Polly (boto3)
+│   ├── elevenlabs.py               #    ElevenLabs API
+│   ├── openai_tts.py               #    OpenAI TTS API
+│   ├── GTTS.py                     #    Google Translate TTS (gTTS)
+│   ├── pyttsx.py                   #    pyttsx3 (offline, system voices)
+│   └── streamlabs_polly.py         #    Streamlabs Polly
+│
+├── video_creation/                 # 🎬 Module tạo video
+│   ├── voices.py                   #    Orchestrator — chọn TTS provider & chạy
+│   ├── screenshot_downloader.py    #    Chụp screenshot Reddit bằng Playwright
+│   ├── background.py               #    Tải & cắt background video/audio (yt-dlp)
+│   ├── final_video.py              #    Ghép tất cả thành video (FFmpeg pipeline)
+│   └── data/                       #    Cookie files + videos.json (tracking)
+│       ├── cookie-dark-mode.json
+│       ├── cookie-light-mode.json
+│       └── videos.json
+│
+├── utils/                          # 🛠️ Utilities
+│   ├── settings.py                 #    Đọc/validate config.toml theo template
+│   ├── .config.template.toml       #    Template cấu hình (định nghĩa tất cả fields)
+│   ├── console.py                  #    Rich console helpers (print_step, handle_input...)
+│   ├── ai_methods.py               #    AI similarity sorting (sentence-transformers)
+│   ├── subreddit.py                #    Logic chọn post chưa làm + bộ lọc
+│   ├── voice.py                    #    sanitize_text(), rate limit, sleep_until()
+│   ├── videos.py                   #    check_done(), save_data() — tracking
+│   ├── cleanup.py                  #    Xóa temp files
+│   ├── ffmpeg_install.py           #    Tự động cài FFmpeg nếu chưa có
+│   ├── imagenarator.py             #    Render ảnh cho storymode method 1
+│   ├── thumbnail.py                #    Tạo thumbnail cho video
+│   ├── fonts.py                    #    Font size helpers
+│   ├── id.py                       #    extract_id() — sanitize reddit thread ID
+│   ├── posttextparser.py           #    Phân tách post text thành các đoạn
+│   ├── playwright.py               #    Helper clear cookies
+│   ├── version.py                  #    Check version mới trên GitHub
+│   ├── gui_utils.py                #    Utils cho Flask GUI
+│   ├── background_videos.json      #    Danh sách background videos (YouTube URLs)
+│   └── background_audios.json      #    Danh sách background audios (YouTube URLs)
+│
+├── GUI/                            # 🌐 Flask Templates (HTML)
+│   ├── layout.html                 #    Base template
+│   ├── index.html                  #    Trang chủ — danh sách videos đã tạo
+│   ├── settings.html               #    Trang cấu hình
+│   ├── backgrounds.html            #    Quản lý backgrounds
+│   └── voices/                     #    Voice sample files
+│
+├── fonts/                          # 🔤 Roboto font files
+│   ├── Roboto-Regular.ttf
+│   ├── Roboto-Bold.ttf
+│   ├── Roboto-Medium.ttf
+│   ├── Roboto-Black.ttf
+│   └── LICENSE.txt
+│
+├── assets/                         # 🎨 Static assets
+│   ├── title_template.png          #    Template ảnh cho fancy thumbnail
+│   └── backgrounds/                #    Downloaded background files (video/audio)
+│
+├── results/                        # 📁 Output videos (auto-created)
+│   └── <subreddit>/
+│       ├── <video>.mp4
+│       ├── OnlyTTS/                #    Video không có background audio
+│       └── thumbnails/             #    Generated thumbnails
+│
+└── threads/                        # 📂 (Unused/placeholder)
+```
+
+---
+
+## 3. Pipeline Xử Lý (Luồng Chính)
+
+```mermaid
+flowchart TD
+    A["main.py — Entry Point"] --> B["1. get_subreddit_threads()"]
+    B --> C["2. save_text_to_mp3()"]
+    C --> D["3. get_screenshots_of_reddit_posts()"]
+    D --> E["4. download/chop backgrounds"]
+    E --> F["5. make_final_video()"]
+    F --> G["results/subreddit/video.mp4"]
+
+    B -.-> B1["reddit/subreddit.py"]
+    B1 -.-> B2["PRAW — Reddit API"]
+    B1 -.-> B3["utils/subreddit.py — filter logic"]
+    B1 -.-> B4["utils/ai_methods.py — similarity sort"]
+
+    C -.-> C1["video_creation/voices.py"]
+    C1 -.-> C2["TTS/engine_wrapper.py"]
+    C2 -.-> C3["7 TTS Engines"]
+
+    D -.-> D1["video_creation/screenshot_downloader.py"]
+    D1 -.-> D2["Playwright — Headless Chrome"]
+
+    E -.-> E1["video_creation/background.py"]
+    E1 -.-> E2["yt-dlp — YouTube download"]
+
+    F -.-> F1["video_creation/final_video.py"]
+    F1 -.-> F2["FFmpeg — Video assembly"]
+```
+
+### Bước 1: Lấy Reddit Thread (`reddit/subreddit.py`)
+
+- Đăng nhập Reddit qua **PRAW** (client_id, client_secret, username, password)
+- Hỗ trợ **2FA** (nhập code thủ công)
+- Chọn post theo các cách:
+  - **Post ID cụ thể** (từ config, hỗ trợ nhiều ID phân cách bằng `+`)
+  - **AI Similarity** — dùng `sentence-transformers/all-MiniLM-L6-v2` so sánh tương đồng với keywords
+  - **Random** từ `subreddit.hot(limit=25)`
+- **Bộ lọc** (trong `utils/subreddit.py`):
+  - Skip posts đã làm (kiểm tra `videos.json`)
+  - Skip NSFW (nếu `allow_nsfw = false`)
+  - Skip pinned posts
+  - Skip posts chứa **blocked words**
+  - Skip posts ít hơn `min_comments`
+  - Storymode: kiểm tra `selftext` length
+- Thu thập comments (filter theo `min/max_comment_length`, skip deleted/removed/stickied)
+- **Output**: Dict chứa `thread_url`, `thread_title`, `thread_id`, `is_nsfw`, `comments[]` hoặc `thread_post`
+
+### Bước 2: Text-to-Speech (`video_creation/voices.py` + `TTS/`)
+
+**7 TTS Providers** với `max_chars` khác nhau:
+
+| Provider | Class | Max Chars | API Key Required | Notes |
+|----------|-------|-----------|------------------|-------|
+| **TikTok** | `TikTok` | 200 | Session ID | Dùng TikTok unofficial API |
+| **Google Translate** | `GTTS` | 5,000 | Không | Dùng gTTS library |
+| **AWS Polly** | `AWSPolly` | 3,000 | AWS Profile | Neural engine, 15 voices |
+| **Streamlabs Polly** | `StreamlabsPolly` | 550 | Không | Free Polly wrapper |
+| **ElevenLabs** | `elevenlabs` | 2,500 | API Key | Multilingual v1 model |
+| **OpenAI** | `OpenAITTS` | 4,096 | API Key | tts-1, tts-1-hd, gpt-4o-mini-tts |
+| **pyttsx3** | `pyttsx` | 5,000 | Không | Offline, system voices |
+
+**TTSEngine wrapper** (`TTS/engine_wrapper.py`):
+- Nhận reddit object → tạo MP3 cho title + mỗi comment
+- Tự động **split** text dài hơn `max_chars` thành nhiều phần, dùng FFmpeg concat
+- Thêm **silence** giữa các phần (`silence_duration`, mặc định 0.3s)
+- Sanitize text: xóa URLs, ký tự đặc biệt, thay `+` → "plus", `&` → "and"
+- Hỗ trợ **dịch** sang ngôn ngữ khác (qua `translators` library)
+- Tính tổng `length` audio → dùng cho video length
+- **Max video length**: mặc định 50 giây (hardcoded `DEFAULT_MAX_LENGTH`)
+- **Output**: MP3 files trong `assets/temp/<thread_id>/mp3/`
+
+### Bước 3: Screenshot Reddit Posts (`video_creation/screenshot_downloader.py`)
+
+- Dùng **Playwright** (Chromium headless)
+- **Login** vào Reddit (username/password)
+- Truy cập thread URL trên `new.reddit.com`
+- Hỗ trợ **Dark/Light/Transparent** theme (load cookies tương ứng)
+- Chụp screenshot:
+  - **Title** → `assets/temp/<id>/png/title.png`
+  - **Comments** → `assets/temp/<id>/png/comment_<i>.png`
+  - **Story content** → `assets/temp/<id>/png/story_content.png`
+- Hỗ trợ **zoom** (scale browser)
+- Hỗ trợ **dịch** text trước khi chụp
+- Xử lý NSFW warning popup
+- **Storymode method 1**: thay vì screenshot, dùng `imagemaker()` render ảnh từ text bằng PIL
+
+### Bước 4: Background Video/Audio (`video_creation/background.py`)
+
+**Background Videos** (10 options):
+| Name | Source | Credit |
+|------|--------|--------|
+| minecraft | YouTube parkour | bbswitzer |
+| minecraft-2 | YouTube | Itslpsn |
+| gta | GTA stunt race | Achy Gaming |
+| motor-gta | Bike parkour GTA | Achy Gaming |
+| rocket-league | Rocket League | Orbital Gameplay |
+| csgo-surf | CSGO Surf | Aki |
+| cluster-truck | Cluster Truck | No Copyright Gameplay |
+| multiversus | MultiVersus | MKIceAndFire |
+| fall-guys | Fall Guys | Throneful |
+| steep | Steep | joel |
+
+**Background Audios** (3 options): `lofi`, `lofi-2`, `chill-summer`
+
+- Tải bằng **yt-dlp** (chỉ lần đầu, cache ở `assets/backgrounds/`)
+- **Cắt ngẫu nhiên** đoạn video/audio dài bằng video length
+- Output: `assets/temp/<id>/background.mp4` và `background.mp3`
+
+### Bước 5: Final Video (`video_creation/final_video.py`)
+
+- **Concat** tất cả audio clips → `assets/temp/<id>/audio.mp3`
+- **Merge** background audio (volume configurable, mặc định 0.15)
+- **Prepare background**: crop video nền theo tỉ lệ `W/H` (mặc định 1080x1920 — portrait)
+- **Tạo fancy thumbnail**: lấy `title_template.png`, stretch middle section, vẽ title text lên
+- **Overlay** screenshots lên background video theo thời gian audio clips
+  - Mỗi screenshot hiện trong khoảng thời gian tương ứng với audio clip của nó
+  - Hỗ trợ `opacity` (mặc định 0.9)
+- **Draw credit text** ở góc dưới phải
+- **Render** bằng FFmpeg:
+  - Codec: `h264_nvenc` (NVIDIA GPU acceleration)
+  - Video bitrate: 20Mbps
+  - Audio bitrate: 192kbps
+  - Threads: `multiprocessing.cpu_count()`
+- **Optional**: Render thêm bản "OnlyTTS" (không có background audio)
+- **Save metadata** vào `videos.json`
+- **Cleanup** temp files
+- **Output**: `results/<subreddit>/<normalized_title>.mp4`
+
+---
+
+## 4. Cấu Hình (`config.toml`)
+
+Cấu hình được validate tự động dựa trên template `utils/.config.template.toml`. Khi chạy lần đầu hoặc thiếu field, bot sẽ hỏi user nhập.
+
+### `[reddit.creds]` — Thông tin đăng nhập Reddit
+| Key | Type | Required | Mô tả |
+|-----|------|----------|-------|
+| `client_id` | string | ✅ | Reddit App ID (12-30 chars) |
+| `client_secret` | string | ✅ | Reddit App Secret (20-40 chars) |
+| `username` | string | ✅ | Tên đăng nhập Reddit (3-20 chars) |
+| `password` | string | ✅ | Mật khẩu Reddit |
+| `2fa` | bool | ❌ | Bật 2FA? Default: `false` |
+
+### `[reddit.thread]` — Cấu hình bài đăng
+| Key | Type | Default | Mô tả |
+|-----|------|---------|-------|
+| `subreddit` | string | — | Subreddit name (hỗ trợ `+` cho nhiều sub) |
+| `post_id` | string | `""` | Post ID cụ thể (hỗ trợ `+` cho nhiều ID) |
+| `random` | bool | `false` | Random thread? |
+| `max_comment_length` | int | `500` | Max ký tự/comment |
+| `min_comment_length` | int | `1` | Min ký tự/comment |
+| `post_lang` | string | `""` | Ngôn ngữ dịch (VD: `vi`, `es`, `ja`) |
+| `min_comments` | int | `20` | Min số comments của post |
+| `blocked_words` | string | `""` | Comma-separated blocked words |
+
+### `[ai]` — AI Similarity
+| Key | Type | Default | Mô tả |
+|-----|------|---------|-------|
+| `ai_similarity_enabled` | bool | `false` | Bật sorting theo similarity |
+| `ai_similarity_keywords` | string | — | Keywords phân cách bằng dấu phẩy |
+
+### `[settings]` — Cài đặt chung
+| Key | Type | Default | Mô tả |
+|-----|------|---------|-------|
+| `allow_nsfw` | bool | `false` | Cho phép NSFW? |
+| `theme` | string | `"dark"` | `dark` / `light` / `transparent` |
+| `times_to_run` | int | `1` | Số lần chạy liên tiếp |
+| `opacity` | float | `0.9` | Opacity overlayed comments (0-1) |
+| `storymode` | bool | `false` | Chỉ đọc title + post content |
+| `storymodemethod` | int | `1` | `0`: 1 ảnh cố định, `1`: ảnh fancy |
+| `storymode_max_length` | int | `1000` | Max ký tự cho storymode |
+| `resolution_w` | int | `1080` | Width video (pixels) |
+| `resolution_h` | int | `1920` | Height video (pixels) |
+| `zoom` | float | `1` | Browser zoom level (0.1-2.0) |
+| `channel_name` | string | `"Reddit Tales"` | Tên kênh hiển thị trên thumbnail |
+
+### `[settings.background]` — Background
+| Key | Type | Default | Mô tả |
+|-----|------|---------|-------|
+| `background_video` | string | `"minecraft"` | Video nền |
+| `background_audio` | string | `"lofi"` | Audio nền |
+| `background_audio_volume` | float | `0.15` | Âm lượng audio nền (0=tắt) |
+| `enable_extra_audio` | bool | `false` | Render thêm bản không có bg audio |
+| `background_thumbnail` | bool | `false` | Tạo thumbnail? |
+| `background_thumbnail_font_*` | — | — | Font family/size/color cho thumbnail |
+
+### `[settings.tts]` — Text-to-Speech
+| Key | Type | Default | Mô tả |
+|-----|------|---------|-------|
+| `voice_choice` | string | `"tiktok"` | TTS provider |
+| `random_voice` | bool | `true` | Random voice mỗi comment |
+| `silence_duration` | float | `0.3` | Khoảng lặng giữa các TTS (giây) |
+| `no_emojis` | bool | `false` | Xóa emojis? |
+| `tiktok_voice` | string | `"en_us_001"` | Voice cho TikTok TTS |
+| `tiktok_sessionid` | string | — | TikTok session ID |
+| `elevenlabs_voice_name` | string | `"Bella"` | Voice cho ElevenLabs |
+| `elevenlabs_api_key` | string | — | ElevenLabs API Key |
+| `aws_polly_voice` | string | `"Matthew"` | Voice cho AWS Polly |
+| `streamlabs_polly_voice` | string | `"Matthew"` | Voice cho Streamlabs |
+| `openai_api_url` | string | `"https://api.openai.com/v1/"` | OpenAI API endpoint |
+| `openai_api_key` | string | — | OpenAI API Key |
+| `openai_voice_name` | string | `"alloy"` | Voice cho OpenAI TTS |
+| `openai_model` | string | `"tts-1"` | Model OpenAI TTS |
+| `python_voice` | string | `"1"` | Index system voice |
+| `py_voice_num` | string | `"2"` | Số system voices |
+
+---
+
+## 5. Dependencies (`requirements.txt`)
+
+| Package | Version | Vai trò |
+|---------|---------|---------|
+| `praw` | 7.8.1 | Reddit API wrapper |
+| `playwright` | 1.49.1 | Browser automation (screenshot) |
+| `moviepy` | 2.2.1 | Video/audio clip processing |
+| `ffmpeg-python` | 0.2.0 | FFmpeg pipeline builder |
+| `yt-dlp` | 2025.10.22 | YouTube video/audio downloader |
+| `gTTS` | 2.5.4 | Google Translate TTS |
+| `pyttsx3` | 2.98 | Offline system TTS |
+| `elevenlabs` | 1.57.0 | ElevenLabs TTS SDK |
+| `boto3` / `botocore` | 1.36.8 | AWS Polly TTS |
+| `requests` | 2.32.3 | HTTP requests (TikTok/Streamlabs API) |
+| `rich` | 13.9.4 | Terminal formatting (progress bars, panels) |
+| `toml` / `tomlkit` | 0.10.2 / 0.13.2 | Config file parsing |
+| `translators` | 5.9.9 | Multi-language translation |
+| `Pillow` (PIL) | — | Image processing (thumbnails, storymode) |
+| `clean-text` | 0.6.0 | Text cleaning (emoji removal) |
+| `unidecode` | 1.4.0 | Unicode → ASCII |
+| `spacy` | 3.8.7 | NLP (text processing) |
+| `torch` | 2.7.0 | PyTorch (AI similarity) |
+| `transformers` | 4.52.4 | HuggingFace transformers (sentence-transformers) |
+| `Flask` | 3.1.1 | Web GUI |
+
+---
+
+## 6. Hai Chế Độ Hoạt Động
+
+### Mode 1: Comment Mode (mặc định)
+- Lấy **top comments** từ Reddit thread
+- Chuyển mỗi comment thành MP3 riêng
+- Chụp screenshot mỗi comment
+- Video hiển thị comments lần lượt
+
+### Mode 2: Story Mode (`storymode = true`)
+- Chỉ đọc **title + selftext** của post
+- Hai method:
+  - **Method 0**: Screenshot toàn bộ post content → 1 ảnh cố định
+  - **Method 1**: Parse text thành từng đoạn → render từng ảnh riêng bằng PIL → hiệu ứng fancy
+
+---
+
+## 7. GUI Web (`GUI.py`)
+
+- Framework: **Flask** (port 4000)
+- Routes:
+  - `/` — Danh sách videos đã tạo (từ `videos.json`)
+  - `/settings` — Form chỉnh sửa `config.toml`
+  - `/backgrounds` — Quản lý background videos
+  - `/background/add` — Thêm background mới
+  - `/background/delete` — Xóa background
+  - `/results/<path>` — Serve video files
+  - `/voices/<path>` — Serve voice samples
+- Tự động mở browser khi chạy
+
+---
+
+## 8. Lưu Ý Kỹ Thuật Quan Trọng
+
+### ⚠️ FFmpeg Encoder
+- Code sử dụng **`h264_nvenc`** (NVIDIA GPU encoder) — yêu cầu có GPU NVIDIA
+- Nếu không có GPU, cần sửa thành `libx264`
+
+### ⚠️ Cleanup Bug
+- `utils/cleanup.py` sử dụng path `../assets/temp/{reddit_id}/` (relative path có `..`) — có thể gây lỗi tùy working directory
+
+### ⚠️ Security Concerns
+- `utils/settings.py` sử dụng `eval()` 2 lần (dòng 33, 81) — đánh dấu `fixme` nhưng chưa sửa
+- `utils/console.py` cũng dùng `eval()` (dòng 105)
+
+### ⚠️ Hardcoded Values
+- `DEFAULT_MAX_LENGTH = 50` (seconds) trong `TTS/engine_wrapper.py`
+- NSFW button selector hardcoded với post ID cụ thể (`#t3_12hmbug`) trong screenshot_downloader
+- `title_template.png` username position hardcoded tại `(205, 825)`
+
+### ⚠️ Video Tracking
+- Videos đã tạo được lưu trong `video_creation/data/videos.json`
+- Mỗi entry: `{subreddit, id, time, background_credit, reddit_title, filename}`
+- Bot sẽ skip posts đã có trong list (trừ khi force bằng `post_id` config)
+
+### ⚠️ AI Similarity Feature
+- Dùng `sentence-transformers/all-MiniLM-L6-v2` model
+- Tải model lần đầu chạy (~80MB)
+- Cosine similarity giữa thread titles+content với user keywords
+- Bật bằng `ai_similarity_enabled = true`
+
+---
+
+## 9. Cách Chạy
+
+```bash
+# 1. Clone & setup
+git clone https://github.com/elebumm/RedditVideoMakerBot.git
+cd RedditVideoMakerBot
+python -m venv ./venv
+source ./venv/bin/activate  # Linux/macOS
+# .\venv\Scripts\activate   # Windows
+
+# 2. Install dependencies
+pip install -r requirements.txt
+python -m playwright install
+python -m playwright install-deps
+
+# 3. Chạy bot (CLI)
+python main.py
+
+# 4. Hoặc chạy GUI
+python GUI.py
+```
+
+### Docker:
+```bash
+docker build -t reddit-video-bot .
+docker run reddit-video-bot
+```
+
+---
+
+## 10. Sơ Đồ Module Dependencies
+
+```mermaid
+graph LR
+    main["main.py"] --> reddit["reddit/subreddit.py"]
+    main --> voices["video_creation/voices.py"]
+    main --> screenshots["video_creation/screenshot_downloader.py"]
+    main --> background["video_creation/background.py"]
+    main --> final["video_creation/final_video.py"]
+
+    reddit --> praw["praw"]
+    reddit --> ai["utils/ai_methods.py"]
+    reddit --> sub_utils["utils/subreddit.py"]
+
+    voices --> engine["TTS/engine_wrapper.py"]
+    engine --> tiktok["TTS/TikTok.py"]
+    engine --> gtts["TTS/GTTS.py"]
+    engine --> aws["TTS/aws_polly.py"]
+    engine --> eleven["TTS/elevenlabs.py"]
+    engine --> openai["TTS/openai_tts.py"]
+    engine --> pyttsx["TTS/pyttsx.py"]
+    engine --> streamlabs["TTS/streamlabs_polly.py"]
+
+    screenshots --> playwright["playwright"]
+    screenshots --> imagenarator["utils/imagenarator.py"]
+
+    background --> ytdlp["yt-dlp"]
+    background --> moviepy["moviepy"]
+
+    final --> ffmpeg["ffmpeg-python"]
+    final --> pil["PIL/Pillow"]
+
+    ai --> torch["torch + transformers"]
+
+    subgraph "Shared Utils"
+        settings["utils/settings.py"]
+        console["utils/console.py"]
+        voice_util["utils/voice.py"]
+        video_util["utils/videos.py"]
+        cleanup["utils/cleanup.py"]
+    end
+```
+
+---
+
+> 📝 **Document generated**: 2026-04-20 | Dựa trên phân tích toàn bộ source code của project.
--- a/TTS/GTTS.py
+++ b/TTS/GTTS.py
@ -13,7 +13,7 @@ class GTTS:
    def run(self, text, filepath, random_voice: bool = False):
        tts = gTTS(
            text=text,
-            lang=settings.config["reddit"]["thread"]["post_lang"] or "en",
+            lang=settings.config["reddit"]["thread"]["post_lang"] or "vi",
            slow=False,
        )
        tts.save(filepath)
--- a/build.sh
+++ b/build.sh
--- a/main.py
+++ b/main.py
--- a/manual/init.py
+++ b/manual/init.py
@ -0,0 +1,15 @@
+"""
+Manual Screenshot → Video Pipeline
+
+This module provides an alternative workflow that creates videos
+from manually captured screenshots and text files, without requiring
+any social media API access.
+
+Supported platforms: Reddit, Threads (Meta), X (Twitter), or any other.
+
+Usage:
+    python manual_main.py init <post_id>        # Create folder structure
+    python manual_main.py render <post_id>      # Render one post
+    python manual_main.py render --all          # Render all unrendered posts
+    python manual_main.py list                  # List all posts with status
+"""
--- a/manual/scanner.py
+++ b/manual/scanner.py
@ -0,0 +1,318 @@
+"""
+Scanner module for the manual pipeline.
+
+Scans manual_posts/ directories for screenshots (.png), audio files (.mp3),
+and optional text files (.txt). Builds a unified post_object for processing.
+
+Folder convention:
+    manual_posts/
+    └── my_post_001/
+        ├── meta.json           (optional - metadata)
+        ├── 0_title.png         (required - screenshot of post title)
+        ├── 0_title.mp3         (preferred - pre-recorded audio)
+        ├── 0_title.txt         (fallback - text for TTS if no .mp3)
+        ├── 1_comment.png       (optional - comment screenshots)
+        ├── 1_comment.mp3       (preferred - pre-recorded audio)
+        ├── 1_comment.txt       (fallback - text for TTS if no .mp3)
+        └── ...
+
+Priority: .mp3 > .txt (if both exist, .mp3 is used and TTS is skipped).
+"""
+
+import json
+import re
+from pathlib import Path
+from typing import Dict, List, Optional, Tuple
+
+from utils.console import print_step, print_substep
+
+
+class PostScanner:
+    """Scans manual_posts/ directory, validates structure, builds post_object."""
+
+    # Regex pattern: <number>_<type>.<ext> where ext is png/jpg/jpeg/mp3/txt
+    FILE_PATTERN = re.compile(r"^(\d+)_(title|comment)\.(png|jpg|jpeg|mp3|txt)$", re.IGNORECASE)
+
+    def __init__(self, input_dir: str = "manual_posts"):
+        self.input_dir = Path(input_dir)
+
+    def scan_all(self) -> List[dict]:
+        """Scan all post folders in the input directory.
+
+        Returns:
+            List of post_object dicts, sorted by folder name
+        """
+        if not self.input_dir.exists():
+            print_substep(f"Input directory '{self.input_dir}' does not exist.", style="red")
+            return []
+
+        posts = []
+        for post_dir in sorted(self.input_dir.iterdir()):
+            if post_dir.is_dir() and not post_dir.name.startswith("."):
+                post_obj = self.scan_one(post_dir.name)
+                if post_obj is not None:
+                    posts.append(post_obj)
+
+        return posts
+
+    def scan_one(self, post_id: str) -> Optional[dict]:
+        """Scan a single post folder and build post_object.
+
+        Args:
+            post_id: Name of the folder inside manual_posts/
+
+        Returns:
+            post_object dict or None if invalid
+        """
+        post_dir = self.input_dir / post_id
+
+        if not post_dir.exists():
+            print_substep(f"Post directory '{post_dir}' does not exist.", style="red")
+            return None
+
+        is_valid, errors = self.validate(post_dir)
+        if not is_valid:
+            print_substep(f"Validation failed for '{post_id}':", style="red")
+            for err in errors:
+                print_substep(f"  ✗ {err}", style="red")
+            return None
+
+        return self._build_post_object(post_dir)
+
+    def validate(self, post_dir: Path) -> Tuple[bool, List[str]]:
+        """Validate a post folder structure.
+
+        Checks:
+        - At least 1 image file exists
+        - Title image (0_title.png) exists
+        - Each image has a corresponding .mp3 or .txt file
+        - Files follow naming convention
+
+        Returns:
+            (is_valid, list_of_errors)
+        """
+        errors = []
+
+        # Gather all matching files
+        images, audios, texts = self._categorize_files(post_dir)
+
+        # Check: at least 1 image
+        if not images:
+            errors.append("No image files found. Need at least 0_title.png")
+            return False, errors
+
+        # Check: title image exists (index 0)
+        if 0 not in images:
+            errors.append("Missing title image: 0_title.png (must start with '0_')")
+
+        # Check: each image has a corresponding .mp3 or .txt file
+        for idx in sorted(images.keys()):
+            if idx not in audios and idx not in texts:
+                errors.append(
+                    f"Missing audio/text for image #{idx}: "
+                    f"provide '{idx}_title.mp3' (or .txt as fallback)"
+                )
+
+        # Check: text files (used as TTS fallback) are not empty
+        for idx, txt_path in texts.items():
+            if idx not in audios:  # Only check .txt if no .mp3 exists
+                content = txt_path.read_text(encoding="utf-8").strip()
+                if not content:
+                    errors.append(f"Text file is empty (and no .mp3 provided): {txt_path.name}")
+
+        return len(errors) == 0, errors
+
+    def list_status(self) -> List[dict]:
+        """List all posts with their status.
+
+        Returns:
+            List of dicts with keys: post_id, num_images, num_audios, num_texts, status
+        """
+        if not self.input_dir.exists():
+            return []
+
+        results = []
+        for post_dir in sorted(self.input_dir.iterdir()):
+            if not post_dir.is_dir() or post_dir.name.startswith("."):
+                continue
+
+            images, audios, texts = self._categorize_files(post_dir)
+            is_valid, errors = self.validate(post_dir)
+
+            # Determine status
+            if not images:
+                status = "empty"
+            elif not is_valid:
+                status = "incomplete"
+            else:
+                status = "ready"
+
+            results.append(
+                {
+                    "post_id": post_dir.name,
+                    "num_images": len(images),
+                    "num_audios": len(audios),
+                    "num_texts": len(texts),
+                    "status": status,
+                    "errors": errors,
+                }
+            )
+
+        return results
+
+    def _categorize_files(self, post_dir: Path) -> Tuple[Dict[int, Path], Dict[int, Path], Dict[int, Path]]:
+        """Categorize files in a post directory into images, audios, and texts.
+
+        Returns:
+            (images_dict, audios_dict, texts_dict) where key is the index number
+        """
+        images = {}  # {0: Path("0_title.png"), ...}
+        audios = {}  # {0: Path("0_title.mp3"), ...}
+        texts = {}   # {0: Path("0_title.txt"), ...}
+
+        for f in post_dir.iterdir():
+            match = self.FILE_PATTERN.match(f.name)
+            if match:
+                idx = int(match.group(1))
+                ext = match.group(3).lower()
+                if ext in ("png", "jpg", "jpeg"):
+                    images[idx] = f
+                elif ext == "mp3":
+                    audios[idx] = f
+                elif ext == "txt":
+                    texts[idx] = f
+
+        return images, audios, texts
+
+    def _build_post_object(self, post_dir: Path) -> dict:
+        """Build the unified post_object from a validated post directory.
+
+        Returns:
+            dict with structure:
+            {
+                "post_id": str,
+                "platform": str,
+                "title": str,
+                "author": str,
+                "url": str,
+                "post_dir": str,
+                "screenshots": [
+                    {
+                        "index": int,
+                        "type": "title" | "comment",
+                        "image_path": str,
+                        "text": str,
+                        "audio_path": None,
+                        "audio_duration": None,
+                    },
+                    ...
+                ],
+                "total_duration": 0,
+                "output_path": None,
+            }
+        """
+        post_id = post_dir.name
+
+        # Read optional meta.json
+        meta = self._read_meta(post_dir)
+
+        # Categorize files
+        images, audios, texts = self._categorize_files(post_dir)
+
+        # Build screenshots list (sorted by index)
+        screenshots = []
+        for idx in sorted(images.keys()):
+            img_path = images[idx]
+            # Determine type from filename
+            match = self.FILE_PATTERN.match(img_path.name)
+            entry_type = match.group(2).lower() if match else "comment"
+
+            # Audio: prefer .mp3, fallback to .txt for TTS
+            audio_path = str(audios[idx]) if idx in audios else None
+            text_content = ""
+            if idx in texts:
+                text_content = texts[idx].read_text(encoding="utf-8").strip()
+
+            screenshots.append(
+                {
+                    "index": idx,
+                    "type": entry_type,
+                    "image_path": str(img_path),
+                    "text": text_content,
+                    "audio_path": audio_path,    # Pre-filled if .mp3 exists
+                    "audio_duration": None,
+                }
+            )
+
+        # Use title text, meta title, or folder name
+        title = ""
+        if screenshots and screenshots[0]["text"]:
+            title = screenshots[0]["text"][:100]
+        elif meta.get("title"):
+            title = meta["title"]
+        else:
+            title = post_id
+
+        return {
+            "post_id": post_id,
+            "platform": meta.get("platform", "other"),
+            "title": title,
+            "author": meta.get("author", ""),
+            "url": meta.get("url", ""),
+            "post_dir": str(post_dir),
+            "screenshots": screenshots,
+            "total_duration": 0,
+            "output_path": None,
+        }
+
+    def _read_meta(self, post_dir: Path) -> dict:
+        """Read meta.json if it exists, return empty dict otherwise."""
+        meta_path = post_dir / "meta.json"
+        if meta_path.exists():
+            try:
+                with open(meta_path, "r", encoding="utf-8") as f:
+                    return json.load(f)
+            except (json.JSONDecodeError, IOError) as e:
+                print_substep(f"Warning: Could not read meta.json: {e}", style="yellow")
+        return {}
+
+
+def create_post_folder(input_dir: str, post_id: str, platform: str = "reddit") -> Path:
+    """Create a new post folder with template files.
+
+    Args:
+        input_dir: Base directory for manual posts
+        post_id: Name for the new post folder
+        platform: Source platform (reddit, threads, x, other)
+
+    Returns:
+        Path to the created folder
+    """
+    post_dir = Path(input_dir) / post_id
+    post_dir.mkdir(parents=True, exist_ok=True)
+
+    # Create meta.json template
+    meta = {
+        "platform": platform,
+        "post_id": post_id,
+        "title": "",
+        "author": "",
+        "url": "",
+        "created_at": "",
+        "tags": [],
+        "notes": "",
+    }
+    meta_path = post_dir / "meta.json"
+    if not meta_path.exists():
+        with open(meta_path, "w", encoding="utf-8") as f:
+            json.dump(meta, f, indent=4, ensure_ascii=False)
+
+    print_step(f"Created post folder: {post_dir}")
+    print_substep("Next steps:", style="bold cyan")
+    print_substep("  1. Add screenshots: 0_title.png, 1_comment.png, ...")
+    print_substep("  2. Add audio files: 0_title.mp3, 1_comment.mp3, ...")
+    print_substep("     (Or use .txt files instead — TTS will generate audio)")
+    print_substep("  3. (Optional) Edit meta.json with post details")
+    print_substep(f"  4. Run: python manual_main.py render {post_id}")
+
+    return post_dir
--- a/manual/tts_processor.py
+++ b/manual/tts_processor.py
@ -0,0 +1,277 @@
+"""
+TTS Processor for the manual pipeline.
+
+Takes a post_object (built by scanner.py), generates MP3 audio files
+for each screenshot's text using the existing TTS engines, and updates
+the post_object with audio paths and durations.
+
+Reuses TTS engines from TTS/ module — no code duplication.
+"""
+
+import re
+from pathlib import Path
+from typing import Tuple
+
+from moviepy import AudioFileClip
+
+from utils import settings
+from utils.console import print_step, print_substep
+from utils.voice import sanitize_text
+
+
+class ManualTTSProcessor:
+    """Processes text-to-speech for manual pipeline posts."""
+
+    def __init__(self, post_object: dict, max_length: int = 120):
+        """
+        Args:
+            post_object: Post data from scanner.py
+            max_length: Maximum total audio length in seconds (default: 120s = 2 min)
+        """
+        self.post = post_object
+        self.post_id = post_object["post_id"]
+        self.max_length = max_length
+        self.mp3_dir = Path(f"assets/temp/{self.post_id}/mp3")
+        self.tts_module = None
+
+    def process(self) -> dict:
+        """Process audio for all screenshots.
+
+        For each screenshot:
+        - If .mp3 already provided (audio_path set by scanner) → skip TTS, just measure duration
+        - If only .txt provided → run TTS to generate .mp3
+        - If neither → skip
+
+        Returns:
+            Updated post_object with audio_path and audio_duration filled in
+        """
+        self.mp3_dir.mkdir(parents=True, exist_ok=True)
+        print_step("🔊 Processing audio files...")
+
+        total_duration = 0
+        processed_count = 0
+        tts_needed = False
+
+        for screenshot in self.post["screenshots"]:
+            idx = screenshot["index"]
+
+            # Case 1: .mp3 already provided — just measure duration
+            if screenshot.get("audio_path"):
+                try:
+                    clip = AudioFileClip(screenshot["audio_path"])
+                    duration = clip.duration
+                    clip.close()
+                except Exception as e:
+                    print_substep(f"  ✗ Failed to read audio #{idx}: {e}", style="red")
+                    duration = 0
+
+                screenshot["audio_duration"] = duration
+                total_duration += duration
+                processed_count += 1
+                print_substep(
+                    f"  ✓ #{idx} → {duration:.1f}s (pre-recorded .mp3)",
+                    style="green",
+                )
+                continue
+
+            # Case 2: Only .txt provided — need TTS
+            text = screenshot.get("text", "").strip()
+            if not text:
+                print_substep(
+                    f"  ⚠ Screenshot #{idx} has no audio or text, skipping.",
+                    style="yellow",
+                )
+                continue
+
+            # Initialize TTS engine only when needed (lazy)
+            if not tts_needed:
+                print_substep("  📝 Some entries need TTS generation...")
+                self.tts_module = self._get_tts_engine()
+                tts_needed = True
+
+            mp3_path = str(self.mp3_dir / f"{idx}.mp3")
+
+            # Sanitize and process text
+            clean_text = self._process_text(text)
+            if not clean_text or clean_text.isspace():
+                print_substep(
+                    f"  ⚠ Screenshot #{idx} text is empty after sanitization, skipping.",
+                    style="yellow",
+                )
+                continue
+
+            # Handle long text by splitting
+            if len(clean_text) > self.tts_module.max_chars:
+                self._generate_split_audio(clean_text, idx, mp3_path)
+            else:
+                self._generate_audio(clean_text, mp3_path)
+
+            # Measure duration
+            try:
+                clip = AudioFileClip(mp3_path)
+                duration = clip.duration
+                clip.close()
+            except Exception as e:
+                print_substep(f"  ✗ Failed to read audio #{idx}: {e}", style="red")
+                duration = 0
+
+            # Update screenshot entry
+            screenshot["audio_path"] = mp3_path
+            screenshot["audio_duration"] = duration
+            total_duration += duration
+            processed_count += 1
+
+            print_substep(
+                f"  ✓ #{idx} → {duration:.1f}s (TTS generated, {len(clean_text)} chars)",
+                style="green",
+            )
+
+            # Check max length
+            if total_duration > self.max_length and processed_count > 1:
+                print_substep(
+                    f"  ⚠ Total duration ({total_duration:.1f}s) exceeds max ({self.max_length}s). "
+                    f"Stopping at {processed_count} clips.",
+                    style="yellow",
+                )
+                break
+
+        self.post["total_duration"] = total_duration
+        print_substep(
+            f"✅ {processed_count} audio clips ready, total: {total_duration:.1f}s",
+            style="bold green",
+        )
+
+        return self.post
+
+    def _get_tts_engine(self):
+        """Initialize the TTS engine based on config.
+
+        Reuses the TTS engines from video_creation/voices.py
+        """
+        from TTS.GTTS import GTTS
+        from TTS.TikTok import TikTok
+        from TTS.aws_polly import AWSPolly
+        from TTS.elevenlabs import elevenlabs
+        from TTS.openai_tts import OpenAITTS
+        from TTS.pyttsx import pyttsx
+        from TTS.streamlabs_polly import StreamlabsPolly
+
+        providers = {
+            "googletranslate": GTTS,
+            "awspolly": AWSPolly,
+            "streamlabspolly": StreamlabsPolly,
+            "tiktok": TikTok,
+            "pyttsx": pyttsx,
+            "elevenlabs": elevenlabs,
+            "openai": OpenAITTS,
+        }
+
+        voice_choice = settings.config["settings"]["tts"]["voice_choice"]
+        engine_class = providers.get(str(voice_choice).lower())
+
+        if engine_class is None:
+            print_substep(
+                f"Unknown TTS provider: {voice_choice}. Falling back to GoogleTranslate.",
+                style="yellow",
+            )
+            engine_class = GTTS
+
+        print_substep(f"Using TTS engine: {engine_class.__name__}")
+        return engine_class()
+
+    def _generate_audio(self, text: str, filepath: str):
+        """Generate a single audio file from text."""
+        try:
+            random_voice = settings.config["settings"]["tts"].get("random_voice", False)
+
+            if str(settings.config["settings"]["tts"]["voice_choice"]).lower() == "googletranslate":
+                # GTTS doesn't support random_voice parameter
+                self.tts_module.run(text, filepath=filepath)
+            else:
+                self.tts_module.run(text, filepath=filepath, random_voice=random_voice)
+        except Exception as e:
+            print_substep(f"  ✗ TTS generation failed: {e}", style="red")
+            raise
+
+    def _generate_split_audio(self, text: str, idx: int, final_path: str):
+        """Split long text and concat into one audio file.
+
+        For texts longer than the TTS engine's max_chars limit.
+        """
+        import os
+
+        # Split text into chunks at sentence boundaries
+        max_chars = self.tts_module.max_chars
+        chunks = [
+            x.group().strip()
+            for x in re.finditer(
+                r" *(((.|\\n){0," + str(max_chars) + r"})(\.|.$))", text
+            )
+        ]
+
+        if not chunks:
+            chunks = [text[:max_chars]]
+
+        part_files = []
+        for part_idx, chunk in enumerate(chunks):
+            if not chunk or chunk.isspace():
+                continue
+            part_path = str(self.mp3_dir / f"{idx}-{part_idx}.part.mp3")
+            self._generate_audio(chunk, part_path)
+            part_files.append(part_path)
+
+        if not part_files:
+            return
+
+        # Concat using ffmpeg
+        list_path = str(self.mp3_dir / f"{idx}_list.txt")
+        with open(list_path, "w") as f:
+            for part in part_files:
+                f.write(f"file '{Path(part).name}'\n")
+
+        os.system(
+            f"ffmpeg -f concat -y -hide_banner -loglevel panic -safe 0 "
+            f"-i {list_path} -c copy {final_path}"
+        )
+
+        # Cleanup part files
+        for part in part_files:
+            try:
+                os.unlink(part)
+            except OSError:
+                pass
+        try:
+            os.unlink(list_path)
+        except OSError:
+            pass
+
+    def _process_text(self, text: str) -> str:
+        """Clean and sanitize text for TTS.
+
+        - Removes lines starting with # (comments in txt files)
+        - Sanitizes using existing sanitize_text()
+        """
+        # Remove comment lines (lines starting with #)
+        lines = text.split("\n")
+        lines = [line for line in lines if not line.strip().startswith("#")]
+        text = " ".join(lines).strip()
+
+        # Remove URLs
+        regex_urls = r"((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:@\-_=#]+\.([a-zA-Z]){2,6}([a-zA-Z0-9\.\&\/\?\:@\-_=#])*"
+        text = re.sub(regex_urls, " ", text)
+
+        # Replace newlines with periods for natural speech
+        text = text.replace("\n", ". ")
+
+        # Add period at end if missing
+        if text and text[-1] not in ".!?":
+            text += "."
+
+        # Clean repeated dots
+        text = re.sub(r"\.{2,}", ".", text)
+        text = re.sub(r"\.\s*\.", ".", text)
+
+        # Use existing sanitize_text for final cleanup
+        text = sanitize_text(text)
+
+        return text
--- a/manual/video_builder.py
+++ b/manual/video_builder.py
@ -0,0 +1,479 @@
+"""
+Video Builder for the manual pipeline.
+
+Takes a post_object (with TTS audio already generated), downloads/chops
+background video and audio, overlays screenshots onto the background
+with correct timing, and renders the final video.
+
+Reuses background download functions from video_creation/background.py.
+Uses libx264 encoder (CPU-based) by default.
+"""
+
+import math
+import multiprocessing
+import os
+import re
+import tempfile
+import threading
+import time
+from pathlib import Path
+from typing import Dict, Tuple
+
+import ffmpeg
+from moviepy import AudioFileClip, VideoFileClip
+from rich.console import Console
+
+from utils import settings
+from utils.console import print_step, print_substep
+
+console = Console()
+
+
+class ProgressFfmpeg(threading.Thread):
+    """Thread to monitor FFmpeg progress during rendering."""
+
+    def __init__(self, vid_duration_seconds, progress_update_callback):
+        threading.Thread.__init__(self, name="ProgressFfmpeg")
+        self.stop_event = threading.Event()
+        self.output_file = tempfile.NamedTemporaryFile(mode="w+", delete=False)
+        self.vid_duration_seconds = vid_duration_seconds
+        self.progress_update_callback = progress_update_callback
+
+    def run(self):
+        while not self.stop_event.is_set():
+            latest_progress = self.get_latest_ms_progress()
+            if latest_progress is not None:
+                completed_percent = latest_progress / self.vid_duration_seconds
+                self.progress_update_callback(completed_percent)
+            time.sleep(1)
+
+    def get_latest_ms_progress(self):
+        lines = self.output_file.readlines()
+        if lines:
+            for line in lines:
+                if "out_time_ms" in line:
+                    out_time_ms_str = line.split("=")[1].strip()
+                    if out_time_ms_str.isnumeric():
+                        return float(out_time_ms_str) / 1000000.0
+        return None
+
+    def stop(self):
+        self.stop_event.set()
+
+    def __enter__(self):
+        self.start()
+        return self
+
+    def __exit__(self, *args, **kwargs):
+        self.stop()
+
+
+class ManualVideoBuilder:
+    """Builds the final video from screenshots + TTS audio + background."""
+
+    def __init__(self, post_object: dict, manual_config: dict):
+        """
+        Args:
+            post_object: Post data with audio already generated (from tts_processor)
+            manual_config: Manual-specific config dict
+        """
+        self.post = post_object
+        self.post_id = post_object["post_id"]
+        self.config = manual_config
+        self.temp_dir = Path(f"assets/temp/{self.post_id}")
+
+        # Video settings
+        self.W = int(self.config.get("resolution_w", settings.config["settings"].get("resolution_w", 1080)))
+        self.H = int(self.config.get("resolution_h", settings.config["settings"].get("resolution_h", 1920)))
+        self.opacity = float(self.config.get("opacity", settings.config["settings"].get("opacity", 0.9)))
+        self.encoder = self.config.get("encoder", "libx264")
+
+        # Background settings
+        self.bg_video_name = self.config.get(
+            "background_video",
+            settings.config["settings"]["background"].get("background_video", "random"),
+        )
+        self.bg_audio_name = self.config.get(
+            "background_audio",
+            settings.config["settings"]["background"].get("background_audio", "random"),
+        )
+        self.bg_audio_volume = float(
+            self.config.get(
+                "background_audio_volume",
+                settings.config["settings"]["background"].get("background_audio_volume", 0.15),
+            )
+        )
+
+        # Local background directories (user drops files here)
+        self.bg_video_dir = Path(self.config.get("background_video_dir", "assets/backgrounds/video"))
+        self.bg_audio_dir = Path(self.config.get("background_audio_dir", "assets/backgrounds/audio"))
+
+        # Output settings
+        self.output_dir = Path(self.config.get("output_dir", "manual_results"))
+
+    def build(self) -> str:
+        """Build the final video.
+
+        Pipeline:
+        1. Filter screenshots that have audio
+        2. Download background video & audio (cached)
+        3. Chop background to match video length
+        4. Prepare background (crop to aspect ratio)
+        5. Concat all audio clips → final audio track
+        6. Mix with background audio
+        7. Overlay screenshots onto background with timing
+        8. Render final video
+
+        Returns:
+            Path to the output video file
+        """
+        # Filter screenshots with audio
+        clips = [s for s in self.post["screenshots"] if s.get("audio_path") and s.get("audio_duration")]
+        if not clips:
+            print_substep("No audio clips found. Cannot create video.", style="red")
+            return ""
+
+        total_duration = sum(s["audio_duration"] for s in clips)
+        video_length = math.ceil(total_duration)
+
+        console.log(f"[bold green] Video will be: {video_length} seconds long ({len(clips)} clips)")
+
+        # Ensure temp directory exists
+        self.temp_dir.mkdir(parents=True, exist_ok=True)
+
+        # Step 1: Download backgrounds
+        print_step("📥 Downloading backgrounds (if needed)...")
+        bg_config = self._get_background_config()
+        self._download_backgrounds(bg_config)
+
+        # Step 2: Chop backgrounds to video length
+        print_step("✂️ Chopping backgrounds to video length...")
+        self._chop_backgrounds(bg_config, video_length)
+
+        # Step 3: Prepare background (crop to aspect ratio)
+        print_step("🎬 Preparing background...")
+        bg_path = self._prepare_background()
+        background_clip = ffmpeg.input(bg_path)
+
+        # Step 4: Concat audio clips
+        print_step("🔊 Building audio track...")
+        audio_inputs = [ffmpeg.input(s["audio_path"]) for s in clips]
+        audio_concat = ffmpeg.concat(*audio_inputs, a=1, v=0)
+        audio_path = str(self.temp_dir / "audio.mp3")
+        ffmpeg.output(
+            audio_concat, audio_path, **{"b:a": "192k"}
+        ).overwrite_output().run(quiet=True)
+
+        # Step 5: Merge with background audio
+        audio = ffmpeg.input(audio_path)
+        final_audio = self._merge_background_audio(audio)
+
+        # Step 6: Overlay screenshots
+        print_step("🖼️ Overlaying screenshots...")
+        screenshot_width = int((self.W * 45) // 100)
+        current_time = 0
+
+        for s in clips:
+            img_input = ffmpeg.input(s["image_path"])["v"].filter("scale", screenshot_width, -1)
+            img_overlay = img_input.filter("colorchannelmixer", aa=self.opacity)
+
+            background_clip = background_clip.overlay(
+                img_overlay,
+                enable=f"between(t,{current_time},{current_time + s['audio_duration']})",
+                x="(main_w-overlay_w)/2",
+                y="(main_h-overlay_h)/2",
+            )
+            current_time += s["audio_duration"]
+
+        # Scale to final resolution
+        background_clip = background_clip.filter("scale", self.W, self.H)
+
+        # Step 7: Render
+        print_step("🎥 Rendering the video...")
+        self.output_dir.mkdir(parents=True, exist_ok=True)
+
+        # Normalize filename
+        filename = self._normalize_filename(self.post.get("title", self.post_id))
+        output_path = str(self.output_dir / f"{filename}.mp4")
+        # Prevent path too long
+        if len(output_path) > 251:
+            output_path = output_path[:247] + ".mp4"
+
+        from tqdm import tqdm
+
+        pbar = tqdm(total=100, desc="Progress: ", bar_format="{l_bar}{bar}", unit=" %")
+
+        def on_update(progress):
+            status = round(progress * 100, 2)
+            old_percentage = pbar.n
+            pbar.update(status - old_percentage)
+
+        with ProgressFfmpeg(video_length, on_update) as progress:
+            try:
+                ffmpeg.output(
+                    background_clip,
+                    final_audio,
+                    output_path,
+                    f="mp4",
+                    **{
+                        "c:v": self.encoder,
+                        "b:v": "20M",
+                        "b:a": "192k",
+                        "threads": multiprocessing.cpu_count(),
+                    },
+                ).overwrite_output().global_args(
+                    "-progress", progress.output_file.name
+                ).run(
+                    quiet=True,
+                    overwrite_output=True,
+                    capture_stdout=False,
+                    capture_stderr=False,
+                )
+            except ffmpeg.Error as e:
+                print_substep(f"FFmpeg error: {e.stderr.decode('utf8') if e.stderr else str(e)}", style="red")
+                pbar.close()
+                return ""
+
+        old_percentage = pbar.n
+        pbar.update(100 - old_percentage)
+        pbar.close()
+
+        # Save to tracking (shared videos.json)
+        self._save_tracking(bg_config, output_path)
+
+        # Cleanup temp files
+        print_step("🗑️ Removing temporary files...")
+        self._cleanup()
+
+        self.post["output_path"] = output_path
+        print_step(f"✅ Done! Video saved to: {output_path}")
+
+        return output_path
+
+    def _scan_local_files(self, directory: Path, extensions: tuple) -> list:
+        """Scan a directory for files matching given extensions.
+
+        Returns:
+            List of Path objects, sorted by name
+        """
+        if not directory.exists():
+            return []
+        files = []
+        for f in directory.iterdir():
+            if f.is_file() and f.suffix.lower() in extensions:
+                files.append(f)
+        return sorted(files)
+
+    def _get_background_config(self) -> dict:
+        """Get background video & audio — local random or YouTube fallback.
+
+        Priority:
+        1. Scan local directories for video/audio files
+        2. If config is 'random' or local files exist → pick random from local
+        3. If config is a specific name AND no local files → use YouTube download
+
+        Returns:
+            dict with 'video_path', 'audio_path', 'video_credit', 'audio_credit'
+        """
+        import random
+
+        result = {
+            "video_path": None,
+            "audio_path": None,
+            "video_credit": "unknown",
+            "audio_credit": "unknown",
+            "_youtube_video": None,  # YouTube config tuple (for download if needed)
+            "_youtube_audio": None,
+        }
+
+        # --- Video background ---
+        video_exts = (".mp4", ".mkv", ".webm", ".avi", ".mov")
+        local_videos = self._scan_local_files(self.bg_video_dir, video_exts)
+
+        if local_videos:
+            # Pick random from local files
+            chosen = random.choice(local_videos)
+            result["video_path"] = str(chosen)
+            result["video_credit"] = chosen.stem
+            print_substep(f"🎬 Background video: {chosen.name} (random from {len(local_videos)} files)")
+        else:
+            # Fallback: YouTube download via background_options
+            try:
+                from video_creation.background import background_options
+                video_name = self.bg_video_name
+                if video_name == "random" or video_name not in background_options["video"]:
+                    video_name = random.choice(list(background_options["video"].keys()))
+                result["_youtube_video"] = background_options["video"][video_name]
+                print_substep(f"🎬 Background video: {video_name} (YouTube)")
+            except Exception as e:
+                print_substep(f"⚠ Could not load YouTube backgrounds: {e}", style="yellow")
+
+        # --- Audio background ---
+        if self.bg_audio_volume > 0:
+            audio_exts = (".mp3", ".wav", ".ogg", ".m4a", ".flac", ".aac")
+            local_audios = self._scan_local_files(self.bg_audio_dir, audio_exts)
+
+            if local_audios:
+                chosen = random.choice(local_audios)
+                result["audio_path"] = str(chosen)
+                result["audio_credit"] = chosen.stem
+                print_substep(f"🎵 Background audio: {chosen.name} (random from {len(local_audios)} files)")
+            else:
+                try:
+                    from video_creation.background import background_options
+                    audio_name = self.bg_audio_name
+                    if audio_name == "random" or audio_name not in background_options["audio"]:
+                        audio_name = random.choice(list(background_options["audio"].keys()))
+                    result["_youtube_audio"] = background_options["audio"][audio_name]
+                    print_substep(f"🎵 Background audio: {audio_name} (YouTube)")
+                except Exception as e:
+                    print_substep(f"⚠ Could not load YouTube audio backgrounds: {e}", style="yellow")
+
+        return result
+
+    def _download_backgrounds(self, bg_config: dict):
+        """Download YouTube backgrounds only if no local files were found."""
+        if bg_config.get("_youtube_video"):
+            from video_creation.background import download_background_video
+            download_background_video(bg_config["_youtube_video"])
+            # Set video_path to the downloaded file
+            yt_cfg = bg_config["_youtube_video"]
+            bg_config["video_path"] = f"assets/backgrounds/video/{yt_cfg[2]}-{yt_cfg[1]}"
+            bg_config["video_credit"] = yt_cfg[2]
+
+        if bg_config.get("_youtube_audio"):
+            from video_creation.background import download_background_audio
+            download_background_audio(bg_config["_youtube_audio"])
+            yt_cfg = bg_config["_youtube_audio"]
+            bg_config["audio_path"] = f"assets/backgrounds/audio/{yt_cfg[2]}-{yt_cfg[1]}"
+            bg_config["audio_credit"] = yt_cfg[2]
+
+    def _chop_backgrounds(self, bg_config: dict, video_length: int):
+        """Chop background video and audio to match the video length."""
+        from video_creation.background import get_start_and_end_times
+
+        # Chop background audio
+        if self.bg_audio_volume > 0 and bg_config.get("audio_path"):
+            audio_file = bg_config["audio_path"]
+            if Path(audio_file).exists():
+                background_audio = AudioFileClip(audio_file)
+                start_a, end_a = get_start_and_end_times(video_length, background_audio.duration)
+                chopped = background_audio.subclipped(start_a, end_a)
+                chopped.write_audiofile(str(self.temp_dir / "background.mp3"))
+                background_audio.close()
+                chopped.close()
+
+        # Chop background video
+        video_file = bg_config.get("video_path")
+        if video_file and Path(video_file).exists():
+            with VideoFileClip(video_file) as video:
+                start_v, end_v = get_start_and_end_times(video_length, video.duration)
+                chopped = video.subclipped(start_v, end_v)
+                chopped.write_videofile(str(self.temp_dir / "background.mp4"))
+        else:
+            print_substep("⚠ No background video file found!", style="red")
+            raise FileNotFoundError(f"Background video not found: {video_file}")
+
+    def _prepare_background(self) -> str:
+        """Crop background video to correct aspect ratio (W:H).
+
+        Returns:
+            Path to the cropped background video
+        """
+        output_path = str(self.temp_dir / "background_noaudio.mp4")
+        try:
+            (
+                ffmpeg.input(str(self.temp_dir / "background.mp4"))
+                .filter("crop", f"ih*({self.W}/{self.H})", "ih")
+                .output(
+                    output_path,
+                    an=None,
+                    **{
+                        "c:v": self.encoder,
+                        "b:v": "20M",
+                        "threads": multiprocessing.cpu_count(),
+                    },
+                )
+                .overwrite_output()
+                .run(quiet=True)
+            )
+        except ffmpeg.Error as e:
+            print_substep(f"Background prepare error: {e}", style="red")
+            raise
+        return output_path
+
+    def _merge_background_audio(self, tts_audio):
+        """Merge TTS audio with background audio.
+
+        Args:
+            tts_audio: FFmpeg audio input of the TTS track
+
+        Returns:
+            Merged audio stream or original if background audio disabled
+        """
+        if self.bg_audio_volume == 0:
+            return tts_audio
+
+        bg_audio_path = self.temp_dir / "background.mp3"
+        if not bg_audio_path.exists():
+            return tts_audio
+
+        bg_audio = ffmpeg.input(str(bg_audio_path)).filter("volume", self.bg_audio_volume)
+        merged = ffmpeg.filter([tts_audio, bg_audio], "amix", duration="longest")
+        return merged
+
+    def _normalize_filename(self, name: str) -> str:
+        """Normalize a string to be safe for filenames."""
+        # Remove problematic characters
+        name = re.sub(r'[?\\"%*:|<>]', "", name)
+        name = re.sub(r"[/]", " ", name)
+        name = name.strip()
+        if not name:
+            name = self.post_id
+        # Limit length
+        return name[:100]
+
+    def _save_tracking(self, bg_config: dict, output_path: str):
+        """Save rendered video info to shared videos.json.
+
+        Handles missing file gracefully (creates it if needed).
+        Does NOT import from utils.videos to avoid praw dependency.
+        """
+        import json
+        import time as t
+
+        videos_path = Path("./video_creation/data/videos.json")
+        videos_path.parent.mkdir(parents=True, exist_ok=True)
+
+        # Load existing data or start fresh
+        done_vids = []
+        if videos_path.exists():
+            try:
+                with open(videos_path, "r", encoding="utf-8") as f:
+                    done_vids = json.load(f)
+            except (json.JSONDecodeError, IOError):
+                done_vids = []
+
+        # Skip if already recorded
+        if self.post_id in [v.get("id") for v in done_vids]:
+            return
+
+        payload = {
+            "subreddit": self.post.get("platform", "manual"),
+            "id": self.post_id,
+            "time": str(int(t.time())),
+            "background_credit": bg_config.get("video_credit", "unknown"),
+            "reddit_title": self.post.get("title", ""),
+            "filename": Path(output_path).name,
+        }
+        done_vids.append(payload)
+
+        with open(videos_path, "w", encoding="utf-8") as f:
+            json.dump(done_vids, f, ensure_ascii=False, indent=4)
+
+    def _cleanup(self):
+        """Remove temporary files for this post."""
+        temp_path = f"assets/temp/{self.post_id}/"
+        if Path(temp_path).exists():
+            import shutil
+            shutil.rmtree(temp_path)
--- a/manual_main.py
+++ b/manual_main.py
@ -0,0 +1,438 @@
+#!/usr/bin/env python
+"""
+Manual Screenshot → Video Pipeline — Entry Point
+
+Create videos from manually captured screenshots and text files,
+without requiring any social media API access.
+
+Supports screenshots from: Reddit, Threads (Meta), X (Twitter), or any platform.
+
+Usage:
+    python manual_main.py init <post_id> [--platform reddit|threads|x|other]
+    python manual_main.py render <post_id>
+    python manual_main.py render --all
+    python manual_main.py list
+"""
+
+import argparse
+import json
+import sys
+from os.path import exists
+from pathlib import Path
+
+import toml
+
+from utils import settings
+from utils.console import print_markdown, print_step, print_substep
+from utils.ffmpeg_install import ffmpeg_install
+from manual.scanner import PostScanner
+from manual.tts_processor import ManualTTSProcessor
+from manual.video_builder import ManualVideoBuilder
+
+__VERSION__ = "1.0.0"
+
+
+# ────────────────────────────────────────────────────────────────
+# Configuration
+# ────────────────────────────────────────────────────────────────
+
+# Default config for manual pipeline (used when [manual] section not in config.toml)
+MANUAL_DEFAULTS = {
+    "input_dir": "manual_posts",
+    "output_dir": "manual_results",
+    "encoder": "libx264",
+    "resolution_w": 1080,
+    "resolution_h": 1920,
+    "opacity": 0.9,
+    "background_video": "random",
+    "background_audio": "random",
+    "background_video_dir": "assets/backgrounds/video",
+    "background_audio_dir": "assets/backgrounds/audio",
+    "background_audio_volume": 0.1,
+    "max_video_length": 120,
+}
+
+# Full default settings.config that TTS engines and shared modules expect.
+# This ensures the manual flow works even if config.toml is empty or missing sections.
+_BASE_SETTINGS_DEFAULTS = {
+    "reddit": {
+        "creds": {
+            "client_id": "",
+            "client_secret": "",
+            "username": "",
+            "password": "",
+            "2fa": False,
+        },
+        "thread": {
+            "subreddit": "",
+            "post_id": "",
+            "max_comment_length": 500,
+            "min_comment_length": 1,
+            "post_lang": "vi",
+            "min_comments": 20,
+            "blocked_words": "",
+        },
+    },
+    "ai": {
+        "ai_similarity_enabled": False,
+        "ai_similarity_keywords": "",
+    },
+    "settings": {
+        "allow_nsfw": False,
+        "theme": "dark",
+        "times_to_run": 1,
+        "opacity": 0.9,
+        "storymode": False,
+        "storymodemethod": 1,
+        "storymode_max_length": 1000,
+        "resolution_w": 1080,
+        "resolution_h": 1920,
+        "zoom": 1,
+        "channel_name": "Reddit Tales",
+        "background": {
+            "background_video": "minecraft",
+            "background_audio": "lofi",
+            "background_audio_volume": 0.1,
+            "enable_extra_audio": False,
+            "background_thumbnail": False,
+            "background_thumbnail_font_family": "arial",
+            "background_thumbnail_font_size": 96,
+            "background_thumbnail_font_color": "255,255,255",
+        },
+        "tts": {
+            "voice_choice": "googletranslate",
+            "random_voice": False,
+            "elevenlabs_voice_name": "Bella",
+            "elevenlabs_api_key": "",
+            "aws_polly_voice": "Matthew",
+            "streamlabs_polly_voice": "Matthew",
+            "tiktok_voice": "en_us_001",
+            "tiktok_sessionid": "",
+            "python_voice": "1",
+            "py_voice_num": "2",
+            "silence_duration": 0.3,
+            "no_emojis": False,
+            "openai_api_url": "https://api.openai.com/v1/",
+            "openai_api_key": "",
+            "openai_voice_name": "alloy",
+            "openai_model": "tts-1",
+        },
+    },
+}
+
+
+def _deep_merge(base: dict, override: dict) -> dict:
+    """Deep merge two dicts. Values in 'override' take priority."""
+    result = base.copy()
+    for key, value in override.items():
+        if key in result and isinstance(result[key], dict) and isinstance(value, dict):
+            result[key] = _deep_merge(result[key], value)
+        else:
+            result[key] = value
+    return result
+
+
+def load_config() -> dict:
+    """Load config and set up settings.config for TTS engines and backgrounds.
+
+    Strategy:
+    1. Start with full default config (so TTS engines always have what they need)
+    2. If config.toml exists and has content, deep-merge on top of defaults
+    3. Extract [manual] section for manual-specific settings
+    4. Set settings.config globally so shared modules (TTS, background, etc.) work
+
+    Returns:
+        dict: Manual-specific config merged with defaults
+    """
+    # Start with complete defaults
+    config = _deep_merge({}, _BASE_SETTINGS_DEFAULTS)
+
+    # Try to load config.toml and merge on top
+    config_path = Path("config.toml")
+    if config_path.exists():
+        try:
+            file_config = toml.load(str(config_path))
+            if file_config:  # Not empty
+                config = _deep_merge(config, file_config)
+                print_substep("Loaded config from config.toml", style="dim")
+        except Exception as e:
+            print_substep(f"Warning: Could not parse config.toml: {e}", style="yellow")
+    else:
+        print_substep(
+            "config.toml not found — using built-in defaults. "
+            "TTS will use GoogleTranslate (no API key needed).",
+            style="yellow",
+        )
+
+    # Set global settings.config so TTS engines and shared modules work
+    settings.config = config
+
+    # Build manual-specific config: defaults + [manual] section from config.toml
+    manual_config = {**MANUAL_DEFAULTS}
+    if "manual" in config:
+        manual_config.update(config["manual"])
+
+    return manual_config
+
+
+# ────────────────────────────────────────────────────────────────
+# Commands
+# ────────────────────────────────────────────────────────────────
+
+
+def cmd_init(args, manual_config):
+    """Create a new post folder with template files."""
+    from manual.scanner import create_post_folder
+
+    post_id = args.post_id
+    platform = getattr(args, "platform", "reddit")
+
+    input_dir = manual_config["input_dir"]
+    post_dir = create_post_folder(input_dir, post_id, platform)
+
+    print_markdown(f"### Post folder created: `{post_dir}`")
+
+
+def cmd_render(args, manual_config):
+    """Render one or all posts into videos."""
+
+    scanner = PostScanner(input_dir=manual_config["input_dir"])
+
+    if args.all:
+        # Render all ready posts
+        posts = scanner.scan_all()
+        if not posts:
+            print_substep("No valid posts found in the input directory.", style="red")
+            return
+
+        # Filter out already rendered
+        posts_to_render = []
+        for post in posts:
+            if _is_already_done(post["post_id"]):
+                print_substep(f"  ⏭ {post['post_id']} — already rendered, skipping", style="blue")
+            else:
+                posts_to_render.append(post)
+
+        if not posts_to_render:
+            print_substep("All posts have already been rendered!", style="green")
+            return
+
+        print_step(f"📋 Rendering {len(posts_to_render)} posts...")
+        for i, post in enumerate(posts_to_render):
+            print_markdown(
+                f"### [{i+1}/{len(posts_to_render)}] Rendering: {post['post_id']}"
+            )
+            _render_single(post, manual_config)
+    else:
+        # Render single post
+        if not args.post_id:
+            print_substep("Please specify a post_id or use --all", style="red")
+            return
+
+        post = scanner.scan_one(args.post_id)
+        if post is None:
+            return  # Error already printed by scanner
+
+        if _is_already_done(post["post_id"]) and not args.force:
+            print_substep(
+                f"Post '{post['post_id']}' already rendered. Use --force to re-render.",
+                style="yellow",
+            )
+            return
+
+        _render_single(post, manual_config)
+
+
+def _render_single(post_object: dict, manual_config: dict):
+    """Render a single post into a video.
+
+    Pipeline:
+    1. TTS: Convert text → MP3 audio files
+    2. Video: Assemble screenshots + audio + background → MP4
+    """
+    post_id = post_object["post_id"]
+    print_step(f"🚀 Starting render for: {post_id}")
+
+    # Step 1: TTS
+    max_length = manual_config.get("max_video_length", 120)
+    tts = ManualTTSProcessor(post_object, max_length=max_length)
+    post_object = tts.process()
+
+    # Check if we have audio
+    clips_with_audio = [s for s in post_object["screenshots"] if s.get("audio_path")]
+    if not clips_with_audio:
+        print_substep("No audio generated. Check text files.", style="red")
+        return
+
+    # Step 2: Video build
+    builder = ManualVideoBuilder(post_object, manual_config)
+    output_path = builder.build()
+
+    if output_path:
+        print_markdown(f"### ✅ Video saved: `{output_path}`")
+    else:
+        print_substep("Video rendering failed.", style="red")
+
+
+def cmd_list(args, manual_config):
+    """List all posts and their status."""
+    from manual.scanner import PostScanner
+
+    scanner = PostScanner(input_dir=manual_config["input_dir"])
+    statuses = scanner.list_status()
+
+    if not statuses:
+        print_substep(
+            f"No posts found in '{manual_config['input_dir']}/'. "
+            f"Run 'python manual_main.py init <post_id>' to create one.",
+            style="yellow",
+        )
+        return
+
+    # Status emoji map
+    status_icons = {
+        "ready": "✅",
+        "incomplete": "⚠️",
+        "empty": "❌",
+    }
+
+    print_step("📋 Manual Posts Status")
+    print()
+
+    for s in statuses:
+        icon = status_icons.get(s["status"], "❓")
+        rendered = "🎬" if _is_already_done(s["post_id"]) else "  "
+        print_substep(
+            f"  {icon} {rendered} {s['post_id']:30s} "
+            f"| {s['num_images']} 🖼️  {s.get('num_audios', 0)} 🎵  {s['num_texts']} 📝 "
+            f"| {s['status']}",
+            style="bold" if s["status"] == "ready" else "",
+        )
+        if s["errors"]:
+            for err in s["errors"]:
+                print_substep(f"      ↳ {err}", style="red")
+
+    print()
+    ready_count = sum(1 for s in statuses if s["status"] == "ready")
+    rendered_count = sum(1 for s in statuses if _is_already_done(s["post_id"]))
+    print_substep(
+        f"  Total: {len(statuses)} posts | "
+        f"{ready_count} ready | "
+        f"{rendered_count} rendered",
+        style="bold cyan",
+    )
+
+
+def _is_already_done(post_id: str) -> bool:
+    """Check if a post has already been rendered (shared videos.json)."""
+    videos_path = "./video_creation/data/videos.json"
+    if not exists(videos_path):
+        return False
+    try:
+        with open(videos_path, "r", encoding="utf-8") as f:
+            done_videos = json.load(f)
+        return any(v.get("id") == post_id for v in done_videos)
+    except (json.JSONDecodeError, IOError):
+        return False
+
+
+# ────────────────────────────────────────────────────────────────
+# CLI
+# ────────────────────────────────────────────────────────────────
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        prog="manual_main.py",
+        description="Manual Screenshot → Video Pipeline. "
+        "Create videos from screenshots captured from Reddit, Threads, X, or any platform.",
+    )
+    parser.add_argument(
+        "--version", action="version", version=f"%(prog)s {__VERSION__}"
+    )
+    subparsers = parser.add_subparsers(dest="command", help="Available commands")
+
+    # init command
+    init_parser = subparsers.add_parser("init", help="Create a new post folder with template files")
+    init_parser.add_argument("post_id", type=str, help="Name/ID for the post folder")
+    init_parser.add_argument(
+        "--platform",
+        type=str,
+        default="reddit",
+        choices=["reddit", "threads", "x", "other"],
+        help="Source platform (default: reddit)",
+    )
+
+    # render command
+    render_parser = subparsers.add_parser("render", help="Render post(s) into video(s)")
+    render_parser.add_argument(
+        "post_id", type=str, nargs="?", default=None, help="Post ID to render"
+    )
+    render_parser.add_argument(
+        "--all", action="store_true", help="Render all unrendered posts"
+    )
+    render_parser.add_argument(
+        "--force", action="store_true", help="Re-render even if already done"
+    )
+
+    # list command
+    subparsers.add_parser("list", help="List all posts and their status")
+
+    return parser
+
+
+def main():
+    print(
+        """
+╔══════════════════════════════════════════════════════════╗
+║       Manual Screenshot → Video Pipeline v1.0.0         ║
+║   Supports: Reddit • Threads • X • Any Platform         ║
+╚══════════════════════════════════════════════════════════╝
+"""
+    )
+
+    parser = build_parser()
+    args = parser.parse_args()
+
+    if not args.command:
+        parser.print_help()
+        sys.exit(1)
+
+    # Check Python version
+    if sys.version_info.major != 3 or sys.version_info.minor not in [10, 11, 12]:
+        print("This program requires Python 3.10, 3.11, or 3.12.")
+        sys.exit(1)
+
+    # Check FFmpeg
+    ffmpeg_install()
+
+    # Load config
+    manual_config = load_config()
+
+    # Create input directory if it doesn't exist
+    input_dir = Path(manual_config["input_dir"])
+    input_dir.mkdir(parents=True, exist_ok=True)
+
+    # Dispatch command
+    commands = {
+        "init": cmd_init,
+        "render": cmd_render,
+        "list": cmd_list,
+    }
+
+    cmd_func = commands.get(args.command)
+    if cmd_func:
+        try:
+            cmd_func(args, manual_config)
+        except KeyboardInterrupt:
+            print("\nInterrupted by user.")
+            sys.exit(0)
+        except Exception as e:
+            print_substep(f"Error: {e}", style="red")
+            raise
+    else:
+        parser.print_help()
+
+
+if __name__ == "__main__":
+    main()
--- a/run.sh
+++ b/run.sh
--- a/utils/background_audios.json
+++ b/utils/background_audios.json
@ -1,18 +1,18 @@
 {
    "__comment": "Supported Backgrounds Audio. Can add/remove background audio here...",
    "lofi": [
-        "https://www.youtube.com/watch?v=LTphVIore3A",
+        "https://www.youtube.com/watch?v=Q7HjxOAU5Kc",
        "lofi.mp3",
-        "Super Lofi World"
+        "Breaking Copyright"
    ],
    "lofi-2":[
-        "https://www.youtube.com/watch?v=BEXL80LS0-I",
+        "https://www.youtube.com/watch?v=cTMOQiY0axo",
        "lofi-2.mp3",
-        "stompsPlaylist"
+        "Breaking Copyright"
    ],
-    "chill-summer":[
-        "https://www.youtube.com/watch?v=EZE8JagnBI8",
-        "chill-summer.mp3",
-        "Mellow Vibes Radio"
+    "lofi-3":[
+        "https://www.youtube.com/watch?v=4sFVeqvJu-0",
+        "lofi-3.mp3",
+        "Chill - Copyright Free Music"
    ]
 }
--- a/utils/background_videos.json
+++ b/utils/background_videos.json
@ -1,17 +1,5 @@
 {
    "__comment": "Supported Backgrounds. Can add/remove background video here...",
-    "motor-gta": [
-        "https://www.youtube.com/watch?v=vw5L4xCPy9Q",
-        "bike-parkour-gta.mp4",
-        "Achy Gaming",
-        "center"
-    ],
-    "rocket-league": [
-        "https://www.youtube.com/watch?v=2X9QGY__0II",
-        "rocket_league.mp4",
-        "Orbital Gameplay",
-        "center"
-    ],
    "minecraft": [
        "https://www.youtube.com/watch?v=n_Dv4JMiwK8",
        "parkour.mp4",
@ -24,40 +12,16 @@
        "Achy Gaming",
        "center"
    ],
-    "csgo-surf": [
-        "https://www.youtube.com/watch?v=E-8JlyO59Io",
-        "csgo-surf.mp4",
-        "Aki",
-        "center"
-    ],
-    "cluster-truck": [
-        "https://www.youtube.com/watch?v=uVKxtdMgJVU",
-        "cluster_truck.mp4",
-        "No Copyright Gameplay",
-        "center"
-    ],
    "minecraft-2": [
        "https://www.youtube.com/watch?v=Pt5_GSKIWQM",
        "minecraft-2.mp4",
        "Itslpsn",
        "center"
    ],
-    "multiversus": [
-        "https://www.youtube.com/watch?v=66oK1Mktz6g",
-        "multiversus.mp4",
-        "MKIceAndFire",
-        "center"
-    ],
-    "fall-guys": [
-        "https://www.youtube.com/watch?v=oGSsgACIc6Q",
-        "fall-guys.mp4",
-        "Throneful",
-        "center"
-    ],
-    "steep": [
-        "https://www.youtube.com/watch?v=EnGiQrWBrko",
-        "steep.mp4",
-        "joel",
+    "roblox": [
+        "https://www.youtube.com/watch?v=TnYDtDiuXzw",
+        "roblox.mp4",
+        "Dope Gameplays",
        "center"
    ]
 }
--- a/utils/settings.py
+++ b/utils/settings.py
--- a/utils/videos.py
+++ b/utils/videos.py