RedditVideoMakerBot

Commit Graph

Author	SHA1	Message	Date
Abdessamad Haddouche	076b65f04c	feat: pro caption system with WhisperX word-level alignment Core changes: - utils/caption_renderer.py: new single-responsibility rendering engine - Three display modes: aligned, single, multi - 8-direction stroke technique for clean text outlines - Transparent PNG overlays (no more solid box) - utils/whisper_aligner.py: WhisperX forced alignment module - Word-level timestamps from any TTS audio - Graceful fallback to single mode if unavailable - utils/imagenarator.py: refactored as thin orchestrator - Delegates to caption_renderer - Saves timing_map.json for final_video sync - utils/sentiment_map.py: added STYLE_MAP with display_mode per sentiment - utils/sentiment.py: stores sentiment in settings for downstream use - TTS/engine_wrapper.py: runs WhisperX after each TTS save - video_creation/final_video.py: reads timing_map, handles absolute + fraction timing - video_creation/screenshot_downloader.py: clean imagemaker call Assets: - fonts/: added Montserrat, Nunito, Oswald, Raleway, Lato, Anton font families Dependencies: - requirements.txt: updated with all current dependencies	3 weeks ago
Abdessamad Haddouche	af0940045c	feat: sentiment-aware video pipeline with DeepSeek, metadata generation, and per-video folder structure SENTIMENT DETECTION (utils/sentiment.py) - Integrate DeepSeek API using OpenAI-compatible SDK to classify each Reddit post - Detect sentiment from post title + body (first 500 chars) into 8 labels: sad, happy, angry, mysterious, funny, dramatic, wholesome, scary - Override in-memory config per post (background_video, background_audio, voice) - Falls back to 'dramatic' label if DeepSeek API fails or is unavailable - Can be enabled/disabled via config.toml [deepseek] enabled = true/false SENTIMENT MAPS (utils/sentiment_map.py) - BACKGROUND_MAP: maps each sentiment to optimal background video + audio pair - OPENAI_VOICE_MAP: maps each sentiment to best-fit OpenAI TTS voice - ELEVENLABS_VOICE_MAP: maps each sentiment to best-fit ElevenLabs voice (fully mapped to real voices: Adam, George, Harry, Callum, Jessica, Brian, Laura, Matilda) - All overrides are in-memory only — config.toml is never modified METADATA GENERATION (utils/sentiment.py) - Single DeepSeek API call generates both sentiment + social media metadata - Generates per-platform content: * YouTube: title (max 70 chars) + full description * TikTok: caption (max 150 chars) with hashtags * Instagram: caption with hashtags * Facebook: caption * Hashtags: list of relevant tags - Falls back to basic title-based metadata if DeepSeek fails - Saves metadata.json inside each video's output folder RESULTS FOLDER RESTRUCTURE (video_creation/final_video.py) - Changed output structure from results/{subreddit}/{filename}.mp4 - New structure: results/{actual_subreddit}/{thread_id}_{sentiment}/video.mp4 - Each video now has its own isolated folder containing: * video.mp4 * metadata.json * thumbnail.png (if thumbnail generation is enabled) * OnlyTTS/video.mp4 (if enable_extra_audio is enabled) SUBREDDIT TRACKING (reddit/subreddit.py) - Added thread_subreddit field to reddit_object using submission.subreddit.display_name - Posts from r/AmItheAsshole now save to results/AmItheAsshole/ - Posts from r/tifu now save to results/tifu/ - Posts from r/confession now save to results/confession/ - Previously all posts were grouped under the combined subreddit string PIPELINE INTEGRATION (main.py) - Added apply_sentiment_config() call between post fetching and video generation - Sentiment detection runs before TTS and background selection - Controlled by settings.config['deepseek']['enabled'] flag CONFIG CHANGES (config.toml + utils/.config.template.toml) - Added [deepseek] section with api_key and enabled fields - elevenlabs_voice_name changed from optional=false to optional=true - Prevents prompt appearing when ElevenLabs is not the selected TTS provider	4 weeks ago

Author

SHA1

Message

Date

Abdessamad Haddouche

076b65f04c

feat: pro caption system with WhisperX word-level alignment

Core changes:
- utils/caption_renderer.py: new single-responsibility rendering engine
  - Three display modes: aligned, single, multi
  - 8-direction stroke technique for clean text outlines
  - Transparent PNG overlays (no more solid box)
- utils/whisper_aligner.py: WhisperX forced alignment module
  - Word-level timestamps from any TTS audio
  - Graceful fallback to single mode if unavailable
- utils/imagenarator.py: refactored as thin orchestrator
  - Delegates to caption_renderer
  - Saves timing_map.json for final_video sync
- utils/sentiment_map.py: added STYLE_MAP with display_mode per sentiment
- utils/sentiment.py: stores sentiment in settings for downstream use
- TTS/engine_wrapper.py: runs WhisperX after each TTS save
- video_creation/final_video.py: reads timing_map, handles absolute + fraction timing
- video_creation/screenshot_downloader.py: clean imagemaker call

Assets:
- fonts/: added Montserrat, Nunito, Oswald, Raleway, Lato, Anton font families

Dependencies:
- requirements.txt: updated with all current dependencies

Abdessamad Haddouche

af0940045c

feat: sentiment-aware video pipeline with DeepSeek, metadata generation, and per-video folder structure

SENTIMENT DETECTION (utils/sentiment.py)
- Integrate DeepSeek API using OpenAI-compatible SDK to classify each Reddit post
- Detect sentiment from post title + body (first 500 chars) into 8 labels:
  sad, happy, angry, mysterious, funny, dramatic, wholesome, scary
- Override in-memory config per post (background_video, background_audio, voice)
- Falls back to 'dramatic' label if DeepSeek API fails or is unavailable
- Can be enabled/disabled via config.toml [deepseek] enabled = true/false

SENTIMENT MAPS (utils/sentiment_map.py)
- BACKGROUND_MAP: maps each sentiment to optimal background video + audio pair
- OPENAI_VOICE_MAP: maps each sentiment to best-fit OpenAI TTS voice
- ELEVENLABS_VOICE_MAP: maps each sentiment to best-fit ElevenLabs voice
  (fully mapped to real voices: Adam, George, Harry, Callum, Jessica, Brian, Laura, Matilda)
- All overrides are in-memory only — config.toml is never modified

METADATA GENERATION (utils/sentiment.py)
- Single DeepSeek API call generates both sentiment + social media metadata
- Generates per-platform content:
  * YouTube: title (max 70 chars) + full description
  * TikTok: caption (max 150 chars) with hashtags
  * Instagram: caption with hashtags
  * Facebook: caption
  * Hashtags: list of relevant tags
- Falls back to basic title-based metadata if DeepSeek fails
- Saves metadata.json inside each video's output folder

RESULTS FOLDER RESTRUCTURE (video_creation/final_video.py)
- Changed output structure from results/{subreddit}/{filename}.mp4
- New structure: results/{actual_subreddit}/{thread_id}_{sentiment}/video.mp4
- Each video now has its own isolated folder containing:
  * video.mp4
  * metadata.json
  * thumbnail.png (if thumbnail generation is enabled)
  * OnlyTTS/video.mp4 (if enable_extra_audio is enabled)

SUBREDDIT TRACKING (reddit/subreddit.py)
- Added thread_subreddit field to reddit_object using submission.subreddit.display_name
- Posts from r/AmItheAsshole now save to results/AmItheAsshole/
- Posts from r/tifu now save to results/tifu/
- Posts from r/confession now save to results/confession/
- Previously all posts were grouped under the combined subreddit string

PIPELINE INTEGRATION (main.py)
- Added apply_sentiment_config() call between post fetching and video generation
- Sentiment detection runs before TTS and background selection
- Controlled by settings.config['deepseek']['enabled'] flag

CONFIG CHANGES (config.toml + utils/.config.template.toml)
- Added [deepseek] section with api_key and enabled fields
- elevenlabs_voice_name changed from optional=false to optional=true
- Prevents prompt appearing when ElevenLabs is not the selected TTS provider

2 Commits (71cbbacd60e428bd9ac9bf9332a387987e5f457a)