Core changes:
- utils/caption_renderer.py: new single-responsibility rendering engine
- Three display modes: aligned, single, multi
- 8-direction stroke technique for clean text outlines
- Transparent PNG overlays (no more solid box)
- utils/whisper_aligner.py: WhisperX forced alignment module
- Word-level timestamps from any TTS audio
- Graceful fallback to single mode if unavailable
- utils/imagenarator.py: refactored as thin orchestrator
- Delegates to caption_renderer
- Saves timing_map.json for final_video sync
- utils/sentiment_map.py: added STYLE_MAP with display_mode per sentiment
- utils/sentiment.py: stores sentiment in settings for downstream use
- TTS/engine_wrapper.py: runs WhisperX after each TTS save
- video_creation/final_video.py: reads timing_map, handles absolute + fraction timing
- video_creation/screenshot_downloader.py: clean imagemaker call
Assets:
- fonts/: added Montserrat, Nunito, Oswald, Raleway, Lato, Anton font families
Dependencies:
- requirements.txt: updated with all current dependencies
SENTIMENT DETECTION (utils/sentiment.py)
- Integrate DeepSeek API using OpenAI-compatible SDK to classify each Reddit post
- Detect sentiment from post title + body (first 500 chars) into 8 labels:
sad, happy, angry, mysterious, funny, dramatic, wholesome, scary
- Override in-memory config per post (background_video, background_audio, voice)
- Falls back to 'dramatic' label if DeepSeek API fails or is unavailable
- Can be enabled/disabled via config.toml [deepseek] enabled = true/false
SENTIMENT MAPS (utils/sentiment_map.py)
- BACKGROUND_MAP: maps each sentiment to optimal background video + audio pair
- OPENAI_VOICE_MAP: maps each sentiment to best-fit OpenAI TTS voice
- ELEVENLABS_VOICE_MAP: maps each sentiment to best-fit ElevenLabs voice
(fully mapped to real voices: Adam, George, Harry, Callum, Jessica, Brian, Laura, Matilda)
- All overrides are in-memory only — config.toml is never modified
METADATA GENERATION (utils/sentiment.py)
- Single DeepSeek API call generates both sentiment + social media metadata
- Generates per-platform content:
* YouTube: title (max 70 chars) + full description
* TikTok: caption (max 150 chars) with hashtags
* Instagram: caption with hashtags
* Facebook: caption
* Hashtags: list of relevant tags
- Falls back to basic title-based metadata if DeepSeek fails
- Saves metadata.json inside each video's output folder
RESULTS FOLDER RESTRUCTURE (video_creation/final_video.py)
- Changed output structure from results/{subreddit}/{filename}.mp4
- New structure: results/{actual_subreddit}/{thread_id}_{sentiment}/video.mp4
- Each video now has its own isolated folder containing:
* video.mp4
* metadata.json
* thumbnail.png (if thumbnail generation is enabled)
* OnlyTTS/video.mp4 (if enable_extra_audio is enabled)
SUBREDDIT TRACKING (reddit/subreddit.py)
- Added thread_subreddit field to reddit_object using submission.subreddit.display_name
- Posts from r/AmItheAsshole now save to results/AmItheAsshole/
- Posts from r/tifu now save to results/tifu/
- Posts from r/confession now save to results/confession/
- Previously all posts were grouped under the combined subreddit string
PIPELINE INTEGRATION (main.py)
- Added apply_sentiment_config() call between post fetching and video generation
- Sentiment detection runs before TTS and background selection
- Controlled by settings.config['deepseek']['enabled'] flag
CONFIG CHANGES (config.toml + utils/.config.template.toml)
- Added [deepseek] section with api_key and enabled fields
- elevenlabs_voice_name changed from optional=false to optional=true
- Prevents prompt appearing when ElevenLabs is not the selected TTS provider
- Fix ElevenLabs voice lookup by name using voice_id instead of name string
- Update model from eleven_multilingual_v1 to eleven_multilingual_v2 (free tier)
- Remove hardcoded voice options restriction in config template
- Update default voice to Sarah
- Enable ffmpeg verbose output for better error debugging