Captions
U-Gen automatically generates word-level synchronized captions for every video. Captions use karaoke-style highlighting where each word lights up as it's spoken — the format proven to boost engagement on social media.
How it works
Transcription
After the audio stage, the voice-over is transcribed with word-level timestamps using OpenAI Whisper.
ASS subtitle generation
Word timestamps are converted into ASS (Advanced SubStation Alpha) subtitle format with karaoke fill (\kf) timing tags for per-word color transitions.
Burn-in
During the final render stage, captions are burned directly into the video using FFmpeg. This ensures they display correctly on all platforms.
Caption presets
Choose a preset when creating a job. You can also use a custom style via your brand kit colors.
tiktok_karaokeBold white text with word-by-word color fill — the classic TikTok caption style.
tiktok_highlightSimilar to TikTok karaoke but with a highlight box behind each active word.
mrbeast_karaokeLarge, high-impact text inspired by MrBeast-style captions. Maximum visibility.
neon_karaokeNeon-colored text with glow effect — eye-catching for social-first content.
minimal_karaokeClean, understated captions with subtle word highlighting. Professional feel.
Karaoke Effect
\kf) for smooth per-word color transitions.Language support
Captions support all 29 languages available in U-Gen. The system uses OpenAI Whisper for transcription, which automatically detects the language from the audio.
LTR languages
English, French, Spanish, German, etc.
RTL languages
Arabic, Hebrew — right-to-left layout defined.
CJK languages
Chinese, Japanese, Korean character rendering.
Brand Colors