Captions

U-Gen automatically generates word-level synchronized captions for every video. Captions use karaoke-style highlighting where each word lights up as it's spoken — the format proven to boost engagement on social media.

How it works

Transcription

After the audio stage, the voice-over is transcribed with word-level timestamps using OpenAI Whisper.

ASS subtitle generation

Word timestamps are converted into ASS (Advanced SubStation Alpha) subtitle format with karaoke fill (\kf) timing tags for per-word color transitions.

Burn-in

During the final render stage, captions are burned directly into the video using FFmpeg. This ensures they display correctly on all platforms.

Caption presets

Choose a preset when creating a job. You can also use a custom style via your brand kit colors.

tiktok_karaoke

Bold white text with word-by-word color fill — the classic TikTok caption style.

tiktok_highlight

Similar to TikTok karaoke but with a highlight box behind each active word.

mrbeast_karaoke

Large, high-impact text inspired by MrBeast-style captions. Maximum visibility.

neon_karaoke

Neon-colored text with glow effect — eye-catching for social-first content.

minimal_karaoke

Clean, understated captions with subtle word highlighting. Professional feel.

Karaoke Effect

Each word transitions from a muted color to a highlighted color at the exact moment it's spoken. This uses ASS karaoke fill tags (\kf) for smooth per-word color transitions.

Language support

Captions support all 29 languages available in U-Gen. The system uses OpenAI Whisper for transcription, which automatically detects the language from the audio.

LTR languages

English, French, Spanish, German, etc.

RTL languages

Arabic, Hebrew — right-to-left layout defined.

CJK languages

Chinese, Japanese, Korean character rendering.

Brand Colors

If you have a brand kit configured, the system can use your primary brand color as the caption highlight color instead of the preset default.