Video Pipeline

Every U-Gen video passes through an 8-stage automated pipeline. Each stage is checkpointed, so if something fails the job can retry from that point without starting over.

Pipeline overview

Click any stage to jump to its details below.

Pipeline Progress

Prepare

AI Agents

Initialize

Generation

Concatenate

Audio + Music

Captions

Complete

Streaming Overlap

U-Gen doesn't wait for all keyframes before generating video. After 2 consecutive keyframes pass QA, video segment generation starts in parallel with remaining keyframe generation. This significantly reduces total processing time.

Stage details

Prepare

prepare

Validates inputs, reserves credits, and creates the job record.

The system checks your product image, validates all parameters, estimates credit cost, and reserves credits from your balance. If validation fails, no credits are consumed.

AI Agents

agents

AI agents generate the video script and plan keyframe prompts.

A specialized AI agent analyzes your product image, persona, and settings to write a natural UGC-style script. It plans the visual keyframes — including camera angles, expressions, and product interaction shots.

Initialize

init

Generates the anchor keyframe — the visual foundation for the entire video.

The anchor keyframe establishes the persona’s appearance, lighting, and environment. All subsequent keyframes reference this image to maintain visual consistency throughout the video.

Generation

generation

Generates all video segments from keyframe pairs.

Each segment is created from a keyframe pair using Veo 3.1 or Kling 3.0. Segments generate with streaming overlap — after 2 consecutive keyframes pass QA, video generation begins in parallel with remaining keyframes.

Concatenate

concat

Stitches all video segments into a single continuous video.

The FFmpeg microservice concatenates segments with smart trimming, handling frame alignment and transition smoothing between segments.

Audio + Music

audio

Applies voice-over and mixes background music.

The script is performed using ElevenLabs STS (Speech-to-Speech) with the selected voice. The audio is synced to the video timeline and mixed with optional background music from the library.

Captions

captions

Generates word-level synchronized captions.

Captions are generated with karaoke-style word highlighting using ASS subtitle format. Supports 29 languages with automatic language detection and RTL layout for Arabic and Hebrew.

Complete

complete

Final render, upload, and delivery.

The FFmpeg microservice performs the final render (burning in captions, mixing audio tracks), uploads the finished video to secure storage, finalizes credits (reserved → consumed), and triggers webhook notifications. The video is ready for download or social auto-posting.

Monitoring Progress

Track pipeline progress in real-time via the current_stage and progress_percent fields on the job object. You can also configure a webhook URL in your account settings to receive notifications on job completion and failure.

Checkpointing and retries

Every stage is checkpointed. If a job fails at the audio stage, you can retry from that stage — the system skips earlier stages and re-uses their cached outputs. This saves both time and credits.

Retrying re-reserves credits from the restart stage onward. See the Retry Job API for programmatic retries.

Video models

Veo 3.1 Fast

Great results at the best price. Ideal for social media ads and product showcases.

FastCost-EfficientHD

Aspect ratios: 9:16, 16:9, Auto

Veo 3.1 Quality

Ultra-realistic visuals for premium brands. When every detail matters.

PremiumPhotorealistic4K

Aspect ratios: 9:16, 16:9, Auto

Kling 3.0 Standard

Flexible clip lengths with built-in sound effects. Perfect for dynamic content.

FlexibleSound1:1

Aspect ratios: 9:16, 16:9, Auto, 1:1

Kling 3.0 Pro

Maximum detail with sound effects. For high-end campaigns that need to stand out.

ProSoundMax Detail1:1

Aspect ratios: 9:16, 16:9, Auto, 1:1

See the Models API for full details and capabilities.