Video Pipeline
Every U-Gen video passes through an 8-stage automated pipeline. Each stage is checkpointed, so if something fails the job can retry from that point without starting over.
Pipeline overview
Click any stage to jump to its details below.
Pipeline Progress
Streaming Overlap
Stage details
Prepare
prepareValidates inputs, reserves credits, and creates the job record.
The system checks your product image, validates all parameters, estimates credit cost, and reserves credits from your balance. If validation fails, no credits are consumed.
AI Agents
agentsAI agents generate the video script and plan keyframe prompts.
A specialized AI agent analyzes your product image, persona, and settings to write a natural UGC-style script. It plans the visual keyframes — including camera angles, expressions, and product interaction shots.
Initialize
initGenerates the anchor keyframe — the visual foundation for the entire video.
The anchor keyframe establishes the persona’s appearance, lighting, and environment. All subsequent keyframes reference this image to maintain visual consistency throughout the video.
Generation
generationGenerates all video segments from keyframe pairs.
Each segment is created from a keyframe pair using Veo 3.1 or Kling 3.0. Segments generate with streaming overlap — after 2 consecutive keyframes pass QA, video generation begins in parallel with remaining keyframes.
Concatenate
concatStitches all video segments into a single continuous video.
The FFmpeg microservice concatenates segments with smart trimming, handling frame alignment and transition smoothing between segments.
Audio + Music
audioApplies voice-over and mixes background music.
The script is performed using ElevenLabs STS (Speech-to-Speech) with the selected voice. The audio is synced to the video timeline and mixed with optional background music from the library.
Captions
captionsGenerates word-level synchronized captions.
Captions are generated with karaoke-style word highlighting using ASS subtitle format. Supports 29 languages with automatic language detection and RTL layout for Arabic and Hebrew.
Complete
completeFinal render, upload, and delivery.
The FFmpeg microservice performs the final render (burning in captions, mixing audio tracks), uploads the finished video to secure storage, finalizes credits (reserved → consumed), and triggers webhook notifications. The video is ready for download or social auto-posting.
Monitoring Progress
current_stage and progress_percent fields on the job object. You can also configure a webhook URL in your account settings to receive notifications on job completion and failure.Checkpointing and retries
Every stage is checkpointed. If a job fails at the audio stage, you can retry from that stage — the system skips earlier stages and re-uses their cached outputs. This saves both time and credits.
Retrying re-reserves credits from the restart stage onward. See the Retry Job API for programmatic retries.
Video models
Veo 3.1 Fast
Great results at the best price. Ideal for social media ads and product showcases.
Aspect ratios: 9:16, 16:9, Auto
Veo 3.1 Quality
Ultra-realistic visuals for premium brands. When every detail matters.
Aspect ratios: 9:16, 16:9, Auto
Kling 3.0 Standard
Flexible clip lengths with built-in sound effects. Perfect for dynamic content.
Aspect ratios: 9:16, 16:9, Auto, 1:1
Kling 3.0 Pro
Maximum detail with sound effects. For high-end campaigns that need to stand out.
Aspect ratios: 9:16, 16:9, Auto, 1:1
See the Models API for full details and capabilities.