Creator Canvas — Video
Storyboard a multi-scene AI video on a node graph. Add Scene nodes for b-roll, Avatar nodes for on-camera presenters, a Voiceover or Music sibling for audio, and a Result node that renders the final cut through the same Inngest pipeline Theo uses everywhere else. For the cross-mode overview see Creator Canvas.
Sibling production nodes
The Video tab gives you four production nodes that all flow into the same Result node. Scene nodes carry b-roll prompts; the three siblings below carry their own content + render lifecycle.
Avatar (Theo Anchor)
On-camera presenter — a lip-synced talking head that speaks a script you type. Each Avatar node carries its own script, avatar pick, and voice pick.
- Spokesperson and announcement shots
- Tutorial intros / outros with a human face
- Mixing presenter scenes with b-roll in the same deck
Voiceover
Narration track over b-roll. Pick a voice, type a script (or let Theo write one), preview the audio, and feed it into the Result node.
- Narrated explainers without an on-camera face
- Reusing a published podcast episode as the voiceover
- Pairing with multiple Scene nodes for a documentary cut
Music
Generated background score from Theo Score. Pick a preset or write a custom music prompt, then wire it into the Result node.
- Adding mood + pacing to a finished cut
- Brand-aware ambient tracks
- Replacing the default per-scene musical cues
Avatar node — Theo Anchor presenter
Drop an Avatar node from the toolbar to add a Theo Anchor presenter scene. Avatar nodes are siblings of Voiceover and Music — each carries its own script, avatar pick, voice pick, aspect ratio, motion prompt, expressiveness preset, and render lifecycle. Wire it into the Result node and it becomes one presenter scene in the final cut.
Script is required. The avatar reads your script verbatim. The visual prompt and reference image surfaces from the regular Scene node do not apply — use a Scene node when you want a b-roll shot instead.
Pick from the gallery. Click the presenter tile on the Avatar node to open a visual gallery. The picker lists every avatar your workspace has access to (filter chips narrow to Avatars, Talking photos, or All), and each tile carries a face preview. Each tile carries a Theo Anchor–compatible signal sourced from the upstream avatar catalog. When that signal is missing, the gallery stamps an amber “May not render” pill on the tile and raises a confirm-time toast — you can still pick it (some avatars render successfully anyway), but if Theo Anchor returns an error at submission time the Avatar node surfaces a “try another presenter” hint. Leave either control on Default to fall back to the stock presenter / voice.
Browse voices visually. The voice tile opens a full gallery dialog with search across name, language, and gender, plus chip filters for language (long-tail languages collapse behind a +N more popover), gender, and an Emotive toggle that narrows to voices that support emotion tags. Each card has a play / pause button with an animated waveform — only one preview rolls at a time, and the footer context bar tells you which voice is playing or staged for selection. Hit Use this voice to commit your pick (Cancel leaves the node untouched).
Pick a backdrop. The Backdrop tile opens a popover with curated color presets, a custom color picker, and an image uploader. Drop a brand still (or pick from connected Asset nodes) to put the presenter in front of a custom backdrop. When both color and image are set, image wins.
Chain multiple presenters. Add several Avatar nodes for a multi-host deck. Each one picks its own avatar + voice, and the timeline strip orders them alongside Scene nodes when you drag the running order.
Fallback is automatic. If Theo Anchor is unavailable at render time, the chain falls back to a Theo Reel b-roll clip rendered from the same script so the deck still ships.
Scene engines (b-roll)
Every Scene node has an Engine pill row that controls how its b-roll clip renders. Theo Autolets Theo's routing intelligence decide based on your prompt and uploaded image; the five brand-name engines lock the scene to a specific model. Avatar nodes do not use this pill — they always render through Theo Anchor.
Theo Auto
RecommendedTheo picks the best engine for each scene based on your prompt and uploaded reference image. Routes Veo 3.1 for text-heavy content, Seedance 2.0 for illustrations, LTX-2 for drafts.
- Mixed storyboards — different scenes get different engines
- When you’re not sure which engine fits
- Text-heavy reference images (screenshots, dashboards, charts)
Veo 3.1
Google’s flagship video engine. Best text rendering, dialogue, lip-sync, and physics in the catalog. Native audio. 4K capable.
- Screenshots, dashboards, charts, documents (best text fidelity)
- Dialogue, lip-sync, talking-head shots
- Calm, prompt-faithful narration shots
Sora 2
OpenAI Sora 2. Best-in-class physics simulation and longest single-clip durations (up to 20 seconds). Native audio + lip-sync.
- Natural human movement and physics-heavy shots
- Long takes up to 20 seconds
- Narrative continuity across a single shot
Seedance 2.0
ByteDance Seedance 2.0. Multi-shot stories from a single prompt with director-level @ mentions. Arena #2 ranked. 1080p + native audio.
- Illustrations, animation, and stylized motion
- Multi-shot sequences and camera transitions
- Director-style camera instructions baked into the prompt
LTX-2
Lightricks LTX-2. Open-weight 4K video at the cheapest premium tier ($0.06/sec at 1080p). Fast iteration without sacrificing resolution.
- Drafts and rapid iteration
- 4K renders on a budget
- Volume jobs where cost matters more than peak quality
Hailuo 02
MiniMax Hailuo 02. Best-in-class human characters with subtle micro-expressions and natural body language.
- Portraits and character-driven scenes
- Emotional shots with subtle facial movement
- Body language + acting beats
Resolution & quality
Each Scene node has a three-tier Draft / Standard / High picker that maps to the active engine’s native resolution menu. Draft is the fastest and cheapest option; High is the highest fidelity each engine ships. On engines where a tier isn’t available the option still stores your intent — switching to a different engine immediately surfaces the resolution you implicitly picked.
| Engine | Draft | Standard | High |
|---|---|---|---|
Theo Auto The tier you choose is forwarded to whichever engine actually renders the scene. | Router picks | Router picks | Router picks |
Veo 3.1 | 720p | 720p | 1080p (HD) |
Sora 2 Draft + High aren’t available on the standard Sora 2 endpoint — the Pro endpoint adds them. | — | 720p | — |
Seedance 2.0 | 480p | 720p | 1080p (HD) |
LTX-2 High costs ~4× a Draft render — use it sparingly. | 1080p (HD) | 1440p (2K) | 2160p (4K) |
Hailuo 02 High isn’t available on the standard Hailuo 02 endpoint; we surface 768P instead. | 512P | 768P | — |
Backwards compatibility. Persisted canvases that only carried the old Standard / High values continue to render exactly as before — the new Draft tier is opt-in for fresh picks.
Reference images (image-to-video)
Every Scene node accepts an uploaded image as a render anchor. Drop a screenshot, dashboard, illustration, or photograph onto the dropzone and the renderer uses it as the first frame, so the motion stays faithful to your upload instead of inventing an unrelated visual.
Theo Auto routes based on your content. When you leave the engine on Theo Auto and attach a reference image, Theo routes deterministically based on what kind of image you uploaded:
- Screenshots, dashboards, charts, documents, photos → Veo 3.1. Best text rendering in the catalog (Vidguru benchmark: 5/5 vs Kling 2.6 1/5).
- Illustrations, animation frames, drawings → Seedance 2.0. Cinematic motion on stylized content; Arena #2 ranked.
- Prompt mentions “draft” / “quick” / “preview” → LTX-2. Cheapest premium tier for rapid iteration.
- Prompt mentions “multi-shot” / “sequence” / “scene transition” → Seedance 2.0. Multi-shot storytelling specialist.
We deliberately do not silently fall back to a different engine on a failure here. The previous fallback path (Kling 2.6 by default for Auto + image) produced renders that ignored uploads with text on them. If you want a different look, pick a specific engine on the pill row.
Per-engine behaviour. Each engine treats the reference frame slightly differently:
- Veo 3.1 uses your image as the first frame and animates outward — highest fidelity to text and detail.
- Sora 2 anchors physically accurate motion on your image — great for natural movement and long takes.
- Seedance 2.0 drives multi-shot sequences from your image — best for illustrations, animation, and dynamic motion.
- LTX-2 renders fast, high-resolution iterations from your image — best for drafts and budget-friendly clips.
- Hailuo 02 brings people in your image to life with micro-expressions — best for portraits and character work.
The renderer matches your reference image to the scene’s aspect ratio (16:9 by default, 9:16 for vertical). When your image already matches the target within ±5% the bytes are preserved verbatim. When the aspect differs, the renderer center-crops to fit and tells you what happened: the “Anchored on your image” pill appends either “Cropped to match your aspect ratio” or “No crop needed — your aspect ratio already matched.”
Detected: … pill (beta). When the content-classifier feature is on for your workspace, Theo runs a quick vision pass on your upload and shows the detected content type (dashboard screenshot, photograph, illustration, etc.). It’s informational today — future iterations may use it to drive smarter routing.
If the anchor fails to load, the Scene node surfaces a clear “Anchor failed” pill with the reason — we never silently substitute a fabricated frame, so you always see why a render didn’t match your image.
Enabling Theo Anchor
The Avatar node is opt-in per workspace. When the underlying service isn't configured, the avatar / voice picker shows a locked hint pointing operators at the environment variable documented in the deployment guide. Setting that key on the server unlocks the node for every user without restarting the canvas.
Scene nodes (Theo Auto + every brand-name engine) and the Voiceover / Music nodes all keep working when Theo Anchor is unavailable — only the Avatar node is gated.
After a render lands, the Scene node surfaces a small badge telling you which engine actually shipped the clip:
- Theo Auto · rendered with Veo 3.1 when you left the engine on Theo Auto.
- Theo · rendered with Sora 2 when you explicitly picked the engine.
- On a rare fallback, an amber Theo · you picked Sora 2 — finished on Veo 3.1 pill shows up with a one-click Retry button so you can re-render on your original engine pick.
Related Articles
Creator Canvas
One node-based workspace for AI images, videos, and podcasts. Three modes share the same brand context, asset library, and Flow Builder.
Video Mode
Generate videos from chat prompts — choose styles, durations, and aspect ratios.
Movies & Videos
Generate AI movies and videos from notes — choose styles, durations, and share your creations.
Voice Mode
Real-time streaming voice conversations powered by Theo Voice — talk to create projects, research, and more.