Creator Canvas — Video

Storyboard a multi-scene AI video on a node graph. Add Scene nodes for b-roll, Avatar nodes for on-camera presenters, a Voiceover or Music sibling for audio, and a Result node that renders the final cut through the same Inngest pipeline Theo uses everywhere else. For the cross-mode overview see Creator Canvas.

Sibling production nodes

The Video tab gives you four production nodes that all flow into the same Result node. Scene nodes carry b-roll prompts; the three siblings below carry their own content + render lifecycle.

Avatar (Theo Anchor)

On-camera presenter — a lip-synced talking head that speaks a script you type. Each Avatar node carries its own script, avatar pick, and voice pick.

Spokesperson and announcement shots
Tutorial intros / outros with a human face
Mixing presenter scenes with b-roll in the same deck

Voiceover

Narration track over b-roll. Pick a voice, type a script (or let Theo write one), preview the audio, and feed it into the Result node.

Narrated explainers without an on-camera face
Reusing a published podcast episode as the voiceover
Pairing with multiple Scene nodes for a documentary cut

Music

Generated background score from Theo Score. Pick a preset or write a custom music prompt, then wire it into the Result node.

Adding mood + pacing to a finished cut
Brand-aware ambient tracks
Replacing the default per-scene musical cues

Avatar node — Theo Anchor presenter

Drop an Avatar node from the toolbar to add a Theo Anchor presenter scene. Avatar nodes are siblings of Voiceover and Music — each carries its own script, avatar pick, voice pick, aspect ratio, motion prompt, expressiveness preset, and render lifecycle. Wire it into the Result node and it becomes one presenter scene in the final cut.

Script is required. The avatar reads your script verbatim. The visual prompt and reference image surfaces from the regular Scene node do not apply — use a Scene node when you want a b-roll shot instead.

Pick from the gallery. Click the presenter tile on the Avatar node to open a visual gallery. The picker lists every avatar your workspace has access to (filter chips narrow to Avatars, Talking photos, or All), and each tile carries a face preview. Each tile carries a Theo Anchor–compatible signal sourced from the upstream avatar catalog. When that signal is missing, the gallery stamps an amber “May not render” pill on the tile and raises a confirm-time toast — you can still pick it (some avatars render successfully anyway), but if Theo Anchor returns an error at submission time the Avatar node surfaces a “try another presenter” hint. Leave either control on Default to fall back to the stock presenter / voice.

Browse voices visually. The voice tile opens a full gallery dialog with search across name, language, and gender, plus chip filters for language (long-tail languages collapse behind a +N more popover), gender, and an Emotive toggle that narrows to voices that support emotion tags. Each card has a play / pause button with an animated waveform — only one preview rolls at a time, and the footer context bar tells you which voice is playing or staged for selection. Hit Use this voice to commit your pick (Cancel leaves the node untouched).

Pick a backdrop. The Backdrop tile opens a popover with curated color presets, a custom color picker, and an image uploader. Drop a brand still (or pick from connected Asset nodes) to put the presenter in front of a custom backdrop. When both color and image are set, image wins.

Chain multiple presenters. Add several Avatar nodes for a multi-host deck. Each one picks its own avatar + voice, and the timeline strip orders them alongside Scene nodes when you drag the running order.

Fallback is automatic. If Theo Anchor is unavailable at render time, the chain falls back to a Theo Reel b-roll clip rendered from the same script so the deck still ships.

Scene engines (b-roll)

Every Scene node has an Engine pill row that controls how its b-roll clip renders. Theo Autolets Theo's routing intelligence decide based on your prompt and uploaded image; the five brand-name engines lock the scene to a specific model. Avatar nodes do not use this pill — they always render through Theo Anchor.

Theo Auto

Recommended

Theo picks the best engine for each scene based on your prompt and uploaded reference image. Routes Veo 3.1 for text-heavy content, Seedance 2.0 for illustrations, LTX-2 for drafts.

Mixed storyboards — different scenes get different engines
When you’re not sure which engine fits
Text-heavy reference images (screenshots, dashboards, charts)

Veo 3.1

Google’s flagship video engine. Best text rendering, dialogue, lip-sync, and physics in the catalog. Native audio. 4K capable.

Screenshots, dashboards, charts, documents (best text fidelity)
Dialogue, lip-sync, talking-head shots
Calm, prompt-faithful narration shots

Sora 2

OpenAI Sora 2. Best-in-class physics simulation and longest single-clip durations (up to 20 seconds). Native audio + lip-sync.

Natural human movement and physics-heavy shots
Long takes up to 20 seconds
Narrative continuity across a single shot

Seedance 2.0

ByteDance Seedance 2.0. Multi-shot stories from a single prompt with director-level @ mentions. Arena #2 ranked. 1080p + native audio.

Illustrations, animation, and stylized motion
Multi-shot sequences and camera transitions
Director-style camera instructions baked into the prompt

LTX-2

Lightricks LTX-2. Open-weight 4K video at the cheapest premium tier ($0.06/sec at 1080p). Fast iteration without sacrificing resolution.

Drafts and rapid iteration
4K renders on a budget
Volume jobs where cost matters more than peak quality

Hailuo 02

MiniMax Hailuo 02. Best-in-class human characters with subtle micro-expressions and natural body language.

Portraits and character-driven scenes
Emotional shots with subtle facial movement
Body language + acting beats

Resolution & quality

Each Scene node has a three-tier Draft / Standard / High picker that maps to the active engine’s native resolution menu. Draft is the fastest and cheapest option; High is the highest fidelity each engine ships. On engines where a tier isn’t available the option still stores your intent — switching to a different engine immediately surfaces the resolution you implicitly picked.

Engine	Draft	Standard	High
Theo Auto The tier you choose is forwarded to whichever engine actually renders the scene.	Router picks	Router picks	Router picks
Veo 3.1	720p	720p	1080p (HD)
Sora 2 Draft + High aren’t available on the standard Sora 2 endpoint — the Pro endpoint adds them.	—	720p	—
Seedance 2.0	480p	720p	1080p (HD)
LTX-2 High costs ~4× a Draft render — use it sparingly.	1080p (HD)	1440p (2K)	2160p (4K)
Hailuo 02 High isn’t available on the standard Hailuo 02 endpoint; we surface 768P instead.	512P	768P	—

Backwards compatibility. Persisted canvases that only carried the old Standard / High values continue to render exactly as before — the new Draft tier is opt-in for fresh picks.

Reference images (image-to-video)

Anchoring a scene on your image

Every Scene node accepts an uploaded image as a render anchor. Drop a screenshot, dashboard, illustration, or photograph onto the dropzone and the renderer uses it as the first frame, so the motion stays faithful to your upload instead of inventing an unrelated visual.

Theo Auto routes based on your content. When you leave the engine on Theo Auto and attach a reference image, Theo routes deterministically based on what kind of image you uploaded:

Screenshots, dashboards, charts, documents, photos → Veo 3.1. Best text rendering in the catalog (Vidguru benchmark: 5/5 vs Kling 2.6 1/5).
Illustrations, animation frames, drawings → Seedance 2.0. Cinematic motion on stylized content; Arena #2 ranked.
Prompt mentions “draft” / “quick” / “preview” → LTX-2. Cheapest premium tier for rapid iteration.
Prompt mentions “multi-shot” / “sequence” / “scene transition” → Seedance 2.0. Multi-shot storytelling specialist.

We deliberately do not silently fall back to a different engine on a failure here. The previous fallback path (Kling 2.6 by default for Auto + image) produced renders that ignored uploads with text on them. If you want a different look, pick a specific engine on the pill row.

Per-engine behaviour. Each engine treats the reference frame slightly differently:

Veo 3.1 uses your image as the first frame and animates outward — highest fidelity to text and detail.
Sora 2 anchors physically accurate motion on your image — great for natural movement and long takes.
Seedance 2.0 drives multi-shot sequences from your image — best for illustrations, animation, and dynamic motion.
LTX-2 renders fast, high-resolution iterations from your image — best for drafts and budget-friendly clips.
Hailuo 02 brings people in your image to life with micro-expressions — best for portraits and character work.

Aspect-ratio handling

The renderer matches your reference image to the scene’s aspect ratio (16:9 by default, 9:16 for vertical). When your image already matches the target within ±5% the bytes are preserved verbatim. When the aspect differs, the renderer center-crops to fit and tells you what happened: the “Anchored on your image” pill appends either “Cropped to match your aspect ratio” or “No crop needed — your aspect ratio already matched.”

Detected: … pill (beta). When the content-classifier feature is on for your workspace, Theo runs a quick vision pass on your upload and shows the detected content type (dashboard screenshot, photograph, illustration, etc.). It’s informational today — future iterations may use it to drive smarter routing.

If the anchor fails to load, the Scene node surfaces a clear “Anchor failed” pill with the reason — we never silently substitute a fabricated frame, so you always see why a render didn’t match your image.

Enabling Theo Anchor

Operator setup

The Avatar node is opt-in per workspace. When the underlying service isn't configured, the avatar / voice picker shows a locked hint pointing operators at the environment variable documented in the deployment guide. Setting that key on the server unlocks the node for every user without restarting the canvas.

Scene nodes (Theo Auto + every brand-name engine) and the Voiceover / Music nodes all keep working when Theo Anchor is unavailable — only the Avatar node is gated.

Post-render engine badge

After a render lands, the Scene node surfaces a small badge telling you which engine actually shipped the clip:

Theo Auto · rendered with Veo 3.1 when you left the engine on Theo Auto.
Theo · rendered with Sora 2 when you explicitly picked the engine.
On a rare fallback, an amber Theo · you picked Sora 2 — finished on Veo 3.1 pill shows up with a one-click Retry button so you can re-render on your original engine pick.

Need a quick voice-only run instead of a talking head? Drop a Voiceover node alongside a few Scene nodes — Theo will narrate over the b-roll without touching the Avatar surface.

Was this article helpful?

Creator Canvas

One node-based workspace for AI images, videos, and podcasts. Three modes share the same brand context, asset library, and Flow Builder.

Video Mode

Generate videos from chat prompts — choose styles, durations, and aspect ratios.

Movies & Videos

Generate AI movies and videos from notes — choose styles, durations, and share your creations.

Voice Mode

Real-time streaming voice conversations powered by Theo Voice — talk to create projects, research, and more.

Creator Canvas overview Video Mode (chat)