Back to skills
extension
Category: Content & MediaNo API key required

create-movie

Orchestrated movie creation for Horus persona. Guides through phases: Research → Script → Build Tools → Generate → Assemble. Uses Docker-isolated coding environment, free/open-source tools only, with full memory integration.

personAuthor: jakexiaohubgithub

create-movie

Orchestrated movie creation for Horus persona. Creates mockumentaries, short films, music videos, and educational content through a phased workflow.

Philosophy

"AI isn't the artist, it's the amplifier" - Nobody & The Computer

Horus uses AI to turn imagination into audiovisual reality. He doesn't just use pre-built tools - he writes code to create his own tools.

Phases

HARDWARE CHECK → RESEARCH → SCRIPT → BUILD TOOLS → GENERATE → ASSEMBLE → LEARN

Phase 0: Hardware Detection (Automatic)

Before any generation, the orchestrator automatically detects hardware via /ops-workstation:

# Automatic hardware check on startup
./run.sh create "prompt"
# → Calls /ops-workstation gpu to detect VRAM
# → Calls /ops-workstation memory to detect RAM
# → Auto-selects optimal model variant

Auto-Selection Logic:

| Detected VRAM | Model Selected | Settings | |---------------|----------------|----------| | ≥24GB | LTX-2 19B FP8 | 720p/1080p, audio on, batch=1 | | 16-23GB | LTX-2 19B FP4 | 720p only, audio on, batch=1 | | 12-15GB | LTX-2 Distilled 2B | 720p, audio optional, batch=1 | | <12GB | RunPod suggested | Prompts to use /ops-runpod |

RAM-Based Optimizations:

| Detected RAM | Optimization | |--------------|--------------| | ≥128GB | Weight streaming enabled (offload to RAM) | | 64-127GB | Partial offloading | | <64GB | No offloading, strict VRAM limits |

Override Auto-Detection:

# Force specific model variant
./run.sh create "prompt" --model ltx2-fp4
./run.sh create "prompt" --model ltx2-distilled
./run.sh create "prompt" --runpod  # Force cloud generation

Phase 1: Research (Library-First)

  1. Check Horus's Library First:
    • horus-filmmaking scope (past techniques, learnings)
    • horus_lore scope (YouTube transcripts, film analysis)
    • Ingested movies with emotion tags
    • Episodic archive (past filmmaking sessions)
  2. Search for New Resources:
    • /ingest-movie search for films to watch
    • /ingest-youtube search for tutorials
  3. Deep Web Research:
    • /dogpile for comprehensive multi-source search
    • /surf for specific tutorials/references

Phase 2: Script (via /create-story)

  • Integrates with /create-story skill for screenplay generation
  • Uses Chutes models (chimera, qwen, deepseek-r1) for creative writing
  • Parses INT./EXT. headings, dialogue, action, audio cues
  • Outputs structured scene breakdown with visual descriptions

Format Options:

  • screenplay (default) - Standard INT./EXT. scene headings
  • mockumentary - Interview segments with talking heads + B-roll
  • reconstruction - Historical recreation with narrator framing

Phase 3: Build Tools

  • Write code in Docker-isolated sandbox
  • Create custom tools for specific effects
  • Iterate on approaches

Phase 4: Generate

  • Use ComfyUI, Stable Diffusion for images
  • Use auto-selected video model based on hardware (LTX-2 FP8/FP4/Distilled)
  • Use Whisper, IndexTTS2 for audio
  • If hardware insufficient, automatically suggests /ops-runpod

Phase 5: Assemble

  • Combine assets with FFmpeg
  • Output MP4 video or interactive HTML

Phase 6: Learn

  • Store successful techniques in /memory
  • Remember what worked for future movies

Quick Start

cd .pi/skills/create-movie

# Full orchestrated workflow (recommended)
./run.sh create "A 30-second film about discovering colors"

# With options
./run.sh create "film noir detective" \
    --duration 60 \
    --style "high contrast, shadows, venetian blinds" \
    --format mp4 \
    --work-dir ./noir_project

# Individual phases (for manual control)
./run.sh research "film noir lighting techniques"
./run.sh script --from-research research.json --duration 30 --use-create-story
./run.sh build-tools --script script.json
./run.sh generate --tools ./tools --script script.json --style "cinematic"
./run.sh assemble --assets ./assets --output movie.mp4 --format mp4
./run.sh learn --project-dir ./movie_project

CLI Commands

create

Full orchestrated workflow through all phases.

./run.sh create PROMPT [OPTIONS]
  --output, -o       Output file (default: movie.mp4)
  --work-dir, -w     Working directory (default: ./movie_project)
  --duration, -d     Target duration in seconds (default: 30)
  --style, -s        Visual style (e.g., 'cinematic', 'film noir')
  --format, -f       Output format: mp4 or html (default: mp4)
  --store-learnings  Store learnings in memory (default: true)
  --skip-research    Skip research phase if research.json exists

research

Library-first research: checks Horus's memory and ingested content before external search.

./run.sh research TOPIC [OPTIONS]
  --output, -o       Output file (default: research.json)
  --skip-external    Only search library, skip external sources

script

Generate screenplay with scene breakdown. Integrates with /create-story.

./run.sh script [OPTIONS]
  --from-research, -r  Research JSON file (required)
  --prompt, -p         Override topic from research
  --duration, -d       Target duration in seconds
  --use-create-story   Use /create-story skill for screenplay
  --model, -m          LLM model (default: chimera)
  --output, -o         Output file (default: script.json)

build-tools

Generate custom tools in Docker sandbox.

./run.sh build-tools [OPTIONS]
  --script, -s       Script JSON file (required)
  --output-dir, -o   Output directory (default: ./tools)
  --skip-docker      Use host instead of Docker sandbox

generate

Create images, video, and audio assets.

./run.sh generate [OPTIONS]
  --tools, -t        Tools directory (default: ./tools)
  --script, -s       Script JSON file (required)
  --output-dir, -o   Assets output directory (default: ./assets)
  --style            Visual style to apply

assemble

Combine assets into final output.

./run.sh assemble [OPTIONS]
  --assets, -a       Assets directory (required)
  --output, -o       Output file/directory (required)
  --format, -f       Output format: mp4 or html (default: mp4)
  --fps              Frames per second for MP4 (default: 24)

learn

Store filmmaking insights in memory after a project.

./run.sh learn [OPTIONS]
  --project-dir, -p  Project directory (required)
  --scope            Memory scope (default: horus-filmmaking)
  --dry-run          Show learnings without storing

study

Pre-phase: Learn filmmaking topics BEFORE creating movies. Targeted /dogpile with internal (memory) + external (web) search, then stores via /memory learn.

./run.sh study TOPIC [OPTIONS]
  --scope            Memory scope (default: horus-filmmaking)
  --deep/--quick     Deep research (dogpile) vs quick (YouTube search)
  --list-topics      Show suggested filmmaking topics

# Examples:
./run.sh study "cinematography lighting techniques" --deep
./run.sh study "camera framing composition" --deep
./run.sh study --list-topics

study-all

Comprehensive learning session - studies all core filmmaking topics.

./run.sh study-all [OPTIONS]
  --scope            Memory scope (default: horus-filmmaking)

Output Formats

MP4 Video

Standard video file, playable anywhere.

Interactive HTML

Web-based experience with:

  • Frame-by-frame navigation
  • Audio controls
  • Scene metadata viewer

Available Skills

Horus has access to all skills in .pi/skills/:

| Skill | Purpose in Movie Creation | |-------|---------------------------| | /dogpile | Deep research on techniques, references | | /surf | Visit websites, tutorials, references | | /memory | Recall prior techniques, store learnings | | /create-image | Generate images for scenes | | /tts-train | Horus's voice for narration | | /ingest-movie | Ingest reference movies for style analysis | | /create-paper | Write stories, scripts, creative content | | /episodic-archiver | Archive movie creation sessions | | /anvil | Debug and harden custom tools | | /ingest-book | Search books for story inspiration |

Free/Open-Source Tools

| Purpose | Tool | |---------|------| | Image Generation | Stable Diffusion (ComfyUI) | | Video Generation | LTX-2 (recommended), Mochi 1, CogVideoX (fallbacks) | | Video Processing | FFmpeg | | Speech-to-Text | faster-whisper | | Text-to-Speech | IndexTTS2 |

Video Model Selection Guide

Choose video model based on your GPU VRAM and use case. VRAM figures include 3-5GB headroom for pipeline overhead (ComfyUI/loader/audio), batch=1, FP8/FP4 where noted.

| VRAM | Recommended Models | Best For | |------|-------------------|----------| | 12GB (RTX 3060/4070) | LTX-2 Distilled (2B), CogVideoX-2B | Quick iterations, pre-viz | | 16GB (RTX 4080/A4000) | LTX-2 19B FP4 (720p, ≤10s), WAN 2.2, SVD | Medium quality production | | 24GB (RTX 4090/A5000) | LTX-2 19B FP8 (recommended), WAN 2.2, Mochi | High quality production | | 40GB+ (A100/H100) | LTX-2 BF16 (43GB), Full Mochi, Open-Sora 2.0 | Maximum quality |

Safe Defaults (RTX A5000 24GB)

Model: LTX-2 19B FP8
Resolution: 720p
Clip length: 10s
Batch size: 1
Seed: fixed
Audio: on

If runtime VRAM >22GB or instability occurs: lower resolution to 540p, disable audio, or shorten clips. Avoid parallel jobs on 24GB.

Model Characteristics

| Model | Speed | Quality | Audio | Best Use Case | |-------|-------|---------|-------|---------------| | LTX-2 19B FP8 ⭐ | Fast | High | Yes | Recommended - Camera controls, audio sync | | LTX-2 Distilled | Fastest | Medium | Yes | Rapid iteration, light VRAM | | WAN 2.2 14B | Slow | Very High | No | Silent films, German Expressionism, art films | | Mochi 1 | Slow | High | No | Final renders, prompt adherence | | HunyuanVideo | Medium | High | No | Production quality | | CogVideoX-5B | Medium | High | No | General purpose (fallback) |

Recommendation:

  • Use LTX-2 19B FP8 for production work with audio sync and camera controls
  • Use WAN 2.2 for silent films or when audio isn't needed (higher visual quality for same VRAM)
  • Fallback to Mochi for maximum quality or CogVideoX for compatibility

LTX-2: Recommended Video Model

LTX-2 is a 19B parameter DiT-based audio-video foundation model.

Model Variants:

| Model | Size | VRAM | Quality | Recommended For | |-------|------|------|---------|-----------------| | LTX-2 19B FP8 ⭐ | ~19GB (+3-5GB overhead) | 24GB | High | Production (A5000, 720p/1080p ≤12-15s, batch=1) | | LTX-2 19B FP4 | ~12GB (+3-5GB overhead) | 16GB | High | Faster, slightly less quality (720p ≤10s) | | LTX-2 BF16 (full) | ~43GB | 40GB+ | Highest | RunPod/A100 only | | LTX-2 Distilled 2B | ~4GB | 12GB | Medium | Rapid iteration |

FP8 Compatibility: Requires compatible CUDA/cuDNN/PyTorch builds. Follow LTX-Video docs for driver requirements.

Key Features:

  • Synchronized Audio-Video Generation: Generates coherent audio + video together
  • Camera Controls: Dolly, jib, static shots with natural camera motion
  • IC-LoRA: Style transformations (anime, sketch, etc.) with ~1GB VRAM
  • Keyframe Interpolation: Morphing between keyframes
  • Pose/Depth/Canny Controls: Precise composition control (Canny edge detection)
  • Text-to-Video and Image-to-Video: Both workflows supported

ComfyUI Templates:

| Template | Use Case | |----------|----------| | LTX2 Text-to-Video | Generate from text prompts | | LTX2 Image-to-Video | Animate a still image | | LTX2 Canny-to-Video | Edge detection guided generation | | LTX2 Distilled | Fast iteration, lower VRAM |

Installation:

# ComfyUI (recommended)
# Install "LTX-Video" from ComfyUI Manager
# Templates appear automatically

# Or standalone
pip install ltx-video

ComfyUI VRAM Optimization Flags:

# Reserve VRAM for other operations (prevents OOM during generation)
python -m main --reserve-vram 5

# Low VRAM mode - offloads to system RAM (slower but prevents OOM)
python -m main --lowvram

# Weight streaming - NVIDIA/ComfyUI collaboration for 256GB RAM systems
# Automatically offloads model weights to system RAM when VRAM exhausted

Additional Resources:

Camera Control Reference (LTX-2)

LTX-2 supports cinematic camera movements via prompt keywords:

| Movement | Prompt Keywords | Effect | |----------|-----------------|--------| | Static | static shot, locked camera | Fixed camera position | | Dolly | dolly in, dolly out, push in | Camera moves toward/away from subject | | Jib/Crane | jib up, jib down, crane shot | Vertical camera sweep | | Pan | pan left, pan right | Horizontal rotation | | Tilt | tilt up, tilt down | Vertical rotation | | Tracking | tracking shot, follow shot | Camera follows subject | | Zoom | zoom in, zoom out | Focal length change |

Example Prompts:

# Dramatic reveal
"Dolly in slowly to a detective examining evidence, noir lighting, static hold on face"

# Action sequence
"Tracking shot following runner through city streets, handheld, dynamic"

# Interview setup
"Static medium shot, subject centered, shallow depth of field, jib down to hands"

Combining Movements:

"Jib up while dolly out, revealing vast landscape, golden hour, cinematic"

WAN 2.2: Silent Film Alternative

WAN 2.2 is a 14B parameter model optimized for visual quality without audio:

Best For:

  • Silent films and art cinema
  • German Expressionism era aesthetics (Nosferatu, Metropolis, Cabinet of Dr. Caligari)
  • High visual fidelity when audio isn't needed
  • Projects where audio will be added separately

Comparison to LTX-2: | Aspect | LTX-2 19B FP8 | WAN 2.2 14B | |--------|---------------|-------------| | Audio | Synchronized | None | | Speed (10-sec HD, A5000) | ~3.5-4.5 min | ~5-6 min | | Visual Quality | High | Very High | | VRAM (24GB) | Works | Works |

When to Choose WAN 2.2:

  • Creating silent films with intertitles
  • German Expressionism homages
  • Music videos where audio is pre-recorded
  • Art films with separate sound design

Practical Notes: Seed control recommended for stable multi-shot outputs. 720p preferred on 24GB for consistent speeds.

Performance Expectations

Video generation is compute-intensive. Plan for overnight batch processing rather than real-time iteration.

Local Generation Times (RTX A5000, 24GB VRAM)

| Video Length | Resolution | Model | Time | |--------------|------------|-------|------| | 5 seconds | HD (720p) | LTX-2 19B FP8 | ~1-1.5 min | | 10 seconds | HD (720p) | LTX-2 19B FP8 | ~3.5-4.5 min | | 10 seconds | Full HD (1080p) | LTX-2 19B FP8 | ~5-6.5 min | | 15 seconds | HD (720p) | LTX-2 19B FP8 | ~6-7.5 min | | 10 seconds | HD (720p) | WAN 2.2 | ~5-6 min |

Notes:

  • Timings based on Alex Ziskind's benchmarks (RTX 5080) with +15-25% buffer for A5000
  • Audio synchronization adds ~10-15% time vs video-only runs
  • IO/storage affects throughput; prefer local NVMe, avoid network mounts

Realistic Workflow

For a 2-minute film (12 x 10-second clips):

  • Generation time: ~42-54 min (LTX-2, 720p) to ~60-72 min (WAN 2.2)
  • With retakes and iterations: 2-4 hours
  • Full production with assembly: overnight task

Recommendation: Queue video generation as overnight background tasks. Use /task-monitor to track progress.

# Example: Run generation overnight
./run.sh generate --script script.json --output-dir ./assets &
# Check progress next morning

RunPod for Large Tasks

Use /ops-runpod when local generation would cause OOM errors.

When to Use RunPod

| Scenario | Local (A5000 24GB) | RunPod Needed | |----------|-------------------|---------------| | LTX-2 19B FP8, 10-sec HD | Works | No | | LTX-2 19B FP8, 15-sec 1080p | Works (batch=1) | No | | 1080p clips >12-15 sec (FP8) | May OOM | Prefer 720p or split; RunPod optional | | LTX-2 BF16 (43GB full model) | OOM | Yes (A100 40GB+) | | Very long videos (>20 sec 1080p) | Likely OOM | Yes | | Batch processing (10+ clips) | Slow but works | Optional (faster) | | WAN 2.2 + LTX-2 parallel | High OOM risk | Prefer sequential or RunPod |

OOM Threshold Guidance (A5000 24GB):

  • LTX-2 FP8: 1080p clips over ~12-15s may OOM with audio; use 720p, shorten clips, or disable audio
  • Control nets (pose/depth/canny) and multiple LoRAs increase memory; enable selectively
  • Monitor runtime VRAM; keep ≤22GB to avoid instability

RunPod Workflow

# Provision GPU for large task
/ops-runpod provision --gpu a100-40gb --task "LTX-2 BF16 generation"

# Run generation on RunPod
/ops-runpod run --script generate.sh

# Download results and terminate
/ops-runpod download --output ./assets
/ops-runpod terminate

RunPod GPU Options:

  • BF16/full precision: A100 40-80GB, H100 (required)
  • FP8/FP4 tasks: L40S 48GB, A10G 24GB (cheaper alternatives)

Cost Consideration: RunPod charges by the hour. For overnight tasks, local generation is more cost-effective. Consider spot/preemptible instances for savings.

Troubleshooting & Fallbacks

OOM Mitigation:

  1. Reduce resolution (720p → 540p)
  2. Shorten clip length
  3. Set batch=1
  4. Switch FP mode (BF16 → FP8 → FP4)
  5. Disable audio
  6. Split long clips into segments

Stability:

  • Fix seed for reproducibility
  • Avoid parallel jobs on 24GB
  • Reduce control nets and LoRA stacks

Fallback Path: If LTX-2 fails, switch to WAN 2.2 (video-only) or CogVideoX; add audio separately in post.

Memory Integration

After each movie, stores:

  • Successful prompts
  • Working tool code
  • Technique insights
  • Concept relationships

Scope: horus-filmmaking

Workflow Patterns (from Nobody & The Computer)

Multi-Model Collaboration

Different AI models handle different creative aspects, inspired by "Bach x Coltrane x Kuti x Takemitsu":

  • Model A (Claude): Structure, composition, narrative arc
  • Model B (GPT): Improvisation, dialogue, variation
  • Model C (Grok): Energy, rhythm, pacing
  • Model D (DeepSeek): Texture, atmosphere, silence

Each model builds on previous work. Constraints: 100 words max per turn for focused output.

Critique Loop

From "A.I.thoven" sessions - "roast the piece with love":

  1. Generate initial draft
  2. Critique constructively (what works, what doesn't)
  3. Iterate based on feedback
  4. Repeat until satisfied

Iteration Speed

Use LTX-2 Distilled for rapid iterations during creative exploration. Use LTX-2 13B for production with camera controls and audio sync. Fallback to Mochi for maximum quality when camera control isn't needed.

Example Session

Horus: I want to create a mockumentary about AI learning to paint.

[RESEARCH] Searching for documentary interview techniques, AI art history...
[SCRIPT] Breaking into 5 scenes: intro, discovery, struggle, breakthrough, reflection
[BUILD TOOLS] Writing code for interview framing effect, paint brush animation...
[GENERATE] Creating 45 frames, 3 audio tracks, 2 voice segments...
[ASSEMBLE] Combining into 2-minute video with transitions...
[LEARN] Storing 8 insights in memory for future films.

Output: ai_painter_mockumentary.mp4 (2:14)

Dependencies

  • Docker (for isolated code execution)
  • FFmpeg (video processing)
  • Python 3.11+ (orchestrator)
  • GPU recommended (for Stable Diffusion, video models)