create-movie

Orchestrated movie creation for Horus persona. Creates mockumentaries, short films, music videos, and educational content through a phased workflow.

Philosophy

"AI isn't the artist, it's the amplifier" - Nobody & The Computer

Horus uses AI to turn imagination into audiovisual reality. He doesn't just use pre-built tools - he writes code to create his own tools.

Phases

HARDWARE CHECK → RESEARCH → SCRIPT → BUILD TOOLS → GENERATE → ASSEMBLE → LEARN

Phase 0: Hardware Detection (Automatic)

Before any generation, the orchestrator automatically detects hardware via /ops-workstation:

# Automatic hardware check on startup
./run.sh create "prompt"
# → Calls /ops-workstation gpu to detect VRAM
# → Calls /ops-workstation memory to detect RAM
# → Auto-selects optimal model variant

Auto-Selection Logic:

| Detected VRAM | Model Selected | Settings | |---------------|----------------|----------| | ≥24GB | LTX-2 19B FP8 | 720p/1080p, audio on, batch=1 | | 16-23GB | LTX-2 19B FP4 | 720p only, audio on, batch=1 | | 12-15GB | LTX-2 Distilled 2B | 720p, audio optional, batch=1 | | <12GB | RunPod suggested | Prompts to use /ops-runpod |

RAM-Based Optimizations:

| Detected RAM | Optimization | |--------------|--------------| | ≥128GB | Weight streaming enabled (offload to RAM) | | 64-127GB | Partial offloading | | <64GB | No offloading, strict VRAM limits |

Override Auto-Detection:

# Force specific model variant
./run.sh create "prompt" --model ltx2-fp4
./run.sh create "prompt" --model ltx2-distilled
./run.sh create "prompt" --runpod  # Force cloud generation

Phase 1: Research (Library-First)

Check Horus's Library First:
- horus-filmmaking scope (past techniques, learnings)
- horus_lore scope (YouTube transcripts, film analysis)
- Ingested movies with emotion tags
- Episodic archive (past filmmaking sessions)
Search for New Resources:
- /ingest-movie search for films to watch
- /ingest-youtube search for tutorials
Deep Web Research:
- /dogpile for comprehensive multi-source search
- /surf for specific tutorials/references

Phase 2: Script (via /create-story)

Integrates with /create-story skill for screenplay generation
Uses Chutes models (chimera, qwen, deepseek-r1) for creative writing
Parses INT./EXT. headings, dialogue, action, audio cues
Outputs structured scene breakdown with visual descriptions

Format Options:

screenplay (default) - Standard INT./EXT. scene headings
mockumentary - Interview segments with talking heads + B-roll
reconstruction - Historical recreation with narrator framing

Phase 3: Build Tools

Write code in Docker-isolated sandbox
Create custom tools for specific effects
Iterate on approaches

Phase 4: Generate

Use ComfyUI, Stable Diffusion for images
Use auto-selected video model based on hardware (LTX-2 FP8/FP4/Distilled)
Use Whisper, IndexTTS2 for audio
If hardware insufficient, automatically suggests /ops-runpod

Phase 5: Assemble

Combine assets with FFmpeg
Output MP4 video or interactive HTML

Phase 6: Learn

Store successful techniques in /memory
Remember what worked for future movies

Quick Start

cd .pi/skills/create-movie

# Full orchestrated workflow (recommended)
./run.sh create "A 30-second film about discovering colors"

# With options
./run.sh create "film noir detective" \
    --duration 60 \
    --style "high contrast, shadows, venetian blinds" \
    --format mp4 \
    --work-dir ./noir_project

# Individual phases (for manual control)
./run.sh research "film noir lighting techniques"
./run.sh script --from-research research.json --duration 30 --use-create-story
./run.sh build-tools --script script.json
./run.sh generate --tools ./tools --script script.json --style "cinematic"
./run.sh assemble --assets ./assets --output movie.mp4 --format mp4
./run.sh learn --project-dir ./movie_project

CLI Commands

create

Full orchestrated workflow through all phases.

./run.sh create PROMPT [OPTIONS]
  --output, -o       Output file (default: movie.mp4)
  --work-dir, -w     Working directory (default: ./movie_project)
  --duration, -d     Target duration in seconds (default: 30)
  --style, -s        Visual style (e.g., 'cinematic', 'film noir')
  --format, -f       Output format: mp4 or html (default: mp4)
  --store-learnings  Store learnings in memory (default: true)
  --skip-research    Skip research phase if research.json exists

research

Library-first research: checks Horus's memory and ingested content before external search.

./run.sh research TOPIC [OPTIONS]
  --output, -o       Output file (default: research.json)
  --skip-external    Only search library, skip external sources

script

Generate screenplay with scene breakdown. Integrates with /create-story.

./run.sh script [OPTIONS]
  --from-research, -r  Research JSON file (required)
  --prompt, -p         Override topic from research
  --duration, -d       Target duration in seconds
  --use-create-story   Use /create-story skill for screenplay
  --model, -m          LLM model (default: chimera)
  --output, -o         Output file (default: script.json)

build-tools

Generate custom tools in Docker sandbox.

./run.sh build-tools [OPTIONS]
  --script, -s       Script JSON file (required)
  --output-dir, -o   Output directory (default: ./tools)
  --skip-docker      Use host instead of Docker sandbox

generate

Create images, video, and audio assets.

./run.sh generate [OPTIONS]
  --tools, -t        Tools directory (default: ./tools)
  --script, -s       Script JSON file (required)
  --output-dir, -o   Assets output directory (default: ./assets)
  --style            Visual style to apply

assemble

Combine assets into final output.

./run.sh assemble [OPTIONS]
  --assets, -a       Assets directory (required)
  --output, -o       Output file/directory (required)
  --format, -f       Output format: mp4 or html (default: mp4)
  --fps              Frames per second for MP4 (default: 24)

learn

Store filmmaking insights in memory after a project.

./run.sh learn [OPTIONS]
  --project-dir, -p  Project directory (required)
  --scope            Memory scope (default: horus-filmmaking)
  --dry-run          Show learnings without storing

study

Pre-phase: Learn filmmaking topics BEFORE creating movies. Targeted /dogpile with internal (memory) + external (web) search, then stores via /memory learn.

./run.sh study TOPIC [OPTIONS]
  --scope            Memory scope (default: horus-filmmaking)
  --deep/--quick     Deep research (dogpile) vs quick (YouTube search)
  --list-topics      Show suggested filmmaking topics

# Examples:
./run.sh study "cinematography lighting techniques" --deep
./run.sh study "camera framing composition" --deep
./run.sh study --list-topics

study-all

Comprehensive learning session - studies all core filmmaking topics.

./run.sh study-all [OPTIONS]
  --scope            Memory scope (default: horus-filmmaking)

Output Formats

MP4 Video

Standard video file, playable anywhere.

Interactive HTML

Web-based experience with:

Frame-by-frame navigation
Audio controls
Scene metadata viewer

Available Skills

Horus has access to all skills in .pi/skills/:

| Skill | Purpose in Movie Creation | |-------|---------------------------| | /dogpile | Deep research on techniques, references | | /surf | Visit websites, tutorials, references | | /memory | Recall prior techniques, store learnings | | /create-image | Generate images for scenes | | /tts-train | Horus's voice for narration | | /ingest-movie | Ingest reference movies for style analysis | | /create-paper | Write stories, scripts, creative content | | /episodic-archiver | Archive movie creation sessions | | /anvil | Debug and harden custom tools | | /ingest-book | Search books for story inspiration |

Free/Open-Source Tools

| Purpose | Tool | |---------|------| | Image Generation | Stable Diffusion (ComfyUI) | | Video Generation | LTX-2 (recommended), Mochi 1, CogVideoX (fallbacks) | | Video Processing | FFmpeg | | Speech-to-Text | faster-whisper | | Text-to-Speech | IndexTTS2 |

Video Model Selection Guide

Choose video model based on your GPU VRAM and use case. VRAM figures include 3-5GB headroom for pipeline overhead (ComfyUI/loader/audio), batch=1, FP8/FP4 where noted.

| VRAM | Recommended Models | Best For | |------|-------------------|----------| | 12GB (RTX 3060/4070) | LTX-2 Distilled (2B), CogVideoX-2B | Quick iterations, pre-viz | | 16GB (RTX 4080/A4000) | LTX-2 19B FP4 (720p, ≤10s), WAN 2.2, SVD | Medium quality production | | 24GB (RTX 4090/A5000) | LTX-2 19B FP8 (recommended), WAN 2.2, Mochi | High quality production | | 40GB+ (A100/H100) | LTX-2 BF16 (43GB), Full Mochi, Open-Sora 2.0 | Maximum quality |

Safe Defaults (RTX A5000 24GB)

Model: LTX-2 19B FP8
Resolution: 720p
Clip length: 10s
Batch size: 1
Seed: fixed
Audio: on

If runtime VRAM >22GB or instability occurs: lower resolution to 540p, disable audio, or shorten clips. Avoid parallel jobs on 24GB.

Model Characteristics

| Model | Speed | Quality | Audio | Best Use Case | |-------|-------|---------|-------|---------------| | LTX-2 19B FP8 ⭐ | Fast | High | Yes | Recommended - Camera controls, audio sync | | LTX-2 Distilled | Fastest | Medium | Yes | Rapid iteration, light VRAM | | WAN 2.2 14B | Slow | Very High | No | Silent films, German Expressionism, art films | | Mochi 1 | Slow | High | No | Final renders, prompt adherence | | HunyuanVideo | Medium | High | No | Production quality | | CogVideoX-5B | Medium | High | No | General purpose (fallback) |

Recommendation:

Use LTX-2 19B FP8 for production work with audio sync and camera controls
Use WAN 2.2 for silent films or when audio isn't needed (higher visual quality for same VRAM)
Fallback to Mochi for maximum quality or CogVideoX for compatibility

LTX-2: Recommended Video Model

LTX-2 is a 19B parameter DiT-based audio-video foundation model.

Model Variants:

| Model | Size | VRAM | Quality | Recommended For | |-------|------|------|---------|-----------------| | LTX-2 19B FP8 ⭐ | ~19GB (+3-5GB overhead) | 24GB | High | Production (A5000, 720p/1080p ≤12-15s, batch=1) | | LTX-2 19B FP4 | ~12GB (+3-5GB overhead) | 16GB | High | Faster, slightly less quality (720p ≤10s) | | LTX-2 BF16 (full) | ~43GB | 40GB+ | Highest | RunPod/A100 only | | LTX-2 Distilled 2B | ~4GB | 12GB | Medium | Rapid iteration |

FP8 Compatibility: Requires compatible CUDA/cuDNN/PyTorch builds. Follow LTX-Video docs for driver requirements.

Key Features:

Synchronized Audio-Video Generation: Generates coherent audio + video together
Camera Controls: Dolly, jib, static shots with natural camera motion
IC-LoRA: Style transformations (anime, sketch, etc.) with ~1GB VRAM
Keyframe Interpolation: Morphing between keyframes
Pose/Depth/Canny Controls: Precise composition control (Canny edge detection)
Text-to-Video and Image-to-Video: Both workflows supported

ComfyUI Templates:

| Template | Use Case | |----------|----------| | LTX2 Text-to-Video | Generate from text prompts | | LTX2 Image-to-Video | Animate a still image | | LTX2 Canny-to-Video | Edge detection guided generation | | LTX2 Distilled | Fast iteration, lower VRAM |

Installation:

# ComfyUI (recommended)
# Install "LTX-Video" from ComfyUI Manager
# Templates appear automatically

# Or standalone
pip install ltx-video

ComfyUI VRAM Optimization Flags:

# Reserve VRAM for other operations (prevents OOM during generation)
python -m main --reserve-vram 5

# Low VRAM mode - offloads to system RAM (slower but prevents OOM)
python -m main --lowvram

# Weight streaming - NVIDIA/ComfyUI collaboration for 256GB RAM systems
# Automatically offloads model weights to system RAM when VRAM exhausted

Additional Resources:

ComfyUI_LTX-2_VRAM_Memory_Management - Nodes for long videos on consumer GPUs

Camera Control Reference (LTX-2)

LTX-2 supports cinematic camera movements via prompt keywords:

| Movement | Prompt Keywords | Effect | |----------|-----------------|--------| | Static | static shot, locked camera | Fixed camera position | | Dolly | dolly in, dolly out, push in | Camera moves toward/away from subject | | Jib/Crane | jib up, jib down, crane shot | Vertical camera sweep | | Pan | pan left, pan right | Horizontal rotation | | Tilt | tilt up, tilt down | Vertical rotation | | Tracking | tracking shot, follow shot | Camera follows subject | | Zoom | zoom in, zoom out | Focal length change |

Example Prompts:

# Dramatic reveal
"Dolly in slowly to a detective examining evidence, noir lighting, static hold on face"

# Action sequence
"Tracking shot following runner through city streets, handheld, dynamic"

# Interview setup
"Static medium shot, subject centered, shallow depth of field, jib down to hands"

Combining Movements:

"Jib up while dolly out, revealing vast landscape, golden hour, cinematic"

WAN 2.2: Silent Film Alternative

WAN 2.2 is a 14B parameter model optimized for visual quality without audio:

Best For:

Silent films and art cinema
German Expressionism era aesthetics (Nosferatu, Metropolis, Cabinet of Dr. Caligari)
High visual fidelity when audio isn't needed
Projects where audio will be added separately

Comparison to LTX-2: | Aspect | LTX-2 19B FP8 | WAN 2.2 14B | |--------|---------------|-------------| | Audio | Synchronized | None | | Speed (10-sec HD, A5000) | ~3.5-4.5 min | ~5-6 min | | Visual Quality | High | Very High | | VRAM (24GB) | Works | Works |

When to Choose WAN 2.2:

Creating silent films with intertitles
German Expressionism homages
Music videos where audio is pre-recorded
Art films with separate sound design

Practical Notes: Seed control recommended for stable multi-shot outputs. 720p preferred on 24GB for consistent speeds.

Performance Expectations

Video generation is compute-intensive. Plan for overnight batch processing rather than real-time iteration.

Local Generation Times (RTX A5000, 24GB VRAM)

| Video Length | Resolution | Model | Time | |--------------|------------|-------|------| | 5 seconds | HD (720p) | LTX-2 19B FP8 | ~1-1.5 min | | 10 seconds | HD (720p) | LTX-2 19B FP8 | ~3.5-4.5 min | | 10 seconds | Full HD (1080p) | LTX-2 19B FP8 | ~5-6.5 min | | 15 seconds | HD (720p) | LTX-2 19B FP8 | ~6-7.5 min | | 10 seconds | HD (720p) | WAN 2.2 | ~5-6 min |

Notes:

Timings based on Alex Ziskind's benchmarks (RTX 5080) with +15-25% buffer for A5000
Audio synchronization adds ~10-15% time vs video-only runs
IO/storage affects throughput; prefer local NVMe, avoid network mounts

Realistic Workflow

For a 2-minute film (12 x 10-second clips):

Generation time: ~42-54 min (LTX-2, 720p) to ~60-72 min (WAN 2.2)
With retakes and iterations: 2-4 hours
Full production with assembly: overnight task

Recommendation: Queue video generation as overnight background tasks. Use /task-monitor to track progress.

# Example: Run generation overnight
./run.sh generate --script script.json --output-dir ./assets &
# Check progress next morning

RunPod for Large Tasks

Use /ops-runpod when local generation would cause OOM errors.

When to Use RunPod

| Scenario | Local (A5000 24GB) | RunPod Needed | |----------|-------------------|---------------| | LTX-2 19B FP8, 10-sec HD | Works | No | | LTX-2 19B FP8, 15-sec 1080p | Works (batch=1) | No | | 1080p clips >12-15 sec (FP8) | May OOM | Prefer 720p or split; RunPod optional | | LTX-2 BF16 (43GB full model) | OOM | Yes (A100 40GB+) | | Very long videos (>20 sec 1080p) | Likely OOM | Yes | | Batch processing (10+ clips) | Slow but works | Optional (faster) | | WAN 2.2 + LTX-2 parallel | High OOM risk | Prefer sequential or RunPod |

OOM Threshold Guidance (A5000 24GB):

LTX-2 FP8: 1080p clips over ~12-15s may OOM with audio; use 720p, shorten clips, or disable audio
Control nets (pose/depth/canny) and multiple LoRAs increase memory; enable selectively
Monitor runtime VRAM; keep ≤22GB to avoid instability

RunPod Workflow

# Provision GPU for large task
/ops-runpod provision --gpu a100-40gb --task "LTX-2 BF16 generation"

# Run generation on RunPod
/ops-runpod run --script generate.sh

# Download results and terminate
/ops-runpod download --output ./assets
/ops-runpod terminate

RunPod GPU Options:

BF16/full precision: A100 40-80GB, H100 (required)
FP8/FP4 tasks: L40S 48GB, A10G 24GB (cheaper alternatives)

Cost Consideration: RunPod charges by the hour. For overnight tasks, local generation is more cost-effective. Consider spot/preemptible instances for savings.

Troubleshooting & Fallbacks

OOM Mitigation:

Reduce resolution (720p → 540p)
Shorten clip length
Set batch=1
Switch FP mode (BF16 → FP8 → FP4)
Disable audio
Split long clips into segments

Stability:

Fix seed for reproducibility
Avoid parallel jobs on 24GB
Reduce control nets and LoRA stacks

Fallback Path: If LTX-2 fails, switch to WAN 2.2 (video-only) or CogVideoX; add audio separately in post.

Memory Integration

After each movie, stores:

Successful prompts
Working tool code
Technique insights
Concept relationships

Scope: horus-filmmaking

Workflow Patterns (from Nobody & The Computer)

Multi-Model Collaboration

Different AI models handle different creative aspects, inspired by "Bach x Coltrane x Kuti x Takemitsu":

Model A (Claude): Structure, composition, narrative arc
Model B (GPT): Improvisation, dialogue, variation
Model C (Grok): Energy, rhythm, pacing
Model D (DeepSeek): Texture, atmosphere, silence

Each model builds on previous work. Constraints: 100 words max per turn for focused output.

Critique Loop

From "A.I.thoven" sessions - "roast the piece with love":

Generate initial draft
Critique constructively (what works, what doesn't)
Iterate based on feedback
Repeat until satisfied

Iteration Speed

Use LTX-2 Distilled for rapid iterations during creative exploration. Use LTX-2 13B for production with camera controls and audio sync. Fallback to Mochi for maximum quality when camera control isn't needed.

Example Session

Horus: I want to create a mockumentary about AI learning to paint.

[RESEARCH] Searching for documentary interview techniques, AI art history...
[SCRIPT] Breaking into 5 scenes: intro, discovery, struggle, breakthrough, reflection
[BUILD TOOLS] Writing code for interview framing effect, paint brush animation...
[GENERATE] Creating 45 frames, 3 audio tracks, 2 voice segments...
[ASSEMBLE] Combining into 2-minute video with transitions...
[LEARN] Storing 8 insights in memory for future films.

Output: ai_painter_mockumentary.mp4 (2:14)

Dependencies

Docker (for isolated code execution)
FFmpeg (video processing)
Python 3.11+ (orchestrator)
GPU recommended (for Stable Diffusion, video models)