Back to skills
extension
Category: OtherNo API key required

codex自动化口播剪辑skill

Use when editing Douyin/TikTok-style talking-head short videos, especially vertical Chinese spoken videos that need picture and lighting correction, tight speech editing, smooth transitions, HyperFrames/plugin animation overlays, subtitles, music mixing, verification, and iteration.

personAuthor: user_268e6861hubcommunity

Douyin Koubo Edit

Purpose

Turn a raw vertical talking-head video into a publishable Douyin-style short video.

Core pipeline:

  1. Picture adjustment
  2. Speech editing
  3. Plugin animation packaging

Use this skill together with video-use for video processing and HyperFrames for animation packaging when available.

Start Gate

Before editing, state the working assumptions and success target in plain language:

  • Target platform and aspect ratio, usually Douyin 9:16.
  • Target duration.
  • Whether to keep the original voice.
  • Whether to use subtitles, animation, and music.
  • Any must-cut or must-keep content.

If the target is unclear, ask before editing. Do not silently choose a creative direction.

Workflow

1. Source Inventory

Inspect the source before touching it:

  • Duration, resolution, frame rate, audio tracks.
  • Face position and safe text areas.
  • Lighting: face brightness, background brightness, overexposed lamps/windows.
  • Audio clarity and noise.
  • Whether the video is a single take or multi-take.

Output a short diagnosis:

  • What is usable.
  • What must be fixed first.
  • Where text and animation can safely go without covering the face.

2. Picture Adjustment

Always correct the picture before subtitles, animation, or music.

Priority order:

  1. Brighten face and main subject.
  2. Keep skin tone natural.
  3. Reduce excessive dark overlays.
  4. Preserve subtitle readability with local caption backgrounds instead of globally darkening the video.

For underexposed indoor talking-head footage, start conservatively:

  • Slight brightness lift.
  • Slight contrast lift.
  • Slight saturation lift.
  • Gentle gamma or midtone lift.

Verify with sampled frames before continuing. If the face still looks dark, fix this step before moving on.

3. Speech Structure

Transcribe or otherwise derive the speech structure before cutting.

Look for:

  • Repetition.
  • Long pauses.
  • False starts.
  • Sentences that do not support the main point.
  • Strong hooks, definitions, examples, and ending lines.

For spoken videos, preserve semantic completeness. Never cut only for speed if it breaks the argument.

4. Speech Editing

Build the cut around meaning:

  • Keep the hook.
  • Keep the core claim.
  • Keep one or two concrete explanations.
  • Keep a clear ending.
  • Remove repeated phrases, filler, and long pauses.

If requested, apply speed changes after semantic cuts. Common Douyin talking-head speed range:

  • 1.05x-1.15x: natural.
  • 1.2x: brisk but still understandable.
  • Above 1.25x: use only if the speaker remains clear.

Use smooth transitions for visible jumps:

  • Prefer short audio crossfades.
  • Prefer subtle visual crossfades for large pose changes.
  • Avoid flashy transitions unless the user asks for them.

5. Animation Design

Use HyperFrames/plugin animation after the corrected edit base is ready.

Before writing animation HTML, define a minimal visual identity:

  • Mood.
  • Light/dark canvas treatment.
  • 3-5 colors with roles.
  • Typography.
  • What not to do.

For talking-head videos:

  • Keep the face clear.
  • Put title and line animations in upper safe areas.
  • Put keyword cards and captions in lower safe areas.
  • Use animation to clarify the point, not to decorate every sentence.

Typical animation layers:

  • Opening hook title.
  • Keyword cards.
  • Progress or accent line.
  • Simple metaphor animation, such as bridge, path, connection, map, or resource flow.
  • Ending emphasis.

6. Subtitles

Choose subtitle mode based on the content:

  • Use concise summary captions when speech is clear and animation should carry key points.
  • Use near-verbatim subtitles when speech is fast, noisy, or needs accessibility.

Rules:

  • Do not cover the face.
  • Keep lines short.
  • Use a local semi-transparent background if needed.
  • Check mobile readability.
  • Apply subtitles after overlays if using ffmpeg composition.

7. Music Mix

Music should support speech, not fight it.

Start conservative, then adjust based on feedback:

  • Low bed: subtle, mostly emotional support.
  • Medium bed: audible energy, still below speech.
  • Avoid vocals unless requested.

After changing music volume, verify:

  • Speech remains clear.
  • No clipping or harsh peaks.
  • Intro and ending do not feel abruptly cut.

8. Export And Self-Check

Before delivering:

  • Confirm duration, resolution, frame rate, audio track.
  • Sample frames from beginning, middle, and end.
  • Check face brightness.
  • Check text does not cover the face.
  • Check animation timing.
  • Check music does not overpower speech.
  • Preserve prior versions with clear filenames when making revisions.

Standard Decision Notes

When reporting the plan or result, mention only decisions that affect the output:

  • What was cut and why.
  • How the picture was corrected.
  • What animation style was used.
  • Music level change.
  • Final duration and output path.

Keep the response short and practical.

Hard Rules

  • Do not start animation before correcting lighting.
  • Do not hide a dark video under heavier overlays.
  • Do not place text over the speaker's face.
  • Do not add features the user did not ask for.
  • Do not overwrite the only final file when making a revision; use a clear versioned name.
  • Verify after every major stage: picture base, cut base, animation render, final export.