金大哥 - youtube-to-shorts-zh Skill 详情

Use $youtube-interview-shorts-zh to turn this YouTube interview URL into multiple Chinese-subbed short clips.

Skill:

Use this skill to convert one long YouTube talk, interview, or podcast into multiple short clips with Chinese hard subtitles.

Workflow

Create an output folder for the source video under the current task workspace: work/<video-slug>/source/ and work/<video-slug>/clips/.
Download the video with English subtitles. Try these approaches in order:
- If yt-dlp is available in the sandbox, run it directly: yt-dlp --write-subs --sub-lang en --sub-format srt -o "source/original.%(ext)s" <url>
- If the sandbox has no internet access (proxy errors, connection failures), create a macOS .command file in the workspace folder. A .command file is a shell script that auto-executes in Terminal.app when the user double-clicks it in Finder. Write the yt-dlp command into the file, instruct the user to double-click it, and wait for them to confirm the download is done before continuing.
Inspect the downloaded files and identify:
Look for the downloaded files and identify:
- the .mp4 source video
- the .en.srt subtitle file
- the optional .jpg thumbnail
Convert the subtitle file into a readable analysis artifact with scripts/srt_to_json.py.
Analyze the English transcript before cutting any video. Produce a generous candidate list rather than a tiny final list.
For a 1-hour interview, target roughly 10 to 15 candidates unless the material genuinely does not support that many.
Write the candidate decisions to selected_clips.json using the schema in references/clip-schema.md.
Present the candidate list to the user in a review-friendly format:
- exact start and end time
- duration
- one provocative working title
- exactly two sentences describing what that clip is about
Let the user choose which clip ids to export. If the user explicitly asks you to proceed without review, choose the strongest set yourself.
For each chosen clip:
- cut the video with scripts/clip_video.py
- extract a local English subtitle window with scripts/window_srt.py
- translate that local SRT into Chinese (clip.zh.srt)
- run python scripts/fix_srt.py clip.zh.srt clip.zh.srt to eliminate timestamp overlaps before burning (see Translation Rules below for why this is mandatory)
- create a short, attention-grabbing title
- create a description under 140 Chinese characters
- append the title and description to a text file
- burn the Chinese SRT into the clip with scripts/burn_subtitles.py
- burn the title into the first second of the video, centered, font size 48, outline width 3
Return the exported clip paths and the metadata text file path.

Selection Rules

Select segments that can stand on their own without heavy context. Favor material that is:

informative
opinionated
counterintuitive
motivating
memorable
emotionally sharp
quotable

Reject segments that are mostly setup, filler, greetings, sponsor reads, long digressions, or references that depend too much on earlier context.

Prefer clips with these properties:

20 seconds to 3 minutes
one clear idea per clip
a strong opening line within the first 3 seconds
minimal dependence on visuals that are not visible in a crop
complete thought by the end of the segment

When a promising moment needs context or a clean ending, extend the boundaries by a few seconds or even longer. Do not cut off the last sentence, and do not let clips overlap heavily unless the overlap is necessary.

Do not be overly conservative. For long interviews, the default failure mode should be "too few candidates," not "too many." A one-hour interview should usually yield many options for the user to review.

How To Analyze The Transcript

Use scripts/srt_to_json.py first so the subtitle file is easier to scan and quote precisely.

Then analyze the transcript in passes:

Skim for candidate stretches with memorable or information-dense lines.
Re-read each candidate with neighboring subtitles to choose a clean start and end.
Ensure the first line hooks and the final line resolves the thought.
Ensure the clip is content-complete, not just a strong sentence fragment.
Assign a short working title, exact timestamps, and a two-sentence user-facing summary.
Double-check that the end timestamp lands after the speaker has finished the thought, not in the middle of a sentence.

If the source subtitles are auto-generated and noisy, infer the intended meaning conservatively. Do not invent claims that are not supported by the spoken content.

Read references/analysis-prompt.md for the default prompts. Use Stage 1 first, then Stage 2 for user review formatting.

Translation Rules

Translate the clip-local English SRT into simplified Chinese SRT.

Important — YouTube rolling SRT format: YouTube's auto-generated captions use a "rolling" style where adjacent cues heavily overlap in time (e.g., cue 1 ends at 5.7s but cue 2 already starts at 2.9s). This is by design for the web player, but when burned into video it causes 2–3 subtitle lines to appear on screen simultaneously, making them unreadable. After writing the Chinese SRT, always run scripts/fix_srt.py to eliminate overlaps before passing the file to burn_subtitles.py. This step is not optional — skipping it guarantees overlapping subtitles in the final video.

Do not try to manually fix timestamps during translation; use fix_srt.py afterward instead.
Keep subtitle meaning faithful, but rewrite into natural spoken Chinese.
Prefer concise Chinese lines that fit short-form video pacing.
Retain named entities, numbers, product names, and quoted phrases accurately.
If an English sentence is broken across multiple subtitle cues, you may rebalance text between neighboring cues while preserving timing order.
Do not collapse a cue-dense subtitle file into a tiny summary subtitle file.
Preserve the original subtitle cadence as much as possible. Dropping duplicate auto-caption fragments is allowed, but the Chinese SRT should usually retain most non-duplicate cues.

Packaging Rules

For each chosen clip, generate:

one short, sharp title suitable for short-video distribution
one description in Chinese of 140 characters or fewer

The title should feel clickable and opinionated. It may be provocative, contrarian, or tension-creating, but it must still be faithful to the speaker's meaning.

Keep the on-screen title around 12 Chinese characters when possible so it fits cleanly in the first-second overlay.

The description should mention:

who is speaking
what show or interview this came from
what topic is being discussed
what the key claim or takeaway is

Write the packaging copy to a text file such as analysis/clip-packaging.txt or clips/<clip-id>/metadata.txt.

File Layout

Use a layout like this unless the user asks for another structure:

work/<video-slug>/
  source/
    original.mp4
    original.en.srt
  analysis/
    transcript.json
    selected_clips.json
    candidate-review.txt
    clip-packaging.txt
  clips/
    01-<slug>/
      clip.mp4
      clip.en.srt
      clip.zh.srt
      clip.hardsub.mp4
      metadata.txt

The downloader may emit title-based filenames. If so, keep them, but normalize the per-clip folders.

Scripts

scripts/srt_to_json.py

Parse SRT into JSON records with cue index, start/end timestamps, start/end seconds, and text. Use this before transcript analysis.

scripts/window_srt.py

Extract the subtitle cues that overlap a selected clip window and shift the timestamps so the new SRT starts at 00:00:00,000.

scripts/clip_video.py

Create a re-encoded MP4 clip for an exact time range. Use this for each selected segment.

scripts/fix_srt.py

Fix YouTube-style rolling subtitle overlaps. Run this on every Chinese SRT before burning it into a clip. It trims each cue's end time so it does not overlap the next cue, adds an 80ms gap between cues, and ensures each cue shows for at least 800ms.

Usage: python scripts/fix_srt.py clip.zh.srt clip.zh.srt (in-place is safe)

scripts/burn_subtitles.py

Render a Chinese SRT into the clip with ffmpeg subtitles filter.

Pass a title when exporting the final video so the first second includes:

centered title
font size 48
3px outline
a CJK-capable font (see Font Discovery in Execution Notes)

References

references/clip-schema.md

Schema definition for selected_clips.json. Read this before writing candidate decisions.

references/analysis-prompt.md

Default prompts for transcript analysis. Contains Stage 1 (candidate identification) and Stage 2 (user review formatting) prompts.

Execution Notes

The downloader depends on local cookies or --cookies-from-browser. If download fails due to expired cookies, refresh them instead of changing the skill.
Prefer exact source files from the downloader output over guessing names.
Verify that ffmpeg is available before clip extraction or burning.
When paths contain spaces or non-ASCII characters, pass arguments as separate shell arguments instead of building unsafe command strings.
If the user asks for vertical shorts, crop the frame to 9:16 but retain the original audio. The deliverable is subbed short clips from the original frame.

Font Discovery: burn_subtitles.py already searches common CJK font paths automatically. Do not hardcode any machine-specific font path. If the user has a custom font they prefer, pass it via the --font argument. Otherwise let the script find a fallback (it will try NotoSansCJK, DroidSansFallbackFull, and others before defaulting to ffmpeg's built-in font).

macOS .command file pattern: When the sandbox has no internet access and yt-dlp must run on the user's machine instead, write a .command shell script to the workspace folder (the user's Desktop folder that is mounted into the sandbox). A .command file opens in Terminal.app and auto-executes when double-clicked in Finder. This lets the user run the download with a single click. After the download completes, the files appear in the shared workspace folder and the sandbox can read them normally.

This skill covers:

candidate selection
user-facing candidate summaries
Chinese subtitle translation
short-video title and description generation

Output Contract

Return:

the source asset folder
the candidate clip list with timestamps, duration, title, and two-sentence summaries
the packaging text file with title and 140-character descriptions
the final clip.hardsub.mp4 path for each exported short

If the workflow cannot finish, report the exact blocker: missing cookies, failed download, missing English subtitles, missing ffmpeg, or unclear transcript quality.