返回 Skill 列表
extension
分类: 其它无需 API Key

youtube-to-shorts-zh

># YouTube 长视频转中文短视频剪辑助手 ## 🎬 一句话介绍 将任何 YouTube 访谈、演讲或播客一键转化为多条带中文字幕的短视频爆款素材。 --- ## ✨ 它能做什么 你有一个 1 小时的精彩访谈,想剪成 10 条 30-60 秒的短视频发到抖音、B站、小红书? 这个 Skill 帮你搞定一切: **智能选段** — 自动分析英文字幕,挑选信息密度高、有情绪张力、适合独立传播的精彩片段 **全自动剪辑** — 切视频、提取字幕、翻译中文、压制硬字幕,一气呵成 **爆款包装** — 为每个短剧生成吸睛标题 + 140字以内中文简介,直接可发布 --- ## 🎯 适合谁用 | 人群 | 痛点 | 这个 Skill 怎么帮 | |------|------|------------------| | 知识博主 | 没时间看完整访谈找素材 | AI 帮你筛选金句、观点、故事 | | 短视频运营 | 翻译+剪辑太费人力 | 全自动出片,10条/小时 | | 内容团队 | 海外优质内容看不懂 | 中文字幕 + 本土化包装 | | 个人创作者 | 不会用剪辑软件 | 零门槛,选好 URL 就行 | --- ## 🔧 核心能力 ### 智能选段引擎 - 自动过滤开场寒暄、广告口播、冗长铺垫 - 优先保留:反常识观点、情绪爆发、金句、实用干货 - 每段 20 秒–3 分钟,独立成片无需上下文 ### 字幕翻译优化 - 保留原视频口语节奏,不强行“概括” - 自动修复 YouTube 滚动字幕的重叠问题,确保成片可读 - 简体中文,自然口语化表达 ### 一键包装 - 每段视频生成:**标题 + 140字内中文简介** - 标题直接压入视频首帧(居中、大字、带描边) - 简介包含:谁在说、什么节目、核心观点 --- ## 📦 输出示例 输入:1 小时访谈 URL 输出: ``` clips/ 01-ai-risk/ clip.hardsub.mp4 # 带中文字幕的短视频 metadata.txt # 标题+简介,直接复制发布 02-future-prediction/ clip.hardsub.mp4 metadata.txt ... ``` --- ## 🚀 怎么用 1. 把 YouTube 链接发给我 2. 告诉我想要多少条(默认 8-12 条) 3. 等几分钟,拿到成片 + 文案 > 全程自动化,不需要你懂剪辑、翻译或字幕。 --- ## 💡 适用场景举例 - 把 Lex Fridman 播客剪成 10 条短视频 - 把 TED 演讲拆成金句合集 - 把科技大佬采访做成抖音热门 - 把英文课程精华转成中文科普 --- ## 📌 一句话总结 **把“看完了不知道剪什么”变成“剪好了直接发”。** --- > 你只需要一个 YouTube 链接,剩下的交给这个 Skill。

person作者: user_f88477d0hubcommunity

Use $youtube-interview-shorts-zh to turn this YouTube interview URL into multiple Chinese-subbed short clips.

Skill:

Use this skill to convert one long YouTube talk, interview, or podcast into multiple short clips with Chinese hard subtitles.

Workflow

  1. Create an output folder for the source video under the current task workspace: work/<video-slug>/source/ and work/<video-slug>/clips/.
  2. Download the video with English subtitles. Try these approaches in order:
    • If yt-dlp is available in the sandbox, run it directly: yt-dlp --write-subs --sub-lang en --sub-format srt -o "source/original.%(ext)s" <url>
    • If the sandbox has no internet access (proxy errors, connection failures), create a macOS .command file in the workspace folder. A .command file is a shell script that auto-executes in Terminal.app when the user double-clicks it in Finder. Write the yt-dlp command into the file, instruct the user to double-click it, and wait for them to confirm the download is done before continuing.
  3. Inspect the downloaded files and identify:
  4. Look for the downloaded files and identify:
    • the .mp4 source video
    • the .en.srt subtitle file
    • the optional .jpg thumbnail
  5. Convert the subtitle file into a readable analysis artifact with scripts/srt_to_json.py.
  6. Analyze the English transcript before cutting any video. Produce a generous candidate list rather than a tiny final list.
  7. For a 1-hour interview, target roughly 10 to 15 candidates unless the material genuinely does not support that many.
  8. Write the candidate decisions to selected_clips.json using the schema in references/clip-schema.md.
  9. Present the candidate list to the user in a review-friendly format:
    • exact start and end time
    • duration
    • one provocative working title
    • exactly two sentences describing what that clip is about
  10. Let the user choose which clip ids to export. If the user explicitly asks you to proceed without review, choose the strongest set yourself.
  11. For each chosen clip:
    • cut the video with scripts/clip_video.py
    • extract a local English subtitle window with scripts/window_srt.py
    • translate that local SRT into Chinese (clip.zh.srt)
    • run python scripts/fix_srt.py clip.zh.srt clip.zh.srt to eliminate timestamp overlaps before burning (see Translation Rules below for why this is mandatory)
    • create a short, attention-grabbing title
    • create a description under 140 Chinese characters
    • append the title and description to a text file
    • burn the Chinese SRT into the clip with scripts/burn_subtitles.py
    • burn the title into the first second of the video, centered, font size 48, outline width 3
  12. Return the exported clip paths and the metadata text file path.

Selection Rules

Select segments that can stand on their own without heavy context. Favor material that is:

  • informative
  • opinionated
  • counterintuitive
  • motivating
  • memorable
  • emotionally sharp
  • quotable

Reject segments that are mostly setup, filler, greetings, sponsor reads, long digressions, or references that depend too much on earlier context.

Prefer clips with these properties:

  • 20 seconds to 3 minutes
  • one clear idea per clip
  • a strong opening line within the first 3 seconds
  • minimal dependence on visuals that are not visible in a crop
  • complete thought by the end of the segment

When a promising moment needs context or a clean ending, extend the boundaries by a few seconds or even longer. Do not cut off the last sentence, and do not let clips overlap heavily unless the overlap is necessary.

Do not be overly conservative. For long interviews, the default failure mode should be "too few candidates," not "too many." A one-hour interview should usually yield many options for the user to review.

How To Analyze The Transcript

Use scripts/srt_to_json.py first so the subtitle file is easier to scan and quote precisely.

Then analyze the transcript in passes:

  1. Skim for candidate stretches with memorable or information-dense lines.
  2. Re-read each candidate with neighboring subtitles to choose a clean start and end.
  3. Ensure the first line hooks and the final line resolves the thought.
  4. Ensure the clip is content-complete, not just a strong sentence fragment.
  5. Assign a short working title, exact timestamps, and a two-sentence user-facing summary.
  6. Double-check that the end timestamp lands after the speaker has finished the thought, not in the middle of a sentence.

If the source subtitles are auto-generated and noisy, infer the intended meaning conservatively. Do not invent claims that are not supported by the spoken content.

Read references/analysis-prompt.md for the default prompts. Use Stage 1 first, then Stage 2 for user review formatting.

Translation Rules

Translate the clip-local English SRT into simplified Chinese SRT.

Important — YouTube rolling SRT format: YouTube's auto-generated captions use a "rolling" style where adjacent cues heavily overlap in time (e.g., cue 1 ends at 5.7s but cue 2 already starts at 2.9s). This is by design for the web player, but when burned into video it causes 2–3 subtitle lines to appear on screen simultaneously, making them unreadable. After writing the Chinese SRT, always run scripts/fix_srt.py to eliminate overlaps before passing the file to burn_subtitles.py. This step is not optional — skipping it guarantees overlapping subtitles in the final video.

  • Do not try to manually fix timestamps during translation; use fix_srt.py afterward instead.
  • Keep subtitle meaning faithful, but rewrite into natural spoken Chinese.
  • Prefer concise Chinese lines that fit short-form video pacing.
  • Retain named entities, numbers, product names, and quoted phrases accurately.
  • If an English sentence is broken across multiple subtitle cues, you may rebalance text between neighboring cues while preserving timing order.
  • Do not collapse a cue-dense subtitle file into a tiny summary subtitle file.
  • Preserve the original subtitle cadence as much as possible. Dropping duplicate auto-caption fragments is allowed, but the Chinese SRT should usually retain most non-duplicate cues.

Packaging Rules

For each chosen clip, generate:

  • one short, sharp title suitable for short-video distribution
  • one description in Chinese of 140 characters or fewer

The title should feel clickable and opinionated. It may be provocative, contrarian, or tension-creating, but it must still be faithful to the speaker's meaning.

Keep the on-screen title around 12 Chinese characters when possible so it fits cleanly in the first-second overlay.

The description should mention:

  • who is speaking
  • what show or interview this came from
  • what topic is being discussed
  • what the key claim or takeaway is

Write the packaging copy to a text file such as analysis/clip-packaging.txt or clips/<clip-id>/metadata.txt.

File Layout

Use a layout like this unless the user asks for another structure:

work/<video-slug>/
  source/
    original.mp4
    original.en.srt
  analysis/
    transcript.json
    selected_clips.json
    candidate-review.txt
    clip-packaging.txt
  clips/
    01-<slug>/
      clip.mp4
      clip.en.srt
      clip.zh.srt
      clip.hardsub.mp4
      metadata.txt

The downloader may emit title-based filenames. If so, keep them, but normalize the per-clip folders.

Scripts

scripts/srt_to_json.py

Parse SRT into JSON records with cue index, start/end timestamps, start/end seconds, and text. Use this before transcript analysis.

scripts/window_srt.py

Extract the subtitle cues that overlap a selected clip window and shift the timestamps so the new SRT starts at 00:00:00,000.

scripts/clip_video.py

Create a re-encoded MP4 clip for an exact time range. Use this for each selected segment.

scripts/fix_srt.py

Fix YouTube-style rolling subtitle overlaps. Run this on every Chinese SRT before burning it into a clip. It trims each cue's end time so it does not overlap the next cue, adds an 80ms gap between cues, and ensures each cue shows for at least 800ms.

Usage: python scripts/fix_srt.py clip.zh.srt clip.zh.srt (in-place is safe)

scripts/burn_subtitles.py

Render a Chinese SRT into the clip with ffmpeg subtitles filter.

Pass a title when exporting the final video so the first second includes:

  • centered title
  • font size 48
  • 3px outline
  • a CJK-capable font (see Font Discovery in Execution Notes)

References

references/clip-schema.md

Schema definition for selected_clips.json. Read this before writing candidate decisions.

references/analysis-prompt.md

Default prompts for transcript analysis. Contains Stage 1 (candidate identification) and Stage 2 (user review formatting) prompts.

Execution Notes

  • The downloader depends on local cookies or --cookies-from-browser. If download fails due to expired cookies, refresh them instead of changing the skill.
  • Prefer exact source files from the downloader output over guessing names.
  • Verify that ffmpeg is available before clip extraction or burning.
  • When paths contain spaces or non-ASCII characters, pass arguments as separate shell arguments instead of building unsafe command strings.
  • If the user asks for vertical shorts, crop the frame to 9:16 but retain the original audio. The deliverable is subbed short clips from the original frame.

Font Discovery: burn_subtitles.py already searches common CJK font paths automatically. Do not hardcode any machine-specific font path. If the user has a custom font they prefer, pass it via the --font argument. Otherwise let the script find a fallback (it will try NotoSansCJK, DroidSansFallbackFull, and others before defaulting to ffmpeg's built-in font).

macOS .command file pattern: When the sandbox has no internet access and yt-dlp must run on the user's machine instead, write a .command shell script to the workspace folder (the user's Desktop folder that is mounted into the sandbox). A .command file opens in Terminal.app and auto-executes when double-clicked in Finder. This lets the user run the download with a single click. After the download completes, the files appear in the shared workspace folder and the sandbox can read them normally.

This skill covers:

  • candidate selection
  • user-facing candidate summaries
  • Chinese subtitle translation
  • short-video title and description generation

Output Contract

Return:

  • the source asset folder
  • the candidate clip list with timestamps, duration, title, and two-sentence summaries
  • the packaging text file with title and 140-character descriptions
  • the final clip.hardsub.mp4 path for each exported short

If the workflow cannot finish, report the exact blocker: missing cookies, failed download, missing English subtitles, missing ffmpeg, or unclear transcript quality.