返回 Skill 列表
extension
分类: 数据与分析无需 API Key

YouTube Transcript (yt-dlp)

使用 yt-dlp 从 YouTube 视频的现有字幕(手动或自动生成)中提取文字,可选带时间戳并本地 SQLite 缓存,适用于...

person作者: mpbshhxhubclawhub

YouTube Transcript (Captions-Only)

Extracts transcripts from existing YouTube captions using yt-dlp. Prefers manual subtitles; falls back to auto-generated captions.

Prerequisites

  • Python 3.7+
  • yt-dlp installed and on PATH (pip install yt-dlp or system package)

How to Run

Script path: {baseDir}/scripts/yt_transcript.py

# Basic usage
python {baseDir}/scripts/yt_transcript.py <youtube_url_or_id>

# Specify language
python {baseDir}/scripts/yt_transcript.py <url> --lang en

# Plain text output
python {baseDir}/scripts/yt_transcript.py <url> --text

# Text without timestamps
python {baseDir}/scripts/yt_transcript.py <url> --text --no-ts

# Custom cache path
python {baseDir}/scripts/yt_transcript.py <url> --cache /path/to/cache.sqlite

Output Formats

JSON mode (default)

Returns a JSON object:

{
  "video_id": "dQw4w9WgXcQ",
  "lang": "en",
  "source": "manual",
  "segments": [
    { "start": 0.0, "duration": 4.2, "text": "We're no strangers to love" }
  ]
}

Text mode (--text)

Newline-separated transcript lines. Use --no-ts to omit timestamps.

Caching

Results are cached in a local SQLite database: {baseDir}/cache/transcripts.sqlite

Subsequent calls for the same video/lang/format are served from cache instantly.

To use a custom cache location: --cache /path/to/transcripts.sqlite

Cookies (optional)

For age-restricted or members-only videos, provide a Netscape-format cookies.txt:

export YT_TRANSCRIPT_COOKIES=/path/to/cookies.txt
python {baseDir}/scripts/yt_transcript.py <url>
# or
python {baseDir}/scripts/yt_transcript.py <url> --cookies /path/to/cookies.txt

Cookies must be stored under ~/.config/yt-transcript/ for security.

Troubleshooting

  • No captions available: Video has no manual or auto-generated captions
  • yt-dlp not found: Install with pip install yt-dlp or brew install yt-dlp
  • Age-restricted video: Provide cookies from a logged-in YouTube session
  • Rate limited: Wait and retry; reduce request frequency