Web Video Transcribe Docx

Overview

Use the bundled scripts to extract media, download it, transcribe it offline, and render DOCX output.

Prefer the deterministic scripts before hand-rolling new code.

Use {baseDir} when constructing file paths inside this skill so the instructions stay portable across agents and marketplaces.

Require Python 3 and local filesystem access.
Require network access for first-run model download and for fetching page or media URLs.
Require a local Chrome or Edge browser only when extracting media from a web page.

Run python {baseDir}/scripts/bootstrap_env.py once in the target environment.
For a generic web page URL, run python {baseDir}/scripts/pipeline_web_to_docx.py <url> --output-dir <dir>.
For a direct media URL, run python {baseDir}/scripts/download_url.py <url> <output> and then python {baseDir}/scripts/transcribe_sensevoice.py --input <file> --output-txt <txt> --output-docx <docx>.
For a local media file, run python {baseDir}/scripts/transcribe_sensevoice.py --input <file> --output-txt <txt> --output-docx <docx>.
If the user asks for a polished reading version rather than a raw transcript, read references/cleanup-guidelines.md, produce a refined .txt, and then render it with python {baseDir}/scripts/transcript_to_docx.py.

"Transcribe the Chinese audio from this web video and export it as a Word document."
"Turn this MP4 into a transcript, then reorganize it into chaptered reading notes."
"This page needs a Referer header for media download. Extract the media stream and convert it to DOCX."

Generic page URL: Use python {baseDir}/scripts/pipeline_web_to_docx.py first. If the page is especially stubborn and it is a Toutiao page, python {baseDir}/scripts/pipeline_toutiao_to_docx.py and python {baseDir}/scripts/extract_toutiao_media.py remain available as site-specific fallbacks.
Direct media URL: Use python {baseDir}/scripts/download_url.py, then transcribe.
Local file: Transcribe directly.

If a page exposes a dedicated audio stream, prefer downloading that instead of the full video stream.
If the page only exposes a video stream, let ffmpeg decode audio during transcription.
If the page exposes HLS or DASH manifests, prefer downloading them through the bundled downloader or pipeline instead of raw HTTP GET.

Preserve meaning.
Fix obvious ASR mistakes, punctuation, paragraph breaks, headings, and chapter boundaries.
Do not invent quotes or historical claims that are not supported by the transcript.
If a passage is too noisy to restore confidently, keep it neutral instead of fabricating detail.

Only download URLs that the user supplied directly or that the extractor captured from the target page.
Do not request, store, or exfiltrate cookies, access tokens, or account credentials.
Do not attempt to bypass DRM, login walls, or geo-restriction controls.
If a page requires authenticated browser state that is not already available, say so plainly and stop at the supported boundary.

Use python {baseDir}/scripts/transcript_to_docx.py for generic TXT-to-DOCX rendering.
Use the raw transcript for auditability and the refined transcript for reading quality.

scripts/bootstrap_env.py Install or verify the Python packages used by this skill.
scripts/extract_web_media.py Open a generic web page in a real browser, capture likely media URLs plus common download headers, and emit a JSON manifest.
scripts/extract_toutiao_media.py Open a Toutiao page in a real browser, capture audio/video URLs, and emit a JSON manifest with the same schema as the generic extractor.
scripts/download_url.py Download a direct media URL to disk with a stable user agent, optional headers, and HLS/DASH handling.
scripts/transcribe_sensevoice.py Download the SenseVoice model on demand, segment media, run offline ASR, and emit TXT and optional DOCX.
scripts/transcript_to_docx.py Render timestamped transcripts or chapterized notes into a Word document.
scripts/pipeline_web_to_docx.py Run the generic end-to-end pipeline: extract, download, transcribe, and render.
scripts/pipeline_toutiao_to_docx.py Run the Toutiao-specialized end-to-end pipeline for cases where the generic extractor is not preferred.

Read references/workflow.md for dependency expectations, command examples, output layout, and troubleshooting.
Read references/cleanup-guidelines.md before polishing a noisy transcript into a chapterized reading copy.
Read references/publishing.md when preparing marketplace metadata, tags, versioning, or ClawHub publish commands.

Run python {baseDir}/scripts/bootstrap_env.py before first use in a fresh environment.
Validate the skill folder with skill-creator/scripts/quick_validate.py.
Prefer testing --help and one representative happy path after changing the scripts.
If extraction fails on a page, capture a direct media URL with browser tooling and continue with the downloader + transcriber.
Do not promise support for DRM-protected streams, authenticated cookies, or sites that only expose encrypted EME playback.