WeChat Article Extract

Use this skill to extract a public WeChat Official Account article into portable Markdown or JSON. It is intentionally local and generic: it does not require the user's Feishu, knowledge-base profile, database, or API keys.

Quick Start

Run the bundled script from the skill directory:

python3 scripts/extract_wechat_article.py "https://mp.weixin.qq.com/s/..." --format markdown --output article.md
python3 scripts/extract_wechat_article.py "https://mp.weixin.qq.com/s/..." --format json --output article.json

For an HTML file already saved from a browser:

python3 scripts/extract_wechat_article.py --html-file article.html --source-url "https://mp.weixin.qq.com/s/..." --format markdown

Workflow

Confirm the input is a public https://mp.weixin.qq.com/s/... article URL or a saved HTML file. Private drafts, logged-in backend pages, and non-WeChat URLs are out of scope.
Extract with scripts/extract_wechat_article.py.
If network fetching fails because WeChat blocks the request, ask the user to save the article HTML from a browser and rerun with --html-file.
Use Markdown for human-readable archives and JSON for downstream import pipelines.
Keep copyright boundaries: summarize or transform extracted content when sharing externally; do not republish full articles unless the user has rights to do so.

Outputs

Markdown output contains:

article title, account name, publish time, source URL, and image count
full text with blank-line paragraph separation
tables converted to Markdown tables when possible
inline image placeholders like [[WECHAT_IMAGE_1]]
image URL list at the end

JSON output contains:

articleId
title
author
publishTime
sourceUrl
content
contentWithImageMarkers
imageEntries
imageUrls
imageCount
coverImageUrl

Notes

The script uses only the Python standard library.
It preserves image positions with markers but does not download images by default.
Add --download-images <dir> when the user explicitly wants local image files.
WeChat article pages change over time; if live extraction fails, saved HTML is the most reliable fallback.