返回 Skill 列表
extension
分类: 数据与分析无需 API Key

WeChat Article Extract

从 mp.weixin.qq.com 链接或已保存的 HTML 中提取微信公众号的公开文章,生成干净的 Markdown 或结构化的 JSON,包含标题、公众号名称等。

person作者: joannaxinghubclawhub

WeChat Article Extract

Use this skill to extract a public WeChat Official Account article into portable Markdown or JSON. It is intentionally local and generic: it does not require the user's Feishu, knowledge-base profile, database, or API keys.

Quick Start

Run the bundled script from the skill directory:

python3 scripts/extract_wechat_article.py "https://mp.weixin.qq.com/s/..." --format markdown --output article.md
python3 scripts/extract_wechat_article.py "https://mp.weixin.qq.com/s/..." --format json --output article.json

For an HTML file already saved from a browser:

python3 scripts/extract_wechat_article.py --html-file article.html --source-url "https://mp.weixin.qq.com/s/..." --format markdown

Workflow

  1. Confirm the input is a public https://mp.weixin.qq.com/s/... article URL or a saved HTML file. Private drafts, logged-in backend pages, and non-WeChat URLs are out of scope.
  2. Extract with scripts/extract_wechat_article.py.
  3. If network fetching fails because WeChat blocks the request, ask the user to save the article HTML from a browser and rerun with --html-file.
  4. Use Markdown for human-readable archives and JSON for downstream import pipelines.
  5. Keep copyright boundaries: summarize or transform extracted content when sharing externally; do not republish full articles unless the user has rights to do so.

Outputs

Markdown output contains:

  • article title, account name, publish time, source URL, and image count
  • full text with blank-line paragraph separation
  • tables converted to Markdown tables when possible
  • inline image placeholders like [[WECHAT_IMAGE_1]]
  • image URL list at the end

JSON output contains:

  • articleId
  • title
  • author
  • publishTime
  • sourceUrl
  • content
  • contentWithImageMarkers
  • imageEntries
  • imageUrls
  • imageCount
  • coverImageUrl

Notes

  • The script uses only the Python standard library.
  • It preserves image positions with markers but does not download images by default.
  • Add --download-images <dir> when the user explicitly wants local image files.
  • WeChat article pages change over time; if live extraction fails, saved HTML is the most reliable fallback.