WeChat MP Reader

Use this skill for 微信公众号文章抓取、公众号反查、文章列表拉取、全文提取。

What this skill should do

Support these user intents:

给一篇公众号文章链接，提取全文
给一篇公众号文章链接，识别公众号并列出该号文章
给一个公众号名称，查找候选公众号并抓取文章列表
检查、保存、复用微信公众号后台 session
将文章内容标准化为 markdown / structured JSON

Operating principles

URL-first is the default path. If the user gives an article URL, resolve from it first.
Name search is best-effort. If account-name search is unreliable, ask for any article URL from that account.
Full text matters more than stats. Article extraction is core; read/like stats are optional.
Use layered fallbacks. Try plain HTTP first, but for WeChat articles treat browser fallback as normal whenever the page looks non-canonical (verification page, shell page, or mixed JS page). The current fallback is local Playwright WebKit only.
Keep outputs structured. Return normalized account/article objects rather than loose text.
Recover fakeid via search when needed. Article pages often expose biz/account name, but not a stable fakeid; when MP backend session is available, try search-based recovery.
Treat session validity as first-class state. Report whether session is present/valid, instead of hiding failures in generic warnings.

Default workflow

Path A — article URL provided

Parse the article URL and extract __biz, mid, idx, sn.
Fetch the article page.
Extract account metadata from HTML / embedded JS.
Load MP backend session from env or session file.
Validate session and report session.present / session.valid / session.reason.
If fakeid is missing and session is valid, search by account name and match candidates using biz / name.
Extract and clean full article content.
If requested and fakeid is available, list more articles for that account.

Path B — account name provided

Load and validate MP backend session.
Attempt account-name search via the search adapter.
Return ranked candidates.
If a confident match exists, fetch article list.
If search fails or is ambiguous, ask for any article URL from that account and switch to Path A.

Path C — session operations

Use the bundled CLI to:

session check — validate current env/file-backed session
session show — report non-sensitive session presence/length/status
session save — persist env-provided session to local cache file
session login-start — start QR login, return scan state, and write a real scannable QR PNG under scripts/cache/wechat-login-qr-real.png
session login-status — poll login status and capture fresh session when ready

Expected outputs

Session object

{
  "present": true,
  "valid": false,
  "reason": "invalid session",
  "base_resp": {}
}

Account object

{
  "name": "",
  "biz": "",
  "fakeid": "",
  "avatar": "",
  "signature": ""
}

Article object

{
  "title": "",
  "url": "",
  "publish_time": "",
  "publish_time_raw": "",
  "author": "",
  "account_name": "",
  "content_html": "",
  "content_markdown": "",
  "images": []
}

Implementation notes

Prefer the bundled Python prototype at scripts/wechat_mp_reader.py.
Default live validation path: use the skill's own session commands (session check, session login-start, session login-status) and then run article <url> --with-account-articles directly via scripts/wechat_mp_reader.py; helper bridge scripts are no longer the default path.
session login-start now persists a real scannable QR image to scripts/cache/wechat-login-qr-real.png and returns its path in qr_image_path.
Session resolution order is: env vars first, then saved session file.
The current article pipeline is URL-first and will automatically fall back to local Playwright WebKit when direct HTTP HTML looks non-canonical.
Treat article body extraction as the MVP.
Treat account-name search and historical article listing as adapters that can evolve.
Treat engagement stats as optional and isolated from the main flow.
Cache article HTML and parsed results when repeated fetching is likely.
Cache resolved account mappings (biz / name -> fakeid) locally to reduce repeated searchbiz lookups.

Files to use

scripts/wechat_mp_reader.py — Python prototype and CLI
scripts/wechat_mp_reader/auth.py — session validation helpers
scripts/wechat_mp_reader/session_store.py — session load/save helpers
references/design.md — architecture, implementation phases, and caveats

Read references/design.md when you need the detailed design, adapter responsibilities, or future roadmap. Read references/usage.md when you need the human-facing usage guide, CLI examples, or natural-language invocation patterns for triggering this skill through an agent.