WeChat MP Reader
Use this skill for 微信公众号文章抓取、公众号反查、文章列表拉取、全文提取。
What this skill should do
Support these user intents:
- 给一篇公众号文章链接,提取全文
- 给一篇公众号文章链接,识别公众号并列出该号文章
- 给一个公众号名称,查找候选公众号并抓取文章列表
- 检查、保存、复用微信公众号后台 session
- 将文章内容标准化为 markdown / structured JSON
Operating principles
- URL-first is the default path. If the user gives an article URL, resolve from it first.
- Name search is best-effort. If account-name search is unreliable, ask for any article URL from that account.
- Full text matters more than stats. Article extraction is core; read/like stats are optional.
- Use layered fallbacks. Try plain HTTP first, but for WeChat articles treat browser fallback as normal whenever the page looks non-canonical (verification page, shell page, or mixed JS page). The current fallback is local Playwright WebKit only.
- Keep outputs structured. Return normalized account/article objects rather than loose text.
- Recover fakeid via search when needed. Article pages often expose
biz/account name, but not a stablefakeid; when MP backend session is available, try search-based recovery. - Treat session validity as first-class state. Report whether session is present/valid, instead of hiding failures in generic warnings.
Default workflow
Path A — article URL provided
- Parse the article URL and extract
__biz,mid,idx,sn. - Fetch the article page.
- Extract account metadata from HTML / embedded JS.
- Load MP backend session from env or session file.
- Validate session and report
session.present / session.valid / session.reason. - If
fakeidis missing and session is valid, search by account name and match candidates usingbiz/ name. - Extract and clean full article content.
- If requested and
fakeidis available, list more articles for that account.
Path B — account name provided
- Load and validate MP backend session.
- Attempt account-name search via the search adapter.
- Return ranked candidates.
- If a confident match exists, fetch article list.
- If search fails or is ambiguous, ask for any article URL from that account and switch to Path A.
Path C — session operations
Use the bundled CLI to:
session check— validate current env/file-backed sessionsession show— report non-sensitive session presence/length/statussession save— persist env-provided session to local cache filesession login-start— start QR login, return scan state, and write a real scannable QR PNG underscripts/cache/wechat-login-qr-real.pngsession login-status— poll login status and capture fresh session when ready
Expected outputs
Session object
{
"present": true,
"valid": false,
"reason": "invalid session",
"base_resp": {}
}
Account object
{
"name": "",
"biz": "",
"fakeid": "",
"avatar": "",
"signature": ""
}
Article object
{
"title": "",
"url": "",
"publish_time": "",
"publish_time_raw": "",
"author": "",
"account_name": "",
"content_html": "",
"content_markdown": "",
"images": []
}
Implementation notes
- Prefer the bundled Python prototype at
scripts/wechat_mp_reader.py. - Default live validation path: use the skill's own session commands (
session check,session login-start,session login-status) and then runarticle <url> --with-account-articlesdirectly viascripts/wechat_mp_reader.py; helper bridge scripts are no longer the default path. session login-startnow persists a real scannable QR image toscripts/cache/wechat-login-qr-real.pngand returns its path inqr_image_path.- Session resolution order is: env vars first, then saved session file.
- The current article pipeline is URL-first and will automatically fall back to local Playwright WebKit when direct HTTP HTML looks non-canonical.
- Treat article body extraction as the MVP.
- Treat account-name search and historical article listing as adapters that can evolve.
- Treat engagement stats as optional and isolated from the main flow.
- Cache article HTML and parsed results when repeated fetching is likely.
- Cache resolved account mappings (
biz/name->fakeid) locally to reduce repeated searchbiz lookups.
Files to use
scripts/wechat_mp_reader.py— Python prototype and CLIscripts/wechat_mp_reader/auth.py— session validation helpersscripts/wechat_mp_reader/session_store.py— session load/save helpersreferences/design.md— architecture, implementation phases, and caveats
Read references/design.md when you need the detailed design, adapter responsibilities, or future roadmap.
Read references/usage.md when you need the human-facing usage guide, CLI examples, or natural-language invocation patterns for triggering this skill through an agent.
Scan to join WeChat group