WeChat Article Fetcher (微信文章抓取)

Fetch WeChat (微信公众号) articles and convert to clean Markdown using Playwright browser automation.

Features

Real Chromium browser to bypass WeChat anti-bot protections
Automatic lazy-loaded image handling (data-src → src)
Auto-generated filename from publish date + title (YYYYMMDD format)
Metadata extraction (author, publish time)
Clean Markdown output with preserved images

Dependencies

pip install playwright markdownify
playwright install chromium

Usage

# Auto-generate filename (YYYYMMDD+Title format)
python scripts/fetch_weixin.py "https://mp.weixin.qq.com/s/xxxxx"

# Custom filename
python scripts/fetch_weixin.py "https://mp.weixin.qq.com/s/xxxxx" article.md

Response Pattern

When user requests WeChat article fetching:

Validate URL: Ensure it's a WeChat URL (mp.weixin.qq.com)
Execute fetching:
```
python scripts/fetch_weixin.py <url> [output_filename]
```
Output filename is optional - auto-generates as YYYYMMDD+Title
Report results:
- Confirm file saved with statistics (characters, words, images)
- Show the auto-generated filename

Example Workflows

Auto-generated filename

# User: "抓取这篇微信文章"
python scripts/fetch_weixin.py "https://mp.weixin.qq.com/s/xxxxx"

# Result:
# ✓ Saved: 20251214关于财政政策和货币政策的关系.md
# ✓ Statistics: 12,345 characters, 8,234 words, 5 images

Custom filename

# User: "Fetch this WeChat article, save as economy.md"
python scripts/fetch_weixin.py "https://mp.weixin.qq.com/s/xxxxx" economy.md

# Result:
# ✓ Saved: economy.md

Troubleshooting

| Issue | Solution | |-------|----------| | WeChat blocked | Script uses real browser to bypass anti-bot | | Timeout | Script has 60s timeout with retry - usually succeeds on second attempt | | Playwright not installed | Run: pip install playwright && playwright install chromium | | Empty content | Wait for page to fully load; check if article is still accessible | | Missing images | Script auto-converts lazy-loaded images; check network connectivity |