Web Scraper & Summarizer
Scrape web pages and generate structured summaries using Ollama LLM.
Usage
Basic Summary
python scripts/scrape_and_summarize.py <url>
Outputs a 2-3 sentence brief summary.
Custom Styles
python scripts/scrape_and_summarize.py <url> detailed
python scripts/scrape_and_summarize.py <url> bullet
Styles:
brief(default) - 2-3 sentence overviewdetailed- Comprehensive with bullet pointsbullet- Key points as bullet list
Workflow
- Detect URL - User provides a URL to analyze
- Fetch - Script uses curl to retrieve page content (30s timeout)
- Clean - HTML stripped, content extracted and cleaned
- Truncate - Content limited to ~16k tokens to fit context
- Summarize - Ollama generates summary based on requested style
Requirements
curlavailable on PATH- Ollama installed with
llama3.2model pulled - Python 3.8+
Error Handling
- Fetch fails: Shows error, exits with code 1
- Content too short: Detects and reports empty pages
- Ollama unavailable: Falls back to raw content display (truncated)
Output Format
Summary is printed to stdout. Progress/debug info goes to stderr.
=== SUMMARY ===
[AI-generated summary here]
On Ollama failure, raw content is shown instead:
=== PAGE CONTENT (raw) ===
[truncated page text]
Performance Notes
- Fast pages: ~5-10 seconds total
- Large pages: Content truncation happens at ~16k tokens
- Ollama latency: Depends on model size and hardware (typically 10-30s for summary)
Scan to join WeChat group