Firecrawl Search & Scrape
Keyless web search and page scraping via Firecrawl v2 API. No API key needed. Stdlib-only Python, no pip install. API key resolution: --api-key flag > FIRECRAWL_API_KEY env var > keyless mode.
Search
python scripts/firecrawl_search.py "query" [options]
| Flag | Default | Purpose |
|------|---------|---------|
| --limit N | 5 | Max results (1-100) |
| --sources | web | web, images, news (comma-separated) |
| --country | US | ISO country code |
| --include | - | Only these domains |
| --exclude | - | Exclude these domains |
| --categories | - | github, research, pdf |
| --scrape | off | Include page markdown in results |
| --output | text | text, json, markdown |
| --api-key | - | Optional (or set FIRECRAWL_API_KEY env) |
python scripts/firecrawl_search.py "latest AI news" --limit 5
python scripts/firecrawl_search.py "AI breakthroughs" --sources web,news --limit 5
python scripts/firecrawl_search.py "Python tutorial" --scrape --limit 3
python scripts/firecrawl_search.py "react hooks" --include "react.dev,medium.com"
python scripts/firecrawl_search.py "LLM fine-tuning" --categories research
python scripts/firecrawl_search.py "climate policy" --output markdown
Scrape (single page)
python scripts/firecrawl_scrape.py "https://example.com" [options]
| Flag | Default | Purpose |
|------|---------|---------|
| --formats | markdown | markdown, html, rawHtml, links, screenshot |
| --only-main | off | Strip nav/footer, keep main content only |
| --output | text | text, json |
python scripts/firecrawl_scrape.py "https://docs.python.org/3/tutorial/" --only-main
python scripts/firecrawl_scrape.py "https://example.com" --formats markdown,links
Research Strategy
For complex topics, run multiple targeted searches instead of one broad query:
- Broad scan:
--limit 5to survey the landscape, identify key sources and terminology. - Focused search: use
--includeto target authoritative domains found in step 1. - Deep read: use
firecrawl_scrape.pyon the 2-3 most relevant URLs for full content.
Gotchas
- Uses curl internally (best keyless compatibility); falls back to urllib if unavailable.
- UTF-8 output handled automatically on Windows (CJK-safe).
- HTTP 403 = IP blocked in keyless mode > provide
--api-keyor setFIRECRAWL_API_KEY. - HTTP 429 = rate limited > reduce
--limitor wait. - ~2 credits per 10 results on free tier.
Scan to join WeChat group