返回 Skill 列表
extension
分类: 开发与工程无需 API Key

firecrawl

抓取和爬取网页,将其转换为干净的Markdown或结构化的JSON,以便LLM(大语言模型)使用。当需要从URL中提取内容、爬取整个网站、映射站点结构、通过抓取搜索网络或从页面中提取结构化数据时,请使用此功能。最适合用于网络抓取、站点爬行、URL发现以及将网络内容转换成适合LLM使用的格式。

person作者: jakexiaohubgithub

Firecrawl Web Scraping

Converts web pages into clean, LLM-ready markdown or structured data. Handles JavaScript rendering, anti-bot measures, and complex sites.

When to Use

Use Firecrawl when you need to:

  • Scrape a specific URL and get its content as markdown/HTML
  • Crawl an entire website or section recursively
  • Map a website to discover all its URLs
  • Search the web AND scrape the results in one operation
  • Extract structured JSON data from web pages
  • Handle JavaScript-rendered or dynamic content
  • Get screenshots of web pages

Protocol

Step 1: Scrape a Single URL

scripts/firecrawl.sh scrape "<url>" [format]

Formats: markdown (default), html, links, screenshot

Example:

scripts/firecrawl.sh scrape "https://docs.firecrawl.dev/introduction"
scripts/firecrawl.sh scrape "https://example.com" "html"

Step 2: Search Web + Scrape Results

scripts/firecrawl.sh search "<query>" [limit]

Example:

scripts/firecrawl.sh search "firecrawl web scraping API" 5

Step 3: Map Website URLs

scripts/firecrawl.sh map "<url>" [limit] [search]

Example:

scripts/firecrawl.sh map "https://firecrawl.dev" 50
scripts/firecrawl.sh map "https://docs.firecrawl.dev" 100 "api reference"

Step 4: Extract Structured JSON (Single Page)

scripts/firecrawl.sh extract "<url>" "<prompt>"

Uses Firecrawl's LLM extraction to return structured JSON from a single page.

Example:

scripts/firecrawl.sh extract "https://firecrawl.dev" "Extract company name, mission, and pricing tiers"

Step 5: Crawl Entire Site

scripts/firecrawl.sh crawl "<url>" [limit] [depth]

Example:

scripts/firecrawl.sh crawl "https://docs.firecrawl.dev" 20 2

Critical Rules

  1. Scrape for single pages - Use scrape when you have specific URLs
  2. Map before crawl - Use map to discover URLs, then scrape specific ones
  3. Search for discovery - Use search to find relevant pages when you don't know URLs
  4. Extract for structure - Use extract when you need JSON, not markdown
  5. Respect rate limits - Script auto-retries on 429 with key rotation
  6. Current year is 2026 - Use this when recency matters; omit for timeless topics or use older years when historically relevant

Resources

See reference/troubleshooting.md for error handling, configuration, and common issues.