Back to skills
extension
Category: Data & AnalyticsAPI key required

Headless Brave Browser

Headless web search and content extraction via the Brave Search API. Features exponential-backoff retry, circuit breaker fault isolation, bounded-concurrency...

personAuthor: kelexinehubclawhub

brave-search

Headless web search and content extraction via the Brave Search API.

Setup

Run once before first use:

cd <skill-root>
npm ci

Required environment variable:

export BRAVE_API_KEY="your-key-here"

Get a free API key at brave.com/search/api.

Usage

Search

node scripts/search.js "query"                        # Basic (5 results)
node scripts/search.js "query" -n 10                  # Up to 20 results
node scripts/search.js "query" --content              # Include page content
node scripts/search.js "query" -n 3 --content         # Combined
node scripts/search.js "query" --json                 # Newline-delimited JSON
node scripts/search.js --help                         # Full options + env vars

Extract page content

node scripts/content.js https://example.com/article
node scripts/content.js https://example.com/article --json
node scripts/content.js https://example.com/article --max-length 8000

Output format (plain text)

--- Result 1 ---
Title:   Page Title
URL:     https://example.com/page
Snippet: Description from Brave Search
Content:
  # Page Title

  Extracted markdown content...

--- Result 2 ---
...

Pass --json to get one JSON object per line instead, suitable for piping.

Exit codes

| Code | Meaning | |------|--------------------------------------------------| | 0 | Success | | 1 | Invalid input or configuration error | | 2 | Page had no extractable content (content.js) | | 130| Interrupted (SIGINT) |

Configuration (environment variables)

All behaviour is configurable without touching code:

| Variable | Default | Description | |------------------------|----------|------------------------------------------------| | BRAVE_API_KEY | — | Required. Brave Search subscription token | | LOG_LEVEL | info | debug · info · warn · error · silent | | LOG_JSON | false | Emit logs as newline-delimited JSON to stderr | | FETCH_TIMEOUT_MS | 15000 | Per-page fetch timeout | | SEARCH_TIMEOUT_MS | 10000 | Brave API call timeout | | MAX_CONTENT_LENGTH | 5000 | Max chars of extracted content | | MAX_RETRY_ATTEMPTS | 3 | Retry attempts on transient errors | | RETRY_BASE_DELAY_MS | 500 | Base delay for exponential backoff | | RETRY_MAX_DELAY_MS | 30000 | Backoff delay cap | | CONCURRENCY_LIMIT | 3 | Parallel page fetches when --content is set | | CB_FAILURE_THRESHOLD | 5 | Consecutive failures before circuit opens | | CB_RESET_TIMEOUT_MS | 60000 | Circuit breaker reset window |

All variables are validated at startup — misconfigured runs fail immediately with a descriptive list of every bad value rather than crashing mid-execution.

Architecture

See references/ARCHITECTURE.md for a full module breakdown.

scripts/
├── search.js            ← Search CLI entry point
├── content.js           ← Content extraction CLI entry point
├── content-fetcher.js   ← HTTP fetch + Readability + DOM fallback
├── config.js            ← Schema-validated env config
├── circuit-breaker.js   ← Fault isolation (CLOSED → OPEN → HALF_OPEN)
├── retry.js             ← Exponential backoff with full jitter
├── concurrency.js       ← Bounded parallel execution pool
├── utils.js             ← htmlToMarkdown, smartTruncate, parseURL
├── logger.js            ← Structured leveled logger → stderr
└── errors.js            ← Typed error hierarchy